There are many moving parts when using language models for text generation: the model technology (n-gram, RNN, Transformers) and its parameters, the data used to fit the model, and various sampling and decoding strategies. It is difficult to understand how each of these parts influences the generation performance. In this project, we want to build a configurable system where we can easily exchange parts and observe how this change influences the generation.