Pre-trained LSTM language models

This webpage contains LSTM language models trained with TF-LM, a language modeling toolkit based on TensorFlow.
Last update: 13th November 2017

We provide both discours-level and sentence-level models. A discourse-level model trains on batches that look like this: The batches all have the same length and may contain (parts of) (multiple) sentence(s). Sentences are delimited with the end-of-sentence symbol.
Sentence-level batches look like this: The batches all have the length of longest sentence in the corpus, and shorter sentences are padded (with e.g. '@'). This wastes quite some memory, but for the case of for example bidirectional LSTMs, sentence-level batches are preferered.

English Penn TreeBank

English WikiText

Dutch CGN (Corpus of Spoken Dutch)

Dutch Subtitles

Statistics for all datasets

Data set # words vocabulary # words training # words validation # words test
Penn TreeBank 10k 900k 70k 80k
WikiText 33k 2M 210k 240k
CGN 100k 10M 550k /
Subtitles 100k 45M 310k /

Hyperparameters for all models

Hyperparameter Penn TreeBank WikiText CGN Subtitles
# LSTM layers 1 2
# LSTM units 512
# steps unrolling 35
Scale initialization -0.05 - +0.05
Dropout % 50
Threshold for clipping norm of gradients 5
Optimizer Stochastic gradient descent
Initial learning rate 1
Learning rate decay 0.8 0.8 (discourse) / 0.6 (sentence) 0.8
Start learning rate decay after x epochs 6 2
Softmax Full Sampled
Early stopping? (stop after x times no improvement) / 3 2