- Jun 15, 2018
-
-
Myle Ott authored
Changelog: - 97b58b46: add Transformer model from Vaswani et al. (2017) - b2374e52: faster Transformer inference with improved caching - 2d27ae08: simulate large mini-batch training with delayed updates (`--update-freq`) - 7ee1d284: add FP16 training support (`--fp16`) - 2a84f46b: faster inference by removing completed sentences from the batch - 663fd806: batched interactive generation - 4c2ef2de: add language modeling / gated convolutional model from Dauphin et al. (2017) - b59815bc: add Hierarchical Neural Story Generation model from Fan et al. (2018) - ff68a9ef: add FairseqTask to modularize task definitions (e.g., translation, language modeling)
-
Myle Ott authored
Add transformer models and replace list with table
-
Alexei Baevski authored
-
alexeib authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss. Changes: - Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator. - Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position - Remove LEFT_PAD_* constants and make them configurable per task
-
Sergey Edunov authored
torch.arange default return type is changed in the latest pytorch version https://github.com/pytorch/pytorch/pull/7016
-
Myle Ott authored
Co-authored-by: pmichel31415 <pmichel@fb.com>
-
alexeib authored
-
Alexei Baevski authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Alexei Baevski authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
Myle Ott authored
-
alexeib authored
-
Alexei Baevski authored
save best val loss in checkpoint and also print best so far this way when training continues from an existing checkpoint, we dont immediately override checkpoint_best with a worse loss
-
Angela Fan authored
-
Angela Fan authored
-
Angela Fan authored
-
alexeib authored
-
alexeib authored
-
alexeib authored
-
alexeib authored
-
Myle Ott authored
-
alexeib authored
-
alexeib authored
-
alexeib authored
This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf There are 3 modes for constructing batches: - token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper - complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper - eos: one sentence per sample (skip blank lines) some results: GCNN-13 - GBW - 37.46 GCNN-14B - GBW - 33.88 GCNN-8 - Wiki103 - 43.76 GCNN-14 - Wiki103 - 35.66 train: python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500 eval: python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'
-
myleott authored
-
Myle Ott authored
-
Myle Ott authored
-