Commits · v0.5.0 · Simon Will / fairseq

Jun 15, 2018
- 0.4.0 -> 0.5.0 · 388c520b
  Myle Ott authored Jun 15, 2018
```
Changelog:
- 97b58b46: add Transformer model from Vaswani et al. (2017)
- b2374e52: faster Transformer inference with improved caching
- 2d27ae08: simulate large mini-batch training with delayed updates (`--update-freq`)
- 7ee1d284: add FP16 training support (`--fp16`)
- 2a84f46b: faster inference by removing completed sentences from the batch
- 663fd806: batched interactive generation
- 4c2ef2de: add language modeling / gated convolutional model from Dauphin et al. (2017)
- b59815bc: add Hierarchical Neural Story Generation model from Fan et al. (2018)
- ff68a9ef: add FairseqTask to modularize task definitions (e.g., translation, language modeling)
```
  v0.5.0
  
  388c520b
- Update README.md · 5383b5db
  Myle Ott authored Jun 15, 2018
```
Add transformer models and replace list with table
```
  5383b5db
- add links to pretrained language models · 2845d033
  Alexei Baevski authored Jun 14, 2018
  
  2845d033
- add default architecture for gbw fconv lm · fbc42f2d
  alexeib authored Jun 14, 2018
  
  fbc42f2d
- Change --path to be colon-separated instead of comma-separated · 16caed31
  Myle Ott authored Jun 14, 2018
  
  16caed31
- Faster generation when using a single model (rather than ensemble) · ef179415
  Myle Ott authored Jun 14, 2018
  
  ef179415
- Fix bidirectional lstm · bfcc6ec7
  Myle Ott authored Jun 12, 2018
  
  bfcc6ec7
- Fix tests · 55dc4842
  Myle Ott authored Jun 12, 2018
  
  55dc4842
- Updates for latest PyTorch · e89329d6
  Myle Ott authored Jun 12, 2018
  
  e89329d6
- Add FairseqTask · ff68a9ef
  Myle Ott authored Jun 12, 2018
```
A Task defines the data format, stores shared state (e.g., dictionaries) and provides helpers for building the model/criterion and calculating the loss.

Changes:
- Add TranslationTask and LanguageModelingTask. New tasks can be registered with @register_task decorator.
- Add EpochBatchIterator to encapsulate batching and saving/restoring dataloader position
- Remove LEFT_PAD_* constants and make them configurable per task
```
  ff68a9ef
- torch.arange default return type is changed in the latest pytorch version... · 2de93532
  Sergey Edunov authored Jun 11, 2018
```
torch.arange default return type is changed in the latest pytorch version https://github.com/pytorch/pytorch/pull/7016
```
  2de93532
- Fix length penalty when combined with --no-early-stop · fc87eea2
  Myle Ott authored Jun 11, 2018
```
Co-authored-by: pmichel31415 <pmichel@fb.com>
```
  fc87eea2
- initialize normalization constant for fconv_lm · 1fb22af4
  alexeib authored Jun 07, 2018
  
  1fb22af4
- build optimizer only once, otherwise it leaks cuda memory · f4108909
  Alexei Baevski authored Jun 05, 2018
  
  f4108909
- Update README.md · 92050ef2
  Myle Ott authored Jun 04, 2018
  
  92050ef2
- Add more integration tests (LM, stories, transformer, lstm) · 16a72b4d
  Myle Ott authored Jun 04, 2018
  
  16a72b4d
- Suppress stdout in test_train · 736fbee2
  Myle Ott authored Jun 04, 2018
  
  736fbee2
- Small fixes · 13aa36cf
  Myle Ott authored May 31, 2018
  
  13aa36cf
- create examples dir and add conv lm + stories readme · c778a31e
  Alexei Baevski authored May 31, 2018
  
  c778a31e
- Merge validate and val_loss functions (simplify train.py) · a919570b
  Myle Ott authored May 30, 2018
  
  a919570b
- Use symlinks for redundant checkpoints · 6643d525
  Myle Ott authored May 30, 2018
  
  6643d525
- Unify various sharding into ShardedIterator · 24d7de44
  Myle Ott authored May 30, 2018
  
  24d7de44
- Migrate all binaries to use options.parse_args_and_arch · 76b5ecab
  Myle Ott authored May 30, 2018
  
  76b5ecab
- Nits · cf1c64a5
  Myle Ott authored May 30, 2018
  
  cf1c64a5
- fix model loading in eval_lm · 6eda8e47
  alexeib authored May 30, 2018
  
  6eda8e47
- save best val loss in checkpoint · 295ccee9
  Alexei Baevski authored May 30, 2018
```
save best val loss in checkpoint and also print best so far

this way when training continues from an existing checkpoint, we dont immediately override checkpoint_best with a worse loss
```
  295ccee9
- minor parameter fixes for stories model · 316744d6
  Angela Fan authored May 30, 2018
  
  316744d6
- modified writing prompts model parameters to make readme cleaner · 7a36da42
  Angela Fan authored May 30, 2018
  
  7a36da42
- added multiscale gated self attention layer with multiple heads, and pretrained fusion models · b59815bc
  Angela Fan authored May 09, 2018
  
  b59815bc
- fix default params · 50931d69
  alexeib authored May 29, 2018
  
  50931d69
- use adaptive softmax only with adaptive loss · 7f5f2461
  alexeib authored May 29, 2018
  
  7f5f2461
- record end_of_epoch in checkpoint · 7d560402
  alexeib authored May 28, 2018
  
  7d560402
- fix restoring from middle of epoch; fix defaulting transformer dropout params · 978c125a
  alexeib authored May 27, 2018
  
  978c125a
- Generalize eval_str_list · 386847ee
  Myle Ott authored May 25, 2018
  
  386847ee
- add big en_fr transformer architecture · 09379ad8
  alexeib authored May 25, 2018
  
  09379ad8
- default normalization constant for older models · e774fda7
  alexeib authored May 25, 2018
  
  e774fda7
- Conv lm implementation · 4c2ef2de
  alexeib authored May 25, 2018
```
This implements convolutional language model from https://arxiv.org/pdf/1612.08083.pdf

There are 3 modes for constructing batches:

- token block: fill each sample with a specified number of tokens without regard for sentence delimiters - this is what was used for training in the paper
- complete: fill each sample with a specified number of tokens but make sure it contains only complete sentences (i.e. if next sentence goes over token block limit, move it to the next sample) - this was used for evaluation in the paper
- eos: one sentence per sample (skip blank lines)

some results:

GCNN-13 - GBW - 37.46
GCNN-14B - GBW - 33.88
GCNN-8 - Wiki103 - 43.76
GCNN-14 - Wiki103 - 35.66

train:

python train.py /private/home/abaevski/data/wiki103 --save-dir /tmp --fp16 --max-epoch 35 --save-interval 1 --save-interval-updates 1000 --keep-interval-updates 25 --arch fconv_lm --optimizer nag --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 --decoder-embed-dim 280 --decoder-layers '[(850, 6)] * 3 + [(850,1)] + [(850,5)] * 4 + [(850,1)] + [(850,4)] * 3 + [(1024,4)] + [(2048, 4)]' --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion cross_entropy --max-tokens 1024 --max-target-positions 1024 --seed 1 --log-format json --log-interval 500

eval:

python eval_lm.py ~abaevski/data/wiki103 --path '/checkpoint02/abaevski/2018-04-27/lm_wiki.fp16.mxup300000.fconv.adam.lrs=reduce_lr_on_plateau.emb280.layers(850,6)*3+(850,1)+(850,5)*4+(850,1)+(850,4)*3+(1024,1)+(2048,4).lr0.0005.clp0.1.drp0.3.wd0.0.crt=cross_entropy.mxtk2048.smptk256.seed1.ngpu8/checkpoint_last.pt'
```
  4c2ef2de
- Merge OSS + internal changes · 4e1ec2d8
  myleott authored May 23, 2018
  
  4e1ec2d8
- Fix old model checkpoints · d4816034
  Myle Ott authored May 21, 2018
  
  d4816034
- Flake8 · 81e99d8d
  Myle Ott authored May 09, 2018
  
  81e99d8d