Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0 (1082ba35) · Commits · Simon Will / fairseq

Commit 1082ba35 authored Sep 06, 2018 by Sergey Edunov Committed by Myle Ott Sep 25, 2018

Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0

- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq

parent 311d2c6c

Hide whitespace changes

Inline Side-by-side

Please register or to comment