Improve memory efficiency of FP16 optimization (#404) (03a57dec) · Commits · Simon Will / fairseq

Commit 03a57dec authored Dec 24, 2018 by Myle Ott Committed by Facebook Github Bot Dec 24, 2018

Improve memory efficiency of FP16 optimization (#404)

Summary:
Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.

This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.

This does not affect convergence, i.e., models will train exactly as they did before.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/404

Differential Revision: D13394376

Pulled By: myleott

fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf

parent 0f833526

Hide whitespace changes

Inline Side-by-side

Please register or to comment