Skip to content
Commit 03a57dec authored by Myle Ott's avatar Myle Ott Committed by Facebook Github Bot
Browse files

Improve memory efficiency of FP16 optimization (#404)

Summary:
Previously when training with --fp16, we stored a copy of the model parameters in FP32 for optimization, which consumed a lot of memory. An alternative is to just do the conversions to FP32 on the fly, which allows the caching allocator to reuse/save some memory.

This reduces peak memory usage by ~20% with a negligible reduction in training speed (~2% slower) when training a big transformer on 8 GPUs on wmt en-de with --update-freq=16.

This does not affect convergence, i.e., models will train exactly as they did before.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/404

Differential Revision: D13394376

Pulled By: myleott

fbshipit-source-id: 2b9f808548df4782110513c9cfc9f7c6159bcbbf
parent 0f833526
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment