Add --fp16-scale-tolerance (#397)
Summary: Let's only decrease the loss scale if a large enough percentage of batches overflow. Pull Request resolved: https://github.com/pytorch/fairseq/pull/397 Differential Revision: D13355159 Pulled By: myleott fbshipit-source-id: e17dde73d34a639519b4348c013fdd19d2b314e6
Loading
Please register or sign in to comment