Adafactor Optimizer (#472)
Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/472 Implementation of "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost" (https://arxiv.org/abs/1804.04235) Differential Revision: D13388049 fbshipit-source-id: 24ad30f4bac248e6aeaced5064bb83784058f03d
Loading
Please register or sign in to comment