Fix initial learning rate (#453) (2210fa71) · Commits · Simon Will / fairseq

Commit 2210fa71 authored Jan 16, 2019 by Myle Ott Committed by Facebook Github Bot Jan 16, 2019

Fix initial learning rate (#453)

Summary:
There was a very subtle bug here 😢When we recently removed this line (7633129b), it meant that the learning rate scheduler didn't get initialized until after the first update. Unfortunately pytorch optimizers store the learning rate in their internal state, so some learning rate schedulers use their `__init__` method to reset the learning rate to some sane initial value. This is especially problematic for LR schedulers that include a warmup, where the Optimizer is likely to contain the peak learning rate at initialization, and it's only in the LR scheduler's `__init__` that the (much smaller) warmup value is set.

For example, the inverse_sqrt scheduler resets the learning rate upon initialization:
https://github.com/pytorch/fairseq/blob/7853818c2e33a63ec17a31bcfe20e4fc75d94130/fairseq/optim/lr_scheduler/inverse_square_root_schedule.py#L48-L50

**Impact:** For the last ~1.5 weeks, the first training update would use the optimizer's default learning rate instead of the initial rate set by the LR scheduler. All subsequent updates used the correct learning rates. This primarily affects LR schedulers with warmups.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/453

Differential Revision: D13704453

Pulled By: myleott

fbshipit-source-id: a946da30100f837c66bdc6b9b77b014ab4eb8764

parent 7853818c

Hide whitespace changes

Inline Side-by-side

Please register or to comment