The current batch multiplier now only gets changed to fit the leftover...
The current batch multiplier now only gets changed to fit the leftover examples, if batch_type == "sentence". The optimizer gradients now get set to zero at the start of each epoch to effectively skip the leftover examples of the last epoch that get computed if batch_type == "token"
Loading
Please register or sign in to comment