Bugfixes in training.py regarding batch_multiplier
Fixed logging of batch loss if batch_multiplier > 1: - Previously only last portion of size batch_size of an effective batch (batch_multiplier * batch_size) was logged. - Previously every portion (batch_size) was saved to tb.writer not just the correctly computed loss of a batch (batch_multiplier * batch_size). Now this behaves the same as in the batch_multiplier = 1 case. Fixed bug occuring if train_data is not devisible by (batch_multiplier * batch_size) if batch_multiplier > 1 - Previously if this case occurs a batch (batch_multiplier * batch_size) couldn't fit the leftover training examples for the last batch in an epoch. Therefore no update could carry out and the leftover gradients were summed on top of the ones of the first batch of the next epoch. This also resulted in a faulty scaling of the loss normalization. This now behaves properly by changing batch_multiplier in last batch of an epoch to fit the number of leftover examples. - The way the losses are accumalated in _train_batch is changed to reduce rounding error that occured by dividing every portion of the batch loss by batch_multiplier. Now only final sum is devided by batch_multiplier.
Loading
Please register or sign in to comment