I set two states, the eos token (zeros). And the excess of the max length (-1...
I set two states, the eos token (zeros). And the excess of the max length (-1 s). And add a punishment on last state on the reward.
Loading
Please register or sign in to comment