* We used the provided toolkit from the coco website to evaluate our data, but had to adjust the code to fit the 2017data we used.
* We used the provided toolkit from the coco website to evaluate our data, but had to adjust the code to fit the 2017-data-version we used.
* For nearly all metrics (except to CIDEr) we have a difference of 0.1 to 0.2 - which is quite a margin but can be explained by
considering that we used only ~7% of the training data for the LSTM.
* More reasons for our lower results: the cnn wasn't fully trained (precision was still low), no extended hyperparameter tuning for LSTM, no beam search implemented