diff --git a/lrp.ipynb b/lrp.ipynb
index 8d3cf41822a60a7f38e3defdbe2723f1bfe32922..deaccdcbc780c91b65465e4385945082fb8065a2 100644
--- a/lrp.ipynb
+++ b/lrp.ipynb
@@ -145,7 +145,7 @@
    "source": [
     "Each image has multiple captions describing different things or scenes in the image. The Embedding Network is trained to map the embedded representations of image features and text features. In case of image-sentence retrieval task: given a query image and the candidate sentences, to obtain the embed representations we pass the image and text features through two branches of the network respectively. After that, we can compute the similarity scores (Euclidean distance in [Wang et al. (2017)](#1)) and rank the candidates in decreasing order. The correct match, e.g. the 5 corresponding captions of the image should be on top of the ranking. As testing on the [MSCOCO](http://cocodataset.org/#home) testset, the authors used the sentences to query images through ranking K similarity scores between the embedded representations and achieved the accuracy of 43.3% if K=1, 76.4% if K=5 and 87.5% if K=10. K stands for the number of the top matches.\n",
     "\n",
-    "From the above examples we know that, the captions of one image contain very different words or phrases describing different objects or regions of one image, can the network really distinguish the difference between the captions and match different captions to different regions from one image, which features from the sentences are corresponding to the images in the retrieval task?"
+    "From the above examples we know that, the captions of one image contain very different words or phrases describing different objects or regions of one image, can the network really distinguish the difference between the captions and match different captions to different regions from one image, which features from the sentences are corresponding to the images in the retrieval task? "
    ]
   },
   {
@@ -184,7 +184,8 @@
     "    - [Obtain word-level relevance](#wordrel)\n",
     "    \n",
     "4. [Word-level relevance visualization and discussion](#result)\n",
-    "    \n"
+    "    \n",
+    "5. [Future work](#future)"
    ]
   },
   {
@@ -202,7 +203,7 @@
    "source": [
     "<a id=\"model\"> </a>\n",
     "#### Load trained Embedding Network\n",
-    "Let's first import the network and construct the configuration for loading the trained model. The parameters of the trained model is saved in the file `model_best.pth.tar`. We just need to reload the parameters to the network. Run `model.eval()` and the model is ready for computing the embedded representation for new input features."
+    "Let's first import the network and construct the configuration for loading the trained model. The parameters of the trained model is saved in the file `model_best.pth.tar`. We just need to reload the parameters to the network. Run `model.eval()` and the model is then ready for computing the embedded representations for new input features."
    ]
   },
   {
@@ -379,501 +380,9 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "ResNet(\n",
-       "  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)\n",
-       "  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "  (relu): ReLU(inplace=True)\n",
-       "  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)\n",
-       "  (layer1): Sequential(\n",
-       "    (0): Bottleneck(\n",
-       "      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "      (downsample): Sequential(\n",
-       "        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      )\n",
-       "    )\n",
-       "    (1): Bottleneck(\n",
-       "      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (2): Bottleneck(\n",
-       "      (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "  )\n",
-       "  (layer2): Sequential(\n",
-       "    (0): Bottleneck(\n",
-       "      (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "      (downsample): Sequential(\n",
-       "        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
-       "        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      )\n",
-       "    )\n",
-       "    (1): Bottleneck(\n",
-       "      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (2): Bottleneck(\n",
-       "      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (3): Bottleneck(\n",
-       "      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (4): Bottleneck(\n",
-       "      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (5): Bottleneck(\n",
-       "      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (6): Bottleneck(\n",
-       "      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (7): Bottleneck(\n",
-       "      (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "  )\n",
-       "  (layer3): Sequential(\n",
-       "    (0): Bottleneck(\n",
-       "      (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "      (downsample): Sequential(\n",
-       "        (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
-       "        (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      )\n",
-       "    )\n",
-       "    (1): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (2): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (3): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (4): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (5): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (6): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (7): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (8): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (9): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (10): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (11): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (12): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (13): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (14): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (15): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (16): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (17): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (18): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (19): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (20): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (21): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (22): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (23): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (24): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (25): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (26): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (27): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (28): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (29): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (30): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (31): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (32): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (33): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (34): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (35): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "  )\n",
-       "  (layer4): Sequential(\n",
-       "    (0): Bottleneck(\n",
-       "      (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "      (downsample): Sequential(\n",
-       "        (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
-       "        (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      )\n",
-       "    )\n",
-       "    (1): Bottleneck(\n",
-       "      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "    (2): Bottleneck(\n",
-       "      (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
-       "      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
-       "      (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
-       "      (relu): ReLU(inplace=True)\n",
-       "    )\n",
-       "  )\n",
-       "  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))\n",
-       "  (fc): Linear(in_features=2048, out_features=1000, bias=True)\n",
-       ")"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
    "source": [
     "ResNet = torchvision.models.resnet152(pretrained=True);ResNet.eval()"
    ]
@@ -882,7 +391,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Check the computed features of the network for the given images through returning the classification results. The first ten predicted object categories will be showed in order to have an interpretation about what the extracted features may represent for. "
+    "Check the computed features of the network for the given images through returning the classification results. The first ten predicted object categories will be shown in order to have an interpretation about what the extracted features may represent for. "
    ]
   },
   {
@@ -965,40 +474,11 @@
     "    X[i+1]=(m.forward(X[i]))"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "bottleneck = [name for name in modules[-3]._modules.keys()]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "36"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "len(bottleneck)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "At the end, we can extract the output from the average pool layer before the last output layer from the network as image features representations."
+    "At the end, we can extract the output from the average pool layer before the last output layer from the network as image features representation."
    ]
   },
   {
@@ -1025,7 +505,7 @@
    "source": [
     "<a id=\"evaluate\"> </a>\n",
     "## Compute embedded representations and obtain relevance\n",
-    "After the image and text features for the above two examples are prepared and the trained parameters are loaded into the model, we can project the text and image features into one joint latent space through two branches of the network and compute the multiplication between two representations at the end. The similarity scores can be considered as relevance which can be propagated back to the input features. According to the redistribution rules of LRP, the elements from the input which are relevant to the similarity result should receive high relevance [(Bach et al. 2015)](#2). \n",
+    "After the image and text features for the above two examples are prepared and the trained parameters are loaded into the model, we can project the text and image features into one joint latent space through two branches of the network and compute the multiplication between two representations at the end. The similarity scores can be considered as relevance which can be propagated back to the input features. According to the redistribution rules of LRP, the elements from the input which are relevant to the similarity result should receive high relevance [(Bach et al., 2015)](#2). \n",
     "\n",
     "Firstly we can decompose each branch of the Embedding Network and extract all the layers from each branch to obtain the output of each layer."
    ]
@@ -1053,7 +533,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Since the Embedding Network receives one features vector for each sentence as input representation, we can't feed the text branch with the sequence features directly but we can first compute an average value of the embedding features of each token by the length of the sentence and summarize the values of all tokens in one sentence to obtain a sentence vector. The sentence representation has as a result the same dimension as the word embedding size. "
+    "Since the Embedding Network receives one feature vector for each sentence as input representation, we can't feed the text branch with the sequence directly but we need to firstly compute an average value of the embedding features of each token by the length of the sentence and summarize the values of all tokens in one sentence to obtain a sentence vector. The sentence representation has as a result the same dimension as the word embedding size."
    ]
   },
   {
@@ -1082,7 +562,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Initialize a list of layer outputs for the text branch, output from the lower layer is passed to the connective layer as input. The output of the last layer will be normalized by its L2 norm."
+    "Now we can collect the layer outputs for the text branch layer by layer in the forward direction where we feed the output from the lower layer to the higher connective layer as input. The output of the last layer will be normalized by its L2 norm."
    ]
   },
   {
@@ -1161,7 +641,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To check if all the layers are linear connected and if we can propagate the relevance in the reversed direction through the collections of layers, we can validate the last layer outputs from both decomposed text and image branches with the outputs computed by the model."
+    "We can validate the last layer outputs from both decomposed text and image branches with the outputs computed by the model To check if all the layers are linear connected. "
    ]
   },
   {
@@ -1186,6 +666,13 @@
     "print(torch.all(text_feats.eq(text_feats_normalized)))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "At last, we have to compute the similarity scores which indicate the matching results between the given sentence-image pairs. \n"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1327,6 +814,13 @@
     "print(text_relevance_1.size())"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, we have finished the forward computation procedure. We have extracted the outputs of each layer from the network, obtained two relevance vectors corresponding to two different images for the text representations. In the following section, we will learn how to propagate these relevances back to the word embeddings features, so that we can investigate which tokens are relevant to the matching results between the captions and the given image."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1337,8 +831,7 @@
     "\n",
     "Because $Z_j$ is possible to be equal or close to zero and the relevance will explore. In this case, we can modify the redistribution rule with a stabilizer $\\epsilon$, e.g.: $R_i^{l} =  \\sum_j Z_{ij}\\frac{R_j^{(l+1)}}{ Z_j + \\epsilon}$, where if  $Z_j < 0 $ then $Z_j - \\epsilon$. \n",
     "\n",
-    "According to the original paper [Bach et al. (2015)](#2), the exact propagation rule definition depends on the type of a component within architecture of the network. In the forward computation of each branch in the Embedding Network, we have seen the sequential modules as follows: Linear, BatchNorm1d, ReLu, Dropout and again a Linear layer at the last. Because of the different properties of different layer, we need to define different rules in different layer accordingly. The main components which contains learning parameters in Embedding Network are the linear layers and the batch normalization layers. The propagation rules in these layers depend on the message pass in the forward computation, which be detailed in the following sections. In the activation layer and dropout layer there is no mixing of inputs to compute outputs and the dimensions of input and output is identical, hence $R^{(l)} = R^{(l+1)}$.\n",
-    " "
+    "According to the original paper [Bach et al. (2015)](#2), the exact propagation rule depends on the architecture of the network. Because of different computation in different layer, the propagation algorithm varies from layer to layer. In the forward computation of each branch in the Embedding Network, we have seen the sequential model consists of different modules as follows: Linear, BatchNorm1d, ReLu, Dropout, followed by a Linear layer at the last. Because of the different properties of different layer, we need to define different rules in different layer accordingly. The main components which contains learning parameters in Embedding Network are the linear layers and the batch normalization layers. The propagation rules in these layers depend on the message pass in the forward computation, which will be detailed below. In the activation layer and dropout layer there is no mixing of inputs to compute outputs and the dimensions of input and output is identical, hence in these layers, we can define $R^{(l)} = R^{(l+1)}$."
    ]
   },
   {
@@ -1577,7 +1070,7 @@
     "<a id=\"result\"></a>\n",
     "## Word-level relevance visualization and discussion\n",
     "\n",
-    "In this post, we have just propagated the relevance in the text branch. As it is mentioned at the beginning, the relevance propagation for the image branch involves further propagation in the vision model, which is not covered in this post. But we still can investigate the token relevance since the visualization of each individual token according their score values supplies evidence of the matching results of the Embedding Network. In order to compare the word level relevance to the image relevant features, we can use the image classification results as reference, because the classification results show that what the image features are representing for.\n",
+    "In this post, we have just propagated the relevance in the text branch. As it is mentioned at the beginning, the relevance propagation for the image branch involves further propagation in the vision model, which is not covered in this post. But we can investigate the token relevance since the visualization of each individual token according their score values supplies evidence of the matching results of the Embedding Network. In order to compare the word level relevance to the image relevant features, we can use the image classification results as reference, because the classification results show that what the image features are representing for.\n",
     "\n",
     "As it is mentioned above, sentence-image retrieval is different to classification task, we obtain the relevance from the similarity scores, hence, we will investigate the relevant features of all captions given one image, not just the positive ones. In order to obtain the word-level visualization for the example captions written in HTML format please check the heatmap functions in the `utils` file and import the assisting function `html_table` and return the HTML table which is the visualization of relevant tokens.\n",
     "\n",
@@ -1763,9 +1256,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The matching scores have shown us that the best match for the first image is the second caption, and the relevance visualization provides us with the evidence for the score with the most relevant token \"toy\". And this relevant token can be also found in the fifth caption that also received a higher score than the other matches. Of course, it is difficult to look into every example and find out specific patterns from the language side to explain the sentence-image matching results. But from the relevance result, we obtain an intuitive view about the reasons for the similarity results.\n",
+    "The matching scores have shown us that the best match for the first image is the second caption, and the relevance visualization provides us with the evidence for the score with the highlighted tokens. For example, the second caption received the highest matched score. The only found relevant token \"toy\" from this caption is the only reason for this result.\n",
     "\n",
-    "There are several possible quantitative analysis methods to validate the relevant tokens or using the the relevance for evaluating the working of the network depending on the task and data. For example, if we have marked phrases annotated in the data set, we will be able to calculate the overlaps between the annotated key phrases and the found relevant tokens. To validate the relevant tokens, we can move out the negative tokens (in blue) and just keep the positive relevant tokens in the captions and execute the testing process. If the matching results doesn't change much that can proof the found relevant tokens."
+    "When we look at the five captions of the first image, different captions describe different objects and scenarios in the image, as a result, we find different relevant features in each caption. The results have also shown us that the relevant features from the text are discrete, the text branch of the Embedding Network didn't captured the dependency information of the sequence, namely, image features are not really matched to complete phrases. \n",
+    "\n",
+    "The similarity scores of the positive matches are not much higher than the negative ones which indicates the challenges of the tasks. Because not only images are paired with multiple sentences but also same sentences can be applied to different images. The found relevant tokens from the wrong captions can tell us which text features can also be matched to the given image, even though they might not obtain much relevance."
    ]
   },
   {
@@ -1919,9 +1414,22 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Future work \n",
-    "In this post, we go through the whole procedures about how to compute the layer outputs of both branches in the Embedding Network and obtain the relevance to be propagated. After that, we have learned the knowledge about how to apply the layer-wise relevance propagation approach to redistribute the relevance back to the input word embeddings for obtaining the word-level evidence of the sentence-image matching results. What can we in the future further pro is that we can try is to experiment the approach on the Flickr30K dataset and propagate the image relevance back to vision model and further to get the evidence from the original image features. In this way, we have to  the matched relevant features from both side and \n",
-    "compute the relevance in the Embeddrelevance Apply the whole procedure again on the examples from on Flickr30K dataset."
+    "From the results of the relevant token regarding the second image, we can verify the last findings when we look at the matching results for the first image. The matching results between the captions and the second are not as expected and very unconfident. The wrong captions obtained higher similarity scores than the right ones. And the most relevant tokens from the wrong captions are \"toy\". From the correct captions tokens like \"sign\" \"street\" \"station\" are significant features corresponding to the second image. However they are found not the most relevant tokens comparing to the token \"bottle\", which is the only one contribution to the retrieval score in its caption.\n",
+    "\n",
+    "\n",
+    "Of course, it is difficult to look into every example and find out specific patterns from the language side to explain the sentence-image matching results. But from the relevance result, we obtain an intuitive view about the reasons for the similarity results. For examples, we can conclude that the Embedding Network is not confident when matching the last five captions to the image features. Because\n",
+    "although in the wrong captions there are many relevant features are found but the similarity scores are in contrast lower than the correct captions. The network couldn't distinguish the text features from these captions comparing to the first five captions.\n",
+    "\n",
+    "There are several possible quantitative analysis methods to validate the relevant tokens or using the the relevance for evaluating the working of the network depending on the task and data. For example, if we have marked phrases annotated in the data set, we will be able to calculate the overlaps between the annotated key phrases and the found relevant tokens. To validate the relevant tokens, we can move out the negative tokens (in blue) and just keep the positive relevant tokens in the captions and execute the testing process. If the matching results doesn't change much that can verify the found relevant tokens."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"future\"></a>\n",
+    "## Future work \n",
+    "In this post, we have applied the layer-wise relevance propagation approach for explaining the network results using the relevant tokens as evidence. We go through the whole procedures about how to compute the layer outputs of both branches in the Embedding Network, obtain the relevance to be propagated and at the end propagate the the relevance to the word embedding features. Yet, this post has just supplied a practical instruction about applying the method, we still need further statistical analyses based on the found relevant features for more comprehensive explanations about the working of the network. Furthermore we can also compare the relevant features between different datasets, i.e [Flickr30K](http://bryanplummer.com/Flickr30kEntities/) against [MSCOCO](http://cocodataset.org/#home) given different text and image features, not just explain the matching between the whole sentences and images but also between phrases and image regions."
    ]
   },
   {