Add explanation to tables in readme

b2d85986 · wernicke · df5a9fd1 · b2d85986
Commit b2d85986 authored 3 years ago by wernicke
--- a/fine_tuning/README.md
+++ b/fine_tuning/README.md
@@ -11,12 +11,12 @@
    - [Installation](#installation)
    - [Usage](#usage)
    - [Results](#results)
+  - [Baselines](#baselines)
+      - [Fine-Grained Relations](#fine-grained-relations)
+      - [Coarse-Grained Relations](#coarse-grained-relations)
  - [Fine-Tuning](#fine-tuning)
    - [Epochs, Learning Rate and Batch Size Variations](#epochs-learning-rate-and-batch-size-variations)
  - [Testing](#testing)
-  - [Evaluation](#evaluation)
-    - [Fine-Grained Relations](#fine-grained-relations)
-    - [Coarse-Grained Relations](#coarse-grained-relations)

 ## Task Description
 Fine-tuning is the process where a pre-trained language model is trained to perform a specialized task. In our case BERT needs to be fine-tuned to predict the relation between the two components of a nominal compound based on the sentence they occur in.
@@ -81,6 +81,69 @@ fineBert.py t file_test.csv
 ### Results
 The resulting model of the fine-tuning process is stored in a directory named after the input file, batch size and learning rate. We output numpy arrays with true labels and predictions which are saved in the working directory. For better clarity we moved them in the results directory.

+## Baselines 
+
+For all 17 data variations, the fine-tuned model was compared to each of the three baselines. As for the probing, the baselines used were a majority baseline, a random uniform prediction, and a random classification weighted by category frequency. The model's performance was compared to each baseline using McNemar's test for nominal data. As depicted in the table below, the performance of the model was slightly better for the coarse-grained than for fine-grained relations. The prediction of the model was significantly better than the uniform baseline for both granularity levels and than the majority baseline for coarse-grained relations for all data set variations. That the predictions is not better than the  stratified random baseline for some data variations, further illustrates that the model is not simply memorizes nouns, but indeed seems to \emph{understand} the relation between the noun compound elements.
+<br>
+The tables to be expended show the p-values of McNemar's test for all combinations of data set variation and baselines, where the model was significantly better. Sparces indicate that where was no significant difference in prediction or the baseline was significantly better.
+
+#### Fine-Grained Relations
+
+<details>
+  <summary markdown="span"><i>🔎Expend to see results from McNemar's all on the fine-grained relations for all data variations.</i></summary>
+
+
+|                                 | Most Frequent | Uniform   | Stratified |
+| ------------------------------- | ------------- | --------- | ---------- |
+| **1 Fine 50 Test Set  **            | 2.41E-120     |           |
+| **2 Nohead Alt Fine 50 Test Set**   | 5.91E-107     | 0         | 0          |
+| **3 Nohead Fine 50 Test Set**       | 6.92E-214     | 1.51E-08  |
+| **4 Nohead Sep Fine 50 Test Set**   | 0             | 0         | 0          |
+| **5 Nomodi Alt Fine 50 Test Set**   | 0             | 0         | 0          |
+| **6 Nomodi Fine 50 Test Set**       | 2.92E-79      |           |
+| **7 Nomodi Sep Fine 50 Test Set**   | 0             | 0         | 0          |
+| **8 Sep Fine 50 Test Set**         | 0             | 0         | 0          |
+| **9 Rndhead Alt Fine 50 Test Set**  | 1.42E-144     | 0         | 0          |
+| **10 Rndhead Fine 50 Test Set**     | 0             | 2.68E-292 |
+| **11 Rndhead Sep Fine 50 Test Set** | 0             | 0         | 0          |
+| **12 Rndmodi Alt Fine 50 Test Set** | 0             | 0         | 0          |
+| **13 Rndmodi Fine 50 Test Set**     | 0             | 0.00E+00  |
+| **14 Rndmodi Sep Fine 50 Test Set** | 0             | 0         | 0          |
+| **15 Rndsent Alt Fine 50 Test Set** | 1.09E-156     | 0.850621  |
+| **16 Rndsent Fine 50 Test Set**     | 0             | 0         |
+| **17 Rndsent Sep Fine 50 Test Set** | 0             | 0         | 0          |
+
+</details>
+
+#### Coarse-Grained Relations
+
+
+<details>
+  <summary markdown="span"><i>🔎Expend to see results from McNemar's all on the coarse-grained relations for all data variations.</i></summary>
+
+|                                   | Most Frequent | Uniform   | Stratified |
+| --------------------------------- | ------------- | --------- | ---------- |
+| **1 Coarse 50 Test Set**              | 1.79E-133     | 3.63E-12  |
+| **2 Nohead Alt Coarse 50 Test Set**   | 0             | 1.04E-240 |
+| **3 Nohead Coarse 50 Test Set**       | 7.66E-199     | 4.54E-36  |
+| **4 Nohead Sep Coarse 50 Test Set**   | 0             | 0         | 0          |
+| **5 Nomodi Alt Coarse 50 Test Set**   | 8.83E-42      | 0         | 0          |
+| **6 Nomodi Coarse 50 Test Set**       | 5.42E-135     | 4.37E-12  |
+| **7 Nomodi Sep Coarse 50 Test Set**   | 0             | 0         | 0          |
+| **8 Sep Coarse 50 Test Set**          | 0             | 0         | 0          |
+| **9 Rndhead Alt Coarse 50 Test Set**  | 0             | 1.20E-228 |
+| **10 Rndhead Coarse 50 Test Set**     | 0             | 0         |
+| **11 Rndhead Sep Coarse 50 Test Set** | 0             | 0         | 0          |
+| **12 Rndmodi Alt Coarse 50 Test Set** | 3.66E-167     | 0         | 0          |
+| **13 Rndmodi Coarse 50 Test Set**     | 0             | 3.85E-256 |
+| **14 Rndmodi Sep Coarse 50 Test Set** | 0             | 0         | 0          |
+| **15 Rndsent Alt Coarse 50 Test Set** | 1.13E-56      |           |
+| **16 Rndsent Coarse 50 Test Set**     | 0             | 0         |
+| **17 Rndsent Sep Coarse 50 Test Set** | 0             | 0         | 0          |
+
+</details>
+
+
 ## Fine-Tuning
 The fine-tuning process is performed using the BertForSequenceClassification transformer which provides a special layer in its neural network for classification issues. For practical purposes and to improve performance we make use of the following transformer and pytorch classes - `DataLoader`, `torch.optim.AdamW` and `get_scheduler`:
 - `DataLoader` provides an iterable over the data with no need for a for-loop
@@ -152,60 +215,3 @@ The last testing experiment was on the test set with the components flipped. Int
 </div></p>


-## Evaluation
-
-### Fine-Grained Relations
-
-<details>
-  <summary markdown="span"><i>🔎Expend to see results from McNemar's test for all fine-grained model.</i></summary>
-
-
-
-|                                 | Most Frequent | Uniform   | Stratified | 1 | 2         | 3 | 4        | 5        | 6 | 7        | 8 | 9        | 10        | 11       | 12       | 13        | 14       | 15 | 16 | 17 |
-| ------------------------------- | ------------- | --------- | ---------- | - | --------- | - | -------- | -------- | - | -------- | - | -------- | --------- | -------- | -------- | --------- | -------- | -- | -- | -- |
-| **1 Fine 50 Test Set**              | 7.53E-120     |           |            |   |           |   |          |          |   |          |   |          |           |          |          |           |          |    |    |
-| **2 Nohead Alt Fine 50 Test Set**   | 5.91E-107     | 0         | 0          |   |           | 0 |          |          | 0 |          |   |          | 2.12E-144 |          |          | 1.32E-129 |          |    |    |    |
-| **3 Nohead Fine 50 Test Set**       | 9.50E-226     | 1.32E-07  |            |   |           |   |          | 4.16E-92 |   |          |   |          |           |          |          |           |          |    |    |
-| **4 Nohead Sep Fine 50 Test Set**   | 0             | 0         | 0          |   | 0         | 0 |          | 0        | 0 |          |   | 0        | 0         | 0.000112 | 0        | 0         | 0.653796 |    |    |    |
-| **5 Nomodi Alt Fine 50 Test Set**   | 0             | 0         | 0          |   | 0         | 0 |          |          | 0 |          |   | 0        | 0         |          | 0.000708 | 0         |          |    |    |    |
-| **6 Nomodi Fine 50 Test Set**       | 2.38E-84      |           |            |   |           |   |          |          |   |          |   |          |           |          |          |           |          |    |    |
-| **7 Nomodi Sep Fine 50 Test Set**   | 0             | 0         | 0          |   | 0         | 0 | 0.002389 | 0        | 0 |          |   | 0        | 0         | 5.47E-08 | 0        | 0         | 0.030416 |    |    |    |
-| **8 Sep Fine 50 Test Set**          | 0             | 0         | 0          |   | 0         | 0 | 7.16E-37 | 0        | 0 | 4.14E-29 |   | 0        | 0         | 6.86E-28 | 0        | 0         | 1.50E-14 |    |    |    |
-| **9 Rndhead Alt Fine 50 Test Set**  | 1.42E-144     | 0         | 0          |   | 1.07E-13  | 0 |          |          | 0 |          |   |          | 3.70E-192 |          |          | 2.30E-170 |          |    |    |    |
-| **10 Rndhead Fine 50 Test Set**     | 0             | 2.93E-296 |            |   | 5.91E-170 |   |          | 0        |   |          |   |          |           |          |          |           |          |    |    |
-| **11 Rndhead Sep Fine 50 Test Set** | 0             | 0         | 0          |   | 0         | 0 |          | 0        | 0 |          |   | 0        | 0         |          | 0        | 0         |          |    |    |    |
-| **12 Rndmodi Alt Fine 50 Test Set** | 0             | 0         | 0          |   | 0         | 0 |          |          | 0 |          |   | 0        | 0         |          |          | 0         |          |    |    |    |
-| **13 Rndmodi Fine 50 Test Set**     | 0             | 4.10E-285 |            |   | 5.09E-181 |   |          | 0        |   |          |   | 0.000702 |           |          |          |           |          |    |    |
-| **14 Rndmodi Sep Fine 50 Test Set** | 0             | 0         | 0          |   | 0         | 0 |          | 0        | 0 |          |   | 0        | 0         | 7.29E-08 | 0        | 0         |          |    |    |    |
-| **15 Rndsent Alt Fine 50 Test Set** | 1.50E-164     | 0.930872  | 1.88E-05   |   |           |   |          |          |   |          |   |          |           |          |          |           |          |    |    |
-| **16 Rndsent Fine 50 Test Set**     | 0             | 0         | 0          |   |           |   |          |          |   |          |   |          |           |          |          |           | 0        |    |    |
-| **17 Rndsent Sep Fine 50 Test Set** | 0             | 0         | 0          | 0 |           |   |          |          |   |          |   |          |           |          |          |           |          | 0  | 0  |    |
-
-</details>
-
-### Coarse-Grained Relations
-
-<details>
-  <summary markdown="span"><i>🔎Expend to see results from McNemar's test for all coarse-grained model.</i></summary>
-
-|                                   | Most Frequent | Uniform   | Stratified | 1        | 2         | 3 | 4        | 5         | 6 | 7        | 8        | 9         | 10        | 11       | 12       | 13        | 14       | 15 | 16 | 17 |
-| --------------------------------- | ------------- | --------- | ---------- | -------- | --------- | - | -------- | --------- | - | -------- | -------- | --------- | --------- | -------- | -------- | --------- | -------- | -- | -- | -- |
-| **1 Coarse 50 Test Set**              | 1.59E-119     | 4.92E-11  |            |          |           |   |          |           |   |          |          |           |           |          |          |           | 1.70E-16 |    |    |
-| **2 Nohead Alt Coarse 50 Test Set**   | 0             | 1.88E-233 |            |          | 2.90E-113 |   |          | 9.38E-167 |   |          | 0.036084 |           |           |          | 0.003429 |           |          |    |    |
-| **3 Nohead Coarse 50 Test Set**       | 6.88E-224     | 5.57E-42  |            |          |           |   |          | 7.46E-35  |   |          |          |           |           |          |          |           |          |    |    |
-| **4 Nohead Sep Coarse 50 Test Set**   | 0             | 0         | 0          |          | 0         | 0 |          | 0         | 0 | 0.18876  |          | 0         | 0         | 1.58E-22 | 0        | 0         | 3.32E-15 |    |    |    |
-| **5 Nomodi Alt Coarse 50 Test Set**   | 8.83E-42      | 0         | 0          |          | 1.90E-95  | 0 |          |           | 0 |          |          | 2.42E-105 | 1.66E-77  |          |          | 5.80E-104 |          |    |    |    |
-| **6 Nomodi Coarse 50 Test Set**       | 2.41E-138     | 1.63E-13  |            |          |           |   |          |           |   |          |          |           |           |          |          |           |          |    |    |
-| **7 Nomodi Sep Coarse 50 Test Set**   | 0             | 0         | 0          |          | 0         | 0 |          | 0         | 0 |          |          | 0         | 0         | 6.34E-19 | 0        | 0         | 6.77E-13 |    |    |    |
-| **8 Sep Coarse 50 Test Set**          | 0             | 0         | 0          |          | 0         | 0 | 3.78E-05 | 0         | 0 | 2.34E-10 |          | 0         | 0         | 5.60E-32 | 0        | 0         | 1.76E-23 |    |    |    |
-| **9 Rndhead Alt Coarse 50 Test Set**  | 0             | 5.26E-222 |            |          | 1.93E-93  |   |          | 3.70E-149 |   |          |          |           |           |          | 0.093448 |           |          |    |    |
-| **10 Rndhead Coarse 50 Test Set**     | 0             | 4.78E-307 |            | 0.678678 | 1.83E-138 |   |          | 5.55E-228 |   |          | 0.085439 |           |           |          | 1.93E-11 |           |          |    |    |
-| **11 Rndhead Sep Coarse 50 Test Set** | 0             | 0         | 0          |          | 0         | 0 |          | 0         | 0 |          |          | 0         | 0         |          | 0        | 0         |          |    |    |    |
-| **12 Rndmodi Alt Coarse 50 Test Set** | 3.66E-167     | 0         | 0          |          | 5.59E-278 | 0 |          | 3.79E-141 | 0 |          |          | 2.84E-300 | 1.90E-240 |          |          | 1.57E-288 |          |    |    |    |
-| **13 Rndmodi Coarse 50 Test Set**     | 0             | 1.22E-257 |            |          | 3.38E-101 |   |          | 1.24E-179 |   |          |          |           |           |          |          |           |          |    |    |
-| **14 Rndmodi Sep Coarse 50 Test Set** | 0             | 0         | 0          |          | 0         | 0 |          | 0         | 0 |          |          | 0         | 0         | 0.005821 | 0        | 0         |          |    |    |    |
-| **15 Rndsent Alt Coarse 50 Test Set** | 2.96E-70      |           |            |          |           |   |          |           |   |          |          |           |           |          |          |           |          |    |    |
-| **16 Rndsent Coarse 50 Test Set**     | 0             | 0         | 1.29E-298  |          |           |   |          |           |   |          |          |           |           |          |          |           | 0        |    |    |
-| **17 Rndsent Sep Coarse 50 Test Set** | 0             | 0         | 0          | 0        |           |   |          |           |   |          |          |           |           |          |          |           |          | 0  | 0  |    |
-
-</details>