Expand on README

501ee747 · kupper · 1f00f885 · 501ee747
Commit 501ee747 authored 2 weeks ago by kupper
--- a/README.md
+++ b/README.md
@@ -3,26 +3,67 @@
 ## Project Overview and Motivation
 This project explores various named entity recognition (NER) approaches, focusing on
 named entity classification (NEC).
+
 These methods include:
- NLI
- MLM (entity/class masking)
- Word2Vec
+- Natural language inference (NLI) with T5
+- Two different approaches for masked language modeling (MLM) with entity and class masking
+- Classification based on Word2Vec embeddings
 - LLM Prompting

-More about these in the report (link report here)
-
-## Setup
-1. Run `pip install requirements.txt`
-2. If you want to use DeepSeek-R1 (via HuggingFace), follow the instructions in [`.env-example`](.env-example).
-3. Execute whatever you want to execute - the models and datasets will be downloaded automatically.
+More information about these methods can be found in our project report which is included in this repository.

 ## Project Structure
-Testcases for models, datasets and individual approaches for debugging can be found in [`/tests`](tests).
+Testcases for implemented models, datasets and individual approaches for debugging can be found in [`/tests`](tests).

 Models are defined in [`/models`](src/models) and are accessed via the [`common_interface`](src/common_interface.py).

 Datasets can be found in [`/data`](data) and are accessed via the [`data manager`](data/data_manager.py).

-Scripts for executing code on the Cluster are in [`/scripts`](scripts).
+Scripts for executing code on the Computerlinguistik cluster or BwUniCluster are located in [`/scripts`](scripts).
+
+The experiments conducted as part of this project and some of their results are located in [`/src/experiments`](src/experiments).
+
+## Setup and Requirements
+Note: A CUDA-enabled GPU is required to run the finetuning and LLM-based experiments
+
+1. Ensure you have installed Python 3.8 or newer
+2. Run `pip install requirements.txt` to install necessary dependencies
+3. To run an experiment locally check section "Running locally". 
+4. To run an experiment on the Computerlinguistik cluster or BwUniCluster cluster check section "Running on a cluster". The required models and datasets will be downloaded and preprocessed automatically.
+4. If you want to use DeepSeek-R1 (via HuggingFace), follow the instructions in [`.env-example`](.env-example).
+
+## Running locally
+
+For correct module loading, all experiments must be run from the projects root folder.
+
+### Example: Running an experiment
+
+The following command executes the "NEC_evaluation" experiment which computes NEC prediction accuracies for all implemented model and dataset combinations. The required models and datasets will be downloaded and preprocessed automatically.
+
+`python3 -m src.experiments.NEC_evaluation.evaluation`
+
+### Example: Running a test
+
+The following command executes the "test_NEC" test which tests all implemented models with an example sentence. The required models will be downloaded automatically.
+
+`python3 -m src.tests.test_NEC`
+
+### Example: Using a finetuned model
+
+1. First start the finetuning for the desired model using one of the experiment under [`/src/experiments/finetune_T5`](src/experiments/finetune_T5/)
+2. Model checkpoints will be saved in a subfolder under [`/models`](src/models) depending on the model
+3. Modify the last line of the model implementation [`/models`](src/models) to load the desired checkpoint instead of the base model (see comment in source)
+4. Running the desired experiment will now use the finetuned model
+
+## Running on a cluster
+
+Slurm scripts are provided for most of the experiments and tests in the [`/scripts`](scripts) folder.
+The scripts ending in "_cl" are intended for execution on the Computerlinguistik cluster and the scripts ending in "_bwuni" are intended for execution on the BwUniCluster.
+
+The scripts must be executed from the projects root folder.
+
+### Example command
+
+`sbatch scripts/NEC_eval_cl.sh`

-The experiments conducted during this project and some of their results are in [`/src/experiments`](src/experiments).
+3. Run the desired experiment either locally or on either the Computerlinguistik Cluster or BwUniCluster cluster