This file explains the functionality of each script.
## Description
This file explains the functionality of each script.
#### text_extraction.py
Downloads the kaggle Dataset for the poems and using the wikipediaapi extracts the needed Wikipedia articles. The BBC News articles are scraped. Only the first few sentences are extracted to minimize the survey length.
#### models.py
Contains the code where GPT2 and OPT are fine_tuned and prompted for text_generation.
#### compute_metrics.py
Contains the functionality to compute the four metrics (fre, pmi, tf-idf, ttr). PMI and TF-IDF are trained on the Poetry Foundation Dataset (excluding the 9 first instances used for the survey).
#### automatic_prediciton.py
Contains the code to extract the needed Sentences and lines from the .txt files in the "Data" folder. Also contains the behaviour to assign higher/lower (the code just outputs "ai" or "human" for the text that has the higher score) coherence, conciseness, creativity and clarity scores. Also has functionality to predict which text it would pick (human or ai, if output is human between the two it would pick the human one).
##### asses_results.py
Contains the behaviour to extract the data from the survey data that is stored in a .csv file. Main bulk of code is extracting the answers from the .csv file (like how many people guessed the LLM correctly on Section 3 or how long did it take each participant to finish the survey).
#### display_results.py
Uses the previous scripts to collect all the answers and outputs and uses functionality to display them using matplotlib.