Skip to content
Snippets Groups Projects
Commit 87312bc0 authored by perov's avatar perov
Browse files

add utility and code to asses results from survey

parent 9b280edb
No related branches found
No related tags found
No related merge requests found
Zeitstempel,Do you study computerlinguistics or work in a similar field?,How often do you use AIs?,What time is it now?,Which text was written by the LLM?,Spalte 5,Whats the specific reason you picked this text as LLM generated?,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Creativity [Text 1],Rate Creativity [Text 2],Which text was written by the LLM?,Spalte 14,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Creativity [Text 1],Rate Creativity [Text 2],Which text was written by the LLM?,Spalte 23,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Creativity [Text 1],Rate Creativity [Text 2],Which text was written by the LLM?,Spalte 32,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Creativity [Text 1],Rate Creativity [Text 2],Which text was written by the LLM?,Spalte 41,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Creativity [Text 1],Rate Creativity [Text 2],Which text was written by the LLM?,Spalte 50,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Creativity [Text 1],Rate Creativity [Text 2],Which text was written by the LLM?,Spalte 59,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Creativity [Text 1],Rate Creativity [Text 2],Which text was written by the LLM?,Spalte 68,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Clarity of Concept [Text 1],Rate Clarity of Concept [Text 2],Which text was written by the LLM?,Spalte 77,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Clarity of Concept [Text 1],Rate Clarity of Concept [Text 2],Which text was written by the LLM?,Spalte 86,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Clarity of Concept [Text 1],Rate Clarity of Concept [Text 2],Which text was written by the LLM?,Spalte 95,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Clarity of Concept [Text 1],Rate Clarity of Concept [Text 2],Which text was written by the LLM?,Spalte 104,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Rate Clarity of Concept [Text 1],Rate Clarity of Concept [Text 2],Which text was written by the LLM?,Spalte 113,Whats the specific reason you picked this text as LLM generated? , Rate Coherence [Text 1], Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Which text was written by the LLM?,Spalte 120,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Which text was written by the LLM?,Spalte 127,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Which text was written by the LLM?,Spalte 134,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Which text was written by the LLM?,Spalte 141,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Which text was written by the LLM?,Spalte 148,Whats the specific reason you picked this text as LLM generated? ,Rate Coherence [Text 1],Rate Coherence [Text 2],Rate Conciseness [Text 1],Rate Conciseness [Text 2],Please enter the time.,If you didnt enter the time in the beginning write below a rough estimate of how long it took you to complete this survey,I only ticked the first question
24.03.2025 09:01:36,Yes,Very often (everyday for all kind of things and tasks),08:28:00,Text 2,,Elmer’s glue is specific,3,3,2,3,4,2,Text 2,I picked randomly because they both look Human/AI,,3,3,3,3,3,2,Text 1,,weird theological view,3,3,2,3,2,4,Text 2,,coherence,3,1,3,2,3,3,Text 2,I picked randomly because they both look Human/AI,,3,3,3,3,2,3,Text 2,,creativity,3,3,3,3,3,2,Text 2,,,3,3,3,2,2,2,Text 1,,,3,3,2,3,3,3,Text 2,,,2,3,1,4,2,3,Text 1,I picked randomly because they both look Human/AI,,2,3,2,3,2,3,Text 1,,general,3,2,3,3,3,3,Text 2,I picked randomly because they both look Human/AI,,2,3,2,3,2,3,Text 1,,"“Wembley Stadium, London, England”",2,3,3,2,Text 2,I picked randomly because they both look Human/AI,,3,2,2,3,Text 1,I picked randomly because they both look Human/AI,,2,3,3,2,Text 2,I picked randomly because they both look Human/AI,,3,3,2,3,Text 1,I picked randomly because they both look Human/AI,,3,3,3,3,Text 2,I picked randomly because they both look Human/AI,,,,,,09:01:00,,
24.03.2025 09:23:18,No,Often,08:40:00,Text 1,,spelling mistake in 2nd Text. LLMs dont do that,1,4,1,3,5,2,Text 1,,"""or something"" is human. Not LLM",5,2,3,4,4,2,Text 2,,there is no continuity or rhytm while reading text 2,5,3,2,3,2,4,Text 1,,"the part ""(she did not know what realness meant)"" should be human",3,4,4,3,4,4,Text 2,,it gods from evolution to chevy trucks? no human would write like that.,5,5,2,3,3,5,Text 1,,"again, text 1 has no rhythm",2,5,2,3,2,4,Text 2,I picked randomly because they both look Human/AI,,5,5,5,4,1,1,Text 1,,a lot of history in explanation ->llm,5,5,3,5,3,3,Text 2,,narrator in text 1,4,5,2,4,5,4,Text 2,,"again, narrator in text 1",5,5,2,5,3,5,Text 1,,the sentence building reminds me of chatgpt,5,4,4,3,4,3,Text 2,,no narrator in text 2,5,5,2,4,5,5,Text 2,,LLMs do not give URLs,5,5,4,5,Text 1,I picked randomly because they both look Human/AI,,5,5,3,3,Text 2,,"""breathtaking game""",5,5,5,4,Text 2,,no emotions in text 2,5,3,3,5,Text 1,I picked randomly because they both look Human/AI,,4,5,2,5,Text 2,,weird sentences,5,3,4,3,09:23:00,,
24.03.2025 09:28:48,No,Sometimes,08:41:00,Text 1,,Its written more structured.,5,4,2,4,3,4,Text 1,,The second Poem feels more human.,5,3,4,3,5,2,Text 2,,"The first text has more thought in it, also it uses stylistic means, like 3x !.",5,2,4,3,5,2,Text 1,I picked randomly because they both look Human/AI,No clue.,4,4,4,2,4,3,Text 2,,The first text makes more sense to me.,5,2,4,3,5,1,Text 1,,Text 1 is very complex and doesnt really feel like a Poem,3,4,5,4,3,5,Text 2,,Its more precise,4,5,3,5,1,1,Text 1,,"Its more precise, the text is easier to read. The reading flows more.",5,4,5,5,3,2,Text 2,,"The firs text contains ""I am talking"". I dont feel.like AI would write that.",4,5,5,5,4,3,Text 2,,"Like the texts before. More precise and the first text contains ""Im""",3,5,4,5,4,3,Text 1,,The second text gives the vibe of a human talking to me.,4,5,4,4,4,5,Text 2,,Again feels like a human is talking to me.,5,4,4,5,4,2,Text 1,,The second could be on the Radio,3,4,5,5,Text 2,,The first one could be on the Radio. The second is more like a written summary,4,1,5,5,Text 1,I picked randomly because they both look Human/AI,No clue,5,5,5,5,Text 1,,The second one feels more natural.,4,5,5,5,Text 2,,The first Text feels less professional ,3,5,3,5,Text 2,,First feels more professional,4,2,5,3,09:28:00,,
24.03.2025 12:28:22,Yes,Very often (everyday for all kind of things and tasks),12:06:00,Text 2,I picked randomly because they both look Human/AI,More logic,1,5,1,5,5,5,Text 2,I picked randomly because they both look Human/AI,More logic,1,5,1,5,1,5,Text 1,,More creative ,5,5,5,5,5,4,Text 2,,More emotional ,,5,,5,,5,Text 1,I picked randomly because they both look Human/AI,More organized ,5,,5,,5,,Text 2,,More real,5,,,5,,5,Text 1,,More exact,5,4,5,4,2,3,Text 1,,,5,4,5,3,3,3,Text 2,I picked randomly because they both look Human/AI,,5,5,,,4,5,Text 2,,More exact ,5,5,5,5,,,Text 2,I picked randomly because they both look Human/AI,I cannot tell,5,5,5,5,5,5,Text 2,I picked randomly because they both look Human/AI,,5,5,5,5,5,5,Text 2,I picked randomly because they both look Human/AI,,,5,,5,Text 2,I picked randomly because they both look Human/AI,,,,,,Text 2,I picked randomly because they both look Human/AI,,5,,5,,Text 2,I picked randomly because they both look Human/AI,,,,,,Text 2,I picked randomly because they both look Human/AI,,,,,,Text 2,I picked randomly because they both look Human/AI,,,,,,12:40:00,0:30:00,
24.03.2025 13:56:48,No,Very often (everyday for all kind of things and tasks),13:12:00,Text 1,,,1,3,1,2,1,4,Text 1,,Because the other one sounds more human through phrases like”on his own time or something”,2,2,2,2,3,2,Text 2,I picked randomly because they both look Human/AI,,3,2,2,3,2,3,Text 1,,Text 2 sounds more personal/human,3,4,3,3,4,3,Text 2,,It doesn’t have the structure of a poem written by a human,2,1,4,2,3,2,Text 1,,,3,3,3,2,2,3,Text 2,I picked randomly because they both look Human/AI,,4,3,4,3,2,2,Text 1,,"It is very informative, but lacks on style",2,3,3,3,2,3,Text 2,,The author in Text 1 talks in a parentheses about himself (seems more human),3,4,3,4,3,4,Text 2,,,3,2,3,4,2,4,Text 1,,,3,2,3,1,4,4,Text 2,,Sounds more precise and to the point,2,3,1,2,2,2,Text 1,,,2,2,3,2,Text 1,,Doesn’t sound human,3,3,3,3,Text 1,,Text 2 is written way more emotional ,2,4,3,3,Text 2,,,3,,3,2,Text 2,I picked randomly because they both look Human/AI,,3,4,3,4,Text 2,I picked randomly because they both look Human/AI,,3,2,2,2,13:56:00,,
24.03.2025 15:59:53,No,Very often (everyday for all kind of things and tasks),14:18:00,Text 2,,I feel like i cant get the grasp of what the Text want to tell me by always changing what ist told,2,1,2,1,3,1,Text 2,,"Usage of "";"" and sentences fastly Changes from 1 thing to the other",5,3,5,2,4,2,Text 2,,Just because Text 1 used to many dots and always different amount and wouldnt use () to desribe Something.,5,3,2,3,3,3,Text 1,,Text 1 used to many complicated words. Text 2 hast some spelling mistakes wich an AI wouldnt do,4,2,4,3,3,3,Text 1,,Only used basic poem rules nothing spezial or hidden in the rhyme,3,5,3,5,3,5,Text 2,I picked randomly because they both look Human/AI,,3,5,4,5,4,3,Text 2,,Knows to much,5,5,4,5,1,1,Text 1,,Text 1 talks all around the topic,3,5,4,4,4,5,Text 2,,Text one ist much more personalyzed,5,5,3,4,5,5,Text 2,,Text 1 more Person to Person and Text 2 uses to many -,3,5,4,5,5,5,Text 1,,Text 1 Sound to bot like,4,5,4,4,3,3,Text 2,,Text mich more personalyzed and not Sure at something,5,5,4,4,3,3,Text 2,,Text 1 has No consistancy ( If it says Date at something it will Always say it),5,5,5,5,Text 2,,Text 1 Just seems more human,5,5,3,3,Text 2,,Too overly explained,5,5,5,3,Text 1,,,4,4,3,3,Text 1,,Tells it too story like,4,4,3,4,Text 2,,Too story like,4,4,3,2,14:23:00,1:25:00,No
24.03.2025 17:03:29,No,Often,16:28:00,Text 1,,it lacks emotion,1,4,1,4,2,5,Text 1,,,3,3,3,3,4,4,Text 1,I picked randomly because they both look Human/AI,,4,3,5,5,4,5,Text 2,,incoherent,4,1,4,1,3,3,Text 2,,words,5,2,4,3,3,2,Text 1,,lack of depth,3,5,3,5,3,2,Text 2,,using three things as an example,5,5,5,5,3,3,Text 1,,three phrases as an example,5,5,5,5,3,3,Text 2,,use of specific words,3,5,5,5,4,5,Text 1,I picked randomly because they both look Human/AI,,5,5,5,5,3,5,Text 1,,three things as an example,5,5,5,5,5,4,Text 2,,"""compare linked list""",5,5,5,5,5,4,Text 1,,,5,4,4,3,Text 2,,"""respectively"" is not of good use here",5,5,5,5,Text 2,,no one writes like this,5,5,5,5,Text 1,,,5,3,4,4,Text 1,,sounds like AI,5,5,5,5,Text 2,,,5,4,4,4,17:03:00,0:35:02,No
24.03.2025 19:27:20,No,Rarely,13:40:00,Text 2,,"Too generous, No original Work, very unspecific topic",1,5,1,5,5,1,Text 2,,again very generous and with No deeper Message,5,5,2,5,5,2,Text 1,,unplausible Change of topic,1,3,3,4,2,5,Text 2,,"Strange Change of topic, Strange use of emotional icons ",5,2,5,2,5,2,Text 2,,"The First Text has impure rhymes, AI Text has a Strange change",5,3,5,3,2,5,Text 1,,Rime,5,5,5,5,5,5,Text 1,,Not original ,5,5,5,5,1,5,Text 1,,"More datas, from Alm the WWW",5,5,5,5,5,5,Text 2,,Basic informations ,5,5,5,5,5,5,Text 2,,First Text has subtexts that are creative for better understanding,5,5,5,5,5,3,Text 1,,Basic informations ,4,4,4,4,4,4,Text 1,,Basic informations ,4,4,4,4,4,4,Text 2,,Very funktional Language ,4,4,4,4,Text 2,,Not good sentences ,4,4,4,4,Text 1,,Text 2 has Lots of creative adjectives,4,4,3,3,Text 2,,Again more creative adjectives ,4,4,4,4,Text 1,,Some sentences makes No sense,3,4,3,4,Text 2,,"Not very exciting, very basic",4,3,4,3,19:26:00,0:45:00,No
24.03.2025 22:56:47,No,Very often (everyday for all kind of things and tasks),22:22:00,Text 2,,Makes more sense than the text above. Maybe the 1st text is extra stupid made my human to irritate.,1,3,1,2,3,3,Text 1,,No sense..,1,3,1,3,2,2,Text 1,,Grandparents written 2 times,2,3,3,3,1,4,Text 1,,No sense again..,3,1,2,2,1,2,Text 2,,Both could be AI.. But the text is again nonsense,,1,,1,2,1,Text 1,I picked randomly because they both look Human/AI,,,,,,,,Text 2,,Precise answer,4,4,4,4,2,2,Text 1,,,4,4,4,4,4,4,Text 2,,Sounds like AI.,3,3,3,4,3,4,Text 2,,Written very flatly,2,5,3,5,2,5,Text 1,,,5,5,5,5,5,5,Text 2,,,3,5,2,5,3,5,Text 2,I picked randomly because they both look Human/AI,,,,,,Text 2,I picked randomly because they both look Human/AI,,,,,,Text 2,I picked randomly because they both look Human/AI,,,,,,Text 2,I picked randomly because they both look Human/AI,,,,,,Text 1,I picked randomly because they both look Human/AI,,,,,,Text 2,I picked randomly because they both look Human/AI,,,,,,22:55:00,,No
25.03.2025 14:01:28,No,Very often (everyday for all kind of things and tasks),11:34:00,Text 2,,It's too perfect,5,5,5,5,3,5,Text 2,,Too perfect without mistakes,4,4,4,5,5,5,Text 2,I picked randomly because they both look Human/AI,Difficult to say,4,4,4,4,4,4,Text 1,,It's too perfect. Humans are not perfect,3,5,3,4,5,5,Text 1,,Too good,5,3,5,3,5,3,Text 2,,,4,4,4,4,4,4,Text 2,,,,,,,,,Text 1,,,,,,,,,Text 2,,,,,,,,,Text 2,,,,,,,,,Text 1,,,,,,,,,Text 2,,,,,,,,,Text 2,,,,,,,Text 1,,,,,,,Text 2,,,,,,,Text 2,,,,,,,Text 2,,,,,,,Text 1,,,,,,,14:01:00,,Yes
25.03.2025 15:01:42,No,Often,14:33:00,Text 2,,"The other one is not a typical poem, which makes me think the LLM would not generate such a result",,,,,,,Text 2,,"The other one used the radiator as the lyrical me, which seems to creative for an LLM. Additionally I don't think ""..."" gets used often in human written poems, and in this survey it occurred in both poems that I think are generated. ",,,,,,,Text 1,,"It is so far off what a normal poem looks like, that I believe the LLM was instructed to do something special. Using a citation within a poem and stating the origin does not feel poem-y at all. ",,,,,,,Text 2,,The emojis,,,,,,,Text 1,,Emerald seas doesn't make sense to me,,,,,,,Text 1,,,,,,,,,Text 2,,,,,,,,,Text 2,,,,,,,,,Text 1,,,,,,,,,Text 2,,,,,,,,,Text 1,,,,,,,,,Text 2,,,,,,,,,Text 2,,,,,,,Text 2,,,,,,,Text 2,,,,,,,Text 2,I picked randomly because they both look Human/AI,,,,,,Text 2,,,,,,,Text 1,,,,,,,15:01:00,,Yes
25.03.2025 15:43:25,No,Sometimes,15:20:00,Text 2,,,2,4,5,3,4,2,Text 1,,,2,5,1,3,4,3,Text 2,,,4,1,2,3,3,5,Text 1,,,1,3,2,4,4,2,Text 1,,,3,1,5,1,4,4,Text 2,I picked randomly because they both look Human/AI,,2,4,3,3,4,4,Text 1,,,5,5,4,3,1,1,Text 1,,,5,4,3,3,5,4,Text 2,,,2,5,4,3,2,5,Text 2,,,3,5,4,3,3,5,Text 1,,,5,4,3,3,5,3,Text 2,,,5,3,3,4,5,2,Text 1,,,5,5,4,2,Text 1,I picked randomly because they both look Human/AI,,4,4,1,4,Text 1,,,5,5,4,3,Text 1,I picked randomly because they both look Human/AI,,5,5,3,5,Text 1,,,5,5,3,5,Text 1,,,5,3,4,2,15:45:00,,No
25.03.2025 17:40:16,No,Never,16:41:00,Text 2,,"It seems coherent but at the same time feels strange, while the first one seems like a kind of Ulysses by Joyce type text.",4,4,1,1,5,3,Text 2,I picked randomly because they both look Human/AI,"This time, I don't know",3,3,1,3,3,3,Text 1,,"the ""grandparents/ grandparents etc."". It doesn't make much sense to differentiate between the same word repeated.",1,4,2,1,3,5,Text 1,,"I don't know y. Gut feeling probably, might be, therefore, wrong.",2,2,1,1,4,4,Text 2,,"The rhymes I would say, they not always seems to fit completely. But I might be wrong, I probably am wrong",4,3,1,3,3,4,Text 1,,"I don't really know. The first one is very strange, the second one very basic. Could be either way.",3,5,2,1,3,3,Text 1,,"More straight to the point, less exciting/exicted",5,5,5,4,2,3,Text 1,,"In the second one there's the ""for example"". I don't know, seemed more human to me.",5,5,5,5,5,5,Text 2,,"Text one contains the ""and I'm not talking about...""",5,5,4,5,4,4,Text 1,,"The last question is not a question. Is a sentence with a question mark at the end, seemed strange to me.",4,5,4,5,4,5,Text 1,,The Second one seems the beginning of an essay type of text.,5,5,5,3,5,4,Text 2,I picked randomly because they both look Human/AI,"No, the first one is kinda crazy. The second one is full on basic.",2,5,1,5,2,5,Text 1,,Second text contains specific terms and thus seems more human to me.,5,5,5,5,Text 2,,"I don't know honestly. I didn't pick completely randomly, but almost",5,5,3,5,Text 1,,More coincise,5,5,4,3,Text 2,,"Again, more coincise",5,5,3,5,Text 2,,"Text 1 contains ""I"" and ""we"". I don't know how much AI likes to use pronouns this way.",5,5,4,5,Text 2,,"Seems strange. I have no Better explanation, unfortunately, but it seems strange to me",5,4,4,4,17:28:00,,No
25.03.2025 17:57:56,Yes,Often,17:46:00,Text 2,,,3,5,4,3,5,2,Text 2,,,5,3,4,4,5,2,Text 1,,,3,4,3,4,3,5,Text 1,,,4,4,4,3,4,3,Text 1,,,5,4,5,5,4,2,Text 1,,,3,4,3,5,3,5,Text 2,,,5,5,5,5,3,3,Text 1,,,5,5,5,4,5,4,Text 1,,,4,5,3,4,4,4,Text 2,,,5,5,4,5,5,5,Text 1,,,5,5,4,3,4,3,Text 2,,,5,5,3,5,4,5,Text 2,,,4,4,4,,Text 2,,,4,3,4,3,Text 1,I picked randomly because they both look Human/AI,,5,5,4,4,Text 2,,,4,4,4,4,Text 2,,,5,5,5,3,Text 2,,,5,5,5,4,18:00:00,,No
25.03.2025 19:26:11,Yes,Very often (everyday for all kind of things and tasks),18:50:00,Text 2,I picked randomly because they both look Human/AI,"I think it's quite difficult to determine which of the texts ist human vs AI generated. Artistic freedom allows many stylistic expressions. Even those that seem strange under normal circumstances, and may be considered AI-generated as a result of this. Hence, I would think that an LLM-generated poem with a prompt like ""generate a poem about xyz"" adheres to common poems it knows, but even the most gibberish could be a human product, though considered ""art"", because of artistic freedom. The first text forces more imagination while the second leads through the scene.",2,4,4,3,4,2,Text 1,,"While the second text is very understandable, I have no idea about the first one.",2,5,1,4,2,3,Text 1,,Bad punctuation,1,4,2,4,1,5,Text 2,,,,,,,,,Text 2,,,,,,,,,Text 1,,,,,,,,,Text 2,,"The second text seems LLM-generated because I could observe that the author of the second text tried to stuff more information into it, potentially trying to name as much facts as possible. It also has a very common structure of LLM-generated texts ",5,4,5,4,,,Text 1,,Text 1 has a common structure of LLM-generated explanations of technical terms,4,5,3,5,3,5,Text 1,,"Again, a common structure observed in LLM-generated texts is manifesting in text 1. Further more, text 1 seems a little off, i.e., wrong, and has Markdown in it … ",2,5,2,5,1,5,Text 1,I picked randomly because they both look Human/AI,Text 1 is so confusing that I suspect an LLM could not generate this nonsense,,,,,,,Text 1,,,,,,,,,Text 2,,Text 2 is very concise and coherent compared to text 1. I therefore believe an LLM would not generate such a bad answer as text 1,2,5,2,5,2,5,Text 1,,,,,,,Text 2,,,,,,,Text 1,,,,,,,Text 2,,,,,,,Text 1,,,,,,,Text 2,,,,,,,18:25:00,,Yes
25.03.2025 20:40:04,No,Very often (everyday for all kind of things and tasks),19:50:00,Text 2,,"Text 2 has overly strict conjunctions, which is very wordy and is not something a normal person would normally write.",1,4,4,1,4,2,Text 1,,"Text 1 has overly strict conjunctions, which is very wordy and is not something a normal person would normally write.In addition, according to my experience, LLM does not like to use...... ",4,2,4,3,2,2,Text 1,,"Feeling. The feeling of text 1 is strange, as a poem, it lacks human feelings",2,1,4,4,2,2,Text 2,,"Text 1 mentions ""black bat wings"" and ""something saturated from the ribs down"". These words are too obscure and meaningless, and AI usually does not generate such words.",1,1,2,3,4,2,Text 1,,"I can only rule out text 1 by realizing that text 2 was written by a person. The reason why text 2 was written by a human is the same as the previous question. The scene mentioned in text 1 is too classic, which may expose it, but this is not strong evidence.",4,1,1,3,1,4,Text 2,,Just like the answer to the previous question,2,4,3,2,4,1,Text 2,,"The writing style of text 2 is too similar to AI, and in my experience, when AI introduces something, it likes to mention its application in the last paragraph.",4,4,4,2,1,1,Text 1,,"The writing style of text 1 is too similar to AI, and in my experience, when AI introduces something, it likes to mention its application in the last paragraph.",4,4,3,3,1,1,Text 1,,Did you leave the“ * ”and“ / ”in text 1 on purpose?,4,3,1,3,4,3,Text 2,,"I firmly believe that AI will not use ""......""",4,3,2,3,1,3,Text 1,,"When AI describes a concept, it does not imagine that it is talking to you, and therefore does not mention ""you"".",4,3,4,3,4,2,Text 2,,"The answer is similar to the previous question. When AI describes a concept, it does not imagine itself talking to you, so there is no ""you"" or ""we"".",3,4,3,4,2,4,Text 1,I picked randomly because they both look Human/AI,,2,4,3,2,Text 2,,"The ""*"" in text2 looks like the one AI uses for line breaks.",4,3,4,2,Text 2,,The adjectives and tone used in text 2 to describe the game scene are too AI-like,4,4,4,2,Text 1,,The answer is the same as the previous one,4,2,3,4,Text 2,I picked randomly because they both look Human/AI,"The word ""we"" used in the second half of text 1 does not seem to be written by Ai",3,4,2,4,Text 1,,"The story in text 2 is very incomplete, and the names of the players are not mentioned, and the tone is also very strange, which is not the style of AI.",4,2,4,1,20:39:00,0:50:00,No
26.03.2025 08:13:22,Yes,Often,07:58:00,Text 2,,The language used in text 2 seems overly flowery,1,5,1,4,3,5,Text 1,,,,,,,,,Text 1,,,,,,,,,Text 1,,,,,,,,,Text 1,,,,,,,,,Text 2,,,,,,,,,Text 2,,,,,,,,,Text 1,,,,,,,,,Text 2,,,,,,,,,Text 2,,,,,,,,,Text 1,,,,,,,,,Text 2,,,,,,,,,Text 1,,,,,,,Text 1,,,,,,,Text 1,,,,,,,Text 2,,,,,,,Text 1,,,,,,,Text 1,,,,,,,08:15:00,,Yes
26.03.2025 17:33:57,Yes,Very often (everyday for all kind of things and tasks),22:30:00,Text 1,,It sounds unartistic,3,4,5,3,5,2,Text 1,,Syntax errors and weird picture overall,2,4,2,3,4,3,Text 2,,"Eric told me so. (jokes aside, the selected text sounds much more precise in its vocabulary use)",3,4,2,4,1,5,Text 2,,it sounds as if an LLM was prompted to artificially and overtly mimic human day-to-day speech/texting,4,3,3,1,4,1,Text 2,,"The second text is horrible, while the first sounds fucking awesome",5,2,5,2,5,2,Text 1,,Unidiomatic and psychologically unfounded--a paradigm instance of fabrication,2,5,4,2,5,3,Text 1,,this sounds much more precise in the choice of words: N-Gram-Style! Heck-yeah.,5,4,5,5,1,1,Text 1,,More precise and smooth in choice of consecutive words.,5,5,5,5,5,4,Text 1,,This text was either written bei a degenerate or by an AI that attempts to mimic informal speech but lacking the prerequisite data,2,4,1,4,2,5,Text 1,,"This again is like a concoction of random, informal patterns of speech which are assembled in an unknowledgable order",3,5,4,5,3,5,Text 2,,Ibid.,4,3,5,4,5,4,Text 1,,grammatical and syntactical errors that no human would make,4,4,3,4,4,4,Text 2,,first text includes BBC links,3,3,4,4,Text 2,,,,,,,Text 2,,,,,,,Text 2,,,,,,,Text 2,,,,,,,Text 2,,,,,,,17:33:00,0:30:00,No
\ No newline at end of file
No preview for this file type
from pathlib import Path
import csv
from datetime import datetime, timedelta
def get_all_data_from_folder(foldername, datatype="txt"):
"""extracts all files from given folder for further processing"""
script_dir = Path(__file__).resolve().parent
data_dir = script_dir.parent / f"{foldername}"
files = list(data_dir.rglob(f"*.{datatype}"))
answers = []
# only gets first file from files
with open(files[0], 'r', encoding='utf-8') as file:
reader = csv.reader(file)
for row in reader:
answers.append(row)
return answers
def process_survey_data(headers, answers):
survey_group_answers = {}
for responses in answers:
question_num = 0
for idx, question in enumerate(headers):
if "Which text was" in question:
question_num += 1
if 3 < idx < 154:
if f"{question}/{question_num}" not in survey_group_answers:
survey_group_answers[f"{question}/{question_num}"] = []
survey_group_answers[f"{question}/{question_num}"].append(responses[idx])
else:
survey_group_answers[f"{question}/{question_num}"].append(responses[idx])
else:
if f"{question}" not in survey_group_answers:
survey_group_answers[f"{question}"] = []
survey_group_answers[f"{question}"].append(responses[idx])
else:
survey_group_answers[f"{question}"].append(responses[idx])
return survey_group_answers
class Proccess_Data(object):
def __init__(self, correct_labels, survey_data, models):
self.correct_labels = correct_labels
self.survey_data = survey_data
self.models = models
@staticmethod
def calculate_total_time(time_start, time_end):
total_time = timedelta()
user_time = {}
for num, (start, end) in enumerate(zip(time_start, time_end), start=1):
start_time = datetime.strptime(start, "%H:%M:%S")
end_time = datetime.strptime(end, "%H:%M:%S")
if end_time < start_time:
continue
time = end_time - start_time
if time.total_seconds() / 60 < 180:
total_time += (end_time - start_time)
user_time[num] = time.total_seconds() // 60
return round((total_time.total_seconds() / 60) / len(time_start), 2) , user_time
def calculate_correct_answers(self, indicies_flag=False, indicies=None):
correct_total = 0
incorrect_total = 0
total = 0
results = {}
for num in range(1, len(self.survey_data['Which text was written by the LLM?/1']) + 1):
all_answers = self.survey_data[f'Which text was written by the LLM?/{num}']
correct_group = 0
incorrect_group = 0
total_group = 0
for idx, single_answers in enumerate(all_answers):
if indicies_flag:
if idx in indicies:
total += 1
total_group += 1
if single_answers == self.correct_labels[num-1]:
correct_group += 1
correct_total += 1
else:
incorrect_total += 1
incorrect_group += 1
else:
total += 1
total_group += 1
if single_answers == self.correct_labels[num-1]:
correct_group += 1
correct_total += 1
else:
incorrect_total += 1
incorrect_group += 1
if total_group > 0:
results[num] = [correct_group, incorrect_group, round(correct_group / total_group, 2)]
# else:
# results[num] = [correct_group, incorrect_group, 0]
results.setdefault(0, [correct_total, incorrect_total, round(correct_total / total, 2)])
return results # {survey_num: [correct, false, correct_percentage]}
def compare_groups(self, group):
if group == "expert":
output = self.survey_data['Do you study computerlinguistics or work in a similar field?']
yes_indices = []
for idx in range(len(output)):
if output[idx] == "Yes":
yes_indices.append(idx)
expert_indices = self.calculate_correct_answers(indicies_flag=True, indicies=yes_indices)
return expert_indices
if group == "ai_usage":
output = self.survey_data['How often do you use AIs?']
very_often_indices = []
for idx in range(len(output)):
if output[idx] == 'Very often (everyday for all kind of things and tasks)':
very_often_indices.append(idx)
usage_indices = self.calculate_correct_answers(indicies_flag=True, indicies=very_often_indices)
return usage_indices
if group == "time":
time_start = self.survey_data['What time is it now?']
time_end = self.survey_data['Please enter the time.']
total_time = self.calculate_total_time(time_start, time_end)
avg_time = total_time[0]
above_avg = []
below_avg = []
for time in total_time[1].items():
if time[1] > avg_time + 10:
above_avg.append(time[0])
if time[1] < avg_time - 10:
below_avg.append(time[0])
above_avg_indicies = self.calculate_correct_answers(indicies_flag=True, indicies=above_avg)
below_avg_indicies = self.calculate_correct_answers(indicies_flag=True, indicies=below_avg)
return above_avg_indicies, below_avg_indicies
def compare_ai(self, correct_percentage):
gpt2, opt, gpt4o , gpt2_count, opt_count, gpt4o_count = 0, 0, 0, 0 ,0, 0
for ai, correctness in zip(self.models, correct_percentage):
if ai == "gpt2":
gpt2 += correctness
gpt2_count += 1
elif ai == "opt":
opt += correctness
opt_count += 1
elif ai == "gpt4o":
gpt4o += correctness
gpt4o_count += 1
return {"gpt2": round(gpt2 / gpt2_count, 2), "opt": round(opt / opt_count, 2), "gpt4o": round(gpt4o / gpt4o_count, 2)}
def average_parameter(self):
"""Looks at the parameters like coherence, conciseness, creawtivity and clarity of concept and calculates the average."""
pass
if __name__ == "__main__":
answers = get_all_data_from_folder("results", "csv")
headers = answers[0]
only_answers = answers[1:]
survey_data = process_survey_data(headers, only_answers)
#print(survey_data.keys())
# correct answers for each survey group (LLM generated text is the correct answer)
correct_text = ["Text 2", "Text 2", "Text 1", "Text 2", "Text 1", "Text 2",
"Text 2", "Text 2", "Text 1", "Text 1", "Text 2", "Text 1",
"Text 1", "Text 2", "Text 2", "Text 1", "Text 1", "Text 2",]
models = ["gpt2", "gpt2", "opt", "opt", "gpt4o", "gpt4o",
"gpt4o", "gpt4o", "opt", "opt", "gpt2", "gpt2",
"opt", "opt", "gpt4o", "gpt4o", "gpt2", "gpt2"]
evaluator = Proccess_Data(correct_text, survey_data, models)
#total_correct = evaluator.calculate_correct_answers()
# expert_group = evaluator.compare_groups("expert")
# ai_usage_group = evaluator.compare_groups("ai_usage")
# time_group = evaluator.compare_groups("time")
#correct_percentage = [i[2] for i in total_correct.values()] # extracts average percentage of correct answers
#model_results = evaluator.compare_ai(correct_percentage)
test = evaluator.average_parameter()
print(test)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment