Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
SynDRA
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
hillengass
SynDRA
Commits
f8b0b084
Commit
f8b0b084
authored
1 year ago
by
pracht
Browse files
Options
Downloads
Patches
Plain Diff
Annotation script
parent
3e11de63
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
elise/elise/mwoz_annotation.py
+70
-0
70 additions, 0 deletions
elise/elise/mwoz_annotation.py
with
70 additions
and
0 deletions
elise/elise/mwoz_annotation.py
0 → 100644
+
70
−
0
View file @
f8b0b084
from
enum
import
Enum
from
outlines
import
models
,
prompt
from
outlines.fsm
import
json_schema
from
pydantic
import
BaseModel
from
typing
import
Union
,
Optional
from
typing_extensions
import
List
,
Literal
import
json
import
requests
import
re
from
.schema
import
MultiWOZ
### Prompt
@prompt
def
llama_prompt_no_schema
(
dialog
):
"""
<s>[INST] <<SYS>>
You are a helpful annotator. You read the text carefully and annotate all valid feels in the schema.
Make sure to only annotate attractions like museums, clubs or other tourist attractions as such.
If you are not sure with an annotation you should annotate None instead.
<</SYS>>
{{dialog}} [/INST]
"""
### Requests
def
main
():
# Read dialogue data
with
open
(
"
elise/output_dialogues_50.json
"
,
"
r
"
)
as
file
:
data
=
json
.
load
(
file
)
dialogues
=
[
d
[
"
dialogue
"
][
0
]
for
d
in
data
]
# Request annotations
for
dia
in
dialogues
:
prompt
=
llama_prompt_no_schema
(
dia
)
# Send request to vLLM server
response
=
requests
.
post
(
"
http://localhost:8000/generate
"
,
json
=
{
"
prompt
"
:
prompt
,
"
schema
"
:
MultiWOZ
.
model_json_schema
(),
"
max_tokens
"
:
1024
,
# Find reasonable limit
"
n
"
:
1
}
)
for
reply
in
response
.
json
()[
"
text
"
]:
annotation
=
reply
.
split
(
"
[/INST]
"
)[
1
]
# Cleanup the whitespace by some erroneous generations
annotation
=
annotation
.
replace
(
"
\n
"
,
""
)
annotation
=
re
.
sub
(
r
"
\s+
"
,
"
"
,
annotation
)
with
open
(
"
output_annotations.txt
"
,
"
a
"
)
as
file
:
# Catch annotation errors like invalid json.
# Most of the time only a closing bracket is missing.
try
:
annotation_json
=
json
.
loads
(
annotation
)
file
.
write
(
json
.
dumps
(
annotation_json
))
file
.
write
(
"
\n
"
)
except
:
annotation
=
annotation
+
"
}
"
try
:
annotation_json
=
json
.
loads
(
annotation
)
file
.
write
(
json
.
dumps
(
annotation_json
))
file
.
write
(
"
\n
"
)
except
:
file
.
write
(
f
"
PARSING ERROR:
{
annotation
}
\n
"
)
\ No newline at end of file
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment