-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Hi,
Like the #2 , I also tried to reproduce the FED paper with the FED data (http://shikib.com/fed_data.json).
But, I couldn't obtain the same result as the paper.
- Average scores of Annotators.
By applying the data processing method in the paper, I could only reproduce the similar results for dialog-level evaluation, not turn-level evaluation. How I can reproduce the results for turn-level ?
- Correlation between Follow-up Utterance(FU) scores and Avg. scores of annotators
I also calculated correlation between FU scores and Avg. scores of human evaluation. I obtained FU scores with the DialogGPT(large) model, following the guidance on README file (i.e., preprocessing inputs and using the FED module).
However, the results of correlation were totally different from the paper. I wonder if the FU scores in the paper were calculated in the same way in this repository. How I can reproduce the same results of correlation?
-
(Dialog level) FU scores and annotator's evaluation that I've obtained.
Calcuated_results.zip
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels



