I can't reproduce the same results in the paper

Hi, 
Like the #2 , I also tried to reproduce the FED paper with the FED data (http://shikib.com/fed_data.json).
But, I couldn't obtain the same result as the paper. 

1) Average scores of Annotators.
By applying the data processing method in the paper, I could only  reproduce the similar results for dialog-level evaluation, not turn-level evaluation. **How I can reproduce the results for turn-level ?**

- Avg. score in the paper
![image](https://user-images.githubusercontent.com/11382726/149056684-81565164-2703-465b-8971-407bd81eb348.png)

- Avg. score in FED data  
![image](https://user-images.githubusercontent.com/11382726/149056715-7af8ecef-ee70-4084-92ce-1b95d6ce3ae7.png)

2) Correlation between Follow-up Utterance(FU) scores and Avg. scores of annotators
I also calculated correlation between FU scores and Avg. scores of human evaluation.  I obtained FU scores with the DialogGPT(large) model, following the guidance on README file (i.e., preprocessing inputs and using the FED module).
However, the results of correlation were totally different from the paper.  I wonder if the FU scores in the paper were calculated in the same way in this repository. How I can reproduce the same results of correlation? 

- Correlation in the paper 
![image](https://user-images.githubusercontent.com/11382726/149057711-77041fbf-b8d2-4c03-98e3-31b056b6db84.png)

- Reproduced correlation
![image](https://user-images.githubusercontent.com/11382726/149057921-3e8f54f1-d876-4f32-8db5-b4ad627801f7.png)

- (Dialog level) FU scores and annotator's evaluation that I've obtained.
[Calcuated_results.zip](https://github.com/Shikib/fed/files/7851403/Calcuated_results.zip)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I can't reproduce the same results in the paper #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

I can't reproduce the same results in the paper #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions