Skip to content

Cannot interpret fed results #4

@Youmna-H

Description

@Youmna-H

Hello,
I've run the example provided in fed_demo.py and I am finding difficulty in interpreting the results. The scores I got are:

{'interesting': -0.28983132044474313, 'engaging': -0.40840943654378226, 'specific': -0.22960980733235647, 'relevant': 7.028880596160889, 'correct': 7.123448371887207, 'semantically appropriate': 0.2597320079803467, 'understandable': 0.21886169910430908, 'fluent': 0.23042782147725394, 'coherent': 7.030221144358317, 'error recovery': 6.849422454833984, 'consistent': 7.3398823738098145, 'diverse': 7.251625696818034, 'depth': 7.140579700469971, 'likeable': -0.23120896021525006, 'understand': 7.056127548217773, 'flexible': -0.09475564956665039, 'informative': -0.16989962259928415, 'inquisitive': -0.34922027587890625}

In the paper, all the scores are in the ranges: 1-3, 0-1, or 1-5. For the above scores, I don't know what is the upper and lower bounds. Also in fed.py, the score is calculated by: scores[metric] = (low_score - high_score)
Does this mean that a negative score is a good thing? Please advise on how to interpret these scores.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions