Different results with different tfkit version

Hi, I'm trying to reproduce your fantastic results based on BART model. I use the trained model you provided:
[https://github.com/voidful/BDG/releases/download/v2.0/BDG_ANPM.pt](url)

When I use tfkit==0.7.0(suggested by readme), I get the result like this:
`{'Bleu_1': 0.4116063603355367, 'Bleu_2': 0.2629480211200134, 'Bleu_3': 0.19128546675900487, 'Bleu_4': 0.1484759134861437, 'ROUGE_L': 0.2184638476496905, 'CIDEr': 0.07954905358236805}`
The value of ROUGE_L is much lower than the reported value, while the BLEU value is similar to the reported value. It takes me about half an hour for evaluation.

However, when I use tfkit==0.8.1(latest), I get the result like this:
`'Bleu_1': 0.40226892712763984, 'Bleu_2': 0.2566475644205321, 'Bleu_3': 0.18535836171285228, 'Bleu_4': 0.14348238003117275, 'ROUGE_L': 0.3556143135035776, 'CIDEr': 0.6532226297900213`
The value is similar to the reported one, but it takes much more time (about 2.5 hours) for evaluation on the same GPU, and the tqdm doesn't show the progress bar.

I was wondering why different tfkit versions would cause different results and different evaluation time. Which version should I use?
Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Different results with different tfkit version #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Different results with different tfkit version #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions