Discrepancy of 0.04% in ActivityNet-QA evaluation code

Hi, I am consistently finding a difference of -0.04% in the reported performance on ActivityNet-QA dataset when using the official code for evaluation (https://github.com/MILVLG/activitynet-qa)
Replicating the results on ActivityNet-QA:
1. Accuracy using Singularity-Temporal (n=12 frames, num_temporal_layers=2, ckpt: ft_anet_qa_singularity_temporal_17m.pth): 44.01%
2.  Accuracy using ActivityNet-QA: 43.97%

Bonus: ActivityNet-QA evaluation code provides evaluation of each question sub-type :)
Accuracy (per question type):
	Motion: 32.2500%
	Spatial Relation: 22.6250%
	Temporal Relation: 4.1250%
	Free: 75.7523%
	All: 43.9750%
Accuracy of the Free type questions(per answer type):
	Yes/No: 75.1194%
	Color: 51.3630%
	Object: 27.6730%
	Location: 39.8964%
	Number: 54.4554%
	Other: 36.2241%

P.S.: The difference of -0.04% is consistent for all my experiments on ActivityNet-QA.

Thanks in advance! 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Discrepancy of 0.04% in ActivityNet-QA evaluation code #28

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Discrepancy of 0.04% in ActivityNet-QA evaluation code #28

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions