Hi, I followed the provided steps to test your code, but I wasn't able to achieve the metrics reported in the paper. I suspect the discrepancy might be due to differences in how the frames were extracted from the HMDB and UCF datasets.
Would you be able to share the exact code or methodology used for frame extraction? This would help ensure reproducibility.
Thanks in advance!