Recreating evaluation segment of the original paper

Dear Dr. Reuben,

I am part of the same reproduction team as @whalekeykeeper. It's a pleasure to make your acquaintance and thank you for your support of our project. 

As a team member, I am specifically attempting to recreate your evaluation script from the original paper. You write about two models being necessary for this part: the main model for generating captions, and a different one, trained on different data, which will be used to inform a second listener for identifying the target image of a caption - amid a group of distractors. 

In the paper, you write as follows:

> We train our production and evaluation models on separate sets consisting of regions in the Visual Genome dataset (Krishna et al., 2017) and full images in MSCOCO (Chen et al., 2015).

You have graciously included pretrained parameters for the encoder-decoder, more than one in fact. However, I cannot understand if the two sets if parameters `coco-[encoder/decoder], vg-[encoder/decoder]` correspond to these two models. If they do not, would you say it's still fine to use one pair for generating captions and the other pair for $L_{eval}$?

Since both TS1 and TS2 (as defined in the paper in section 4.1) are constructed from Visual Genome, it feels intuitively right to use `vg-[encoder/decoder]` for $L_{eval}$ and `coco-[encoder/decoder]` for caption generation. Unfortunately we do not have the compute at this time to train new models from scratch, so being able to use the provided pretrained parameters would be a huge help. 

I thank you again for your time and support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recreating evaluation segment of the original paper #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Recreating evaluation segment of the original paper #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions