Hi authors,
Thanks for your great work! I am currently studying your dataset and have a few questions regarding the recaptioning process mentioned:
- Model Selection: Could you clarify which Multimodal Large Language Model (MLLM) was used for the recaptioning pipeline?
- Prompt Details: What specific prompts were used to guide the model during this process?
- Experimental Analysis: Does the paper include any experimental analysis or ablation studies specifically focusing on the impact or quality of these recaption results? If so, could you point me to the relevant section?
Thanks in advance for your time and help!