World Models

Task 1: Evaluating Qwen2-VL-2B-Instruct on Multimodal Reasoning

Model and Dataset: Used the Qwen2-VL-2B-Instruct model (from GitHub and Hugging Face) and tested it on the MathVista dataset (Hugging Face).
Accelerated Inference: Integrated the vllm toolkit to speed up inference processing.
Performance Evaluation:
- Assessed model performance on multimodal reasoning tasks.
- Compared inference time with and without vllm.
- Analyzed model predictions and identified bad cases.

VQGAN: Reviewed the VQGAN paper and code (GitHub), focusing on the image tokenization process.
Model Execution: Ran VQGAN on the DIV2K dataset (DIV2K) using a subset of 200 images in Google Colab.
Visualizations:
- Visualized quantized token ID frequency distributions.
- Performed dimensionality reduction (PCA and t-SNE) on token ID embeddings to study uniformity.
- Compared quantized token differences between two similar images.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DIV2K_output		DIV2K_output
MMR_Evaluate_GPT.ipynb		MMR_Evaluate_GPT.ipynb
MMR_Evaluate_Rules.ipynb		MMR_Evaluate_Rules.ipynb
Multimodal_Reasoning_No_vLLM.ipynb		Multimodal_Reasoning_No_vLLM.ipynb
Multimodal_Reasoning_vLLM.ipynb		Multimodal_Reasoning_vLLM.ipynb
Multimodal_Reasoning_vLLM_batch.ipynb		Multimodal_Reasoning_vLLM_batch.ipynb
README.md		README.md
detailed_results.json		detailed_results.json
detailed_results_gpt.json		detailed_results_gpt.json
image_tokenization_final.ipynb		image_tokenization_final.ipynb
image_tokenization_final.py		image_tokenization_final.py
mmr_evaluate_gpt.py		mmr_evaluate_gpt.py
mmr_evaluate_rules.py		mmr_evaluate_rules.py
multimodal_reasoning_no_vllm.py		multimodal_reasoning_no_vllm.py
multimodal_reasoning_vllm.py		multimodal_reasoning_vllm.py
multimodal_reasoning_vllm_batch.py		multimodal_reasoning_vllm_batch.py
responses.json		responses.json
responses_batch.json		responses_batch.json