Marco Mistretta marcomistretta

Hi 👋 I'm Marco Mistretta!

I’m a PhD student in Artificial Intelligence at MICC, University of Florence, working under the guidance of Prof. Andrew D. Bagdanov and Prof. Marco Bertini. With a background in Computer Engineering and AI, my research focuses on pushing the boundaries of Multimodal Vision-Language Models (like CLIP) and their real-world applications.

This expertise is demonstrated through my first-author publications in top-tier venues, including ECCV (main conference), NeurIPS (workshop), ICLR (main conference). These works reflect my dedication to solving challenging problems and advancing the field of AI.

I recently completed an Applied Scientist Internship at Amazon (RufusX Team, London), where I worked on foundational research and development in Generative AI and Multimodal Large Language Models (MLLMs) as part of the Amazon Rufus initiative.

For more information, feel free to visit my website: marcomistretta.github.io

🥇 First Author Publications:

Cross the Gap: Inter-modal CLIP Representations Are Superior for Intra-modal Tasks
ICLR 2025 (main paper)
Authors: Marco Mistretta*, Alberto Baldrati*, Lorenzo Agnolucci*, Marco Bertini, Andrew D. Bagdanov
Code: GitHub Repository
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
ECCV 2024 (main paper)
Authors: Marco Mistretta*, Alberto Baldrati*, Marco Bertini, Andrew D. Bagdanov
Code: GitHub Repository
RE-tune: Incremental Fine Tuning of Biomedical Vision-Language Models for Multi-label Chest X-ray Classification
NeurIPS 2023, Medical Imaging meets NeurIPS Workshop
Authors: Marco Mistretta, Andrew D. Bagdanov

💼 Work Experience:

Applied Scientist Intern – Amazon (RufusX Team, London)

July 2025 – December 2025

Worked on Generative AI and Multimodal Large Language Models (MLLMs) within the Amazon Rufus initiative.
Fine-tuned, evaluated, and deployed large-scale multimodal models impacting millions of customers.
Collaborated with scientists and engineers to advance real-world multimodal reasoning and generation.

🌟 What Drives Me:

I'm really into:

🧠 Multimodal Learning: Combining visual and language data to get a richer understanding of the world.
💬 Natural Language Processing (NLP): Teaching machines to understand and communicate in human language.
🖼️ Contrastive Self-Supervised Learning: Finding patterns in data without the need for human labels.
♻️ Incremental Learning: Allowing AI models to keep learning from new information without forgetting the old ones.
🎯 Few-Shot Adaptation: Quickly adapting AI to a diverse data distribution with minimal examples.
📝 Prompt Learning: Tuning only a few learnable parameters, so-called "prompts", to maximize VLMs performance.
🚀 Test-Time Adaptation: Letting models adjust during inference to handle unseen data on the fly.

🚀 Skills & Technologies:

Programming Languages: Python, Java, C++, MATLAB, R
Frameworks & Tools: PyTorch, TensorFlow, Hugging Face, OpenCV
Research Areas: Vision-Language Models, Self-Supervised Learning, Few-Shot Learning, Prompt Learning, Incremental Learning

🔗 Let's Connect!

I’d love to connect! Feel free to reach out on:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Marco Mistretta marcomistretta

Achievements

Achievements

Highlights

Organizations

Block or report marcomistretta