-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Issue Description
There's a mismatch between the CLIP implementation used in our research jupyter notebook, validate_clip_rankings.ipynb, and our production code, which may lead to unexpected differences in scoring behavior.
Details
Current Implementation
-
Research Notebook (
validate_clip_rankings.ipynb):- Uses OpenAI's original CLIP implementation directly
- Loads model with
clip.load("ViT-B/32", device=device) - Tokenizes and encodes text with
model.encode_text(clip.tokenize(text).to(device)) - Calculates scores with matrix multiplication
(text_features @ image_features.T).item()
-
Production Code (
calculate_scores_payout.py):- Uses HuggingFace's Transformers implementation via our custom
ClipEmbedderclass - Initializes with
self.embedder = ClipEmbedder() - Encodes text with
self.embedder.get_text_embedding(guess) - Calculates scores with numpy dot product
np.dot(text_features, image_features)
- Uses HuggingFace's Transformers implementation via our custom
Impact
This inconsistency means that:
- The baseline adjustment behavior might differ between research and production
- Scoring thresholds determined in the notebook might not directly transfer to production
- Research findings might not fully apply to the deployed system
Potential Solutions
- Update
ScoreValidatorto use the same direct CLIP approach as the notebook - Modify the notebook to use
ClipEmbedderfor consistency - Perform a comparative analysis to determine which approach performs better
- Standardize on a single CLIP implementation across all code
Reproduction Steps
- Run baseline adjustment tests in the notebook
- Run the same tests with the
ScoreValidatorimplementation - Compare the results for the same input texts and images
Priority
Medium - This won't break the system but should be addressed for consistent behavior between research and production.
Metadata
Metadata
Assignees
Labels
No labels