Skip to content

Conversation

@amisevsk
Copy link
Contributor

Description

File suffix matching for models is too broad for *.bin files, which are not necessarily models. From what we can tell, only huggingface transformers (legacy) uses this suffix. To avoid false matches, restrict matching .bin files to model weights only if the filename contains pytorch_model

Linked issues

Closes #1052

AI-Assisted Code

  • This PR contains AI-generated code that I have reviewed and tested
  • I take full responsibility for all code in this PR, regardless of how it was created

File suffix matching for models is too broad for *.bin files, which are
not necessarily models. From what we can tell, only huggingface
transformers (legacy) uses this suffix. To avoid false matches, restrict
matching .bin files to model weights only if the filename contains
`pytorch_model`

Signed-off-by: Angel Misevski <amisevsk@gmail.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines the file type detection logic to avoid incorrectly classifying .bin files as model weights. Instead of treating all .bin files as models, the code now only matches .bin files that contain pytorch_model in their filename, which aligns with HuggingFace transformers' legacy naming convention.

Changes:

  • Removed .bin from the generic modelWeightsSuffixes list
  • Added a new modelWeightsPatterns variable with glob-style pattern matching for *pytorch_model*.bin files
  • Introduced anyPattern() function to match filenames against glob patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@gorkem gorkem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work fine.

I thought we had a test somewhere that used .binthat should be modified but could not find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kit init uses a too broad .bin for pytorch

2 participants