Skip to content

Conversation

@nmcclatchey
Copy link

CNPY originally only supported NPY 1.0 headers, with a 2-byte length field. This PR causes parse_npy_header() to respond to NPY 2.0 and 3.0 by reading the header length as 4 bytes.

CNPY originally only supported NPY 1.0 headers, with a 2-byte length
field. This commit causes parse_npy_header to respond to NPY 2.0 and 3.0
by reading the header length as 4 bytes.
@ZJUGuoShuai
Copy link

I only saw version 2.0 from NEP 1, can you tell me where is version 3.0 defined? Thanks.

@nmcclatchey
Copy link
Author

You can find all 3 versions defined on numpy.org. Specifically, on the numpy.lib.format page.

Version 3.0 merely swaps ASCII strings to Unicode strings.

cnpy.cpp Outdated
header_len = *reinterpret_cast<uint32_t*>(buffer+8);
else
header_len = *reinterpret_cast<uint16_t*>(buffer+8);
std::string header(reinterpret_cast<char*>(buffer+(extended_header ? 11 : 9)),header_len);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since header_len is uint16_t, I think it should be?

Suggested change
std::string header(reinterpret_cast<char*>(buffer+(extended_header ? 11 : 9)),header_len);
std::string header(reinterpret_cast<char*>(buffer+(extended_header ? 12 : 10)),header_len);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. It's been working because the only use of header has been based on relative locations discovered using find and similar, but we could improve performance (ever so slightly) by using the correct offset.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks for pointing this out.

The `header` string was started too early, and had included the final byte of the length field (and omitting the final byte). This commit adjusts where it starts to fix the issue.

Note: No ill effects occurred earlier because each parsed field is located using `find`, and the final byte is typically `}` and irrelevant to the parsed fields.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants