Skip to content

Extended buffer protocol demo#5

Open
da-woods wants to merge 8 commits intomasterfrom
wip-buffer-protocol
Open

Extended buffer protocol demo#5
da-woods wants to merge 8 commits intomasterfrom
wip-buffer-protocol

Conversation

@da-woods
Copy link
Owner

@da-woods da-woods commented Mar 9, 2025

Quick demo of how the extended buffer protocol could work in Cython.

Essentially you can tag a struct (and probably other c types in future) with the decorators

  • extended_buffer_name (for an exact match)
  • extended_buffer_startswith (to match the start of the string)
  • extended_buffer_regex (to use the Python re module)

In principle a single type can be tagged with multiple possible decorators (but this hasn't really been tested).

In principle this can be mixed in with other existing buffer types in a single buffer, but this really hasn't been tested.

I've also added format as an attribute to our typed memoryviews, to get the format string.

I've created a quick demo with a toy datetime and string class. The string class stores some information in the format string so demonstrates that this could be viable.

Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
else:
assert False, dtype

scope = getattr(dtype, 'scope', None)
Copy link

@seberg seberg Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above, assert len(fields) > 0 also doesn't work, I think it should be:

assert len(fields) > 0 or getattr(dtype, 'scope', None) is not None

(i.e. for StringDType where there is no simple time inside the struct.)

(I can make a PR to the PR some time, was just trying out a bit)

EDIT: There are is at least one more issues with that anyway. May not be worth to hash out for the PoC.
(The whole fact that npy_packed_string is an opaque struct makes it a bit more complex.)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes - in the past we haven't allowed a buffer on an opaque structure (because it doesn't hasn't really made sense).

However with this it would make sense so we'd probably have to change some of the assumptions.

I can have a look at this later this week if useful - depends how useful a complete PoC is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, edited above, but to be clearer. I am not sure, I am running into multiple issues down the line (also because npy_packed_string is an opaque struct.

@ngoldbaum might be more interested, I think for a PoC, it is maybe best to stick to timedelta, a bit less interesting, but it should work as is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for a PoC, it is maybe best to stick to timedelta, a bit less interesting, but it should work as is.

The C structure is opaque but it has a well-defined size inside NumPy, we only wanted to give ourselves the freedom to be able to change the layout of the struct in the future.

From NumPy's point of view, the size of the struct in the buffer is well-defined, so NumPy could advertise e.g. "this is StringDType, which is the same as char[16]" and then leave it up to buffer protocol consumers to cast to npy_packed_static_string * when they need to call into the C API.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's possibly two kinds of opaqueness - opaque to Cython (but where C sizeof(type) works) and opaque to C.

There's two issues there I think:

  • Cython currently assumes that that the structure isn't opaque to it. This is something that could definitely be worked around - I just hadn't considered it.
  • opaque to C is more of a problem - at very least it implies that the structure can only work as the last one in the buffer. But there may be other issues there too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Yeah, just to try it briefly, it seems as soon as I do something slightly more tricky than what your test does, cython generates some functions that can't compile:

static PyObject* __pyx_convert__to_py_npy_packed_static_string(npy_packed_static_string s)

i.e. even just a loop, maybe?

    for i in range(len(mview)):
        ret = cnp.NpyString_pack(NULL, &mview[i], string, size)
        if ret < 0:
            break

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static PyObject* __pyx_convert__to_py_npy_packed_static_string(npy_packed_static_string s)

That looks to me like you've somehow tried to convert to it a Python object (possibly accidentally). I suspect that's something for a better, early diagnostic.

i.e. even just a loop, maybe?

    for i in range(len(mview)):
        ret = cnp.NpyString_pack(NULL, &mview[i], string, size)
        if ret < 0:
            break

At least in my demo I definitely had to declare it as a non-contiguous array for indexing to work. That might be the issue here? Obviously the demo was very simple and cut-down but I did try getting the address of an array element, which is all I think you're doing here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I thought so first too and was confused. Coming back to it now, I identified the problem :).

If I replace the len(mview) with mview.shape[0] it works fine, I pushed it to my branch/pr: seberg/numpy#50

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point is to say "this is probably enough to prove the point that we can work with the feature even if the Cython side clearly needs a lot of tidying before it's released to the world at large".

I'll keep thinking a bit about how to improve the interface - there's bits of what I've done here that I don't really like. I probably won't implement them at this stage, but at least they'll be ready if the PEP is accepted. I don't think any of those thoughts really affect what should go in the PEP though - just how Cython presents it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think this is plenty for now. For times/stringdtype the coolest thing might be custom properties/methods (i.e. for the unit/allocator); but I don't think it is important at all for an initial implementation.
Also, while that would be cool for these dtypes, I can also see sufficient use-cases that are not parametric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants