Conversation
They aren't modified, so const will hopefully help the compiler out a little (and prevent them from being modified in future)
Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
| else: | ||
| assert False, dtype | ||
|
|
||
| scope = getattr(dtype, 'scope', None) |
There was a problem hiding this comment.
Above, assert len(fields) > 0 also doesn't work, I think it should be:
assert len(fields) > 0 or getattr(dtype, 'scope', None) is not None
(i.e. for StringDType where there is no simple time inside the struct.)
(I can make a PR to the PR some time, was just trying out a bit)
EDIT: There are is at least one more issues with that anyway. May not be worth to hash out for the PoC.
(The whole fact that npy_packed_string is an opaque struct makes it a bit more complex.)
There was a problem hiding this comment.
Ah yes - in the past we haven't allowed a buffer on an opaque structure (because it doesn't hasn't really made sense).
However with this it would make sense so we'd probably have to change some of the assumptions.
I can have a look at this later this week if useful - depends how useful a complete PoC is.
There was a problem hiding this comment.
Sorry, edited above, but to be clearer. I am not sure, I am running into multiple issues down the line (also because npy_packed_string is an opaque struct.
@ngoldbaum might be more interested, I think for a PoC, it is maybe best to stick to timedelta, a bit less interesting, but it should work as is.
There was a problem hiding this comment.
I think for a PoC, it is maybe best to stick to timedelta, a bit less interesting, but it should work as is.
The C structure is opaque but it has a well-defined size inside NumPy, we only wanted to give ourselves the freedom to be able to change the layout of the struct in the future.
From NumPy's point of view, the size of the struct in the buffer is well-defined, so NumPy could advertise e.g. "this is StringDType, which is the same as char[16]" and then leave it up to buffer protocol consumers to cast to npy_packed_static_string * when they need to call into the C API.
There was a problem hiding this comment.
There's possibly two kinds of opaqueness - opaque to Cython (but where C sizeof(type) works) and opaque to C.
There's two issues there I think:
- Cython currently assumes that that the structure isn't opaque to it. This is something that could definitely be worked around - I just hadn't considered it.
- opaque to C is more of a problem - at very least it implies that the structure can only work as the last one in the buffer. But there may be other issues there too.
There was a problem hiding this comment.
Cool! Yeah, just to try it briefly, it seems as soon as I do something slightly more tricky than what your test does, cython generates some functions that can't compile:
static PyObject* __pyx_convert__to_py_npy_packed_static_string(npy_packed_static_string s)
i.e. even just a loop, maybe?
for i in range(len(mview)):
ret = cnp.NpyString_pack(NULL, &mview[i], string, size)
if ret < 0:
break
There was a problem hiding this comment.
static PyObject* __pyx_convert__to_py_npy_packed_static_string(npy_packed_static_string s)
That looks to me like you've somehow tried to convert to it a Python object (possibly accidentally). I suspect that's something for a better, early diagnostic.
i.e. even just a loop, maybe?
for i in range(len(mview)): ret = cnp.NpyString_pack(NULL, &mview[i], string, size) if ret < 0: break
At least in my demo I definitely had to declare it as a non-contiguous array for indexing to work. That might be the issue here? Obviously the demo was very simple and cut-down but I did try getting the address of an array element, which is all I think you're doing here.
There was a problem hiding this comment.
Right, I thought so first too and was confused. Coming back to it now, I identified the problem :).
If I replace the len(mview) with mview.shape[0] it works fine, I pushed it to my branch/pr: seberg/numpy#50
There was a problem hiding this comment.
At this point is to say "this is probably enough to prove the point that we can work with the feature even if the Cython side clearly needs a lot of tidying before it's released to the world at large".
I'll keep thinking a bit about how to improve the interface - there's bits of what I've done here that I don't really like. I probably won't implement them at this stage, but at least they'll be ready if the PEP is accepted. I don't think any of those thoughts really affect what should go in the PEP though - just how Cython presents it.
There was a problem hiding this comment.
Agreed, I think this is plenty for now. For times/stringdtype the coolest thing might be custom properties/methods (i.e. for the unit/allocator); but I don't think it is important at all for an initial implementation.
Also, while that would be cool for these dtypes, I can also see sufficient use-cases that are not parametric.
Quick demo of how the extended buffer protocol could work in Cython.
Essentially you can tag a struct (and probably other c types in future) with the decorators
extended_buffer_name(for an exact match)extended_buffer_startswith(to match the start of the string)extended_buffer_regex(to use the Pythonremodule)In principle a single type can be tagged with multiple possible decorators (but this hasn't really been tested).
In principle this can be mixed in with other existing buffer types in a single buffer, but this really hasn't been tested.
I've also added
formatas an attribute to our typed memoryviews, to get the format string.I've created a quick demo with a toy datetime and string class. The string class stores some information in the format string so demonstrates that this could be viable.