-
Notifications
You must be signed in to change notification settings - Fork 97
Open
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
Summary
Implement NumPy ufunc interoperability for the Arkouda pandas ExtensionArray by adding a correct, well-scoped __array_ufunc__ implementation. This will allow common ufuncs (e.g., np.add, np.subtract, np.negative, np.logical_and, comparisons, etc.) to operate on Arkouda-backed Series/arrays without silently materializing to NumPy, while preserving pandas semantics where required.
Background / Motivation
Today, many NumPy ufunc operations on Arkouda-backed pandas objects either:
- fall back to object/NumPy materialization (breaking scalability), or
- error in inconsistent ways, or
- route through pandas that expects
__array_ufunc__and__array_priority__behavior for ExtensionArrays.
A minimal-but-correct __array_ufunc__ enables:
- predictable behavior for arithmetic and elementwise operations,
- better pandas compatibility (pandas frequently triggers ufunc paths),
- clear errors for unsupported dtypes (e.g.,
Strings,Categorical) or unsupported ufuncs/methods.
Goals
- Add
__array_ufunc__to the ArkoudaExtensionArrayimplementation. - Support elementwise ufuncs for numeric and boolean dtypes where there is a reasonable Arkouda mapping.
- Handle
method="__call__"andmethod="reduce"(as appropriate) with clear scoping. - Respect pandas expectations:
- return an
ExtensionArray(orSeriesvia pandas) when appropriate, - propagate
np.nan/ missing values correctly (where applicable), - preserve dtype where possible.
- return an
- Avoid accidental conversion to NumPy unless explicitly requested (e.g., via
outbeing a NumPy array, or ufunc not supported).
Non-goals (for this ticket)
- Full coverage of every NumPy ufunc and method (
accumulate,reduceat,outer, etc.). - Supporting ufuncs for Arkouda
StringsandCategoricalunless there is a clear, existing Arkouda primitive (should raise a helpfulTypeErrorfor now). - Implementing NumPy array protocol conversions beyond what is needed for ufunc interoperability.
Proposed Behavior
Supported inputs
selfis the ArkoudaExtensionArray.- Additional inputs may include:
- scalar Python numbers/bools,
- NumPy scalars,
- other Arkouda
ExtensionArrayinstances, - pandas arrays/Series that wrap Arkouda arrays (unwrap as needed).
Dispatch rules
- Reject unsupported
methodvalues withNotImplemented(orTypeErrorif pandas expects it), except for:__call__(required)reduce(optional, only for a small safe subset such asnp.add.reduce,np.logical_or.reduceif Arkouda equivalents exist)
- If any input is a higher-priority type that should handle the ufunc, return
NotImplemented. - Map ufuncs to Arkouda server-side ops:
- Unary:
negative,absolute,invert(for bool/int), etc. - Binary:
add,subtract,multiply,true_divide,floor_divide,power(if supported), comparisons (equal,not_equal,less,greater, etc.), logical ops for bool.
- Unary:
- If
outis provided:- If
outcontains Arkouda ExtensionArrays: write into those (if we support it), else reject with a clear error. - If
outcontains NumPy arrays: either materialize (explicit) or raise (preferred) — pick one and document it.
- If
- Return type:
- For elementwise ops: return a new Arkouda
ExtensionArraywith the result. - For
reduce: return a scalar (Python/NumPy scalar) or a 0-dim equivalent consistent with pandas expectations.
- For elementwise ops: return a new Arkouda
Error messages
- For unsupported dtypes (Strings/Categorical): raise
TypeErrorlike:"NumPy ufunc '<name>' is not supported for Arkouda dtype '<dtype>'"
- For unsupported ufuncs: raise
NotImplementedErroror returnNotImplementeddepending on pandas expectations; include a message guiding users to convert explicitly if they really want NumPy.
Implementation Notes
- Location: Arkouda pandas ExtensionArray class
- Consider implementing a small internal dispatcher:
_UFUNC_TABLE: dict[np.ufunc, callable]or mapping byufunc.__name__.- Centralize dtype checks and missing-value handling.
- Ensure correct behavior with:
__array_priority__(set high enough to win dispatch vs NumPy when appropriate),__array__(if implemented) does not accidentally trigger conversions in the ufunc path.
- Make sure
__array_ufunc__does not breakSeriesops that pandas already routes through its own arithmetic machinery.
Repro / Expected UX
Example (should stay on Arkouda)
>>> import arkouda as ak
>>> import numpy as np
>>> import pandas as pd
>>> s = pd.Series([1, 2, 3], dtype="ak")
>>> (np.add(s.array, 5)).to_numpy() # materialize only at the end
array([6, 7, 8])Example (unsupported dtype gives helpful error)
>>> import arkouda as ak
>>> import numpy as np
>>> import pandas as pd
>>> s = pd.Series(["a", "b"], dtype="ak")
>>> np.add(s.array, "x")
TypeError: NumPy ufunc 'add' is not supported for Arkouda dtype 'string'Tests
Add unit tests covering:
- Unary ufunc:
np.negative,np.absolute(numeric) - Binary ufunc:
np.add,np.subtract,np.multiply,np.true_divide(numeric) - Comparisons:
np.equal,np.less, etc. (numeric/bool) - Mixed scalar + EA and EA + EA
out=behavior (whatever policy is chosen)- Unsupported ufunc raises/returns NotImplemented in a predictable way
- Unsupported dtype (Strings/Categorical) raises a clear
TypeError - Ensure no silent
to_numpy()/ materialization occurs in the supported paths:- validate the result is an Arkouda
ExtensionArray(or wraps one)
- validate the result is an Arkouda
Acceptance Criteria
__array_ufunc__is implemented on the ArkoudaExtensionArray.- Core elementwise numeric ufuncs work end-to-end without NumPy materialization.
- Unsupported ufuncs/dtypes produce clear, consistent errors.
- Test suite includes coverage for supported, unsupported, and edge cases (including
out=). - Documentation/comments explain the supported ufunc surface and rationale for exclusions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request