Skip to content

Implement __array_ufunc__ for Arkouda-backed pandas ExtensionArray #5437

@ajpotts

Description

@ajpotts

Summary

Implement NumPy ufunc interoperability for the Arkouda pandas ExtensionArray by adding a correct, well-scoped __array_ufunc__ implementation. This will allow common ufuncs (e.g., np.add, np.subtract, np.negative, np.logical_and, comparisons, etc.) to operate on Arkouda-backed Series/arrays without silently materializing to NumPy, while preserving pandas semantics where required.

Background / Motivation

Today, many NumPy ufunc operations on Arkouda-backed pandas objects either:

  • fall back to object/NumPy materialization (breaking scalability), or
  • error in inconsistent ways, or
  • route through pandas that expects __array_ufunc__ and __array_priority__ behavior for ExtensionArrays.

A minimal-but-correct __array_ufunc__ enables:

  • predictable behavior for arithmetic and elementwise operations,
  • better pandas compatibility (pandas frequently triggers ufunc paths),
  • clear errors for unsupported dtypes (e.g., Strings, Categorical) or unsupported ufuncs/methods.

Goals

  • Add __array_ufunc__ to the Arkouda ExtensionArray implementation.
  • Support elementwise ufuncs for numeric and boolean dtypes where there is a reasonable Arkouda mapping.
  • Handle method="__call__" and method="reduce" (as appropriate) with clear scoping.
  • Respect pandas expectations:
    • return an ExtensionArray (or Series via pandas) when appropriate,
    • propagate np.nan / missing values correctly (where applicable),
    • preserve dtype where possible.
  • Avoid accidental conversion to NumPy unless explicitly requested (e.g., via out being a NumPy array, or ufunc not supported).

Non-goals (for this ticket)

  • Full coverage of every NumPy ufunc and method (accumulate, reduceat, outer, etc.).
  • Supporting ufuncs for Arkouda Strings and Categorical unless there is a clear, existing Arkouda primitive (should raise a helpful TypeError for now).
  • Implementing NumPy array protocol conversions beyond what is needed for ufunc interoperability.

Proposed Behavior

Supported inputs

  • self is the Arkouda ExtensionArray.
  • Additional inputs may include:
    • scalar Python numbers/bools,
    • NumPy scalars,
    • other Arkouda ExtensionArray instances,
    • pandas arrays/Series that wrap Arkouda arrays (unwrap as needed).

Dispatch rules

  1. Reject unsupported method values with NotImplemented (or TypeError if pandas expects it), except for:
    • __call__ (required)
    • reduce (optional, only for a small safe subset such as np.add.reduce, np.logical_or.reduce if Arkouda equivalents exist)
  2. If any input is a higher-priority type that should handle the ufunc, return NotImplemented.
  3. Map ufuncs to Arkouda server-side ops:
    • Unary: negative, absolute, invert (for bool/int), etc.
    • Binary: add, subtract, multiply, true_divide, floor_divide, power (if supported), comparisons (equal, not_equal, less, greater, etc.), logical ops for bool.
  4. If out is provided:
    • If out contains Arkouda ExtensionArrays: write into those (if we support it), else reject with a clear error.
    • If out contains NumPy arrays: either materialize (explicit) or raise (preferred) — pick one and document it.
  5. Return type:
    • For elementwise ops: return a new Arkouda ExtensionArray with the result.
    • For reduce: return a scalar (Python/NumPy scalar) or a 0-dim equivalent consistent with pandas expectations.

Error messages

  • For unsupported dtypes (Strings/Categorical): raise TypeError like:
    • "NumPy ufunc '<name>' is not supported for Arkouda dtype '<dtype>'"
  • For unsupported ufuncs: raise NotImplementedError or return NotImplemented depending on pandas expectations; include a message guiding users to convert explicitly if they really want NumPy.

Implementation Notes

  • Location: Arkouda pandas ExtensionArray class
  • Consider implementing a small internal dispatcher:
    • _UFUNC_TABLE: dict[np.ufunc, callable] or mapping by ufunc.__name__.
    • Centralize dtype checks and missing-value handling.
  • Ensure correct behavior with:
    • __array_priority__ (set high enough to win dispatch vs NumPy when appropriate),
    • __array__ (if implemented) does not accidentally trigger conversions in the ufunc path.
  • Make sure __array_ufunc__ does not break Series ops that pandas already routes through its own arithmetic machinery.

Repro / Expected UX

Example (should stay on Arkouda)

>>> import arkouda as ak
>>> import numpy as np
>>> import pandas as pd
>>> s = pd.Series([1, 2, 3], dtype="ak")
>>> (np.add(s.array, 5)).to_numpy()  # materialize only at the end
array([6, 7, 8])

Example (unsupported dtype gives helpful error)

>>> import arkouda as ak
>>> import numpy as np
>>> import pandas as pd
>>> s = pd.Series(["a", "b"], dtype="ak")
>>> np.add(s.array, "x")
TypeError: NumPy ufunc 'add' is not supported for Arkouda dtype 'string'

Tests

Add unit tests covering:

  • Unary ufunc: np.negative, np.absolute (numeric)
  • Binary ufunc: np.add, np.subtract, np.multiply, np.true_divide (numeric)
  • Comparisons: np.equal, np.less, etc. (numeric/bool)
  • Mixed scalar + EA and EA + EA
  • out= behavior (whatever policy is chosen)
  • Unsupported ufunc raises/returns NotImplemented in a predictable way
  • Unsupported dtype (Strings/Categorical) raises a clear TypeError
  • Ensure no silent to_numpy() / materialization occurs in the supported paths:
    • validate the result is an Arkouda ExtensionArray (or wraps one)

Acceptance Criteria

  • __array_ufunc__ is implemented on the Arkouda ExtensionArray.
  • Core elementwise numeric ufuncs work end-to-end without NumPy materialization.
  • Unsupported ufuncs/dtypes produce clear, consistent errors.
  • Test suite includes coverage for supported, unsupported, and edge cases (including out=).
  • Documentation/comments explain the supported ufunc surface and rationale for exclusions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions