Skip to content

Implement ArkoudaCategoricalArray.__setitem__ (pandas-aligned) #5430

@ajpotts

Description

@ajpotts

Summary

Implement ArkoudaCategoricalArray.__setitem__ to support
pandas-compatible item assignment into Arkouda-backed categorical
ExtensionArrays.

This is required for common pandas workflows such as:

  • Series.loc[...] = ... / Series.iloc[...] = ...
  • boolean mask assignment
  • where/mask
  • fillna and other in-place manager paths
  • categorical value replacement without dtype loss

Currently, assignment into Arkouda categorical arrays is missing or
inconsistent, leading to TypeError/NotImplementedError or pandas
fallback behavior (often converting to object/NumPy).


Background / Why

pandas Categorical supports item assignment with strict rules:

  • Assigned values must be existing categories or missing
  • New categories are not implicitly added (unless user explicitly
    adds them via add_categories or similar higher-level API)
  • Missing values are supported and propagate through codes/mask
  • Assignment must preserve dtype
    (CategoricalDtype(categories=..., ordered=...))

For Arkouda-backed categoricals, we want identical semantics while
keeping operations server-side where possible.


Requirements / Expected pandas Semantics

Given categories ["a", "b"]:

  1. Assign existing category:
    • cat[0] = "b" is allowed
  2. Assign missing:
    • cat[0] = None / pd.NA is allowed and marks entry missing
  3. Assign value not in categories:
    • cat[0] = "c" should raise (typically
      TypeError/ValueError depending on path)
    • pandas message often indicates: "Cannot setitem on a Categorical
      with a new category..."
  4. Assignment via indexers should work:
    • int, slice, boolean mask, integer array indexer
  5. Broadcasting rules:
    • scalar value broadcasts to all targeted positions
    • array-like values must match number of targeted positions

Scope

In Scope

  • Implement ArkoudaCategoricalArray.__setitem__(key, value)
  • Support keys:
    • int position
    • slice
    • boolean mask (same length)
    • integer indexer (array-like positions)
  • Support values:
    • scalar category label
    • scalar missing (None, pd.NA, possibly np.nan)
    • array-like of labels/missing matching target selection length
    • another ArkoudaCategoricalArray (assignment by position)
  • Enforce "no new categories" rule
  • Preserve:
    • categories
    • ordered flag
    • dtype and internal representation (codes + categories + missing
      marker/mask)
  • Add unit tests

Out of Scope

  • Adding categories automatically during setitem
  • Implementing add_categories / remove_categories (if not already
    present)
  • 2D assignment (categorical EA is 1D)
  • Alignment by Index labels (handled by pandas, not EA)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions