-
Notifications
You must be signed in to change notification settings - Fork 97
Open
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
Summary
Implement ArkoudaCategoricalArray.__setitem__ to support
pandas-compatible item assignment into Arkouda-backed categorical
ExtensionArrays.
This is required for common pandas workflows such as:
Series.loc[...] = .../Series.iloc[...] = ...- boolean mask assignment
where/maskfillnaand other in-place manager paths- categorical value replacement without dtype loss
Currently, assignment into Arkouda categorical arrays is missing or
inconsistent, leading to TypeError/NotImplementedError or pandas
fallback behavior (often converting to object/NumPy).
Background / Why
pandas Categorical supports item assignment with strict rules:
- Assigned values must be existing categories or missing
- New categories are not implicitly added (unless user explicitly
adds them viaadd_categoriesor similar higher-level API) - Missing values are supported and propagate through codes/mask
- Assignment must preserve dtype
(CategoricalDtype(categories=..., ordered=...))
For Arkouda-backed categoricals, we want identical semantics while
keeping operations server-side where possible.
Requirements / Expected pandas Semantics
Given categories ["a", "b"]:
- Assign existing category:
cat[0] = "b"is allowed
- Assign missing:
cat[0] = None/pd.NAis allowed and marks entry missing
- Assign value not in categories:
cat[0] = "c"should raise (typically
TypeError/ValueErrordepending on path)- pandas message often indicates: "Cannot setitem on a Categorical
with a new category..."
- Assignment via indexers should work:
- int, slice, boolean mask, integer array indexer
- Broadcasting rules:
- scalar value broadcasts to all targeted positions
- array-like values must match number of targeted positions
Scope
In Scope
- Implement
ArkoudaCategoricalArray.__setitem__(key, value) - Support keys:
- int position
- slice
- boolean mask (same length)
- integer indexer (array-like positions)
- Support values:
- scalar category label
- scalar missing (
None,pd.NA, possiblynp.nan) - array-like of labels/missing matching target selection length
- another
ArkoudaCategoricalArray(assignment by position)
- Enforce "no new categories" rule
- Preserve:
- categories
- ordered flag
- dtype and internal representation (codes + categories + missing
marker/mask)
- Add unit tests
Out of Scope
- Adding categories automatically during setitem
- Implementing
add_categories/remove_categories(if not already
present) - 2D assignment (categorical EA is 1D)
- Alignment by Index labels (handled by pandas, not EA)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request