Skip to content

Closes #5335: Bug: pd.array fails on Arkouda-backed Categorical with dtype specified#5448

Open
ajpotts wants to merge 1 commit intoBears-R-Us:mainfrom
ajpotts:5335_pd.array_fails_on_Arkouda-backed_Categorical_with_dtype_specified
Open

Closes #5335: Bug: pd.array fails on Arkouda-backed Categorical with dtype specified#5448
ajpotts wants to merge 1 commit intoBears-R-Us:mainfrom
ajpotts:5335_pd.array_fails_on_Arkouda-backed_Categorical_with_dtype_specified

Conversation

@ajpotts
Copy link
Contributor

@ajpotts ajpotts commented Feb 27, 2026

Summary

Fixes an issue where calling pd.array() on an Arkouda-backed
Categorical with an explicit dtype would fail with a
NotImplementedError.

The failure occurred because pd.array() routed through
ArkoudaArray._from_sequence, which ultimately attempted to iterate
over the Categorical. Arkouda intentionally disallows iteration on
Categorical objects to prevent implicit data transfer from the server.

This change introduces a safe conversion path that: - Detects
Arkouda-backed Categorical inputs in _from_sequence - Extracts the
server-side categorical codes - Casts the codes if a dtype is
provided - Constructs the ArkoudaArray directly from the server-side
pdarray

This avoids iteration entirely and preserves server-side semantics.


Root Cause

pd.array(cat, dtype="ak_int64") triggered:

  1. ArkoudaArray._from_sequence
  2. ak_array(scalars, ...)
  3. list(scalars) inside ak_array
  4. Categorical.__iter__, which raises NotImplementedError

Since iteration is intentionally blocked for Categoricals, a direct
server-side conversion path was required.


Changes

_arkouda_array.py

  • Added special-case handling for Arkouda Categorical in
    _from_sequence
  • Extract categorical codes directly
  • Cast codes when a dtype is provided
  • Return cls(codes) without invoking ak_array

Tests

Added:

def test_pd_array_with_dtype_on_ak_categorical_should_not_iterate(self):

This verifies that:

  • pd.array(cat, dtype="ak_int64") succeeds
  • The resulting values match cat.codes
  • No iteration occurs

Behavioral Impact

Before: - pd.array(Categorical(...), dtype=...) raised
NotImplementedError

After: - Returns an ArkoudaArray backed by categorical codes - No
implicit data transfer - Fully server-side conversion path


Example

import arkouda as ak
import pandas as pd
from arkouda.pandas import Categorical

cat = Categorical(ak.array(["a", "a", "b"]))
arr = pd.array(cat, dtype="ak_int64")

# arr now contains the categorical codes

Closes #5335: Bug: pd.array fails on Arkouda-backed Categorical with dtype specified

@ajpotts ajpotts force-pushed the 5335_pd.array_fails_on_Arkouda-backed_Categorical_with_dtype_specified branch from 4c76949 to 5252427 Compare February 27, 2026 21:54
@ajpotts ajpotts force-pushed the 5335_pd.array_fails_on_Arkouda-backed_Categorical_with_dtype_specified branch from 5252427 to 9088871 Compare February 27, 2026 22:02
@ajpotts ajpotts requested a review from jaketrookman February 27, 2026 22:11
@ajpotts ajpotts marked this pull request as ready for review February 27, 2026 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: pd.array fails on Arkouda-backed Categorical with dtype specified

1 participant