Closes #5335: Bug: pd.array fails on Arkouda-backed Categorical with dtype specified#5448
Open
ajpotts wants to merge 1 commit intoBears-R-Us:mainfrom
Conversation
4c76949 to
5252427
Compare
…ical with dtype specified
5252427 to
9088871
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes an issue where calling
pd.array()on an Arkouda-backedCategoricalwith an explicitdtypewould fail with aNotImplementedError.The failure occurred because
pd.array()routed throughArkoudaArray._from_sequence, which ultimately attempted to iterateover the
Categorical. Arkouda intentionally disallows iteration onCategoricalobjects to prevent implicit data transfer from the server.This change introduces a safe conversion path that: - Detects
Arkouda-backed
Categoricalinputs in_from_sequence- Extracts theserver-side categorical
codes- Casts the codes if adtypeisprovided - Constructs the
ArkoudaArraydirectly from the server-sidepdarrayThis avoids iteration entirely and preserves server-side semantics.
Root Cause
pd.array(cat, dtype="ak_int64")triggered:ArkoudaArray._from_sequenceak_array(scalars, ...)list(scalars)insideak_arrayCategorical.__iter__, which raisesNotImplementedErrorSince iteration is intentionally blocked for Categoricals, a direct
server-side conversion path was required.
Changes
_arkouda_array.pyCategoricalin_from_sequencecls(codes)without invokingak_arrayTests
Added:
This verifies that:
pd.array(cat, dtype="ak_int64")succeedscat.codesBehavioral Impact
Before: -
pd.array(Categorical(...), dtype=...)raisedNotImplementedErrorAfter: - Returns an
ArkoudaArraybacked by categorical codes - Noimplicit data transfer - Fully server-side conversion path
Example
Closes #5335: Bug: pd.array fails on Arkouda-backed Categorical with dtype specified