-
Notifications
You must be signed in to change notification settings - Fork 97
Description
Summary
Implement the equals method for: - ArkoudaStringArray -
ArkoudaCategoricalArray
to match pandas ExtensionArray semantics.
pandas relies on .equals() for correctness checks, testing utilities,
alignment logic, and some internal fast-path decisions. Missing or
incorrect implementations can cause false negatives/positives in
comparisons and may trigger slow fallbacks (e.g., converting to
object/NumPy).
Background / Why
In pandas, ExtensionArray.equals(other) answers:
Are these two arrays the same length and do they contain equal
elements in the same positions, treating missing values as equal to
missing values?
Key points: - This is not elementwise comparison (==); it returns
a single boolean. - Missing values compare equal only when both are
missing in the same positions. - For categoricals, equality also
depends on dtype metadata (categories/order).
This method is used in: - pandas tests/assertions (tm.assert_*
helpers) - Series.equals, Index.equals - some optimization checks
(e.g., short-circuiting operations)
Expected pandas Semantics
Strings
Two arrays are equal if: - Same length - For each position: - both
missing → equal - both non-missing and strings equal → equal - otherwise
not equal
Example: - ["a", None, "b"] equals ["a", None, "b"] → True -
["a", None, "b"] equals ["a", "x", "b"] → False - ["a", None]
equals ["a"] → False
Categoricals
Two categoricals are equal if: - Same length - Same dtype metadata
(pandas behavior requires: - same categories (typically same values and
same order) - same ordered flag) - And the codes (including
missing) match positionally
Examples: - Categorical(["a", None], categories=["a","b"]) equals same
dtype and same values → True - Same values but different categories
order → False (pandas treats dtype mismatch as not equal) - Same
categories but different ordered flag → False
Note: If pandas allows equality when categories are the same set but
different order, we should match exactly what pandas does for
Categorical.equals.
Scope
In Scope
- Implement:
ArkoudaStringArray.equals(other) -> boolArkoudaCategoricalArray.equals(other) -> bool
- Accept
otheras:- same Arkouda array type
- pandas equivalent array type where reasonable (e.g., pandas
StringArray/Categorical) - array-like (optional; if not supported, return False)
- Ensure missing-value semantics match pandas
- Avoid full materialization for large arrays (no
.to_numpy()of
full data) - Add unit tests comparing to pandas baselines
Out of Scope
- Elementwise comparisons (
==), handled elsewhere - Cross-dtype "coercive" equality (should generally return False)