Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 5 additions & 12 deletions cdisc_rules_engine/operations/record_count.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def _execute_operation(self) -> pd.Series:
if self.params.regex
else effective_grouping
)
if self.params.regex:
if self.params.regex and filtered is None:
group_df = self._get_regex_grouped_counts(
self.params.dataframe, grouping_for_operations
)
Expand Down Expand Up @@ -102,19 +102,12 @@ def _get_regex_grouped_counts(self, dataframe, grouping_columns):
grouped_counts[col] = grouped_counts[col].astype(
df_for_grouping[col].dtype
)
original_with_idx = dataframe[grouping_columns].copy()
original_with_idx["_idx"] = range(len(original_with_idx))
transformed_with_idx = df_for_grouping.copy()
transformed_with_idx["_idx"] = range(len(transformed_with_idx))
transformed_with_counts = transformed_with_idx.merge(
transformed_with_counts = df_for_grouping.merge(
grouped_counts, on=grouping_columns, how="left"
)
original_with_idx["size"] = transformed_with_counts["size"].values
result = (
original_with_idx.drop(columns=["_idx"])
.groupby(grouping_columns, as_index=False, dropna=False)
.first()
)
result = dataframe[grouping_columns].copy()
result["size"] = transformed_with_counts["size"].values
result = result.groupby(grouping_columns, as_index=False, dropna=False).first()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new logic breaks row alignments and i snot equivalent to the previous logic. This can cause silent bugs. Could you please mention if you think this is correct and that regex transformations will not reorder rows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RamilCDISC why do you believe it will break row alignment? df_for_grouping is transformed by the apply_regex_to_grouping_columns but this performs only element-wise operations on selected columns. No rows are removed/added/reordered by this. The left merge keep all rows in the original order. I dont know where the row alignment not changing would be introduced

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I actually wanted to tag line 105. The merge operation can change the ordering. The previous code had 'idx' which could be used for reorder to original. This may not be a big issue. Please let me know your thoughts.

Copy link
Collaborator Author

@SFJohnson24 SFJohnson24 Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RamilCDISC
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html
"left: use only keys from left frame, similar to a SQL left outer join; preserve key order "

It is a left merge so the key order is preserved.

return result

def _apply_regex_to_grouping_columns(
Expand Down
Loading