-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Labels
SDTMCDISC SDTM/SDTMIGCDISC SDTM/SDTMIGquestionFurther information is requestedFurther information is requested
Milestone
Description
Feature Description
CORE: 0.14.0
SUPP merge doesn't work in following case:
(1) Data is in Excel Format
(2) Merge by --SEQ variable (numeric)
Numeric data has ".0" in pandas dataframe in this case.
STUDYID DOMAIN USUBJID SPDEVID AESEQ ...
0 TEST AE A001 None 1.0 ...
1 TEST AE A001 None 2.0 ...
On the other hand, IDVARVAL variable is character type.
STUDYID RDOMAIN USUBJID IDVAR IDVARVAL ...
0 TEST AE A001 AESEQ 1 ...
1 TEST AE A001 AESEQ 2 ...
"def merge_pivot_supp_dataset" handles this merge (cdisc_rules_engine/utilities/data_processor.py)
I see the following, and this code recognize "1" vs "1.0" is unequal.
if not is_blank:
common_keys.append(dynamic_key)
current_supp = right_dataset.rename(columns={"IDVARVAL": dynamic_key})
current_supp = current_supp.drop(columns=["IDVAR"])
left_dataset[dynamic_key] = left_dataset[dynamic_key].astype(str)
current_supp[dynamic_key] = current_supp[dynamic_key].astype(str)
Maybe normalization of numeric variable data resolve this issue. Create a new function like this...
def normalize_numeric_key(x):
if pd.isna(x):
return x
try:
f = float(x)
if f.is_integer():
return str(int(f))
return str(f)
except (ValueError, TypeError):
return str(x)
... and apply for left dataset
if not is_blank:
common_keys.append(dynamic_key)
current_supp = right_dataset.rename(columns={"IDVARVAL": dynamic_key})
current_supp = current_supp.drop(columns=["IDVAR"])
left_dataset[dynamic_key] = left_dataset[dynamic_key].apply(normalize_numeric_key)
current_supp[dynamic_key] = current_supp[dynamic_key].astype(str)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
SDTMCDISC SDTM/SDTMIGCDISC SDTM/SDTMIGquestionFurther information is requestedFurther information is requested