Skip to content

SUPP dataset merge by numeric variable #1660

@HajimeShimizu

Description

@HajimeShimizu

Feature Description

CORE: 0.14.0

SUPP merge doesn't work in following case:
(1) Data is in Excel Format
(2) Merge by --SEQ variable (numeric)

Numeric data has ".0" in pandas dataframe in this case.

  STUDYID DOMAIN USUBJID SPDEVID AESEQ  ... 
0    TEST     AE    A001    None     1.0  ...   
1    TEST     AE    A001    None     2.0  ... 

On the other hand, IDVARVAL variable is character type.

  STUDYID RDOMAIN USUBJID IDVAR IDVARVAL  ... 
0    TEST     AE    A001    AESEQ     1  ...   
1    TEST     AE    A001    AESEQ     2  ... 

"def merge_pivot_supp_dataset" handles this merge (cdisc_rules_engine/utilities/data_processor.py)

I see the following, and this code recognize "1" vs "1.0" is unequal.

            if not is_blank:
                common_keys.append(dynamic_key)
                current_supp = right_dataset.rename(columns={"IDVARVAL": dynamic_key})
                current_supp = current_supp.drop(columns=["IDVAR"])
                left_dataset[dynamic_key] = left_dataset[dynamic_key].astype(str)
                current_supp[dynamic_key] = current_supp[dynamic_key].astype(str)

Maybe normalization of numeric variable data resolve this issue. Create a new function like this...

        def normalize_numeric_key(x):
            if pd.isna(x):
                return x
            try:
                f = float(x)
                if f.is_integer():
                    return str(int(f))
                return str(f)
            except (ValueError, TypeError):
                return str(x)

... and apply for left dataset

            if not is_blank:
                common_keys.append(dynamic_key)
                current_supp = right_dataset.rename(columns={"IDVARVAL": dynamic_key})
                current_supp = current_supp.drop(columns=["IDVAR"])
                left_dataset[dynamic_key] = left_dataset[dynamic_key].apply(normalize_numeric_key)
                current_supp[dynamic_key] = current_supp[dynamic_key].astype(str)

Metadata

Metadata

Assignees

No one assigned

    Labels

    SDTMCDISC SDTM/SDTMIGquestionFurther information is requested

    Type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions