Added make_mt_IHME.py to convert IHME flat files with ICD codes and morbidity data into mt and ht. by rahulg603 · Pull Request #3 · Nealelab/ukb_common

rahulg603 · 2021-01-08T19:39:15Z

Upstream code to generate the compressed flat file is R code that I still have locally, and need to clean up.

…orbidity data into mt and ht.

konradjk · 2021-01-11T16:57:07Z

make_mt_IHME.py

+                         'must be present in the MatrixTable column fields.')
+
+
+parser = argparse.ArgumentParser()


I always put this inside the if __name__ == '__main__' block. It probably doesn't matter, but its not really needed if you were to import from this script, so I think it makes sense there.

konradjk · 2021-01-11T16:57:23Z

make_mt_IHME.py

+
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--input', type=str, help='Location of the table containing the data for merging. Expects this to contain data for a single year.')


type=str is the default. fine to leave it if you want, but you can also omit

konradjk · 2021-01-13T14:47:09Z

make_mt_IHME.py

+                 entry_fields: list, field_names: list):
+    if not row_key in field_names:
+        raise ValueError('Row key must be in set of fields.')
+    if(len(row_other) == 0):


don't need the outer parens here

konradjk · 2021-01-13T14:47:35Z

make_mt_IHME.py

+        row_other_l = row_other.split(',')
+        if not all([row in field_names for row in row_other_l]):
+            raise ValueError('All row fields must be in the columns of the read table.')
+    return row_key, list(np.setdiff1d(list(set(row_other_l)),[row_key]))


You don't modify row_key - is there a compelling reason to return it?

konradjk · 2021-01-13T14:50:15Z

make_mt_IHME.py

+
+    ht : Hail Table
+        Should contain columns that will be transformed into the new struct.
+


Can you provide a simple pictoral example of what this function does? Hard to envision from just the docstring

konradjk · 2021-01-13T14:52:10Z

make_mt_IHME.py

+    age_groups = list(set(age_groups))
+    age_groups_not_in_vals = np.setdiff1d(age_groups,age_vals)
+    if len(age_groups_not_in_vals) == 0:
+        mt_fc = mt_f.filter_cols(hl.literal(age_groups).contains(mt_f.age_name))


What is mt_f?

konradjk · 2021-01-13T14:53:40Z

make_mt_IHME.py

+    args = parser.parse_args()
+
+    # Import
+    ht = hl.import_table(args.input, force=True, impute=True)


If you're making an assumption that this is gzipped, then you must have a particular file in mind. If so, I would suggest making an entry in resources/phenotypes.py like one of the other resources and read it from there rather than taking it as an input arg

konradjk · 2021-01-13T14:55:22Z

make_mt_IHME.py

+    mt = ht_m.to_matrix_table(row_key=[row_key], row_fields=row_other_fields,
+                              col_key=[column_key], col_fields=column_other_fields,
+                              n_partitions=100)
+    mt = mt.annotate_globals(year=args.year)


Is this for tracking? Is the date in the filename? If so, filename might be better than a user-specified year - if a user picks a new file they might not notice this argument and not update it

konradjk · 2021-01-13T14:56:00Z

make_mt_IHME.py

+    mt_f = mt.annotate_rows(**modifications)
+
+    # Output MatrixTable
+    mt_f.write(args.output, overwrite=True)


parameterize an args.overwrite (action='store_true')

Added make_mt_IHME.py to convert IHME flat files with ICD codes and m…

a1c4889

…orbidity data into mt and ht.

rahulg603 requested a review from konradjk January 8, 2021 19:39

konradjk reviewed Jan 13, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Added make_mt_IHME.py to convert IHME flat files with ICD codes and morbidity data into mt and ht.#3

Added make_mt_IHME.py to convert IHME flat files with ICD codes and morbidity data into mt and ht.#3
rahulg603 wants to merge 1 commit intomasterfrom
ukb_common_ihme

rahulg603 commented Jan 8, 2021

Uh oh!

konradjk Jan 11, 2021

Uh oh!

konradjk Jan 11, 2021

Uh oh!

konradjk Jan 13, 2021

Uh oh!

konradjk Jan 13, 2021

Uh oh!

konradjk Jan 13, 2021

Uh oh!

konradjk Jan 13, 2021

Uh oh!

konradjk Jan 13, 2021

Uh oh!

konradjk Jan 13, 2021

Uh oh!

konradjk Jan 13, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		'must be present in the MatrixTable column fields.')


		parser = argparse.ArgumentParser()



		parser = argparse.ArgumentParser()
		parser.add_argument('--input', type=str, help='Location of the table containing the data for merging. Expects this to contain data for a single year.')


		ht : Hail Table
		Should contain columns that will be transformed into the new struct.

Comments

Conversation

rahulg603 commented Jan 8, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants