Added make_mt_IHME.py to convert IHME flat files with ICD codes and morbidity data into mt and ht.#3
Added make_mt_IHME.py to convert IHME flat files with ICD codes and morbidity data into mt and ht.#3
Conversation
…orbidity data into mt and ht.
| 'must be present in the MatrixTable column fields.') | ||
|
|
||
|
|
||
| parser = argparse.ArgumentParser() |
There was a problem hiding this comment.
I always put this inside the if __name__ == '__main__' block. It probably doesn't matter, but its not really needed if you were to import from this script, so I think it makes sense there.
|
|
||
|
|
||
| parser = argparse.ArgumentParser() | ||
| parser.add_argument('--input', type=str, help='Location of the table containing the data for merging. Expects this to contain data for a single year.') |
There was a problem hiding this comment.
type=str is the default. fine to leave it if you want, but you can also omit
| entry_fields: list, field_names: list): | ||
| if not row_key in field_names: | ||
| raise ValueError('Row key must be in set of fields.') | ||
| if(len(row_other) == 0): |
There was a problem hiding this comment.
don't need the outer parens here
| row_other_l = row_other.split(',') | ||
| if not all([row in field_names for row in row_other_l]): | ||
| raise ValueError('All row fields must be in the columns of the read table.') | ||
| return row_key, list(np.setdiff1d(list(set(row_other_l)),[row_key])) |
There was a problem hiding this comment.
You don't modify row_key - is there a compelling reason to return it?
|
|
||
| ht : Hail Table | ||
| Should contain columns that will be transformed into the new struct. | ||
|
|
There was a problem hiding this comment.
Can you provide a simple pictoral example of what this function does? Hard to envision from just the docstring
| age_groups = list(set(age_groups)) | ||
| age_groups_not_in_vals = np.setdiff1d(age_groups,age_vals) | ||
| if len(age_groups_not_in_vals) == 0: | ||
| mt_fc = mt_f.filter_cols(hl.literal(age_groups).contains(mt_f.age_name)) |
| args = parser.parse_args() | ||
|
|
||
| # Import | ||
| ht = hl.import_table(args.input, force=True, impute=True) |
There was a problem hiding this comment.
If you're making an assumption that this is gzipped, then you must have a particular file in mind. If so, I would suggest making an entry in resources/phenotypes.py like one of the other resources and read it from there rather than taking it as an input arg
| mt = ht_m.to_matrix_table(row_key=[row_key], row_fields=row_other_fields, | ||
| col_key=[column_key], col_fields=column_other_fields, | ||
| n_partitions=100) | ||
| mt = mt.annotate_globals(year=args.year) |
There was a problem hiding this comment.
Is this for tracking? Is the date in the filename? If so, filename might be better than a user-specified year - if a user picks a new file they might not notice this argument and not update it
| mt_f = mt.annotate_rows(**modifications) | ||
|
|
||
| # Output MatrixTable | ||
| mt_f.write(args.output, overwrite=True) |
There was a problem hiding this comment.
parameterize an args.overwrite (action='store_true')
Upstream code to generate the compressed flat file is R code that I still have locally, and need to clean up.