-
Notifications
You must be signed in to change notification settings - Fork 47
Description
Similar to Issue #391 with cohort query abstraction, do similar for common task of clinical feature extraction from Stanford clinical EMR databases. Adapt to SQL to make it efficient to just reuse BigQuery infrastructure without having dependencies on Python, R, or other layers. (Though application logic layers can still be used for more advanced feature engineering and manipulation.)
Do we have SQL code for common clinical feature extraction / engineering / feature matrix factory? I know we have several version of people's Python code for common feature engineering, @ccorbin's is probably the most robust at this point. But just as we can consolidate cohort construction in just SQL (given how bizarrely efficient BigQuery is), it would be worth doing the same for feature extraction. Can still use Python or other downstream code for further feature engineering or manipulation. I think @Grace K might have had some examples of constructing some common features using just SQL queries to add feature columns to a cohort table/query?