Conversation
scripts/api/api.py
Outdated
| headers=headers, | ||
| data=data_file_data, | ||
| ) | ||
| print(r) |
There was a problem hiding this comment.
I don't think it should print. Maybe a log.info?
scripts/api/api.py
Outdated
| headers=headers, | ||
| data=metadata_file_data, | ||
| ) | ||
| print(r) |
There was a problem hiding this comment.
Same as above. Now that I think about it, we probably want to return the status or success of r?
|
|
||
| def diverge_files(self, path): | ||
| basename = os.path.basename(path) | ||
| is_data_file = self.DATAFILE.match(basename) |
There was a problem hiding this comment.
What do you think about calling these vars data_file and metadata_file? It's not a boolean expression, I believe it returns the matched regex or None.
| is_metadata_file = self.METADATA.match(basename) | ||
|
|
||
| if is_data_file: | ||
| file_extension_to_dict = self.DATAFILE.match(basename).groupdict() |
There was a problem hiding this comment.
Can use the var declared on 26 here.
| is_metadata_file = self.METADATA.match(basename) | ||
|
|
||
| if is_data_file: | ||
| file_extension_to_dict = self.DATAFILE.match(basename).groupdict() |
There was a problem hiding this comment.
| file_extension_to_dict = self.DATAFILE.match(basename).groupdict() | |
| data_file_matches = self.DATAFILE.match(basename).groupdict() |
|
|
||
| return self.process_data_file(path, file_extension_to_dict) | ||
| if is_metadata_file: | ||
| metadata_file_extension = self.METADATA.match(basename).groupdict() |
There was a problem hiding this comment.
| metadata_file_extension = self.METADATA.match(basename).groupdict() | |
| metadata_file_matches = self.METADATA.match(basename).groupdict() |
| self.data_file_list = data_file_list | ||
| self.metadata_file_list = metadata_file_list | ||
|
|
||
| def diverge_files(self, path): |
There was a problem hiding this comment.
| def diverge_files(self, path): | |
| def process_file(self, path): |
|
@Elkrival Is there an example config containing the new URL directive pointing to the API endpoint? I couldn't find it. |
17a12ba to
7858a39
Compare
|
@ngmaloney 733d948 this commit updates the example config |
build/scripts-3.11/import.py
Outdated
| parser = ap.ArgumentParser() | ||
| parser.add_argument('-c', '--config') | ||
| parser.add_argument('-d', '--dbname', default='dpdata') | ||
| parser.add_argument('-v', '--verbose', action='store_true') | ||
| parser.add_argument('expr') | ||
| args = parser.parse_args() |
build/scripts-3.11/import.py
Outdated
| # dirname = os.path.dirname(f) | ||
| # basename = os.path.basename(f) | ||
| # # probe for dpdash-compatibility and gather information | ||
| # probe = dpimport.probe(f) | ||
| # if not probe: | ||
| # logger.debug('document is unknown %s', basename) | ||
| # continue | ||
| # # nothing to be done | ||
| # if db.exists(probe): | ||
| # logger.info('document exists and is up to date %s', probe['path']) | ||
| # continue | ||
| # logger.info('document does not exist or is out of date %s', probe['path']) | ||
| # # import the file | ||
| # logger.info('importing file %s', f) | ||
| # dppylib.import_file(db.db, probe) | ||
|
|
||
| # logger.info('cleaning metadata') | ||
| # lastday = get_lastday(db.db) | ||
| # if lastday: | ||
| # clean_metadata(db.db, lastday) |
There was a problem hiding this comment.
| # dirname = os.path.dirname(f) | |
| # basename = os.path.basename(f) | |
| # # probe for dpdash-compatibility and gather information | |
| # probe = dpimport.probe(f) | |
| # if not probe: | |
| # logger.debug('document is unknown %s', basename) | |
| # continue | |
| # # nothing to be done | |
| # if db.exists(probe): | |
| # logger.info('document exists and is up to date %s', probe['path']) | |
| # continue | |
| # logger.info('document does not exist or is out of date %s', probe['path']) | |
| # # import the file | |
| # logger.info('importing file %s', f) | |
| # dppylib.import_file(db.db, probe) | |
| # logger.info('cleaning metadata') | |
| # lastday = get_lastday(db.db) | |
| # if lastday: | |
| # clean_metadata(db.db, lastday) |
|
|
||
| DPimport is a command line tool for importing files into DPdash using a | ||
| simple [`glob`](https://en.wikipedia.org/wiki/Glob_(programming)) expression. | ||
| simple [`glob`](<https://en.wikipedia.org/wiki/Glob_(programming)>) expression. |
There was a problem hiding this comment.
Why does this need the <>s? Is it because of the parens in the url?
There was a problem hiding this comment.
There were pushes done to main so maybe this was done for a reason. I'm unsure what.
|
|
||
|
|
||
|
|
||
| ## MongoDB |
There was a problem hiding this comment.
Is it worth noting something like "This used to require Mongo but doesn't anymore"?
There was a problem hiding this comment.
The app still uses mongo, but we don't write to it directly anymore
build/scripts-3.11/import.py
Outdated
| logging.basicConfig(level=level) | ||
|
|
||
| with open(os.path.expanduser(args.config), 'r') as fo: | ||
| config = yaml.load(fo, Loader=yaml.SafeLoader) |
There was a problem hiding this comment.
yes it's used in the import.py file within the scripts directory
build/scripts-3.11/import.py
Outdated
| studies[subject['_id']['study']] = {} | ||
| studies[subject['_id']['study']]['subject'] = [] | ||
| studies[subject['_id']['study']]['max_day'] = 0 |
There was a problem hiding this comment.
subject['_id']['study'] is referenced a lot in this loop. Feels like it would be worth it to both give it a clear name/reason and also to reduce noise on any given line.
build/scripts-3.11/import.py
Outdated
| { | ||
| '_id' : True, | ||
| 'collection' : True, | ||
| 'synced' : True |
There was a problem hiding this comment.
Does this line request that the sync status be part of the result set?
There was a problem hiding this comment.
not sure, but this code is not part of the work
655c34d
build/scripts-3.11/import.py
Outdated
| if doc['synced'] is False and 'collection' in doc: | ||
| db[doc['collection']].drop() |
There was a problem hiding this comment.
Why does it look like you're dropping the whole collection?
There was a problem hiding this comment.
it's not part of the original work, it's been removed 655c34d
build/scripts-3.11/import.py
Outdated
| subject_metadata['days'] = subject['days'] | ||
| subject_metadata['study'] = subject['_id']['study'] | ||
|
|
||
| studies[subject['_id']['study']]['max_day'] = studies[subject['_id']['study']]['max_day'] if (studies[subject['_id']['study']]['max_day'] >= subject['days'] ) else subject['days'] |
There was a problem hiding this comment.
this is outdated it's been removed
655c34d
scripts/api/api.py
Outdated
| def create_data_file(api_url, data_file_data): | ||
| request_url = api_url + "day" | ||
| headers = {"content-type": "application/json"} | ||
| r = requests.post( | ||
| request_url, | ||
| headers=headers, | ||
| data=data_file_data, | ||
| ) | ||
| status = r.status_code | ||
| if status != 200: | ||
| response = r.json()["message"] | ||
| logging.info(response) | ||
| else: | ||
| response = r.json()["data"] | ||
| logging.info(response) | ||
|
|
||
|
|
||
| def create_metadata_file(api_url, metadata_file_data): | ||
| request_url = api_url + "metadata" | ||
| headers = {"content-type": "application/json"} | ||
| r = requests.post( | ||
| request_url, | ||
| headers=headers, | ||
| data=metadata_file_data, | ||
| ) | ||
| status = r.status_code | ||
| if status != 200: | ||
| response = r.json()["message"] | ||
| logging.info(response) | ||
| else: | ||
| response = r.json()["data"] | ||
| logging.info(response) |
There was a problem hiding this comment.
Unless I'm missing something, these look identical aside from the url's postfix. Could you move this functionality into a common function that both create_data_file and create_metadata_file both call?
* Added service that parses data csv and metadata csv * Using the new data structures the service parses the data and creats a json * Added tests to the service Updated import script for v2 * Removed all files that parse mongodb and removed dependency * Added service that handles incoming csv data for metadata and participant csv data * Service also has a method to convert data to json * Added api request with a configuration for url Add requests * Added requests dependency Updates to script to add hash collection * Added hash collection
655c34d to
9fdfcc4
Compare
Test CSV Load generator
Add authentication to request
* Subject ID and uppercase Study keys needed to be updated for import
* updated key in metadata * Updated tests
* Handle infinity and nan values * If compute error, import as string values
* Updates to import
* Updates for new data migration structure
* Updated tests
* Test consent and synced
* extract variables from assessment
|
@Elkrival what's the status of this PR? Does it need to be rebased/merged or closed? |
* removed print
* added variables as dictionaries
* Pr comments
Extract variables
* print structure
When connecting with local/self-signed/dev environments it is sometimes necessary to bypass SSL verification. This commit adds a config option to do so but it is not the default nor is it considered secure/best-practice.
This should also speed up the import API.
Omit verbosity
This pr updates the script to parse csv files and makes a requests to the api. It no longer depends on pymongo to parse the database and make the updates.
The pr removes unused code, and it adds a service that collects the parsed data, when the data is collected it then makes a request to the backend node api to insert the data.
It also adds a configuration of api_url which points to the server route.
Tests and more tests.
Figma link to workflow. https://www.figma.com/file/EYWZPBDHD25Of0KUkwMksC/import-Workflow?type=whiteboard&node-id=0-1&t=DDCkCP8SeU9rSRy6-0