-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
There is currently a src/resources/ folder that holds datasets for the processing scripts. However, I don't think this is the place to keep them. Two main reason I can think of for this is:
- When/if the project is moved to be an installable package via
pip, thisresources/folder will not be captured in the install. We can of course move it under the installed files, however, I feel this should be reserved for code (just my opinion though and I think this is up for discussion) - If people clone the repository, they will be downloading all the data in this folder. If its just a few small csvs its no big deal. But this can quickly get out of hand if all of us start adding datasets to this folder.
This issue is likely part of the bigger question of "where to store our data"? However, I would consider this a successful PR if:
- Datasets are moved out of the
resources/folder - Scripts will only download the required datasets for their functionality
- Scripts will check if data have already been locally downloaded first, and only download when required (probably a utility function)
Metadata
Metadata
Assignees
Labels
No labels