Skip to content

Explore use of Datalad for storing datasets  #18

@trevorb1

Description

@trevorb1

Exploring the use of Datalad for the pulling of large datasets and versioning of the datasets might be useful. I am not totally sure if Datalad perfectly aligns with our use-case, but I think it is still worth exploring.

This site gives an overview of how Datalad can work with git-annex, and specifically, this section of the site gives an overview of how to "publish a dataset on GitHub with publicly-accessible annexed files" (with the key being, these files are not downloaded locally automatically). We still need a place to store files, but this may ease the process for large datasets.

More information on Datalad can be found here:
Website: https://www.datalad.org/
GitHub: https://github.com/datalad/datalad
Documentation: http://handbook.datalad.org/en/latest/index.html
Introduction presentation: https://training.westdri.ca/materials/datalad_for_hpc_1_1.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions