This is the use case template for the EcoViz+AI workshop.
This should be done once by the project lead. After the project lead creates their copy of the repository on GitHub, everyone else in the group will create their own fork.
To use this template, clone this repo and rename it.
From the directory where you keep git repositories (e.g., ~/Documents/GitHub/), run this command.
git clone https://github.com/FlukeAndFeather/ecoviztemplate.git [your repo name]
Make sure to replace [your repo name] with the repo name you've chosen for your use case. Otherwise it'll be called "ecoviztemplate".
Create a remote repository on GitHub with the same name.
Then, back at the command line, run this command (replacing the username and repo name).
git remote add origin https://github.com/[your username]/[your repo name].git
Then push your local to the remote.
git push -u origin main
The template is organized to accomplish the three goals of a research compendium.
- Folders and files organized according to conventions
- Modular organization of data, methods, and outputs
- Record of the computational environment
This template supports analyses in Python, R, or a combination of the two.
├── data
├── figs
├── notebooks
├── output
├── paper
├── pipeline
└── src
| ├── Python
| └── R
└── traintest
├── test
└── train
data/- raw data onlyfigs/- static figures generated during pipelinenotebooks/- literate programming files (e.g., Jupyter, Quarto) explaining the pipelineoutput/- derived data generated during pipelinepaper/- the analysis manuscript (preferably in .md format)pipeline/- pipeline scriptssrc/- source files supporting the pipeline (e.g., functions, classes, constants)traintest/- specifies train/test splits for reproducibility
├── DESCRIPTION
└── requirements.txt
These are conventional files used for recording package dependencies.
DESCRIPTION is a file used by R packages to capture package metadata. It is useful for analyses because it records package dependencies following commonly used conventions. Add packages (with version numbers) to the Imports field. When other users clone your repo, they can install dependencies by calling devtools::install_deps() in R.
requirements.txt is a file used to record Python package dependencies. Users can install dependencies by running pip install -r requirements.txt at the command line.
├── data
├── figs
├── output
├── pipeline
│ ├── 00_download_data.R
│ ├── 00_download_data.py
│ ├── _run_pipeline.R
│ └── _run_pipeline.py
└── src
| ├── Python
| └── R
└── traintest
├── test
└── train
The pipeline folder structure supports modular organization of your code. This modular organization makes it easier to:
- Distribute development among collaborators
- Document pipeline functionality
- Reuse or replace components over time
Here's how the pipeline flows.
pipeline > _run_pipeline.Rorpipeline > _run_pipeline.pyis a single script that runs the full pipeline._run_pipeline.*calls pipeline scripts in order e.g.,00_download_data.R,01_preprocess_data.R,02_fit_model.R, ...,10_render_notebooks.R.- Pipeline scripts use functions, classes, and constant defined by the modules in
src/Python/or scripts insrc/R/. - Pipeline scripts generate processed data in
output/and static visualizations infigs/. - Train/test splits for AI models use the identifiers in
traintest/train/andtraintest/test/.
00_download_data.R and 00_download_data.py are provided as examples. Replace them and add additional pipeline scripts as appropriate.
The last pipeline script should render all literate programming documents (see next section).
├── notebooks
└── paper
The literate programming documents in notebooks/ explain the pipeline components. They should render relatively quickly, so keep long-running commands in the pipeline scripts. Keep rendering times short by using the processed data in outputs/.
Rendering the documents in notebooks/ should be automated by the last pipeline script (e.g., pipeline/10_render_notebooks.R). For Jupyter notebooks, render them to HTML using nbconvert: jupyter nbconvert --to html notebooks/your_notebook.ipynb. For Quarto documents, use quarto render notebooks/your_notebook.qmd --to html.
paper/ contains a manuscript describing your use case. It should preferably be in Markdown or a literate programming script.