Soil ID Algorithm

Requirements

Python: 3.12 or better

Contributing

Configure git to automatically lint your code and validate validate your commit messages:

$ make setup_git_hooks

Set up a virtual environment and install dependencies:

$ uv venv
$ source .venv/bin/activate
$ make install && make install_dev

Configure the .env file to connect to the local database:

$ cp .env.sample .env

explanation of algorithm

terminology

soil map unit: (possibly disjoint) geographic area that is associated with soil component percentage / arial coverage
soil series: collection of related soil components
soil component: description of various soil properties at specific depth intervals

references

equation 1 in https://landpotential.org/wp-content/uploads/2020/07/sssaj-0-0-sssaj2017.09.0337.pdf

dependencies

simple features: https://r-spatial.github.io/sf/index.html
well-known geometry: https://paleolimbot.github.io/wk/
R package for querying soilDB: https://ncss-tech.github.io/soilDB/
dplyr: https://dplyr.tidyverse.org/

algorithm

Input: a specific point in lat/lon, and a set of depth intervals.

Query for all map units within 1km of the point.
Fall back to STATSGO at 10km if SSURGO is incomplete, or else declare not available if area not surveyed.
Associate each map unit with its polygons' minimum distance to the point in question.
Infill missing components by rescaling them to sum to 100.
Calculate the component probabilities by, for each component, dividing the distance-weighted sum of that component's probability in each map unit by the total distance-weighted sum of each component's probability in each map unit.
Limit to components in the top 12 component series by probability.
Query the local database for the component horizons.
Return the individual probabilities of data at each horizon based on the weighted sum of each component's data at each horizon.

Resources

SoilID Project Box Folder

• This folder contains the data schema and processed soil database tables that are ingested into the mySQL database.

• https://nrcs.app.box.com/s/vs999nq9ruyetb9b4l7okmssdggh8okn

SSURGO/STATSGO2 metadata:

• https://www.nrcs.usda.gov/resources/data-and-reports/ssurgo/stats2go-metadata

SSURGO/STATSGO data:

• https://nrcs.app.box.com/v/soils/folder/17971946225

Testing

Regular tests

There are several smaller test suites:

There is a set of "unit" tests, which really are testing the entire codebase more or less, but don't rely on any external API services, instead using snapshotted data from those services. You can run these tests with make test_unit.
- These tests mostly produce snapshots of algorithm output rather than validating specific properties of the output, so they moreso validate that the algorithm hasn't changed (or how it has changed) rather than that it is correct. If the snapshots have changed in a desirable way, you can update them with make test_update_unit_snapshots.
For US only, there is a set of "integration" tests which run the algorithm against the live API services, but just confirm that the algorithm doesn't crash, they don't validate the output since it can change over time. These can be run with make test_integration.
The unit and integration tests can be run together with make test for convenience: this is what must pass for a PR to be mergeable.
The API snapshots themselves can be checked against the live API for drift using make test_api_snapshot. They can be updated to the new live API values using make test_update_api_snapshots.

Bulk test

There is a large suite of integration tests which takes many hours to run. It comes in the format of two scripts:

Run make generate_bulk_test_results_us or make generate_bulk_test_results_global to run the algorithm over a collection of thousands soil pits with soil IDs given by trained data collectors, which will accumulate the results in a log file. This can take several hours or potentially need to run overnight due (especially the US tests are slow due to the speed of external API services).
Run RESULTS_FILE=$RESULTS_FILE make process_bulk_test_results_us or RESULTS_FILE=$RESULTS_FILE make process_bulk_test_results_global to view statistics calculated over that log file. This can be run concurrently with generate_bulk_test_results to see statistics over the soil pits which have been run so far.
It has been nice to have these as two separate scripts because then you can iterate on the processing and display of statistics without interrupting the data collection.
It would be of value to also be able to run these US tests against snapshotted API data, it would just be much more onerous to collect and update the data.

Acknowledgements

Beaudette, D., Roudier, P., Brown, A. (2023). aqp: Algorithms for Quantitative Pedology. R package version 2.0.
Beaudette, D.E., Roudier, P., O'Geen, A.T. Algorithms for quantitative pedology: A toolkit for soil scientists, Computers & Geosciences, Volume 52, March 2013, Pages 258-268, ISSN 0098-3004.
soilDB: Beaudette, D., Skovlin, J., Roecker, S., Brown, A. (2024). soilDB: Soil Database Interface. R package version 2.8.3.

Name		Name	Last commit message	Last commit date
Latest commit History 281 Commits
.github		.github
requirements		requirements
scripts		scripts
soil_id		soil_id
.editorconfig		.editorconfig
.env.sample		.env.sample
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Soil ID Algorithm

Requirements

Contributing

explanation of algorithm

terminology

references

dependencies

algorithm

Resources

SoilID Project Box Folder

SSURGO/STATSGO2 metadata:

SSURGO/STATSGO data:

Testing

Regular tests

Bulk test

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

techmatters/soil-id-algorithm

Folders and files

Latest commit

History

Repository files navigation

Soil ID Algorithm

Requirements

Contributing

explanation of algorithm

terminology

references

dependencies

algorithm

Resources

SoilID Project Box Folder

SSURGO/STATSGO2 metadata:

SSURGO/STATSGO data:

Testing

Regular tests

Bulk test

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages