Releases: amel-github/sars-ani
Releasing new data fields in SARS-ANI dataset
The original SARS-ANI dataset displayed common and scientific names of the animal host as found in the information source and/or inferred from the literature or expert knowledge.
Misspelled animal names and errors in taxonomy can lead to inaccurate scientific conclusions and poor policy design. Moreover, harmonized host names can aid integrating other datasets (e.g. data on host biological traits, geographic distribution, or association with other pathogens).
Therefore, for each event, we programmatically performed taxonomic validation of the animal host name, using the R package taxize (Chamberlain et al. 2013). For more information on our validation process, see the R script sars_ani_validation.R.
Version 1.1. contains seven fields related to the identification of the animal host:
-
host_com_orig: Most specific designation of the animal host provided by the source(s), in English.
-
host_sci_orig: Scientific name of the animal host as mentioned in the source(s) (scientific names are harmonized so that only the first letter of the genus is capitalized).
-
host_com_res: Common name of the animal host, harmonized against the National Center for Biotechnology Information (NCBI) taxonomic backbone.
-
host_sci_res: Scientific name of the animal host (resolved to species or subspecies level), harmonized against the National Center for Biotechnology Information (NCBI) taxonomic backbone.
-
host_colloq: The colloquial name of the host, i.e. the name commonly used to identify the animal in non-specialist language (e.g. “tiger” for “Sumatran tiger”).
-
host_sci_spec_res: The scientific name of the host resolved to the species level.
-
family: Animal family of the animal host.
sars-ani v1.0
SARS-ANI Version 1.0
(April, 2022)
This is the first release of the SARS-ANI dataset.