Skip to content

tensorchiefs/data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tensorchiefs Data Collection and Tools

This repository contains datasets as well as R and Python packages for teaching statistics, data science, and related subjects. Our goal is that this package can be used from Python or R on a local machine or in the cloud with the same syntax. All files are cached and dataset-specific functionality can be defined in the optional documentation markdown file.

Installation and simple Usage

R

Installing the Released Version (no release yet)

install.packages("https://github.com/tensorchiefs/data/releases/download/testrelease2/edudat_0.1.tar.gz", repos = NULL, type = "source")

Using the R Package:

Load the edudat package and datasets:

library(edudat)
df <- load_data("challenger.csv")

Showing the dataset and other functionality

show_data(df)
list_cache_files() #Lists all the cached files

Sourcing additional functions (currently only in R)

source_extra_code(df, verbose = TRUE)
plot(df) + ggtitle("Challenger dataset")
to_celcius(df$Temp)

Note that not all datasets have additional functions. They need to be defined in an accompanying qmd script in an code section named extra, see data/challenger.qmd

Python

Installation of the Python Package: Install the edudat package from PyPI:

pip install edudat

Using the Python Package: Load the CSV data in Python:

from edudat import load_data
df1 = load_data("challenger.csv")

Additional information/functionality on data sets

The data sets can be described by quarto (qmd) files. These files contain additional information about the data set, such as a description, the source, the variables, and the data types. The qmd files are located in the data/ directory and are rendered into the docs branch. The rendered files can be found https://github.com/tensorchiefs/data/tree/main/docs.

In the cmd files, it is also possible to provide additional code for the data sets. Have a look at the challenger.qmd file for an example, where the R-Code plot_data is defined as a named code chunk.

{r plot_data, echo=TRUE, eval=FALSE}

Please ensure that eval=FALSE is set in the code chunk options if the code is not supposed to be executed in the automatic rendering.

Structure

  • data/: Contains the CSV data.
  • R/: Contains the R package edudat.
  • python/: Contains the Python package edudat.
  • docs/: Contain documentation on the dataset

Advanced Issue

R

Installation of the R Package (as in githup main):

install.packages("devtools") #Install the `devtools` package if you haven't already:
#Install the `edudat` package directly from GitHub:
devtools::install_github("tensorchiefs/data/R/edudat")

Contributing

Contributions are welcomed at a later stage, have a look at the contribution howto.

About

data

Resources

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •