data-preprocessing

Preprocessing codes for diabetes dataset

Dataset links

Common

Diabetes prediction

Severity classification

Example dataset goes here

Treatement recommendation

Example dataset goes here

How to use the huggingface datasets above

Say the dataset URL is https://huggingface.co/datasets/user_name/dataset_name

Use the user_name/dataset_name

import pandas as pd
from datasets import load_dataset

dataset = load_dataset("user_name/dataset_name", split="train")  # default split is train, if the dataset has other splits, use them as necessary

# Use dataset as is for a Arrow dataset or conver to pandas if needed
df = dataset.to_pandas()

Preprocessing scripts

merge_nhanes_files.py : Merges multiple NHANES files into a single dataset.
parquet_to_csv.py : Converts Parquet files to CSV format for easier data handling.

Environment Variables

The project uses environment variables to manage input and output directories as well as options for generating CSV files. You can set these variables in a .env file based on the provided .env.example.

Prerequisites

Python 3.13 or higher
Ensure you have poetry installed for dependency management.

Setup

Clone the repository to your local machine.
Navigate to the project directory.
Install the required dependencies using poetry:
```
poetry install
```
Create a .env file in the project root directory and set the necessary environment variables as shown in the .env.example file.
Run the preprocessing scripts as needed.

Usage Example

To merge NHANES files, run the following command from the project root directory:

poetry run python code/merge_nhanes_files.py

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
code		code
data		data
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

data-preprocessing

Dataset links

Common

Diabetes prediction

Severity classification

Treatement recommendation

How to use the huggingface datasets above

Preprocessing scripts

Environment Variables

Prerequisites

Setup

Usage Example

Archived Links

V1

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

data-preprocessing

Dataset links

Common

Diabetes prediction

Severity classification

Treatement recommendation

How to use the huggingface datasets above

Preprocessing scripts

Environment Variables

Prerequisites

Setup

Usage Example

Archived Links

V1

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages