Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions TODO_PIPELINES/nathanbao_pipelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# APIs
## Basketbal APIs
NBA: https://github.com/swar/nba_api
Scraping: https://www.basketball-reference.com/

## Baseball APIs
MLB: https://statsapi.mlb.com/

## Kalshi API
https://docs.kalshi.com/welcome?redirect=%2F

# Control Flow

## 1. Collect Data
Get NBA + MLB data from APIs and get injury list too maybe. Use Python for API calls with aiohttp async calls.

## 2. Clean Data and Normalization
Pandas and Numpy to filter and clean data.

## 3. Feature Engineering
Past performance (face-offs), Offensive/defensive ratings, Player availability, Pitch speed/batting stats (for MLB), Kalshi market metrics (price, volume, spread)

Scikit-Learn

## 4. Model Training
Logistic Regression, Random Forest, Time-series models, Simple NN

PyTorch, Scikit-Learn

## 5. Test Preditctions on Kalshi/Fine-Tuning
For a few days, put custom parameters and see what our model will predict. Wait for outcome to come and see how our model did

FastAPI, Docker

## 6. Track Record
Track PnL

PostgreSQL/SQLite or Amazon S3 Bucket

## 7. Evaluate Performance
Plot different graphs/metrics of wins vs. losses or net money

Pandas/Matplotlib or Streamlit