Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
58804e0
Updated calculate peer scores function and fixed fstrings
CodexVeritas May 2, 2025
455a676
reorganized data files into subdirectories
CodexVeritas May 2, 2025
222d883
Added pseudocode that for a refactor, and data models that can be use…
CodexVeritas May 2, 2025
28edce0
Converted baseline scoring function to be independent of dataframes
CodexVeritas May 2, 2025
956097a
Added baseline scoring tests
CodexVeritas May 2, 2025
2575e6c
Added peer scoring tests
CodexVeritas May 2, 2025
c20f0eb
Minor updates
CodexVeritas May 2, 2025
a7e7d5e
Added some comments
CodexVeritas May 3, 2025
4498342
Added peer score function from previous versions
CodexVeritas May 3, 2025
9e9c794
Refactored spot peer scoring functions
CodexVeritas May 3, 2025
33e65c9
Got all binary scoring tests passing
CodexVeritas May 3, 2025
12e767a
Got MC scoring tests passing
CodexVeritas May 3, 2025
02d3635
Got all scoring tests passing except numeric baseline max/min
CodexVeritas May 6, 2025
cf44360
Fixed some dataframe row to scoring parameter conversion
CodexVeritas May 7, 2025
2cf9c6f
Got calculate_all_peer_scores working
CodexVeritas May 7, 2025
9037133
unified peer and head to head functions
CodexVeritas May 7, 2025
ae1eefc
Small touchups
CodexVeritas May 7, 2025
3f4d40e
Small touchup
CodexVeritas May 7, 2025
b2deadb
Updated resolution types for numeric tests
CodexVeritas May 7, 2025
aed12bf
Fixed option parsing problem, and provided median MC question better
CodexVeritas May 21, 2025
8eca5b0
Moved community prediction comparison files to archived
CodexVeritas May 21, 2025
3f63771
Moved another cp comparison csv
CodexVeritas May 21, 2025
3a9daee
Moved discrimination chart above failing cell
CodexVeritas May 21, 2025
2aea159
Debugging calibration curve
CodexVeritas May 22, 2025
a309c7e
Fixed second calibration graph
CodexVeritas May 22, 2025
1e59b86
calibration bug fix :bug:
May 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.venv/
.env
__pycache__
.personal/
9,244 changes: 5,671 additions & 3,573 deletions AI_BENCHMARKING_ANALYSIS.ipynb

Large diffs are not rendered by default.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
47 changes: 0 additions & 47 deletions bootstrapped_h2h_bot_vs_pros.csv

This file was deleted.

Loading