Optimize scikit-learn pipeline and add --scikit_model_name flag by vile319 · Pull Request #37 · Gleghorn-Lab/Protify

vile319 · 2026-02-10T20:34:09Z

Optimizes the scikit-learn pipeline to handle large datasets and adds the ability to skip LazyPredict when a model is already known.

Changes
main.py

When --scikit_model_name is specified, skips LazyPredict and goes directly to hyperparameter tuning + training

lazy_predict.py

Precompute preprocessing once instead of refitting StandardScaler/Imputer per model
Added n_jobs=-1 to parallelizable models (RandomForest, etc.)
Removed slow models from LazyPredict: SVC, NuSVC, AdaBoost, KNeighbors, DecisionTree, LDA/QDA, etc.
Added XGBoost/LightGBM to model dictionaries correctly

scikit_classes.py

Fixed --scikit_model_name CLI arg mapping to model_name
Added hyperparameter tuning when using --scikit_model_name directly
Added verbose logging to RandomizedSearchCV

New Usage
Full pipeline (LazyPredict → best model → hyperparameter tuning)
python main.py --model_names ESMC-600 --data_names gold-ppi --embedding_pooling_types mean var --use_scikit --n_jobs -1

Skip LazyPredict, go straight to XGBoost tuning
python main.py --model_names ESMC-600 --data_names gold-ppi --embedding_pooling_types mean var --use_scikit --n_jobs -1 --scikit_model_name XGBClassifier --scikit_n_iter 10

vile319 and others added 2 commits February 10, 2026 15:29

Optimize scikit-learn pipeline and add --scikit_model_name flag

fbcf4b3

Merge branch 'main' into scikit-pipeline-optimization

e88e55a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize scikit-learn pipeline and add --scikit_model_name flag#37

Optimize scikit-learn pipeline and add --scikit_model_name flag#37
vile319 wants to merge 2 commits intoGleghorn-Lab:mainfrom
vile319:scikit-pipeline-optimization

vile319 commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vile319 commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant