From cabb19439cd13936916c338606d311fdf019dbd2 Mon Sep 17 00:00:00 2001 From: tariq-hasan Date: Mon, 2 Feb 2026 03:09:24 -0500 Subject: [PATCH 1/2] gsoc: add dynamic llm trainer for gsoc 2026 Signed-off-by: tariq-hasan --- .../en/events/upcoming-events/gsoc-2026.md | 53 +++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/content/en/events/upcoming-events/gsoc-2026.md b/content/en/events/upcoming-events/gsoc-2026.md index ed2529cea1..9fff6fabc8 100644 --- a/content/en/events/upcoming-events/gsoc-2026.md +++ b/content/en/events/upcoming-events/gsoc-2026.md @@ -297,3 +297,56 @@ Tracking issue: https://github.com/kubeflow/sdk/issues/238 - Familiarity with the Kubeflow SDK and Trainer codebase. - Understanding of the Kubeflow Ecosystem and basic Kubernetes concepts. - Engage and contribute to Kubeflow community on Slack and GitHub. + +### Project 10: Dynamic LLM Trainer Framework for Kubeflow + +**Components:** +[kubeflow/trainer](https://www.github.com/kubeflow/trainer), +[kubeflow/sdk](https://www.github.com/kubeflow/sdk) + +**Mentors:** +[@andreyvelich](https://github.com/andreyvelich), +[@tariq-hasan](https://github.com/tariq-hasan), +[TBD] + +**Contributor:** + +**Details:** + +Kubeflow Trainer provides Kubernetes-native distributed ML training with a Python-first experience. It currently supports LLM fine-tuning through TorchTune as a built-in backend, but TorchTune is no longer actively adding new features, limiting support for emerging models and post-training methods (DPO, PPO, ORPO). + +This project proposes a **Dynamic LLM Trainer Framework** that decouples Kubeflow Trainer from any single fine-tuning backend. The goal is to introduce a pluggable architecture enabling multiple frameworks to integrate seamlessly while preserving backward compatibility and a simple Python SDK. This builds on the existing plugin architecture in `pkg/runtime/framework/plugins/torch/` and extends the `BuiltinTrainer` pattern in the SDK. + +**The framework will provide:** + +- A backend-agnostic LLM Trainer interface, symmetric to TrainingRuntime on the control plane +- Dynamic backend registration for in-tree and external frameworks +- TorchTune refactored as a first-class pluggable backend +- Faster day-0/day-1 support for new models and fine-tuning strategies +- Backward compatibility for existing TorchTune-based workflows + +**Initial backends to explore:** + +| Backend | Rationale | +|---------|-----------| +| TorchTune | Preserve existing functionality | +| TRL | Industry standard for SFT/DPO/PPO | +| Unsloth | ~2× faster, ~70% lower memory | +| LlamaFactory | 100+ model support | + +Beyond in-tree backends, the SDK should support external framework registration, mirroring how TrainingRuntime enables custom runtimes. + +This project is well-suited for contributors interested in ML systems, API design, and bridging modern LLM tooling with production Kubernetes platforms. + +Tracking issue: [kubeflow/trainer#2839](https://github.com/kubeflow/trainer/issues/2839) + +**Difficulty:** Hard + +**Size:** 350 hours (Large) + +**Skills Required/Preferred:** +* Python, Go +* Familiarity with Kubernetes and Kubeflow Trainer architecture +* Experience with LLM fine-tuning frameworks (TRL, TorchTune, Unsloth) +* Understanding of distributed training concepts +* Interest in API and framework design From bb110e8ab57f4d995e35b39b11f7b84ea07ca45b Mon Sep 17 00:00:00 2001 From: tariq-hasan Date: Mon, 2 Feb 2026 08:02:03 -0500 Subject: [PATCH 2/2] gsoc: update list of mentors Signed-off-by: tariq-hasan --- content/en/events/upcoming-events/gsoc-2026.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/content/en/events/upcoming-events/gsoc-2026.md b/content/en/events/upcoming-events/gsoc-2026.md index 9fff6fabc8..5ec96bca71 100644 --- a/content/en/events/upcoming-events/gsoc-2026.md +++ b/content/en/events/upcoming-events/gsoc-2026.md @@ -305,9 +305,8 @@ Tracking issue: https://github.com/kubeflow/sdk/issues/238 [kubeflow/sdk](https://www.github.com/kubeflow/sdk) **Mentors:** -[@andreyvelich](https://github.com/andreyvelich), [@tariq-hasan](https://github.com/tariq-hasan), -[TBD] +[@andreyvelich](https://github.com/andreyvelich) **Contributor:**