kubeflow · google-oss-prow · Feb 2, 2026 · Feb 2, 2026 · Feb 2, 2026 · andreyvelich
diff --git a/content/en/events/upcoming-events/gsoc-2026.md b/content/en/events/upcoming-events/gsoc-2026.md
@@ -297,3 +297,55 @@ Tracking issue: https://github.com/kubeflow/sdk/issues/238
 - Familiarity with the Kubeflow SDK and Trainer codebase.
 - Understanding of the Kubeflow Ecosystem and basic Kubernetes concepts.
 - Engage and contribute to Kubeflow community on Slack and GitHub.
+
+### Project 10: Dynamic LLM Trainer Framework for Kubeflow
+
+**Components:**
+[kubeflow/trainer](https://www.github.com/kubeflow/trainer),
+[kubeflow/sdk](https://www.github.com/kubeflow/sdk)
+
+**Mentors:**
+[@tariq-hasan](https://github.com/tariq-hasan),
+[@andreyvelich](https://github.com/andreyvelich)
+
+**Contributor:**
+
+**Details:**
+
+Kubeflow Trainer provides Kubernetes-native distributed ML training with a Python-first experience. It currently supports LLM fine-tuning through TorchTune as a built-in backend, but TorchTune is no longer actively adding new features, limiting support for emerging models and post-training methods (DPO, PPO, ORPO).
+
+This project proposes a **Dynamic LLM Trainer Framework** that decouples Kubeflow Trainer from any single fine-tuning backend. The goal is to introduce a pluggable architecture enabling multiple frameworks to integrate seamlessly while preserving backward compatibility and a simple Python SDK. This builds on the existing plugin architecture in `pkg/runtime/framework/plugins/torch/` and extends the `BuiltinTrainer` pattern in the SDK.
+
+**The framework will provide:**
+
+- A backend-agnostic LLM Trainer interface, symmetric to TrainingRuntime on the control plane
+- Dynamic backend registration for in-tree and external frameworks
+- TorchTune refactored as a first-class pluggable backend
+- Faster day-0/day-1 support for new models and fine-tuning strategies
+- Backward compatibility for existing TorchTune-based workflows
+
+**Initial backends to explore:**
+
+| Backend | Rationale |
+|---------|-----------|
+| TorchTune | Preserve existing functionality |
+| TRL | Industry standard for SFT/DPO/PPO |
+| Unsloth | ~2× faster, ~70% lower memory |
+| LlamaFactory | 100+ model support |
+
+Beyond in-tree backends, the SDK should support external framework registration, mirroring how TrainingRuntime enables custom runtimes.
+
+This project is well-suited for contributors interested in ML systems, API design, and bridging modern LLM tooling with production Kubernetes platforms.
+
+Tracking issue: [kubeflow/trainer#2839](https://github.com/kubeflow/trainer/issues/2839)
+
+**Difficulty:** Hard
+
+**Size:** 350 hours (Large)
+
+**Skills Required/Preferred:**
+* Python, Go
+* Familiarity with Kubernetes and Kubeflow Trainer architecture
+* Experience with LLM fine-tuning frameworks (TRL, TorchTune, Unsloth)
+* Understanding of distributed training concepts
+* Interest in API and framework design