feat: add framework-level validation for conflicting env vars across …#3237
feat: add framework-level validation for conflicting env vars across …#3237krishdef7 wants to merge 5 commits intokubeflow:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow Trainer! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
…plugins - Add MPIRunReservedEnvNames constant for OpenMPI reserved envs - Add user-env conflict validation to MPI plugin (mirrors Torch pattern) - Add EnvVarsReserverPlugin interface for plugins to expose reserved envs - Implement ReservedEnvVarNames on MPI and Torch plugins - Add cross-plugin env conflict detection in RunCustomValidationPlugins - Add unit tests for MPI env validation Closes kubeflow#3147 Signed-off-by: krishdef7 <gargkrish06@gmail.com>
1976685 to
9ce322c
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds framework-level validation to detect conflicts between environment variables managed by different runtime plugins (MPI, Torch) and between plugin-reserved envs and user-provided envs in a TrainJob.
Changes:
- Introduces
EnvVarsReserverPlugininterface for plugins to expose reserved environment variable names - Implements the interface on MPI and Torch plugins to return their respective reserved env var sets
- Adds cross-plugin env conflict detection in the framework's validation logic
- Adds user-env conflict validation to the MPI plugin (mirroring existing Torch pattern)
- Adds comprehensive test coverage for MPI env validation
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/runtime/framework/interface.go | Defines new EnvVarsReserverPlugin interface for exposing reserved env var names |
| pkg/runtime/framework/core/framework.go | Implements cross-plugin env conflict detection in RunCustomValidationPlugins |
| pkg/runtime/framework/plugins/mpi/mpi.go | Implements EnvVarsReserverPlugin and adds user-env validation |
| pkg/runtime/framework/plugins/torch/torch.go | Implements EnvVarsReserverPlugin interface |
| pkg/constants/constants.go | Defines MPIRunReservedEnvNames set with OpenMPI reserved env vars |
| pkg/runtime/framework/plugins/mpi/mpi_test.go | Adds tests for MPI reserved env validation |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: krishdef7 <157892833+krishdef7@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: krishdef7 <157892833+krishdef7@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: krishdef7 <157892833+krishdef7@users.noreply.github.com>
Signed-off-by: krishdef7 <gargkrish06@gmail.com>
|
Hi @jinchihe @kuizhiqing, Adding a quick concrete example to clarify the validation: If a user configures:
This currently passes validation but can lead to runtime issues. With this PR, validation fails early with a clear error indicating the conflicting env. Happy to adjust the validation scope or placement if needed. |
What this PR does / why we need it
Adds framework-level validation to detect conflicts between environment variables managed by different runtime plugins (MPI, Torch), and between plugin-reserved envs and user-provided envs in a TrainJob.
Changes
MPIRunReservedEnvNamesconstant for OpenMPI reserved env varsEnvVarsReserverPlugininterface for plugins to expose reserved env var namesReservedEnvVarNames()on MPI and Torch pluginsRunCustomValidationPluginsWhich issue(s) this PR fixes
Closes #3147