Skip to content

feat: add framework-level validation for conflicting env vars across …#3237

Open
krishdef7 wants to merge 5 commits intokubeflow:masterfrom
krishdef7:feature/framework-env-conflict-validation
Open

feat: add framework-level validation for conflicting env vars across …#3237
krishdef7 wants to merge 5 commits intokubeflow:masterfrom
krishdef7:feature/framework-env-conflict-validation

Conversation

@krishdef7
Copy link

What this PR does / why we need it

Adds framework-level validation to detect conflicts between environment variables managed by different runtime plugins (MPI, Torch), and between plugin-reserved envs and user-provided envs in a TrainJob.

Changes

  • Add MPIRunReservedEnvNames constant for OpenMPI reserved env vars
  • Add user-env conflict validation to MPI plugin (mirrors existing Torch pattern)
  • Add EnvVarsReserverPlugin interface for plugins to expose reserved env var names
  • Implement ReservedEnvVarNames() on MPI and Torch plugins
  • Add cross-plugin env conflict detection in RunCustomValidationPlugins
  • Add unit tests for MPI env validation

Which issue(s) this PR fixes

Closes #3147

Copilot AI review requested due to automatic review settings February 23, 2026 13:16
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jeffwan for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions
Copy link

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

  • If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
  • Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team

Join the community:

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

…plugins

- Add MPIRunReservedEnvNames constant for OpenMPI reserved envs
- Add user-env conflict validation to MPI plugin (mirrors Torch pattern)
- Add EnvVarsReserverPlugin interface for plugins to expose reserved envs
- Implement ReservedEnvVarNames on MPI and Torch plugins
- Add cross-plugin env conflict detection in RunCustomValidationPlugins
- Add unit tests for MPI env validation

Closes kubeflow#3147

Signed-off-by: krishdef7 <gargkrish06@gmail.com>
@krishdef7 krishdef7 force-pushed the feature/framework-env-conflict-validation branch from 1976685 to 9ce322c Compare February 23, 2026 13:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds framework-level validation to detect conflicts between environment variables managed by different runtime plugins (MPI, Torch) and between plugin-reserved envs and user-provided envs in a TrainJob.

Changes:

  • Introduces EnvVarsReserverPlugin interface for plugins to expose reserved environment variable names
  • Implements the interface on MPI and Torch plugins to return their respective reserved env var sets
  • Adds cross-plugin env conflict detection in the framework's validation logic
  • Adds user-env conflict validation to the MPI plugin (mirroring existing Torch pattern)
  • Adds comprehensive test coverage for MPI env validation

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/runtime/framework/interface.go Defines new EnvVarsReserverPlugin interface for exposing reserved env var names
pkg/runtime/framework/core/framework.go Implements cross-plugin env conflict detection in RunCustomValidationPlugins
pkg/runtime/framework/plugins/mpi/mpi.go Implements EnvVarsReserverPlugin and adds user-env validation
pkg/runtime/framework/plugins/torch/torch.go Implements EnvVarsReserverPlugin interface
pkg/constants/constants.go Defines MPIRunReservedEnvNames set with OpenMPI reserved env vars
pkg/runtime/framework/plugins/mpi/mpi_test.go Adds tests for MPI reserved env validation

krishdef7 and others added 4 commits February 23, 2026 18:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: krishdef7 <157892833+krishdef7@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: krishdef7 <157892833+krishdef7@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: krishdef7 <157892833+krishdef7@users.noreply.github.com>
Signed-off-by: krishdef7 <gargkrish06@gmail.com>
@krishdef7
Copy link
Author

Hi @jinchihe @kuizhiqing,

Adding a quick concrete example to clarify the validation:

If a user configures:

  • MPI plugin (reserves OMPI_* envs)
  • Torch plugin
  • and manually sets OMPI_MCA_orte_default_hostfile in spec.trainer.env

This currently passes validation but can lead to runtime issues.

With this PR, validation fails early with a clear error indicating the conflicting env.

Happy to adjust the validation scope or placement if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Framework-level validation for conflicting environment variables across runtime plugins

2 participants