feat: Run dataset and model initializers in parallel by priyank766 · Pull Request #313 · kubeflow/sdk

priyank766 · 2026-02-22T09:00:09Z

What this PR does / why we need it:
Currently, the ContainerBackend (Docker/Podman) runs dataset and model initializers sequentially. For Large Language Model (LLM) fine-tuning or heavy data workloads, downloading both a massive dataset and a base model separately can add significant overhead to the job startup time.

This PR refactors the _run_initializers logic to use concurrent.futures.ThreadPoolExecutor. If both a dataset and a model are configured, they are now initialized in parallel threads. This reduces the total initialization time to the duration of the longest single download, rather than the sum of both.

Key technical changes:

Refactored kubeflow/trainer/backends/container/backend.py to use a thread pool for parallel initializer dispatch.
Added import concurrent.futures to the backend.
Ensured thread safety and proper error propagation (if one initializer fails, the main thread correctly identifies the failure and cleans up).
Verified that shared volume mounts work correctly during concurrent writes from separate containers.

Which issue(s) this PR fixes:
Fixes #290

Checklist:

Docs included if any changes are user facing (Internal performance improvement, no change to public API surface)
Unit tests pass: uv run pytest kubeflow/trainer/backends/container/backend_test.py
Code adheres to Ruff formatting/linting standards verified via make verify.

Verification Results:
Ran the specific backend tests to confirm the threading logic works as expected without race conditions:

uv run pytest kubeflow/trainer/backends/container/backend_test.py
# Result: 19 passed

google-oss-prow · 2026-02-22T09:00:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign astefanutti for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2026-02-22T09:00:17Z

🎉 Welcome to the Kubeflow SDK! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Slack: Join our #kubeflow-ml-experience and #kubeflow-trainer Slack channels
Meetings: Attend the Kubeflow SDK and ML Experience bi-weekly meetings

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

Copilot

Pull request overview

This PR refactors the ContainerBackend to run dataset and model initializers in parallel using Python's ThreadPoolExecutor, reducing total initialization time from sequential (sum of both downloads) to parallel (duration of the longest download). This is particularly beneficial for LLM fine-tuning workloads with large datasets and models.

Changes:

Refactored _run_initializers to use concurrent.futures.ThreadPoolExecutor with max_workers=2
Updated method documentation to reflect parallel execution
Modified logging messages to indicate queueing and parallel completion

priyank766 · 2026-02-23T14:08:50Z

I wanted to know this issue is Open right i submitted PR but I had issue for PR Title can anyone can help me as I didn't get the format for PR Title issue
@astefanutti

astefanutti · 2026-02-23T14:11:58Z

/retitle feat: Run dataset and model initializers in parallel

astefanutti

/ok-to-test

…ow#290) Signed-off-by: priyank <priyank8445@gmail.com>

priyank766 · 2026-02-23T16:40:59Z

@astefanutti E2E Test are solved here

Copilot AI review requested due to automatic review settings February 22, 2026 09:00

google-oss-prow bot requested review from Electronic-Waste, kramaranya and szaher February 22, 2026 09:00

google-oss-prow bot added the size/M label Feb 22, 2026

Copilot started reviewing on behalf of priyank766 February 22, 2026 09:00 View session

Copilot AI reviewed Feb 22, 2026

View reviewed changes

priyank766 force-pushed the feat/parallel-initializers-290 branch 2 times, most recently from af5276e to be3c0a8 Compare February 22, 2026 09:07

priyank766 changed the title ~~feat(container): Run dataset and model initializers in parallel (#290)~~ feat : Run dataset and model initializers in parallel (#290) Feb 22, 2026

google-oss-prow bot changed the title ~~feat : Run dataset and model initializers in parallel (#290)~~ feat: Run dataset and model initializers in parallel Feb 23, 2026

astefanutti reviewed Feb 23, 2026

View reviewed changes

google-oss-prow bot added the ok-to-test label Feb 23, 2026

feat(trainer): Run dataset and model initializers in parallel (kubefl…

c05ed89

…ow#290) Signed-off-by: priyank <priyank8445@gmail.com>

priyank766 force-pushed the feat/parallel-initializers-290 branch from bb80231 to c05ed89 Compare February 23, 2026 16:19

priyank766 requested a review from astefanutti February 23, 2026 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Run dataset and model initializers in parallel#313

feat: Run dataset and model initializers in parallel#313
priyank766 wants to merge 1 commit intokubeflow:mainfrom
priyank766:feat/parallel-initializers-290

priyank766 commented Feb 22, 2026

Uh oh!

google-oss-prow bot commented Feb 22, 2026

Uh oh!

github-actions bot commented Feb 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

priyank766 commented Feb 23, 2026

Uh oh!

astefanutti commented Feb 23, 2026

Uh oh!

astefanutti left a comment

Uh oh!

priyank766 commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

priyank766 commented Feb 22, 2026

Uh oh!

google-oss-prow bot commented Feb 22, 2026

Uh oh!

github-actions bot commented Feb 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

priyank766 commented Feb 23, 2026

Uh oh!

astefanutti commented Feb 23, 2026

Uh oh!

astefanutti left a comment

Choose a reason for hiding this comment

Uh oh!

priyank766 commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants