Skip to content

feat: Support for Spark Client in Kubeflow SDK#158

Closed
Shekharrajak wants to merge 14 commits intokubeflow:mainfrom
Shekharrajak:feature/spark-client
Closed

feat: Support for Spark Client in Kubeflow SDK#158
Shekharrajak wants to merge 14 commits intokubeflow:mainfrom
Shekharrajak:feature/spark-client

Conversation

@Shekharrajak
Copy link
Member

@Shekharrajak Shekharrajak commented Nov 11, 2025

This PR introduces the Kubeflow Spark Client - a cloud-native
Python client for managing Apache Spark applications on
Kubernetes. It provides a unified, Pythonic interface
for submitting, monitoring, and managing Spark jobs using the
Kubeflow Spark Operator.

KEP: #163

Few examples added in examples/spark directory to play with.

# Setup the env using 
cd examples/spark

./setup_test_environment.sh

# run simple example 
python test_spark_client_integration.py

# spark connect 
./setup_spark_connect.sh
python ipython_spark_connect_demo.py 

Slack thread: https://cloud-native.slack.com/archives/C074588U7EG/p1763656387742729?thread_ts=1763568656.642239&cid=C074588U7EG

@google-oss-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign astefanutti for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@andreyvelich
Copy link
Member

/retitle feat: Support for Spark Client in Kubeflow SDK

@google-oss-prow google-oss-prow bot changed the title Spark Client feat: Support for Spark Client in Kubeflow SDK Nov 12, 2025
@andreyvelich
Copy link
Member

Thanks a lot for this @Shekharrajak 🚀
We will review this PR after KubeCon + CloudNativeCon NA!
cc @kubeflow/kubeflow-sdk-team

@andreyvelich
Copy link
Member

@google-oss-prow
Copy link
Contributor

@andreyvelich: GitHub didn't allow me to request PR reviews from the following users: aagumin, shravan-achar, bigsur0.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @akshaychitneni @shravan-achar @bigsur0 @vara-bonthu @nabuskey @ChenYi015 @jacobsalway @aagumin @ImpSy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kramaranya
Copy link
Contributor

kramaranya commented Nov 12, 2025

Shall we create a KEP first since this is quite a big addition to Kubeflow SDK?

@Shekharrajak
Copy link
Member Author

Shall we create a KEP first this is quite a big addition to Kubeflow SDK?

I can create one, but this PR is having the same pattern as trainer client.

@aagumin
Copy link

aagumin commented Nov 17, 2025

Is there a plan for the SDK to support working with Spark Connect? For example, a data scientist might have a dynamic infrastructure where they can create a Spark Connect cluster on demand.
It would also be great to see the required Kubernetes RBAC in the documentation so that all examples work. Ideally, it would be limited to CRDs and pods/logs.

@andreyvelich
Copy link
Member

@Shekharrajak Let's create a simple KEP which identifies use-cases and users patterns to interact with SparkApplication CRD.
I doesn't need to be super detailed like HPO: https://github.com/kubeflow/sdk/tree/main/docs/proposals/46-hyperparameter-optimization, but we can discuss initial API design there.

@andreyvelich
Copy link
Member

Is there a plan for the SDK to support working with Spark Connect? For example, a data scientist might have a dynamic infrastructure where they can create a Spark Connect cluster on demand.
It would also be great to see the required Kubernetes RBAC in the documentation so that all examples work. Ideally, it would be limited to CRDs and pods/logs.

Yeah, I think we can talk about solutions to connect to existing Spark cluster, and where Kubeflow SDK APIs might be helpful.

I know that @lresende and @fresende added instructions on how to connect Jupyter Notebooks to Spark cluster via Jupyter Enterprise Gateway, but we can discuss various options for Spark Connect too.
https://www.kubeflow.org/docs/components/spark-operator/user-guide/notebooks-spark-operator/

@Shekharrajak
Copy link
Member Author

@Shekharrajak Let's create a simple KEP which identifies use-cases and users patterns to interact with SparkApplication CRD. I doesn't need to be super detailed like HPO: https://github.com/kubeflow/sdk/tree/main/docs/proposals/46-hyperparameter-optimization, but we can discuss initial API design there.

Created the doc: #163 Please have a look.

@Shekharrajak Shekharrajak force-pushed the feature/spark-client branch 4 times, most recently from b3b3941 to 210ad60 Compare November 19, 2025 05:46
Signed-off-by: shekharrajak <shekharrajak@live.com>
updated the spark connect backend and examples

Signed-off-by: shekharrajak <shekharrajak@live.com>
Shekharrajak and others added 12 commits November 20, 2025 23:15
Support for Spark connect backend in Spark Client
updated the spark connect backend and examples

Signed-off-by: shekharrajak <shekharrajak@live.com>
Spark connect backend for Spark Client
Update Python docstrings to use SparkSessionClient and BatchSparkClient
…_status

- Add get_job_logs(submission_id, executor_id, follow) method for retrieving logs
- Rename wait_for_job to wait_for_job_status for TrainerClient API consistency
- Update docstring examples to reflect the new method name
- Add get_job_status(job_id) method to retrieve the current status of a job
- Update documentation to include usage examples for the new method
@Shekharrajak
Copy link
Member Author

Since there is few design changes - started fresh here #225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants