feat: Support for Spark Client in Kubeflow SDK#158
feat: Support for Spark Client in Kubeflow SDK#158Shekharrajak wants to merge 14 commits intokubeflow:mainfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/retitle feat: Support for Spark Client in Kubeflow SDK |
|
Thanks a lot for this @Shekharrajak 🚀 |
|
@andreyvelich: GitHub didn't allow me to request PR reviews from the following users: aagumin, shravan-achar, bigsur0. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Shall we create a KEP first since this is quite a big addition to Kubeflow SDK? |
I can create one, but this PR is having the same pattern as trainer client. |
|
Is there a plan for the SDK to support working with Spark Connect? For example, a data scientist might have a dynamic infrastructure where they can create a Spark Connect cluster on demand. |
|
@Shekharrajak Let's create a simple KEP which identifies use-cases and users patterns to interact with SparkApplication CRD. |
Yeah, I think we can talk about solutions to connect to existing Spark cluster, and where Kubeflow SDK APIs might be helpful. I know that @lresende and @fresende added instructions on how to connect Jupyter Notebooks to Spark cluster via Jupyter Enterprise Gateway, but we can discuss various options for Spark Connect too. |
Created the doc: #163 Please have a look. |
b3b3941 to
210ad60
Compare
Signed-off-by: shekharrajak <shekharrajak@live.com>
210ad60 to
fa9c89e
Compare
updated the spark connect backend and examples Signed-off-by: shekharrajak <shekharrajak@live.com>
Support for Spark connect backend in Spark Client
updated the spark connect backend and examples Signed-off-by: shekharrajak <shekharrajak@live.com>
Spark connect backend for Spark Client
Update Python docstrings to use SparkSessionClient and BatchSparkClient
consistent apis like trainer client
…_status - Add get_job_logs(submission_id, executor_id, follow) method for retrieving logs - Rename wait_for_job to wait_for_job_status for TrainerClient API consistency - Update docstring examples to reflect the new method name
- Add get_job_status(job_id) method to retrieve the current status of a job - Update documentation to include usage examples for the new method
|
Since there is few design changes - started fresh here #225 |
This PR introduces the Kubeflow Spark Client - a cloud-native
Python client for managing Apache Spark applications on
Kubernetes. It provides a unified, Pythonic interface
for submitting, monitoring, and managing Spark jobs using the
Kubeflow Spark Operator.
KEP: #163
Few examples added in examples/spark directory to play with.
Slack thread: https://cloud-native.slack.com/archives/C074588U7EG/p1763656387742729?thread_ts=1763568656.642239&cid=C074588U7EG