Skip to content

Add Spark script example for WXD-Confluent TableFlow integration#41

Open
shibil-rahman wants to merge 2 commits intoIBM:mainfrom
shibil-rahman:WXD_confluent
Open

Add Spark script example for WXD-Confluent TableFlow integration#41
shibil-rahman wants to merge 2 commits intoIBM:mainfrom
shibil-rahman:WXD_confluent

Conversation

@shibil-rahman
Copy link

📋 Summary

This PR adds a new tutorial demonstrating how to integrate IBM watsonx.data with Confluent Tableflow to read data from Confluent-managed Iceberg tables using WXD Spark.

📁 Changes

  • New Directory: Tutorials/WXD - Confluent Integration/
  • Files Added:
    • read_confluent_table_standalone.py - Complete PySpark script for Confluent Tableflow integration
    • README.md - Comprehensive documentation with usage instructions

✨ Features

The tutorial provides:

  • Confluent Tableflow Integration: Connect to Confluent's REST catalog using API credentials
  • Auto-Discovery: Automatically discovers available namespaces and tables in the catalog
  • Table Inspection: Describes table schemas and displays metadata
  • Data Querying: Retrieves and displays sample data from Confluent Tableflow tables
  • Standalone Execution: Runs independently with embedded Spark configuration

📖 Documentation Highlights

The README includes:

  • Clear overview of what the integration does
  • Storage authentication options:
    • ✅ Confluent Managed Storage (no additional config needed)
    • ✅ Integrated AWS S3 Storage (with required S3 credentials)
  • Three execution methods:
    1. 🔬 Using SparkLab (VS Code Development Environment)
    2. 🚀 Submit via Spark Application REST API
    3. 💻 Submit via CPDCTL CLI
  • Configuration parameters reference
  • Troubleshooting guide
  • Links to relevant IBM watsonx.data documentation

🎯 Use Cases

This integration enables users to:

  • Query Confluent Tableflow data directly from watsonx.data
  • Leverage Spark's processing capabilities on Confluent-managed data
  • Build data pipelines that span both platforms
  • Perform analytics on streaming data stored in Confluent

✅ Testing

  • Script tested with Confluent Tableflow REST catalog
  • Verified auto-discovery of namespaces and tables
  • Confirmed data retrieval and display functionality

@shibil-rahman
Copy link
Author

Hi @liuljun, This Spark example use-case is required for documenting in WXD official pages, looking forward to make this script available in public repo. Need your help on this. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant