rosetta/docs/markdowns/generate.md at develop · rosettadb/rosetta

Generate Spark code for data transfers (Python or Scala)

Command: generate

This command will generate Spark Python (file) or Spark Scala (file), firstly it extracts a schema from a source database and gets connection properties from the source connection, then it creates a python (file) or scala (file) that translates schemas, which is ready to transfer data from source to target.

rosetta [-c, --config CONFIG_FILE] generate [-h, --help] [-s, --source CONNECTION_NAME] [-t, --target CONNECTION_NAME] [--pyspark] [--scala]

Parameter	Description
-h, --help	Show the help message and exit.
-c, --config CONFIG_FILE	YAML config file. If none is supplied it will use main.conf in the current directory if it exists.
-s, --source CONNECTION_NAME	The source connection name to extract schema from.
-t, --target CONNECTION_NAME	The target connection name where the data will be transfered.
--pyspark	Generates the Spark SQL file.
--scala	Generates the Scala SQL file.

Example Command:

Here’s a basic example command that uses the generate function:

rosetta generate -s source_db_connection -t target_db_connection --pyspark

This command will:

Connect to the specified source and target databases using the connection details provided.
Extract the schema from the source.
Generate a PySpark or Scala script, depending on the selected flag (--pyspark or --scala), which is ready to transfer data from source to target.

Additional Notes

JDBC Drivers: Ensure you have the correct JDBC drivers for both the source and target databases. These drivers should be specified in the spark.driver.extraClassPath.
Database Configuration: Modify the source_jdbc_url ,target_jdbc_url, and other connection parameters as per your environment setup.
Mode Options: The mode("overwrite") option in .save() will overwrite any existing data in the target table. Change it as needed (e.g., append, ignore, error).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate Spark code for data transfers (Python or Scala)

Command: generate

Example Command:

Additional Notes

FilesExpand file tree

generate.md

Latest commit

History

generate.md

File metadata and controls

Generate Spark code for data transfers (Python or Scala)

Command: generate

Example Command:

Additional Notes