Skip to content

Add support for Oracle databases & dialect in PyDough#484

Open
john-sanchez31 wants to merge 52 commits intomainfrom
John/oracle
Open

Add support for Oracle databases & dialect in PyDough#484
john-sanchez31 wants to merge 52 commits intomainfrom
John/oracle

Conversation

@john-sanchez31
Copy link
Contributor

Resolves #479

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@john-sanchez31 john-sanchez31 requested review from a team, hadia206, juankx-bodo and knassre-bodo and removed request for a team March 9, 2026 15:50
Copy link
Contributor

@hadia206 hadia206 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work John.
Please see my comments below


# Default attempts for connection if not given
if not (attempts := kwargs.pop("attempts", None)):
attempts = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be 3 per the docstring?
``attempts (not an Oracle connector parameter): Number of connection attempts (default: 3)

format string supplied by the user (which typically uses Python
`strftime`-style specifiers) into a form that Oracle will
understand. Only a small subset of directives is currently
supported; additional tokens may be added as needed.
Copy link
Contributor

@hadia206 hadia206 Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the user used something that is not currently supported?

self,
args: list[SQLGlotExpression],
types: list[PyDoughType],
) -> SQLGlotExpression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment to explain why we needed to override the base, i.e. difference between this and the base?

)
match operator:
case pydop.DEFAULT_TO:
# sqlglot convert COALESCE in NVL for Oracle, which is fine for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# sqlglot convert COALESCE in NVL for Oracle, which is fine for
# sqlglot convert COALESCE to NVL for Oracle, which is fine for

# The length of the first string given: LENGH(X)
len_string: SQLGlotExpression = sqlglot_expressions.Length(this=string)

# The length of the replaced string: NVL(LENGH(REPLACE(X, Y, "")), 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment to clarify why NVL is necessary here? (In Oracle, empty strings are NULL so ...)

sqlglot_expressions.Literal.string("YYYY-MM-DD"),
],
)
if isinstance(literal_expression.value, datetime.datetime):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetime.datetime is a subclass of datetime.date, so a datetime value matches both checks. That means if you have a datetime it'll go to both and the second would silently overwrites the first.

So 2 things:
This should be if - elif and the check for datetime.datetime should come first

if isinstance(literal_expression.value, datetime.datetime):
    ...
elif isinstance(literal_expression.value, datetime.date):
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same should be applied to ANSI portion

"PyDough does not yet support datetime values with a timezone"
)
literal = sqlglot_expressions.Anonymous(
this="TO_DATE",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not TO_TIMESTAMP like oracle_transform_bindings?

# PYDOUGH CHANGE: ignore SYSTIMESTAMP since it is a special case
# of a column that should not be considered external
if isinstance(c.this, exp.Identifier)
and c.this.this != "SYSTIMESTAMP"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it special for other dialects too? Let's confirm that.
If it's not, we need to figure out how to make this change only for Oracle

or document it clearly that SYSTEMTIMESTAMP is not recommended to be used as column name because ...

external_columns = get_scope_external_columns(scope)
if not parent:
continue
if scope.external_columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be updated too?

Suggested change
if scope.external_columns:
if external_columns:

pytest.param(DatabaseDialect.SNOWFLAKE, id="snowflake"),
pytest.param(DatabaseDialect.MYSQL, id="mysql"),
pytest.param(DatabaseDialect.POSTGRES, id="postgres"),
pytest.param(DatabaseDialect.ORACLE, id="oracle"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Oracle to all_dialects_tpch_db_context as well.
See how it's used for division by zero test.
We need to start moving our tests to use that so that it's one function used for all. Perhaps you can do that for one test.

Pick your poison test_pipeline_e2e_oracle_tpch or test_pipeline_e2e_oracle_tpch_custom and I'll do the other one in my PR 🥲

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Oracle databases & dialect in PyDough

2 participants