Add support for Oracle databases & dialect in PyDough#484
Add support for Oracle databases & dialect in PyDough#484john-sanchez31 wants to merge 52 commits intomainfrom
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…iff days, trunc week, trunc day, join_strings, extract quarters, cast int
hadia206
left a comment
There was a problem hiding this comment.
Good work John.
Please see my comments below
|
|
||
| # Default attempts for connection if not given | ||
| if not (attempts := kwargs.pop("attempts", None)): | ||
| attempts = 1 |
There was a problem hiding this comment.
Shouldn't this be 3 per the docstring?
``attempts (not an Oracle connector parameter): Number of connection attempts (default: 3)
| format string supplied by the user (which typically uses Python | ||
| `strftime`-style specifiers) into a form that Oracle will | ||
| understand. Only a small subset of directives is currently | ||
| supported; additional tokens may be added as needed. |
There was a problem hiding this comment.
What happens if the user used something that is not currently supported?
| self, | ||
| args: list[SQLGlotExpression], | ||
| types: list[PyDoughType], | ||
| ) -> SQLGlotExpression: |
There was a problem hiding this comment.
Add comment to explain why we needed to override the base, i.e. difference between this and the base?
| ) | ||
| match operator: | ||
| case pydop.DEFAULT_TO: | ||
| # sqlglot convert COALESCE in NVL for Oracle, which is fine for |
There was a problem hiding this comment.
| # sqlglot convert COALESCE in NVL for Oracle, which is fine for | |
| # sqlglot convert COALESCE to NVL for Oracle, which is fine for |
| # The length of the first string given: LENGH(X) | ||
| len_string: SQLGlotExpression = sqlglot_expressions.Length(this=string) | ||
|
|
||
| # The length of the replaced string: NVL(LENGH(REPLACE(X, Y, "")), 0) |
There was a problem hiding this comment.
Let's add a comment to clarify why NVL is necessary here? (In Oracle, empty strings are NULL so ...)
| sqlglot_expressions.Literal.string("YYYY-MM-DD"), | ||
| ], | ||
| ) | ||
| if isinstance(literal_expression.value, datetime.datetime): |
There was a problem hiding this comment.
datetime.datetime is a subclass of datetime.date, so a datetime value matches both checks. That means if you have a datetime it'll go to both and the second would silently overwrites the first.
So 2 things:
This should be if - elif and the check for datetime.datetime should come first
if isinstance(literal_expression.value, datetime.datetime):
...
elif isinstance(literal_expression.value, datetime.date):
...
There was a problem hiding this comment.
The same should be applied to ANSI portion
| "PyDough does not yet support datetime values with a timezone" | ||
| ) | ||
| literal = sqlglot_expressions.Anonymous( | ||
| this="TO_DATE", |
There was a problem hiding this comment.
Why not TO_TIMESTAMP like oracle_transform_bindings?
| # PYDOUGH CHANGE: ignore SYSTIMESTAMP since it is a special case | ||
| # of a column that should not be considered external | ||
| if isinstance(c.this, exp.Identifier) | ||
| and c.this.this != "SYSTIMESTAMP" |
There was a problem hiding this comment.
Is it special for other dialects too? Let's confirm that.
If it's not, we need to figure out how to make this change only for Oracle
or document it clearly that SYSTEMTIMESTAMP is not recommended to be used as column name because ...
| external_columns = get_scope_external_columns(scope) | ||
| if not parent: | ||
| continue | ||
| if scope.external_columns: |
There was a problem hiding this comment.
Shouldn't this be updated too?
| if scope.external_columns: | |
| if external_columns: |
| pytest.param(DatabaseDialect.SNOWFLAKE, id="snowflake"), | ||
| pytest.param(DatabaseDialect.MYSQL, id="mysql"), | ||
| pytest.param(DatabaseDialect.POSTGRES, id="postgres"), | ||
| pytest.param(DatabaseDialect.ORACLE, id="oracle"), |
There was a problem hiding this comment.
Add Oracle to all_dialects_tpch_db_context as well.
See how it's used for division by zero test.
We need to start moving our tests to use that so that it's one function used for all. Perhaps you can do that for one test.
Pick your poison test_pipeline_e2e_oracle_tpch or test_pipeline_e2e_oracle_tpch_custom and I'll do the other one in my PR 🥲
Resolves #479