Support Materialized Views (to_table) by hadia206 · Pull Request #493 · bodo-ai/PyDough

hadia206 · 2026-02-13T22:00:39Z

Summary
This PR implements the to_table functionality for PyDough, allowing users to materialize PyDough queries as database tables or views, and then use them in subsequent queries.

Workflow
PyDough Query -> to_table() -> DDL executed -> ViewGeneratedCollection -> use in new PyDough Query

User writes PyDough query
User calls to_table() to materialize it
PyDough generates DDL (CREATE TABLE AS SELECT...)
DDL is executed on the database
Returns a collection reference to the new table (ViewGeneratedCollection)
User can use that reference in new PyDough queries

Example

# Step 1: PyDough query
asian_nations = nations.WHERE(region.name == 'ASIA')

# Steps 2-5: Materialize it as a temp table
asian_tmp = pydough.to_table(asian_nations, name='asian_nations', temp=True)

# Step 6: Use the materialized table in subsequent queries
result = asian_tmp.CALCULATE(name).ORDER_BY(name)

# Use with other collections via CROSS
result = regions.CROSS(asian_tmp).WHERE(asian_tmp.region_key == regions.key).CALCULATE(
    nation_name=asian_tmp.name,
    region_name=regions.name
)

Main Changes

Added to_table() function:
- Generates appropriate DDL statements for each database dialect (SQLite, MySQL, PostgreSQL, Snowflake) and returns a collection reference that can be used in subsequent PyDough queries
- Support for as_view=True to create views instead of tables
- Support for replace=True to replace existing tables/views
- Support for temp=True to create temporary tables
ViewGeneratedCollection :
- New collection type representing a user-created table/view
Added execute_ddl() method to DatabaseConnection:
- Execute DDL statements (CREATE [OR REPLACE TEMP] TABLE/VIEW, DROP TABLE/VIEW IF EXISTS)
Test Infrastructure
- Added reset_active_session fixture to automatically resets the global active session after each test to avoid session overlap which lead to some duplicate writing errors
- Tests for different PyDough queries
- Tests for different DDL statements

closes #499

… SQLGlot bug discovered along the way [RUN CI]

… into John/df_collection

john-sanchez31 · 2026-03-02T21:57:01Z

documentation/dsl.md

+
+# Query from the materialized table - direct method call works for simple queries
+result = asian_tmp.CALCULATE(name)
+```


Can we add the actual result for each example? I think would be really helpful to understand what this api creates.

documentation/usage.md

john-sanchez31 · 2026-03-02T22:06:52Z

documentation/usage.md

+The first argument it takes in is the PyDough node for the collection being materialized. The second argument is the name of the view/table to create. It can optionally take in the following keyword arguments:
+
+- `as_view`: If `True`, create a VIEW. If `False` (default), create a TABLE.
+- `replace`: If `True`, drop table/view if exists and then create the table/view. For Snowflake, use `CREATE OR REPLACE` to allow replacing an existing view/table. Default is `False`.


What happens if replace=False and the user tries to create a view/table that already exists? Can we specify it here?

Agreed. Also, let's make the format of how defaults are declared consistent between this vs as_view.

Not sure, what you mean?
If replace=False and user tried with one that exists, it'll fail as expected by all SQL engines.
I'll add the note but want to make sure I understand that you mean state it and I'm not missing something

john-sanchez31 · 2026-03-02T22:35:38Z

pydough/evaluation/materialize_view.py

+        - actual_temp is the final temp value (may differ from input due to dialect limitations)
+    """
+    # Handle differences in CREATE syntax for different databases.
+    create_caps = CREATE_CAPABILITIES[db_dialect]


john-sanchez31 · 2026-03-02T22:40:14Z

pydough/evaluation/materialize_view.py

+        )
+
+    # Check if we can use CREATE OR REPLACE
+    can_replace = create_caps.replace_view if as_view else create_caps.replace_table


john-sanchez31 · 2026-03-02T22:46:10Z

pydough/evaluation/materialize_view.py

+        raise PyDoughException(
+            f"TEMPORARY views are not supported for {session.database.dialect.name}"
+        )
+    # session.metadata = graph


Is this an actual comment?

Nope, outdated comment. removed

john-sanchez31 · 2026-03-02T23:15:55Z

pydough/evaluation/materialize_view.py

+        the created view/table.
+
+    """
+    _validate_table_name(name)


There is a function from error_utils called is_valid_sql_identifier() that can be used here. There are more functions that can be used for validation like unique_properties_predicate.verify() for unique_columns. Also don't forget to manage quoted names for the table name and for columns. For reference, see how I use normalize_column_name in create_constant_table. I think you can use it as well.

Thanks. I missed that.

john-sanchez31 · 2026-03-03T03:20:37Z

pydough/user_collections/view_collection.py

@@ -0,0 +1,126 @@
+"""
+A user-defined collection representing a database [temporary] view/table.


Because this is inside the user_collections folder can we add the documentation on the README.md file? Just a brief description of this class would be fine.

john-sanchez31 · 2026-03-03T03:25:53Z

tests/test_pipeline_tpch_custom.py

+        # then CALCULATE on materialized view
+        pytest.param(
+            PyDoughPandasTest(
+                "asian_nations = nations.WHERE(region.name == 'ASIA')\n"


Can we create a view collection from range/dataframe collection? If so, we should add a test. Can we combine user generated collections? For example CROSS a dataframe/range collection with a view collection

We should also have tests where the uniqueness columns from to_table come into play (e.g. .BEST where the per=... ancestor is the to_table collection).

john-sanchez31 · 2026-03-03T03:34:26Z

tests/test_pipeline_tpch_custom.py

+            PyDoughPandasTest(
+                "asian_nations = nations.WHERE(region.name == 'ASIA').CALCULATE(nation_key=key, nation_name=name)\n"
+                "asian_tmp = pydough.to_table(asian_nations, name='asian_nations_t4', replace=True)\n"
+                "result = CROSS(asian_tmp).CALCULATE(nation_key, nation_name).ORDER_BY(nation_key.ASC())",


Can we use CROSS like this without anything before? I thought it must have something before like collection1.CROSS(collection2). (Just making sure)

EDIT: Based on Kian's comment elsewhere, this behavior should not happen. Updated code to error if this is used and updated the tests.

knassre-bodo

Initial review done; overall great work but some things that need to get iterated on.

knassre-bodo · 2026-03-03T18:22:22Z

documentation/usage.md

+The first argument it takes in is the PyDough node for the collection being materialized. The second argument is the name of the view/table to create. It can optionally take in the following keyword arguments:
+
+- `as_view`: If `True`, create a VIEW. If `False` (default), create a TABLE.
+- `replace`: If `True`, drop table/view if exists and then create the table/view. For Snowflake, use `CREATE OR REPLACE` to allow replacing an existing view/table. Default is `False`.


Agreed. Also, let's make the format of how defaults are declared consistent between this vs as_view.

knassre-bodo · 2026-03-03T18:23:16Z

documentation/usage.md

+| SQLite     | No (uses DROP + CREATE)| Yes        | No (uses DROP + CREATE)| Yes       |
+| Snowflake  | Yes                    | Yes        | Yes                    | No        |
+| PostgreSQL | No (uses DROP + CREATE)| Yes        | Yes                    | No        |
+| MySQL      | No (uses DROP + CREATE)| Yes        | Yes                    | No        |


Let's include Oracle and BodoSQL here (which reminds me... this PR will probably need some brief tests for both of those)

Sure, The PR was up for review before these were merged. which as of today only BodoSQL so I'll work on that for now.

EDIT: BodoSQL relies on Bodo which dropped Mac/Intel support. I'll disable feature for BodoSQL till I make the machine switch and can resume it.

knassre-bodo · 2026-03-03T18:24:18Z

documentation/usage.md

+#### Example 1: Basic Table Materialization
+
+Below is an example of using `pydough.to_table` to materialize a filtered query as a temporary table, then query from it:
+
+```py
+%%pydough
+# Create a temporary table with Asian nations
+asian_nations = nations.WHERE(region.name == 'ASIA')
+asian_tmp = pydough.to_table(asian_nations, name='asian_nations', temp=True)
+
+# Query from the materialized table - direct method call
+result = asian_tmp.CALCULATE(name)
+pydough.to_df(result)
+```


Let's also show an example with the sql for both steps: what does the DDL sql look like, and what does the final to_df sql look like?

knassre-bodo · 2026-03-03T18:25:27Z

pydough/conversion/hybrid_translator.py

+                # Handle the case where the ancestor is a ChildOperatorChildAccess
+                # (which happens when using CROSS at the top level with a
+                # generated collection). In that case, unwrap it and process the
+                # inner child_access (typically a GlobalContext).
+                # Only do this when parent is None (top-level), otherwise let normal
+                # ChildOperatorChildAccess handling below process it.
+                ancestor_context = node.ancestor_context
+                if (
+                    isinstance(ancestor_context, ChildOperatorChildAccess)
+                    and parent is None
+                ):
+                    ancestor_context = ancestor_context.child_access
+                hybrid = self.make_hybrid_tree(ancestor_context, parent, is_aggregate)


I think the problem here is that something else should have raised an error earlier but didn't. Using CROSS in that way without a context doesn't make sense, since CROSS has to be combining two different sides together, but just doing CROSS(asian_tmp) by itself makes no snese.

knassre-bodo · 2026-03-03T18:27:10Z

pydough/database_connectors/database_connector.py

-            # TODO: (gh #175) enable typed DataFrames.
-            data = self.cursor.fetchall()
-            return pd.DataFrame(data, columns=column_names)
+    def execute_ddl(self, sql: str) -> None:


We may need to revise how this works (in the DatabaseContext dataclass) to account for BodoSQL, since the way that PR works it it revises DatabaseContext.connection to either be a DatabaseConnector or a BodoSQLContext

will be addressed in followup PR as discussed offline

knassre-bodo · 2026-03-03T18:45:34Z

tests/test_metadata/snowflake_sample_graphs.json

        "name": "regions",
        "type": "simple table",
-        "table path": "TPCH_SF1.REGION",
+        "table path": "SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.REGION",


Also, have you re-run pdunit_update -m "not execute" on all the tests? Because I imagine this change to the graph would change the SQL for all of our TPCH snowflake SQL tests.

knassre-bodo · 2026-03-03T18:47:35Z

.gitignore

+# Ignore tpch.db file
+tpch.db
+


We already ignore *.db earlier, so this is redundant.

Yes, I noticed that and forgot to remove it

tests/test_pipeline_tpch_custom.py

knassre-bodo · 2026-03-03T18:56:47Z

tests/test_pipeline_tpch_custom.py

+    "as_view, replace, temp",
+    [
+        (False, False, False),
+        (False, False, True),
+        (True, False, True),
+        (True, False, False),
+        (False, True, False),
+        (False, True, True),
+        (True, True, False),
+        (True, True, True),
+    ],
+)


Won't it be highly problematic to run these tests multiple times, especially with some contexts like Snowflake, since if temp is False it is just adding a bunch of tables which will still be there during the next test, or worse during the next pytest run? Won't we need some kind of cleanup step?

There's a cleanup step that handles that in run_e2e_test_to_table
cleanup_statement = f"DROP {table_or_view} IF EXISTS {table_name}"

knassre-bodo · 2026-03-03T19:00:20Z

tests/test_pipeline_tpch_custom.py

+        # then CALCULATE on materialized view
+        pytest.param(
+            PyDoughPandasTest(
+                "asian_nations = nations.WHERE(region.name == 'ASIA')\n"


We should also have tests where the uniqueness columns from to_table come into play (e.g. .BEST where the per=... ancestor is the to_table collection).

knassre-bodo

Initial review done; overall great work but some things that need to get iterated on.

…materialize_view

…e instead

hadia206 · 2026-03-09T18:26:16Z

tests/test_pydough_functions/defog_test_functions.py

+            num_journals=n_jours,
+            ratio=n_pubs / n_jours,
+        )
+        .ORDER_BY(year.ASC(na_pos="last"))


Not related to this PR.

As I'm the lucky person to have tests fail on my runs, this Snowflake test decided to fail with me 🤣

Fix to match SQLite Text in "defog_sql_text_academic_gen14" ORDER BY publication.year NULLS LAST; and make return determinstic.

E AssertionError: DataFrame.iloc[:, 0] (column name="year") are different DataFrame.iloc[:, 0] (column name="year") values are different (100.0 %) [index]: [0, 1] [left]: [2021, 2020] [right]: [2020, 2021]

hadia206 · 2026-03-09T20:27:30Z

tests/conftest.py

            schema=schema_name,
        )

+        # Sqlite's datetime functions operate in UTC,


Unrelated to the PR.

The defog Snowflake e2e tests compare PyDough results on Snowflake against reference SQL on SQLite. SQLite always uses UTC, but Snowflake defaults to Pacific Time, so time-relative queries ("last week", "today", etc.) diverge in certain day/time runs. This fix ensures the Snowflake test connection sets TIMEZONE = 'UTC' to match SQLite's behavior.

john-sanchez31 and others added 30 commits January 8, 2026 08:41

Initial documentation

dfad733

base df collection implementation for sqlite, ansi and mysql

3d10cc0

types fixed

4ea44f8

ref sql added

486d785

implementation df collections for postgres and snowflake

0eef40d

datatypes fixed

036f39a

datatypes, numbers and inf test added

c882800

string and cross df collection tests (no fix for partition yet)

4e84394

WIP: patition with user generated collections

60b9703

Adding more range colleciton partition tests and fixing qualification…

4d6ca40

… SQLGlot bug discovered along the way [RUN CI]

fixing comments and deleting unneccesary case

63fb854

Merge branch 'John/df_collection' of https://github.com/bodo-ai/PyDough…

b0d942e

… into John/df_collection

partition 2 df collection test

f3f22ae

df collection where date test added

7a49b16

df collection top_k test added

0318e25

dataframe_collection_best test added

a2d3dd6

bad test and window function test added

c500b54

docstring and refactored code [run all]

2038159

pyarrow dependency added [run all]

cb77bd8

pyarrow dependency changed [run all]

78ec168

testing [run all]

3c2d350

testing [run all]

f15d3a1

reverting [run all]

eb53764

connectors version locked [run all]

29276b1

range test updated [run all]

6ec7c96

conflicts solved

3c9c571

dataframe test changed to tpch_custom file

62c32ce

defog deealership_adv8 test fixed

01f94d2

refsol sql added

c246449

defog dealership_adv13 added

4ffe44f

hadia206 requested review from john-sanchez31 and juankx-bodo February 26, 2026 00:36

hadia206 added 8 commits February 26, 2026 10:31

[run CI][run dialects][run s3]

8457ce8

[run CI][run dialects][run s3] add cursor.close()

e21bbd0

[run CI][run dialects][run s3] add cursor.close()

5a40cfb

-v

9d9a9ed

[run dialects] alter SF timezone session

122a406

[run dialects] attempt on postgres hang fix

7ec7ed9

[run CI][run dialects][run s3]

69b7317

[run CI][run dialects][run s3] remove -v

023aefb

john-sanchez31 reviewed Mar 3, 2026

View reviewed changes

knassre-bodo reviewed Mar 3, 2026

View reviewed changes

hadia206 added 2 commits March 4, 2026 13:20

Merge branch 'main' of https://github.com/bodo-ai/PyDough into Hadia/…

937a10f

…materialize_view

addressed John and some of Kian's comments

01b619d

hadia206 mentioned this pull request Mar 5, 2026

Support to_table() with BodoSQL Dialect #500

Open

4 tasks

hadia206 added 10 commits March 5, 2026 15:35

more tests

e1abd91

update to include uniqueness as pointed out by Kian

3dbe8a3

revert hybrid fix, update tests, and add an raise an exception messag…

26b20c5

…e instead

move capabilities to DatabaseDialect

120484c

[run CI][run dialects][run s3]

b5f42de

[run CI][run dialects][run s3] skip tests with BodoSQL

e79b325

[run CI][run dialects] update error message

11e381f

try with SNOWFLAKE_SAMPLE_DATA

769b8e8

add ORDER_BY to academic_gen14

b87f30c

[run CI][run dialects]

a72d320

hadia206 commented Mar 9, 2026

View reviewed changes

hadia206 added 2 commits March 9, 2026 13:11

update related SQL files

c3a090f

[run CI][run dialects][run s3]

1d04bf6

hadia206 commented Mar 9, 2026

View reviewed changes

		@@ -0,0 +1,126 @@
		"""
		A user-defined collection representing a database [temporary] view/table.

Conversation

hadia206 commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadia206 Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knassre-bodo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadia206 Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knassre-bodo left a comment

Choose a reason for hiding this comment

Uh oh!

hadia206 Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

hadia206 commented Feb 13, 2026 •

edited

Loading

hadia206 Mar 4, 2026 •

edited

Loading

hadia206 Mar 4, 2026 •

edited

Loading

hadia206 Mar 9, 2026 •

edited

Loading