From 8f80dcd728cafc7c1b9750f5a2809b4905218a84 Mon Sep 17 00:00:00 2001 From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com> Date: Mon, 9 Feb 2026 12:14:18 +0000 Subject: [PATCH 1/2] Update guides/developer/dbt-model-best-practices.mdx Co-Authored-By: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com> --- guides/developer/dbt-model-best-practices.mdx | 1 + 1 file changed, 1 insertion(+) diff --git a/guides/developer/dbt-model-best-practices.mdx b/guides/developer/dbt-model-best-practices.mdx index c2d8ba39..debdc145 100644 --- a/guides/developer/dbt-model-best-practices.mdx +++ b/guides/developer/dbt-model-best-practices.mdx @@ -78,6 +78,7 @@ All Lightdash queries run against your data warehouse. Beyond using wide, flat t | Strategy | Performance impact | Cost impact | |----------|-------------------|-------------| | [Materialize as tables](#materialize-models-as-tables) | High | High | +| [Index and partition data](#index-and-partition-your-data) | High | High | | [Minimize joins](#minimize-joins-at-query-time) | High | Medium | | [Enable caching](#leverage-caching) | Medium | High | | [Limit exposed models](#limit-models-exposed-to-the-bi-layer) | Low | Medium | From 54545a2de42b22642d74ff8cfffa74e592d6efaf Mon Sep 17 00:00:00 2001 From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com> Date: Mon, 9 Feb 2026 12:14:30 +0000 Subject: [PATCH 2/2] Update guides/developer/dbt-model-best-practices.mdx Co-Authored-By: mintlify[bot] <109931778+mintlify[bot]@users.noreply.github.com> --- guides/developer/dbt-model-best-practices.mdx | 34 +++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/guides/developer/dbt-model-best-practices.mdx b/guides/developer/dbt-model-best-practices.mdx index debdc145..cb68a2d5 100644 --- a/guides/developer/dbt-model-best-practices.mdx +++ b/guides/developer/dbt-model-best-practices.mdx @@ -98,6 +98,40 @@ Views re-execute SQL on every query. [Tables](https://docs.getdbt.com/docs/build Schedule dbt runs (daily/hourly) to keep tables fresh while avoiding on-demand computation. For large datasets with append-only updates, consider [incremental models](https://docs.getdbt.com/docs/build/incremental-models). +### Index and partition your data + +Proper indexing and partitioning in your data warehouse can dramatically improve query performance and reduce costs. These optimizations happen at the warehouse level and benefit all queries, including those from Lightdash. + +**Partitioning** divides large tables into smaller segments based on a column (typically a date). Queries that filter on the partition column only scan relevant partitions instead of the entire table. + +**Clustering/indexing** organizes data within partitions to speed up filtering and sorting on frequently queried columns. + +| Warehouse | Partitioning | Clustering/Indexing | +|-----------|--------------|---------------------| +| BigQuery | `partition_by` | `cluster_by` | +| Snowflake | Automatic micro-partitions | `cluster_by` | +| Redshift | `dist` and `sort` keys | `sort` keys | +| Databricks | `partition_by` | `zorder` | + +Example dbt configuration for BigQuery: + +```yaml +{{ config( + materialized='table', + partition_by={ + "field": "created_at", + "data_type": "date", + "granularity": "day" + }, + cluster_by=["customer_id", "status"] +) }} +``` + +Best practices: +- Partition by date columns used in time-based filters (e.g., `created_at`, `order_date`) +- Cluster by columns frequently used in `WHERE` clauses or `GROUP BY` +- Review your warehouse's query history to identify high-cost queries that could benefit from partitioning + ### Minimize joins at query time Pre-join data in your dbt models rather than joining at query time. As discussed in [wide, flat tables](#use-wide-flat-tables-in-the-bi-layer), this approach outperforms runtime joins.