Skip to content

stddev smoketest on product metrics + small changes#31

Open
andrew-sha wants to merge 7 commits intomainfrom
ashannon/scopuli/product_timeseries_metrics_smoketest
Open

stddev smoketest on product metrics + small changes#31
andrew-sha wants to merge 7 commits intomainfrom
ashannon/scopuli/product_timeseries_metrics_smoketest

Conversation

@andrew-sha
Copy link
Member

Contains a tests/ directory for custom tests. Inside are two things

  1. There is a generic test which is designed to check timeseries data for records in the column_name column which fall outside stddev_coef standard deviations of the previous lookback records
  2. There is a singular test which essentially implements the above test for the product_timeseries_metrics table

NOTE: The product_timeseries_metrics records have some quirks--i.e. they can be grouped according to different geographic levels (country, province, census division) as well as different units (lbs, /1kg). To perform an actually meaningful test of this kind, the records need to get partitioned by all of these columns. Otherwise, we would end up doing weird stuff like comparing the daily price from a specific census division such as Wellington against the past month's prices in that specific region but also in all of Ontario and all of Canada. I didn't know how to design the generic test to accommodate all these extra dimensions, so I just made a singular test. But, there is also the generic test in case we ever want to use it.

Also, my linter isn't working so I will eventually recommit this when I fix it :(

Copy link
Member

@thecartercodes thecartercodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from a design standpoint, I started to have some ideas.

Let's actually go ahead and calculate the stddev, 30d lookbacks in the model itself i.e. product_timeseries_metrics. Let's also calculate the number of listings above or below the stddev on the calendar_date. Then we can configure a native test in the .yml with severity conditions e.g.

https://docs.getdbt.com/reference/resource-configs/severity

This would solve for the issue you mentioned earlier where you can't partition by the combination of groups but what's more, we can then make these test assertions without windowed clauses in tests and these useful statistics as part of our analytics which I was interested in releasing anyways.

To solve your issue with more of a hack, you can just change the param partition_column_name to partition_clause and generate a liquid expression in the .yml config that concatenates the columns together delimited by , .

@andrew-sha andrew-sha closed this Dec 6, 2023
@andrew-sha andrew-sha reopened this Dec 6, 2023
@andrew-sha
Copy link
Member Author

I think from a design standpoint, I started to have some ideas.

Let's actually go ahead and calculate the stddev, 30d lookbacks in the model itself i.e. product_timeseries_metrics. Let's also calculate the number of listings above or below the stddev on the calendar_date. Then we can configure a native test in the .yml with severity conditions e.g.

https://docs.getdbt.com/reference/resource-configs/severity

This would solve for the issue you mentioned earlier where you can't partition by the combination of groups but what's more, we can then make these test assertions without windowed clauses in tests and these useful statistics as part of our analytics which I was interested in releasing anyways.

To solve your issue with more of a hack, you can just change the param partition_column_name to partition_clause and generate a liquid expression in the .yml config that concatenates the columns together delimited by , .

Im not sure how a severity test would work in this case. A severity test allows you to grade the severity of the failure based on the number of failed rows--i.e. if five rows fail the test you can throw a warning but then if more than 5 fail than the entire test fails. Importantly, I think you still need to define a custom test which will retrieve those failing rows in the first place. If you want each row in product_timeseries_metrics to have a column counting the number of outliar listings, I think we could use the out-of-the-box accepted_values test--i.e. we just put an accepted_values: 0 condition on that new column in the .yml.

@thecartercodes
Copy link
Member

Im not sure how a severity test would work in this case. A severity test allows you to grade the severity of the failure based on the number of failed rows--i.e. if five rows fail the test you can throw a warning but then if more than 5 fail than the entire test fails. Importantly, I think you still need to define a custom test which will retrieve those failing rows in the first place. If you want each row in product_timeseries_metrics to have a column counting the number of outliar listings, I think we could use the out-of-the-box accepted_values test--i.e. we just put an accepted_values: 0 condition on that new column in the .yml.

Yeah, it would be a custom test with a severity condition.

@andrew-sha
Copy link
Member Author

now product_timeseries_metrics has a column that counts the number of listings beyond one stddev of the mean of the price listings within the corresponding geographic region. I didn't add the severity test yet because in principle I'm a bit unsure what it achieves. The number of days days/regions that have some large number of outlier listings doesn't seem like something that should be determined in a test, since I don't think the existence of a lot of outlier listings necessarily indicates a failure in the tech. If anything that might be useful information for the consumer and thus belong outside a test?

@thecartercodes
Copy link
Member

It'll be useful in a test. To your point though, this word "test" is probably better substituted with "audit". Specifically, the audit should be written to check for the % of outliers relative to total product listings which should also be in the metrics table. Importantly, line 46-50 in product_timeseries_metrics still has that issue we found in an earlier peer program. We can't use the grouping set's resulting conditions (i.e. it's grouped by region_code when it's the column is not null) as part our case condition for collapsing the residues into a single statistic.

, plh.price
, abs(price - avg(price) over (partition by acd.val, product_id, region_code, census_division_id, currency, unit)) as residue_cd
, abs(price - avg(price) over (partition by acd.val, product_id, region_code, currency, unit)) as residue_re
, abs(price - avg(price) over (partition by acd.val, product_id, currency, unit)) as residue_ca
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for directly calculating the residuals here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The residuals have to be calculated using those window, but then the average prices need to be calculated using grouping sets. My understanding is that using windows and grouping sets in a singe query requires a bit of finesse so I think it was the simplest logically speaking to have the windows in a subquery that the next subquery can easily refer to via those cases. I suspect there are a few other ways to do it though

@andrew-sha
Copy link
Member Author

It'll be useful in a test. To your point though, this word "test" is probably better substituted with "audit". Specifically, the audit should be written to check for the % of outliers relative to total product listings which should also be in the metrics table. Importantly, line 46-50 in product_timeseries_metrics still has that issue we found in an earlier peer program. We can't use the grouping set's resulting conditions (i.e. it's grouped by region_code when it's the column is not null) as part our case condition for collapsing the residues into a single statistic.

What exactly is the issue with lines 46-50? The query seemed to execute fine and also I believe the order of execution in a sql query does grouping before select, so shouldn't those case conditions accurately handle the three different levels of geographic granularity?

@thecartercodes
Copy link
Member

What exactly is the issue with lines 46-50? The query seemed to execute fine and also I believe the order of execution in a sql query does grouping before select, so shouldn't those case conditions accurately handle the three different levels of geographic granularity?

Yeah, you're right, I didn't think the grouping set's returned results was what was called with the lines 46-50. This also proves the behavior -

with data as (
    select
      *
    from (
        values
            ('ON', 'Div 1', 1)
            , ('ON', 'Div 2', 2)
            , ('ON', 'Div 3', 2)
    ) as t (region_code, census_division, metric)
)
select
  region_code
  , census_division
  , sum(metric)
  , case
      when region_code is null then 'Region Code is null'
      else 'Region Code is not null'
    end as is_region_code_null
  , case
      when census_division is null then 'Census Division is null'
      else 'Census Division is not null'
    end as is_region_code_null
from data
group by grouping sets ((1,2), (1), ())

Copy link
Member

@thecartercodes thecartercodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

- accepted_values:
values: ['NL', 'PE', 'NS', 'NB', 'NB', 'QC', 'ON', 'MB', 'SK', 'AB', 'BC', 'YT', 'NT', 'NU'] No newline at end of file
values: ['NL', 'PE', 'NS', 'NB', 'NB', 'QC', 'ON', 'MB', 'SK', 'AB', 'BC', 'YT', 'NT', 'NU']
- name: sum_outside_one_stddev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- name: sum_outside_one_stddev
- name: sum_listings_outside_one_stddev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants