Fix(tests): Address some test flakiness#5209
Conversation
.circleci/continue_config.yml
Outdated
| - bigquery | ||
| - clickhouse-cloud | ||
| - athena | ||
| #- snowflake |
There was a problem hiding this comment.
TODO: revert prior to merge
dcff7e8 to
85c2dbf
Compare
| catalogs: | ||
| memory: ':memory:' | ||
| testing: 'testing.duckdb' | ||
| testing: "{{ var('tmp_path') }}/testing.duckdb" |
There was a problem hiding this comment.
This was causing flakiness in style_and_cicd_tests, example
duckdb.duckdb.IOException: IO Error: Could not set lock on file "testing.duckdb": Conflicting lock is held in /home/circleci/.pyenv/versions/3.12.11/bin/python3.12 (PID 3249). See also https://duckdb.org/docs/stable/connect/concurrency
The duckdb integration tests get run as part of that and without prefixing the path, testing.duckdb gets created in the sqlmesh root dir rather than the unique dir for each test.
This causes it to be re-used between tests and potentially accessed in parallel from multiple workers although there is an xdist_group that forces most of the tests to run sequentially
| sushi_state_schema = ctx.add_test_suffix("sushi_state") | ||
| raw_test_schema = ctx.add_test_suffix("raw") | ||
|
|
||
| config = load_config_from_paths( |
There was a problem hiding this comment.
This was duplicating logic already in TestContext, since it pre-dated TestContext.create_context()
|
|
||
| init_example_project(tmp_path, ctx.engine_type, schema_name=schema_name) | ||
|
|
||
| config = load_config_from_paths( |
There was a problem hiding this comment.
This was duplicating logic already in TestContext, since it pre-dated TestContext.create_context()
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| @pytest.fixture(scope="session") |
There was a problem hiding this comment.
This was session-scoped because its predecessor was session scoped and the session scope got retained through an earlier refactor.
However, I was encountering some very hard-to-pin-down issues that I have seen a bunch of times regarding StashKey, example:
teardown failed on attempt 2! Exiting immediately!
Traceback (most recent call last):
File "/home/********/.pyenv/versions/3.12.11/lib/python3.12/site-packages/_pytest/runner.py", line 344, in from_call
result: TResult | None = func()
^^^^^^
KeyError: <_pytest.stash.StashKey object at 0x7f1666072320>
That looks like a concurrency issue and I had a theory it was related to session and function scoped fixtures being mixed.
So i've made this function scoped because I couldnt see a strong reason for it to be session scoped
fd3dfe1 to
5f5a27d
Compare
|
|
||
|
|
||
| @pytest.hookimpl(hookwrapper=True, tryfirst=True) | ||
| def pytest_runtest_makereport(item: pytest.Item, call: pytest.CallInfo): |
There was a problem hiding this comment.
This seems to be the only hook that can catch the StashKey errors. Fixtures that yield their values (like tmp_path) appear to hit different codepaths than other types of fixtures.
I tried both pytest_fixture_post_finalizer and pytest_runtest_teardown before resorting to this
5f5a27d to
2ad5091
Compare
| # Note: the cluster doesnt need to be running to create / drop catalogs, but it does need to be running to run the integration tests | ||
| echo "Ensuring cluster is running" | ||
| databricks clusters start $CLUSTER_ID || true | ||
| databricks clusters start $CLUSTER_ID |
There was a problem hiding this comment.
This is to make Databricks fail early with the cluster start error rather than time out for 40mins trying to run tests
This reverts commit dedc368.
6a024ea to
e5f4287
Compare
I notice the Fabric tests failing on main with:
The cause was essentially the create warehouse API calls failing if the warehouse already existed.
SQLMesh code generally expects
CREATE IF NOT EXISTS/DROP IF EXISTSsemantics, so I implemented these.In addition, some other tests are starting to become more flaky with issues like:
and
so I had a go at addressing these too. The general theme is we run tests concurrently for speed but aren't always good at ensuring each test gets its own unique copy of things and can work in isolation from other tests