FIX-#7675: Allow backend switching to backends other than provided arguments#7679
Conversation
…n provided arguments Signed-off-by: Jonathan Shi <jonathan.shi@snowflake.com>
modin/core/storage_formats/base/query_compiler_calculator.py
Dismissed
Show dismissed
Hide dismissed
sfc-gh-mvashishtha
left a comment
There was a problem hiding this comment.
I have some minor comments. Also, I have some questions:
- Have you run any benchmarks with this change? Does it solve the pathological merge case that motivated it?
- It possible or likely that we switch to an unexpected and/or suboptimal backend during multi-dataset operations? e.g. say we switch to ray for a snowflake-pandas merge? Is there a good way to test for whether this happens in practice?
I ran the pathological merge as a sanity check and it went from 140s -> 6s (it takes around 2s with hybrid disabled, but there's some thrashing because an unnecessary switch occurs after
Right now we don't allow automatic switching to Ray, as its omitted from |
| from modin.logging.metrics import emit_metric | ||
|
|
||
|
|
||
| def all_switchable_backends() -> list[str]: |
There was a problem hiding this comment.
I don't understand why this cannot be part of envvars
There was a problem hiding this comment.
We could make this configurable, but I just refactored this out from a function in the QC caster:
modin/modin/core/storage_formats/pandas/query_compiler_caster.py
Lines 800 to 807 in b002708
| """ | ||
| Calculate which query compiler we should cast to. | ||
|
|
||
| Switching calculation is performed as follows: |
There was a problem hiding this comment.
+1 to the documentation here.
|
@sfc-gh-mvashishtha After doing some more testing I realized it made more sense to only switch to other backends if we explicitly registered a function as a switch point, as is the case for 0/1-argument functions. I've updated the code to reflect this. |
What do these changes do?
After this PR,
AutoSwitchBackendnow has 2 separate behaviors for functions with multiple query compiler arguments:For example, after calling
pd.concat([A1, A2]), we previously would only consider switching to the backends of the query compilers of argumentsA1andA2. Now, after callingregister_function_for_pre_op_switch(class_name=None, backend="Backend_A", method="concat"), Modin may now move arguments to some third backendBackend_B.flake8 modin/ asv_bench/benchmarks scripts/doc_checker.pyblack --check modin/ asv_bench/benchmarks scripts/doc_checker.pygit commit -sdocs/development/architecture.rstis up-to-date