Skip to content

Conversation

@dorian-K
Copy link
Contributor

No description provided.

@dorian-K

This comment was marked as resolved.

This comment was marked as outdated.

@dorian-K
Copy link
Contributor Author

dorian-K commented Dec 19, 2025

Todo:

  • tracers that use energy and att_weights are broken now, and there isn't a straightforward way to fix because even if we automatically fall back to the returnn impl once we detect a tracer, that implementation is in the backend class and not in rf.dot_attention so current implementations don't find those variables. As far as I can tell this is only used in attention weight analyses and a test in test_rf_attention.py, so not huge impact but still annoying. internal flag to fall back to the vanilla implementation, and duplicate some of the tests in test_rf_attention to test both the new an the vanilla implementation, and att weights / energy will only be tested for vanilla impl
  • write some tests to verify that torch scaled_dot_product_attention produces the same result as the returnn fallback impl
  • convert existing building blocks in attention to use efficient is_causal=True parameter
  • maybe rename scaled_dot_product_attention to just dot_attention
  • there have been significant changes to torch scaled_dot_product_attention since version 2.0.0 till now, verify that all are compatible to the current impl

@albertz
Copy link
Member

albertz commented Dec 19, 2025

tracers that use energy and att_weights are broken now, and there isn't a straightforward way to fix because even if we automatically fall back to the returnn impl once we detect a tracer, that implementation is in the backend class and not in rf.dot_attention so current implementations don't find those variables. As far as I can tell this is only used in attention weight analyses and a test in test_rf_attention.py, so not huge impact but still annoying

I think it is to be expected that using such tracers can never be reliable and stable. We don't guarantee that, and we don't need to guarantee that. So this is not really an issue.

We should still keep the existing tests. For that, there should be a flag (maybe only internal flag) to disable this and fall back to the current vanilla implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants