Fix incorrect gradient in Solve for structured assume_a (sym/pos/her)#1887
Fix incorrect gradient in Solve for structured assume_a (sym/pos/her)#1887WHOIM1205 wants to merge 3 commits intopymc-devs:mainfrom
Conversation
Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
|
Is this related to #1230 ? |
This PR is not directly related to #1230. |
|
I think the question was whether it's the same nature of issue |
tests/tensor/test_slinalg.py
Outdated
|
|
||
| analytic = f(A_val, b_val) | ||
|
|
||
| # Numerical gradient: perturb only the read triangle |
There was a problem hiding this comment.
Can we concot a graph that allows us to use verify_grad instead and still works as regression test?
Something whose input is just the triangular entries? I'm assuming they were being half counted?
You can still verify they came out as zeros on an explicit grad fn
There was a problem hiding this comment.
Can we concot a graph that allows us to use
verify_gradinstead and still works as regression test?Something whose input is just the triangular entries? I'm assuming they were being half counted?
You can still verify they came out as zeros on an explicit grad fn
Thanks for the suggestion i've updated the test accordingly
The manual finite-difference loop has been replaced with utt.verify_grad using a triangular parameterization of the structured entries the symmetric matrix is reconstructed inside the graph and i’ve also added an explicit assertion that the unread triangle has zero gradients
all parametrized cases are passing locally
I suppose this is a mistake comment from some other work |
Ah thanks for clarifying |
Apologies that was clearly pasted from another PR by mistake. |
Replace manual finite-difference loop with verify_grad using triangular parameterization. Add explicit zero-gradient assertion for unread triangle. Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
| # triangle contribute to both (i,j) and (j,i) of the effective matrix, | ||
| # so we must accumulate the symmetric contribution and zero the unread triangle. | ||
| if self.lower: | ||
| res[0] = ptb.tril(A_bar) + ptb.tril(A_bar.mT, -1) |
There was a problem hiding this comment.
| res[0] = ptb.tril(A_bar) + ptb.tril(A_bar.mT, -1) | |
| res[0] = ptb.tril(A_bar) + ptb.tril(A_bar.conj().mT, -1) |
Otherwise the hermetian case is wrong
|
I think it would be good to have a simple closed-form test. Given: The inverse is analytically computable: Given And the gradient with respect to Checking pytensor: import pytensor
import pytensor.tensor as pt
rho = pt.dscalar('rho')
A = pt.stacklists([[1., 0.], [rho, 1.]])
b = pt.stacklists(([[1.], [0]]))
x = pt.linalg.solve(A, b, assume_a='pos')
loss = x[1, 0]
dL_dx = pt.grad(loss, rho)
fn = pytensor.function([rho], dL_dx)
fn(0.88) # array(-19.64815653)
# Analytical:
def expected_grad(rho):
return -(1 + rho ** 2) / (1 - rho ** 2) ** 2
expected_grad(0.88) # -34.86368894924802Note though that this only occurs because we passed a non-PSD matrix to solve. It is a numerical quirk that the underlying algorithm is able to treat this matrix as PSD, notwithstanding that it is not. If you instead pass |
Fix gradient handling in
Solvefor structuredassume_acasesSummary
SolveBase.L_opcomputes gradients assuming all entries ofAare independent. This is correct forassume_a="gen".However, when using structured assumptions (
"sym","her","pos"), the solver only reads one triangle ofA. The backward pass did not account for this, resulting in incorrect gradients when a pre-structured matrix was passed directly intopt.linalg.solve.Existing tests did not catch this because they wrapped the input matrix with a symmetrization transform, which masked the issue via the chain rule.
Fix
Updated
pytensor/tensor/slinalg.py:python
Before
Inherited from SolveBase
A_bar = -outer(b_bar, c)