Is there a way to improve the speed performance of the solver by doing the computation on GPU? If so, which tensors should be assigned to 'cuda'?