How to accelerate inference?

Hi,

I enabled the cublas compilation option.

The problem is that not charge o process all in GRAM memory?

What is the best line command to construct and execute in a CUDA 3090 with 24GB GRAM in the more fast posibility for each model?