First of all, amazing work — thank you for sharing this!
I noticed in the blog post that inference was performed using an H200 GPU. While that's impressive, such hardware is far beyond the reach of most individual users.
I'm wondering if it's possible to run inference on more accessible, consumer-grade GPUs — for example, an RTX 4090 with 24GB of VRAM. Would that be sufficient? Are there any recommended optimizations or settings for running on such hardware?
Looking forward to your advice, and thanks again for the great work!