A high-performance VRAM virtualization and relay system to offload AI inference (LLMs, Vision) from Android devices to remote GPU nodes via a custom low-latency binary protocol.
cplusplus docker-container distributed-computing low-latency socket-programming android-jni remote-gpu llm-inference vram-virtualization
-
Updated
Feb 6, 2026 - C