🚀 VRAM-Relay Node

VRAM-Relay is a high-performance infrastructure designed to virtualize and offload GPU memory (VRAM) from mobile devices to remote compute servers.

Built for the era of local-first AI, it enables Android devices to run heavyweight models (LLMs, Stable Diffusion, vision models) by transparently leveraging the raw GPU power of a nearby PC or server.

🏗️ System Architecture

1. Relay Server (Linux / CUDA)

The server node is responsible for GPU memory management and AI execution.

VRAM Manager Dynamically allocates and releases GPU buffers based on client sessions and model requirements.
Socket Engine An asynchronous, high-throughput TCP server capable of handling multiple concurrent inference streams.
Discovery Provider A lightweight UDP broadcast service allowing instant detection of relay nodes by mobile devices on the local network.

2. Android Client (JNI / C++ / Java)

The Android client is optimized for minimal latency and seamless integration.

Native Bridge Ultra-fast C/C++ layer (JNI) to reduce serialization and network overhead.
Discovery Service A background Android Service that scans the local Wi-Fi network for available relay nodes.
Transparent API A simplified Java/Kotlin interface for loading models, sending prompts, and retrieving tensors or text outputs.

3. "VRAM-Link" Protocol

Pure Binary Protocol No JSON, no HTTP — minimal overhead with deterministic performance.
Fixed 16-byte Header Includes packet type, payload size, sequencing, and CRC checksum.
Network Optimization Optional LZ4 compression for large tensor transfers and model weights.

🛠️ Installation & Deployment

Server Requirements

OS: Linux (Ubuntu 22.04+ recommended)
Hardware: NVIDIA GPU (Pascal architecture or newer)
Software: Docker & NVIDIA Container Toolkit

1. Docker Deployment

# Clone the repository
git clone https://github.com/your-repo/vram-relay.git
cd vram-relay

# Build the optimized Docker image
sudo docker build -t vram-relay-node -f docker/Dockerfile .

# Run the relay node
# Ports:
#  - 8765 TCP: data & inference
#  - 8766 UDP: service discovery
sudo docker run --gpus all \
  -p 8765:8765/tcp \
  -p 8766:8766/udp \
  --restart unless-stopped \
  --name vram-relay \
  vram-relay-node

2. Android Client Setup

Open the android-client/ directory in Android Studio.
Ensure network permissions are present in AndroidManifest.xml (already included).
Build the project:

./gradlew assembleDebug

📊 Performance & Latency (Wi-Fi 6 / LAN)

The protocol is optimized for near real-time responsiveness.

Operation	Latency (ms)	Effective Throughput
UDP Discovery	< 100 ms	N/A
RTT (Ping / Pong)	2 – 8 ms	< 1 KB
Model Loading	200 – 1200 ms	Up to 120 MB/s
LLM Inference	50 – 250 ms	GPU-dependent

💻 Integration Example (Java)

VRAMRelayClient client = new VRAMRelayClient(context);
client.initialize();

// 1. Automatic server discovery
client.discoverServers(3000, servers -> {
    if (!servers.isEmpty()) {
        // 2. Connect to the first available GPU node
        client.connect(servers.get(0));

        // 3. Load a model into remote VRAM
        client.loadModel("llama-3-8b-q4", true);
    }
});

🛡️ Security & Best Practices

⚠️ Important

Local Network Only The protocol is unencrypted by default to maximize throughput. For Internet access, always use a secure VPN tunnel (WireGuard or Tailscale).
Process Isolation Docker confines the relay process and limits access to the host file system.
Timeout Handling Idle connections automatically release allocated VRAM after 5 minutes of inactivity.

🗺️ Roadmap

Multi-GPU support (NVLink / SLI)
Token-by-token streaming inference
Adaptive compression based on Wi-Fi signal quality
Native plugins for PyTorch and Hugging Face

📄 License & Contributions

This project is distributed under the MIT License.

Contributions are welcome via Pull Requests. Bug reports, performance improvements, and protocol extensions are highly encouraged.

VRAM-Relay aims to break the VRAM barrier and bring truly powerful AI inference to mobile and constrained devices — without compromise.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
android-client		android-client
benchmarks		benchmarks
docker		docker
protocol		protocol
relay-node		relay-node
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 VRAM-Relay Node

🏗️ System Architecture

1. Relay Server (Linux / CUDA)

2. Android Client (JNI / C++ / Java)

3. "VRAM-Link" Protocol

🛠️ Installation & Deployment

Server Requirements

1. Docker Deployment

2. Android Client Setup

📊 Performance & Latency (Wi-Fi 6 / LAN)

💻 Integration Example (Java)

🛡️ Security & Best Practices

🗺️ Roadmap

📄 License & Contributions

About

Uh oh!

Releases

Packages

Languages

License

damienos61/VRAM-Relay-Node

Folders and files

Latest commit

History

Repository files navigation

🚀 VRAM-Relay Node

🏗️ System Architecture

1. Relay Server (Linux / CUDA)

2. Android Client (JNI / C++ / Java)

3. "VRAM-Link" Protocol

🛠️ Installation & Deployment

Server Requirements

1. Docker Deployment

2. Android Client Setup

📊 Performance & Latency (Wi-Fi 6 / LAN)

💻 Integration Example (Java)

🛡️ Security & Best Practices

🗺️ Roadmap

📄 License & Contributions

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages