| title | description | type | status |
|---|---|---|---|
WebSocket Binary Protocol Reference |
VisionFlow uses a custom binary WebSocket protocol optimized for real-time XR collaboration, semantic graph synchronization, and low-latency node updates. The protocol achieves 36 bytes per node up... |
reference |
stable |
VisionFlow uses a custom binary WebSocket protocol optimized for real-time XR collaboration, semantic graph synchronization, and low-latency node updates. The protocol achieves 36 bytes per node update at 90 Hz, enabling smooth multi-user immersive experiences.
Client → Server:
MESSAGE-TYPE: 0x00 (HELLO)
PROTOCOL-VERSION: u32 (current: 1)
CLIENT-ID: UUID (128 bits)
CAPABILITIES: u32 (bitmask)
bit 0: hand-tracking
bit 1: eye-tracking
bit 2: voice-enabled
bit 3: ar-supported
bit 4: vr-supported
PLATFORM: u8
0: WebXR
1: Meta Quest
2: Apple Vision Pro
3: SteamVR
4: Desktop/Fallback
Total: 1 + 4 + 16 + 4 + 1 = 26 bytes
Server → Client:
MESSAGE-TYPE: 0x01 (WELCOME)
SESSION-ID: UUID (128 bits)
WORLD-ID: UUID (128 bits)
PROTOCOL-VERSION: u32
CAPABILITY-FLAGS: u32 (server capabilities)
TIMESTAMP: u64 (server time in milliseconds)
STATE-SNAPSHOT-SIZE: u32
[STATE-SNAPSHOT] (variable, gzip compressed)
Total: 1 + 16 + 16 + 4 + 4 + 8 + 4 + variable
Heartbeat (bidirectional, 30-second interval):
MESSAGE-TYPE: 0x02 (PING)
TIMESTAMP: u64
SEQUENCE: u32
Response:
MESSAGE-TYPE: 0x03 (PONG)
TIMESTAMP: u64
SEQUENCE: u32
Total: 13 bytes
┌─────────────────┬──────────────────┬────────────┬─────────────┐
│ Message Type │ User ID │ Timestamp │ Data Length │
│ (1 byte) │ (4 bytes) │ (4 bytes) │ (2 bytes) │
├─────────────────┼──────────────────┼────────────┼─────────────┤
│ u8 │ u32 (hash) │ u32 (delta)│ u16 │
├─────────────────┴──────────────────┴────────────┴─────────────┤
│ Payload (variable, up to 512 bytes) │
└────────────────────────────────────────────────────────────────┘
class WebSocketFrame:
def --init--(self):
self.message-type: u8
self.user-id: u32 # Hash of UUID for compactness
self.timestamp: u32 # Delta from last frame (4-byte window)
self.data-length: u16 # 0-512 bytes
self.payload: bytes| Type | Name | Purpose | Response |
|---|---|---|---|
| 0x01 | WELCOME | Server greeting + snapshot | None |
| 0x02 | PING | Connection check | PONG (0x03) |
| 0x03 | PONG | Ping response | None |
| 0x04 | SYNC-REQUEST | Request full sync | SYNC-RESPONSE |
| 0x05 | SYNC-RESPONSE | Full world state | None |
| Type | Name | Purpose | Frequency |
|---|---|---|---|
| 0x10 | POSE-UPDATE | User head/hand transforms | 90 Hz |
| 0x11 | AVATAR-STATE | Avatar appearance/status | On change |
| 0x12 | USER-JOIN | New user entered space | On event |
| 0x13 | USER-LEAVE | User left space | On event |
| 0x14 | VOICE-DATA | Audio stream | ~50 Hz (16kHz mono) |
| Type | Name | Purpose | Frequency |
|---|---|---|---|
| 0x20 | GESTURE-EVENT | Hand gesture recognized | On gesture |
| 0x21 | VOICE-COMMAND | Voice command | On speech |
| 0x22 | OBJECT-SELECT | Object interaction | On action |
| 0x23 | OBJECT-GRAB | Object grabbed | On action |
| 0x24 | OBJECT-RELEASE | Object released | On action |
| Type | Name | Purpose | Frequency |
|---|---|---|---|
| 0x30 | NODE-CREATE | New ontology node | On creation |
| 0x31 | NODE-UPDATE | Update node properties | On change |
| 0x32 | NODE-DELETE | Remove node | On deletion |
| 0x33 | EDGE-CREATE | New relationship | On creation |
| 0x34 | EDGE-DELETE | Remove relationship | On deletion |
| 0x35 | CONSTRAINT-APPLY | Physics constraint | On change |
| Type | Name | Purpose | Frequency |
|---|---|---|---|
| 0x40 | AGENT-ACTION | Agent-initiated action | On action |
| 0x41 | AGENT-RESPONSE | Agent response data | On response |
| 0x42 | AGENT-STATUS | Agent status update | 1 Hz |
| Type | Name | Purpose | Frequency |
|---|---|---|---|
| 0x50 | ERROR | Error notification | On error |
| 0x51 | ACK | Message acknowledgment | On receipt |
| 0x52 | NACK | Negative acknowledgment | On reject |
Optimized transform update for user pose (head + hands):
┌──────────────┬──────────────┬──────────────┬──────────────┐
│ Position X │ Position Y │ Position Z │ Rotation X │
│ float16 (2) │ float16 (2) │ float16 (2) │ float16 (2) │
├──────────────┼──────────────┼──────────────┼──────────────┤
│ Rotation Y │ Rotation Z │ Rotation W │ Hand State │
│ float16 (2) │ float16 (2) │ float16 (2) │ u16 (2) │
├──────────────┴──────────────┴──────────────┴──────────────┤
│ Velocity (velocity estimation for smooth interpolation) │
│ float16 x3 (6 bytes) = [vx, vy, vz] │
├─────────────────────────────────────────────────────────────┤
│ Hand State: 16-bit packed │
│ Left Hand: 4 bits (open, pinch, point, fist) │
│ Right Hand: 4 bits (open, pinch, point, fist) │
│ Head Rotation Confidence: 4 bits (0-15) │
│ Tracking State: 4 bits (calibrated, tracking, lost, etc)│
└─────────────────────────────────────────────────────────────┘
Total: 8 + 8 + 12 + 2 + 6 = 36 bytes (efficient!)
Typescript Example:
class PoseUpdate {
position: Vector3; // XYZ coords (float16 each = 6 bytes)
rotation: Quaternion; // XYZW (float16 each = 8 bytes)
handState: u16; // Packed hand gesture + tracking state
velocity: Vector3; // Movement direction (float16 each = 6 bytes)
serialize(): Buffer {
const buf = Buffer.alloc(36);
// Compress position to float16
buf.writeUInt16LE(this.position.x, 0);
buf.writeUInt16LE(this.position.y, 2);
buf.writeUInt16LE(this.position.z, 4);
// ... etc for rotation and velocity
return buf;
}
}Update ontology node in shared graph:
┌──────────────┬──────────────┬──────────────┬──────────────┐
│ Node ID │ Property ID │ Value Type │ Value Data │
│ UUID (16) │ u16 │ u8 │ variable │
├──────────────┴──────────────┴──────────────┴──────────────┤
│ Value Data Types: │
│ 0x00: null (0 bytes) │
│ 0x01: boolean (1 byte) │
│ 0x02: u32 (4 bytes) │
│ 0x03: f32 (4 bytes) │
│ 0x04: string (1 + length bytes) │
│ 0x05: vector3 (12 bytes) │
│ 0x06: uri (variable) │
└────────────────────────────────────────────────────────────┘
Minimum: 16 + 2 + 1 = 19 bytes
Opus-encoded audio at 16kHz mono:
┌──────────────┬──────────────┬──────────────────────────────┐
│ Sequence │ Frame Type │ Opus Payload │
│ u16 │ u8 │ (variable, ~160 bytes) │
├──────────────┼──────────────┼──────────────────────────────┤
│ Frame Types: │ │ │
│ 0: speech │ 1: noise │ 2: silence │
│ 3: end-frame │ │ │
└──────────────┴──────────────┴──────────────────────────────┘
Total: ~160 bytes at 50 fps (20 ms frames) = 8 KB/s per user
Error reporting:
┌─────────────┬──────────────┬──────────────┬──────────────┐
│ Error Code │ Severity │ Message Len │ Message │
│ u16 │ u8 (0-3) │ u8 │ ASCII string │
├─────────────┼──────────────┼──────────────┴──────────────┤
│ Severity Codes: │ │
│ 0: Info 1: Warning 2: Error 3: Fatal │
└─────────────────────────────────────────────────────────┘
Error codes reference: [Error Codes Reference](./error-codes.md)
// Only send changed fields
class DeltaPose {
flags: u8; // Bitmask of which fields changed
// bit 0: position changed
// bit 1: rotation changed
// bit 2: velocity changed
// bits 3-7: reserved
payload: Buffer; // Only includes changed fields
// If position changed: 6 bytes (3x float16)
// If rotation changed: 8 bytes (4x float16)
// If velocity changed: 6 bytes (3x float16)
// Example: position + rotation = 1 + 6 + 8 = 15 bytes
// vs full update = 36 bytes (58% reduction!)
}Uses gzip for graph updates:
// On server
const graphDelta = computeChanges(previousState, currentState);
const compressed = gzip(serialize(graphDelta));
// Threshold: send full state if delta > 80% of full size
if (compressed.length > fullState.length * 0.8) {
sendFullState();
} else {
sendDeltaUpdate(compressed);
}| Content | Message Type | Frequency | Bandwidth |
|---|---|---|---|
| Pose | POSE-UPDATE | 90 Hz | 36 bytes × 90 = 3.24 MB/s |
| Voice | VOICE-DATA | 50 Hz (20ms frames) | ~160 bytes × 50 = 8 KB/s |
| Gestures | GESTURE-EVENT | ~5-10 per sec | ~50 bytes × 10 = 500 B/s |
| Graph | NODE-UPDATE | Variable | ~1-10 KB/s |
| Overhead | Headers + keepalive | Constant | ~1 KB/s |
| TOTAL | - | - | ~13-15 KB/s per user |
- 10 concurrent users: 130-150 KB/s (1 Mb/s bandwidth)
- 100 concurrent users: 1.3-1.5 MB/s (10 Mb/s bandwidth)
- 1000 concurrent users: 13-15 MB/s (100 Mb/s bandwidth)
class WebSocketClient {
private reconnectAttempts = 0;
private maxReconnectAttempts = 10;
private reconnectDelay = 1000; // 1 second
onDisconnect() {
if (this.reconnectAttempts < this.maxReconnectAttempts) {
setTimeout(() => {
this.connect();
this.reconnectAttempts++;
}, this.reconnectDelay * Math.pow(2, this.reconnectAttempts));
} else {
this.showFatalError("Unable to reconnect to server");
}
}
}// All incoming messages validate:
function validateMessage(frame: WebSocketFrame): boolean {
// 1. Check message type is valid
if (frame.messageType > 0x5F) return false;
// 2. Check payload size
if (frame.dataLength > 512) return false;
// 3. Check timestamp is reasonable (max 30 second skew)
const timeDelta = Math.abs(Date.now() - frame.timestamp);
if (timeDelta > 30000) return false;
// 4. Check sequence numbers (prevent duplicates/reordering)
if (frame.sequence <= this.lastSequence) return false;
return true;
}For concurrent edits:
// Both users edit node simultaneously
User1: NODE-UPDATE { id: 'node-1', value: 100, timestamp: 1000 }
User2: NODE-UPDATE { id: 'node-1', value: 200, timestamp: 1001 }
// Server resolves with later timestamp
Result: value = 200 (User2 wins)
// User1 receives NACK + corrected value
Server: NACK { reason: "concurrent-edit", correctValue: 200 }For concurrent graph modifications:
// Conflict-free replicated data type strategy
// Each user has unique ID prefix
User1-ID: "user-a"
User2-ID: "user-b"
// Create operations get unique identifiers
Operation: { id: "user-a-1000", timestamp: 1000, ... }
Operation: { id: "user-b-1000", timestamp: 1000, ... }
// Server merges using CRDT rules (commutative, idempotent)
// Both operations can be applied in any order with same result- All messages validated against schema
- Payload sizes capped at 512 bytes
- User IDs verified against authentication context
- All traffic uses WSS (WebSocket Secure = TLS)
- Sensitive data (voice, positioning) encrypted end-to-end
- Eye gaze data encrypted per-frame
// Per-user rate limiting
const LIMITS = {
POSE-UPDATE: 100, // max per second
NODE-UPDATE: 10,
GESTURE-EVENT: 20,
VOICE-DATA: 60
};
function checkRateLimit(userId: string, msgType: u8): boolean {
const key = `${userId}:${msgType}`;
const count = this.rateLimitMap.get(key) || 0;
if (count >= LIMITS[msgType]) {
return false; // Drop message
}
this.rateLimitMap.set(key, count + 1);
return true;
}| Parameter | Value | Notes |
|---|---|---|
| Max Connections | 1000 | Per server instance |
| Pose Update Rate | 90 Hz | Match HMD refresh rate |
| Heartbeat Interval | 30 sec | Keep-alive |
| Max Message Size | 512 B | Prevents flooding |
| Compression | gzip | For graph updates |
| Voice Codec | Opus 16kHz | High quality, low latency |
| Buffer Size | 64 KB | Per-connection |
- Error Codes Reference - Error code definitions
-
- HTTP/REST interface
- XR Immersive System Architecture - XR platform integration
- Network Architecture - Low-level networking
Last Updated: 2025-11-04 Protocol Version: 1.0 Status: Production Ready Changelog: First release (November 2025)