Skip to content

HIPAA-eligible DIY Twilio alternative for voice AI telephone applications. Uses Asterisk PBX and AWS Chime SIP trunking.

License

Notifications You must be signed in to change notification settings

VectorlyApp/open-telephony-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HIPAA-eligible DIY Twilio alternative

For more context, see the blog release post: https://vectorly.app/blog/open-telephony-stack

This repository contains the complete infrastructure for building a HIPAA-eligible phone system using Asterisk and AWS Chime SDK SIP Media Application.

This is a production-ready alternative to Twilio that gives you full control over your telephony infrastructure while maintaining HIPAA eligibility. Includes a sample voice agent server for integrating with AI voice services (e.g., OpenAI Realtime API).

Why this matters

Twilio has limitations for healthcare applications:

  • Limited control over media routing
  • Complex HIPAA BAA requirements
  • $2,000/mo charge for HIPAA-compliant services
  • Media Streams only available on select plans

This solution provides:

  • HIPAA eligible solution - You control all infrastructure
  • Complete control - Own your PBX, customize everything
  • Production-ready - Battle-tested in real healthcare applications
  • Scalable - Handles concurrent calls with ease

What this is

A complete and secure telephony system built to handle both inbound and outbound calls:

  1. Receives calls via AWS Chime Voice Connector (you get a real phone number)
  2. Terminates SIP/TLS on Asterisk running in Docker
  3. Bridges the audio via RTP to a WebSocket connection
  4. Streams base64 μ-law audio to your AI voice server
  5. Twilio-like API (the WebSocket interface is modeled after Twilio's Media Streams API)

You bring your own AI. This just handles the phone infrastructure.

Who this is for

Use case examples:

  • Building voice AI in healthcare and need HIPAA compliance without Twilio's BAA costs
  • Customizing call handling in ways Twilio doesn't allow
  • Wanting full control over your telephony stack
  • Learning how telephony infrastructure works and building a VoIP stack from scratch

Consider alternatives if:

  • You just need basic voice for a side project (Twilio is easier)
  • You don't want to manage infrastructure
  • You don't have any special compliance needs

Infrastructure requires time and maintenance. That's the trade-off.

Architecture

The system routes calls from AWS Chime Voice Connector through an Asterisk PBX server, which bridges RTP audio streams to a FastAPI shim server. The shim server converts the audio to WebSocket events compatible with Twilio's Media Streams API, allowing seamless integration with your voice AI services.

System Architecture

Port reference

Service Port Protocol Description
Asterisk SIP 5061 TCP/TLS SIP signaling with AWS Chime
Asterisk ARI 8088 HTTP Asterisk REST Interface (localhost only)
Shim server 8080 HTTP FastAPI server, health endpoints
RTP media 10000-10299 UDP Audio streams to/from Asterisk

Call flow

Here's what happens when someone calls your number:

  1. Caller dials your AWS Chime phone number
  2. Chime sends SIP INVITE to your Asterisk server (TLS:5061)
  3. Asterisk matches the call in extensions.conf
    • Answer()
    • Stasis(voice-agent)
  4. ARI sends StasisStart event to shim server via WebSocket
  5. Shim server:
    • Opens WebSocket to your voice server
    • Creates ARI mixing bridge
    • Adds PSTN channel to bridge
    • Allocates UDP port for RTP (10000-10299); each live call gets its own port
    • Creates ExternalMedia channel pointing to that port
    • Adds ExternalMedia channel to bridge
  6. Audio flows: PSTN ↔ Bridge ↔ ExternalMedia ↔ Shim (RTP) ↔ Voice Server (WSS)
  7. Caller hangs up (or AI ends call via a tool call)
  8. ARI sends ChannelHangupRequest / ChannelDestroyed
  9. Shim cleans up: closes WebSocket, deletes bridge, releases port

Environment and dependencies with uv

This project uses uv to manage Python environments and pinned dependencies.

Setup

# install uv
pip install uv
uv --version

# create a venv in .venv
uv venv --prompt open-telephony-stack

# activate env
source .venv/bin/activate  # or use `uv run` without activating

# install exactly what lockfile says (if uv.lock exists)
uv sync

# dev install (editable mode, so `import _` from src/ works live)
uv pip install -e .

Files

  • pyproject.toml: Project metadata + declared dependencies.
  • uv.lock: Fully pinned versions for reproducible installs (generated by uv lock). Do not manually edit.
  • src/: Source package (open-telephony-stack).

Common commands

# run shim server in env
uv run uvicorn servers.asterisk_shim_server:app --reload

Updating dependencies

Add new library items to dependencies in pyproject.toml, and then:

# 1. edit pyproject.toml
uv lock
# 2. regenerate + sync
uv sync

Or, if you just want uv.lock to use the latest version of dependencies (including submodules):

uv lock --upgrade --no-cache

Docker builds also install from pyproject.toml for consistency.

Architecture overview

Signal flow

  1. Inbound Call: PSTN → AWS Chime SIP → Asterisk (via SIP/TLS)
  2. ARI Events: Asterisk notifies Shim Server via ARI WebSocket
  3. Bridge Setup: Shim creates mixing bridge, ExternalMedia channel
  4. RTP Streaming: Asterisk ↔ Shim Server (μ-law RTP on UDP)
  5. WebSocket Bridge: Shim ↔ Voice Server (WSS to internal ALB)
  6. AI Processing: Voice Server ↔ AI Voice Agent Server

Key components

1. Asterisk PBX (deployment/asterisk-server/)

The core telephony engine running in Docker:

  • SIP/TLS termination with AWS Chime SDK
  • Dialplan routing for inbound/outbound calls
  • ARI (Asterisk REST Interface) for programmatic control
  • ExternalMedia channels for RTP bridging
  • Auto-renewing Let's Encrypt TLS certificates

Config Files:

  • pjsip.conf - SIP trunk configuration for AWS Chime
  • extensions.conf - Call routing dialplan
  • ari.conf - REST API credentials
  • http.conf - HTTP server for ARI
  • modules.conf - Loaded Asterisk modules

FastAPI application that bridges Asterisk to your voice AI service:

  • ARI Supervisor - Manages WebSocket connection to Asterisk
  • CallSession - Per-call state machine handling RTP ↔ WSS
  • RTP Pacer - Maintains perfect 20ms cadence for audio
  • Health endpoints - Monitor active calls and system status

3. Call Session Manager (src/ari/call_session.py)

The heart of the real-time audio processing:

  • RTP socket management - Allocates ports from configurable range
  • ExternalMedia lifecycle - Creates/destroys ARI channels
  • Bidirectional audio - Concurrent RTP → WSS and WSS → RTP loops
  • Buffer management - Queues audio for smooth 20ms frame delivery
  • Mark/ACK handling - Supports interruption via Twilio-compatible events

Manages all active calls:

  • WebSocket lifecycle - Auto-reconnect on disconnection
  • StasisStart events - Spawns CallSession for new calls
  • Cleanup orchestration - Graceful shutdown on hangup
  • Health monitoring - Tracks active sessions

5. Voice Agent Server (src/servers/voice_agent_server.py)

An example FastAPI application that bridges the shim server to OpenAI Realtime API:

  • WebSocket endpoint - Receives Twilio-compatible media streams from shim server
  • OpenAI Realtime API integration - Connects to OpenAI for voice-to-voice AI conversations
  • Call state management - Tracks call metadata, transcripts, and session state
  • Tool call handling - Supports agent tools (e.g., end_call) for call control
  • Interruption handling - Detects caller speech and truncates assistant audio
  • Transcript logging - Captures conversation transcripts for both user and assistant

Quick start

Prerequisites

  • AWS Account with Chime SDK SIP Media Application
  • EC2 Instance (t3.medium or larger, Amazon Linux 2023)
  • Elastic IP assigned to EC2
  • Domain name pointing to the Elastic IP
  • AI Voice Agent Server (external service that handles voice AI processing)
  • Docker and Docker Compose installed

1. Set up AWS Chime SIP trunk

  1. Create a SIP Media Application in AWS Chime console
  2. Provision a phone number and associate it with the SIP Media Application
  3. Configure outbound calling hostnames (your Asterisk server domain)
  4. Note your Chime outbound hostname (e.g., +1XXXXXXXXXX.voiceconnector.chime.aws)

2. Configure DNS

Before setting up TLS certificates, you need to configure DNS so that AWS Chime can resolve your Asterisk server's hostname. Create an A record pointing your SIP subdomain to your EC2 instance's Elastic IP:

Record Type Name Value TTL
A sip.yourdomain.com Your Elastic IP (e.g., 54.123.45.67) 300 (or default)

This DNS record must be in place before:

  1. Requesting Let's Encrypt certificates (Certbot validates domain ownership)
  2. Configuring AWS Chime Voice Connector termination (Chime needs to resolve the hostname)
  3. Setting external_signaling_address in pjsip.conf (must match the DNS name)

After creating the record, wait for DNS propagation (usually a few minutes, but can take up to 48 hours depending on TTL). You can verify with:

dig sip.yourdomain.com
# or
nslookup sip.yourdomain.com

3. Install Asterisk server

cd deployment/asterisk-server

# Install Docker & Docker Compose (if needed)
sudo yum -y install docker
sudo systemctl start docker && sudo systemctl enable docker
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Start Asterisk
docker-compose up -d

# View logs
docker logs -f asterisk-server

# Access Asterisk CLI
docker exec -it asterisk-server asterisk -rvvvvv

4. Configure TLS certificates

AWS Chime requires TLS for SIP. Use Let's Encrypt:

# Install certbot
sudo yum install -y certbot

# Get certificate (ensure port 80 is open)
sudo certbot certonly --standalone \
  --preferred-challenges http \
  -d sip.yourdomain.com \
  --agree-tos -m your@email.com

# Enable auto-renewal
sudo systemctl enable --now certbot-renew.timer

# Set up deploy hook to reload Asterisk
sudo mkdir -p /etc/letsencrypt/renewal-hooks/deploy
sudo tee /etc/letsencrypt/renewal-hooks/deploy/reload-asterisk.sh > /dev/null <<'EOF'
#!/bin/sh
docker exec asterisk-server asterisk -rx "core reload"
EOF
sudo chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-asterisk.sh

5. Configure Asterisk

Edit deployment/asterisk-server/asterisk-config/pjsip.conf:

[chime-out]
type=endpoint
context=from-pstn
disallow=all
allow=ulaw
aors=chime-out
outbound_auth=chime-out
media_encryption=sdes
; ... (see full config in file)

[chime-in]
type=endpoint
context=from-pstn
disallow=all
allow=ulaw
; ... (see full config in file)

Edit deployment/asterisk-server/asterisk-config/extensions.conf:

[from-pstn]
exten => s,1,Answer()
 same => n,Stasis(voice-agent)
 same => n,Hangup()

6. Deploy shim server

# Set environment variables
cp .env.example .env
# Edit .env with your values:
# - ARI_BASE=http://127.0.0.1:8088/ari
# - ARI_USER=ariuser
# - ARI_PASS=your_password
# - ECS_MEDIA_WSS_URL=wss://your-voice-server.com/voice/voice
# - EXTERNAL_MEDIA_HOST=127.0.0.1
# - RTP_PORT_START=10000
# - RTP_PORT_END=10299

# Build Docker image
docker build -t asterisk-shim -f deployment/shim-server/Dockerfile .

# Run shim server (host networking for RTP)
docker run -d --env-file .env --network host --name asterisk-shim asterisk-shim

# View logs
docker logs -f asterisk-shim

7. Test the system

Call your AWS Chime phone number. You should see:

  1. Asterisk logs: Inbound SIP INVITE
  2. Shim logs: CallSession started, RTP sockets allocated
  3. Voice Server logs: WebSocket connection established
  4. AI Voice Agent Server: Session created and processing audio

Configuration reference

Environment variables

# Asterisk ARI Configuration
ARI_BASE=http://127.0.0.1:8088/ari      # Asterisk HTTP API endpoint
ARI_USER=ariuser                          # ARI username (matches ari.conf)
ARI_PASS=your_secure_password             # ARI password (matches ari.conf)
ARI_APP=voice-agent                       # Stasis app name (matches extensions.conf)

# RTP Configuration
EXTERNAL_MEDIA_HOST=127.0.0.1             # IP where shim binds RTP sockets
RTP_PORT_START=10000                      # Start of RTP port range
RTP_PORT_END=10299                        # End of RTP port range (300 ports = 300 concurrent calls)

# Voice Server Integration
ECS_MEDIA_WSS_URL=wss://voice.internal.yourdomain.com/voice/voice  # Internal ALB endpoint

Asterisk configuration files

pjsip.conf - SIP trunking to AWS Chime:

  • Transport TLS on port 5061
  • Outbound/inbound endpoints for Chime
  • Media encryption (SDES)
  • Authentication credentials

extensions.conf - Dialplan routing:

  • from-pstn context for inbound calls
  • Answer → Stasis(voice-agent) → Hangup
  • Outbound calling via PJSIP/+1XXXXXXXXXX@chime-out

ari.conf - REST API access:

  • Username/password for shim server
  • Allowed origins (CORS)

http.conf - HTTP server settings:

  • Bind address 0.0.0.0:8088
  • Enable HTTP for ARI

Real-time audio format

The WebSocket API is modeled after Twilio's Media Streams. If you've integrated with Twilio before, this will look familiar: same event structure, same audio format.

Start event (shim → voice server)

{
    "event": "start",
    "start": {
        "streamSid": "unique-stream-id",
        "callSid": "asterisk-channel-id",
        "customParameters": {
            "source": "asterisk-shim",
            "format": "ulaw"
        }
    }
}

Media event (bidirectional)

{
    "event": "media",
    "streamSid": "unique-stream-id",
    "media": {
        "payload": "base64-encoded-ulaw-audio",
        "timestamp": 1234
    }
}

Audio specs:

  • Format: μ-law (PCMU)
  • Sample rate: 8000 Hz
  • Frame size: 160 bytes (20ms)
  • Encoding: Base64

Clear event (voice server → shim)

{
    "event": "clear"
}

Clears the audio buffer immediately. Used for barge-in / interruption handling.

Mark event (bidirectional)

{
    "event": "mark",
    "streamSid": "unique-stream-id",
    "mark": {"name": "responsePart"}
}

Used for tracking audio playback position. The shim ACKs marks when the corresponding audio has actually been transmitted.

Stop event (either direction)

{
    "event": "stop",
    "streamSid": "unique-stream-id"
}

Architecture deep dive

Why this topology?

Asterisk is mature, stable, and handles SIP/RTP at scale. But it doesn't natively support AI voice interactions. By using ARI (Asterisk REST Interface) and ExternalMedia channels, we can:

  1. Keep Asterisk doing what it does best (SIP, RTP, call routing)
  2. Bridge to modern WebSocket-based AI services
  3. Maintain frame-perfect 20ms audio cadence
  4. Handle interruptions gracefully

RTP → WebSocket bridge

The shim server maintains two concurrent audio loops per call:

Loop 1: RTP → WebSocket (Caller Audio)

while not closed:
    # Receive 20ms μ-law frame from Asterisk
    datagram = await sock.recvfrom(2048)
    payload = rtp_strip_header(datagram)

    # Forward to voice server via WebSocket
    await ecs_ws.send(json.dumps({
        "event": "media",
        "streamSid": stream_sid,
        "media": {
            "payload": base64.b64encode(payload).decode(),
            "timestamp": timestamp_ms
        }
    }))

Loop 2: WebSocket → RTP (AI Audio)

# Separate pacer task sends RTP every 20ms
async def rtp_pacer():
    while not closed:
        await asyncio.sleep(0.02)  # 20ms tick

        # Pull frame from buffer (or send silence)
        if buffer:
            frame = buffer.pop(Config.FRAME_BYTES)
        else:
            frame = bytes([0xFF]) * Config.FRAME_BYTES  # μ-law silence

        rtp_send(frame, marker=is_first_frame)

This ensures perfect timing - Asterisk expects audio every 20ms, regardless of network jitter from the WebSocket connection.

ExternalMedia channels

Asterisk's ExternalMedia channel type creates a client-mode RTP connection:

  1. Shim allocates UDP port (e.g., 10000)
  2. Creates ExternalMedia channel pointing to 127.0.0.1:10000
  3. Asterisk sends RTP to that socket
  4. Asterisk exposes UNICASTRTP_LOCAL_ADDRESS and UNICASTRTP_LOCAL_PORT variables
  5. Shim queries these variables to discover where to send return audio

Mixing bridge

Each call uses an ARI mixing bridge:

PSTN Channel ────┐
                 │
                 ├──► Mixing Bridge ──► Mixed Audio
                 │
ExternalMedia────┘

This allows future enhancements like:

  • Conference calling
  • Music on hold
  • Call recording
  • Call transfer

Production considerations

Scaling

Vertical Scaling:

  • Each CallSession uses ~5-10MB RAM
  • t3.medium handles 50+ concurrent calls
  • t3.large handles 200+ concurrent calls

Horizontal Scaling:

  • Run multiple Asterisk+Shim instances
  • Use AWS Chime load balancing across SIP endpoints
  • Share nothing architecture (each instance independent)

Security

TLS Everywhere:

  • SIP over TLS (port 5061) to AWS Chime
  • WSS (WebSocket Secure) to voice server
  • Let's Encrypt auto-renewal

Firewall Rules (Security Groups):

Port Protocol Source Description
22 TCP Your IP SSH
80 TCP 0.0.0.0/0 Let's Encrypt ACME challenge
5061 TCP AWS Chime IPs SIP/TLS
10000-10299 UDP AWS Chime IPs RTP media

The repo includes a Lambda function that automatically updates your security group when AWS publishes new IP ranges for AMAZON, EC2, and CHIME_VOICECONNECTOR services.

Credentials:

  • ARI username/password in environment variables
  • Never hardcode in config files
  • Rotate regularly

Monitoring

Health Endpoints:

# Shim server health
curl http://localhost:8080/health

# Returns:
{
  "status": "ok",
  "supervisor_task_status": "running",
  "config": { ... },
  "running": true,
  "active_sessions": 5,
  "active_channels": ["channel-id-1", "channel-id-2", ...]
}

Asterisk CLI:

docker exec -it asterisk-server asterisk -rx 'core show channels'
docker exec -it asterisk-server asterisk -rx 'pjsip show endpoints'
docker exec -it asterisk-server asterisk -rx 'ari show apps'

Logs:

# Asterisk logs
docker logs -f asterisk-server

# Shim server logs
docker logs -f asterisk-shim

# Enable verbose Asterisk logging
docker exec -it asterisk-server asterisk -rx 'core set verbose 10'
docker exec -it asterisk-server asterisk -rx 'pjsip set logger on'

Troubleshooting

Call not connecting

Check Asterisk SIP registration:

docker exec -it asterisk-server asterisk -rx 'pjsip show endpoints'

Check ARI connectivity:

curl http://127.0.0.1:8088/ari/asterisk/info?api_key=ariuser:your-password

Check shim server status:

curl http://localhost:8080/health

No audio

Check RTP ports are open:

sudo netstat -tulpn | grep '10[0-9][0-9][0-9]'

Check ExternalMedia channel:

docker exec -it asterisk-server asterisk -rx 'core show channels'
# Should see UnicastRTP/ channel

Enable RTP debugging:

docker exec -it asterisk-server asterisk -rx 'rtp set debug on'

High latency

Check network to AWS Chime:

ping $(dig +short [your-chime-hostname])

Check CPU usage:

top -b -n 1 | grep asterisk

Reduce concurrent calls if CPU > 80%

File structure

.
├── LICENSE
├── README.md
├── aws_lambda
│   └── update_telephony_vm_sg.py
├── deployment
│   ├── asterisk-server
│   │   ├── README.md
│   │   ├── asterisk-config
│   │   │   ├── ari.conf
│   │   │   ├── asterisk.conf
│   │   │   ├── extensions.conf
│   │   │   ├── http.conf
│   │   │   ├── logger.conf
│   │   │   ├── modules.conf
│   │   │   ├── pjsip.conf
│   │   │   └── rtp.conf
│   │   └── docker-compose.yml
│   ├── shim-server
│   │   └── Dockerfile
│   └── voice-agent-server
│       └── Dockerfile
├── images
│   └── system-architecture.png
├── pyproject.toml
├── src
│   ├── __init__.py
│   ├── ari
│   │   ├── __init__.py
│   │   ├── asterisk_ari_supervisor.py
│   │   └── call_session.py
│   ├── env_config.py
│   ├── servers
│   │   ├── __init__.py
│   │   ├── asterisk_shim_server.py
│   │   └── voice_agent_server.py
│   └── utils
│       ├── logger.py
│       └── telephony_utils.py
└── uv.lock

Resources

Asterisk

AWS

Docker Hub

Frameworks

Background reading

About

HIPAA-eligible DIY Twilio alternative for voice AI telephone applications. Uses Asterisk PBX and AWS Chime SIP trunking.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published