For more context, see the blog release post: https://vectorly.app/blog/open-telephony-stack
This repository contains the complete infrastructure for building a HIPAA-eligible phone system using Asterisk and AWS Chime SDK SIP Media Application.
This is a production-ready alternative to Twilio that gives you full control over your telephony infrastructure while maintaining HIPAA eligibility. Includes a sample voice agent server for integrating with AI voice services (e.g., OpenAI Realtime API).
Twilio has limitations for healthcare applications:
- Limited control over media routing
- Complex HIPAA BAA requirements
- $2,000/mo charge for HIPAA-compliant services
- Media Streams only available on select plans
This solution provides:
- HIPAA eligible solution - You control all infrastructure
- Complete control - Own your PBX, customize everything
- Production-ready - Battle-tested in real healthcare applications
- Scalable - Handles concurrent calls with ease
A complete and secure telephony system built to handle both inbound and outbound calls:
- Receives calls via AWS Chime Voice Connector (you get a real phone number)
- Terminates SIP/TLS on Asterisk running in Docker
- Bridges the audio via RTP to a WebSocket connection
- Streams base64 μ-law audio to your AI voice server
- Twilio-like API (the WebSocket interface is modeled after Twilio's Media Streams API)
You bring your own AI. This just handles the phone infrastructure.
Use case examples:
- Building voice AI in healthcare and need HIPAA compliance without Twilio's BAA costs
- Customizing call handling in ways Twilio doesn't allow
- Wanting full control over your telephony stack
- Learning how telephony infrastructure works and building a VoIP stack from scratch
Consider alternatives if:
- You just need basic voice for a side project (Twilio is easier)
- You don't want to manage infrastructure
- You don't have any special compliance needs
Infrastructure requires time and maintenance. That's the trade-off.
The system routes calls from AWS Chime Voice Connector through an Asterisk PBX server, which bridges RTP audio streams to a FastAPI shim server. The shim server converts the audio to WebSocket events compatible with Twilio's Media Streams API, allowing seamless integration with your voice AI services.
| Service | Port | Protocol | Description |
|---|---|---|---|
| Asterisk SIP | 5061 | TCP/TLS | SIP signaling with AWS Chime |
| Asterisk ARI | 8088 | HTTP | Asterisk REST Interface (localhost only) |
| Shim server | 8080 | HTTP | FastAPI server, health endpoints |
| RTP media | 10000-10299 | UDP | Audio streams to/from Asterisk |
Here's what happens when someone calls your number:
- Caller dials your AWS Chime phone number
- Chime sends SIP INVITE to your Asterisk server (TLS:5061)
- Asterisk matches the call in
extensions.confAnswer()Stasis(voice-agent)
- ARI sends
StasisStartevent to shim server via WebSocket - Shim server:
- Opens WebSocket to your voice server
- Creates ARI mixing bridge
- Adds PSTN channel to bridge
- Allocates UDP port for RTP (10000-10299); each live call gets its own port
- Creates
ExternalMediachannel pointing to that port - Adds
ExternalMediachannel to bridge
- Audio flows: PSTN ↔ Bridge ↔
ExternalMedia↔ Shim (RTP) ↔ Voice Server (WSS) - Caller hangs up (or AI ends call via a tool call)
- ARI sends
ChannelHangupRequest/ChannelDestroyed - Shim cleans up: closes WebSocket, deletes bridge, releases port
This project uses uv to manage Python environments and pinned dependencies.
# install uv
pip install uv
uv --version
# create a venv in .venv
uv venv --prompt open-telephony-stack
# activate env
source .venv/bin/activate # or use `uv run` without activating
# install exactly what lockfile says (if uv.lock exists)
uv sync
# dev install (editable mode, so `import _` from src/ works live)
uv pip install -e .pyproject.toml: Project metadata + declared dependencies.uv.lock: Fully pinned versions for reproducible installs (generated byuv lock). Do not manually edit.src/: Source package (open-telephony-stack).
# run shim server in env
uv run uvicorn servers.asterisk_shim_server:app --reloadAdd new library items to dependencies in pyproject.toml, and then:
# 1. edit pyproject.toml
uv lock
# 2. regenerate + sync
uv syncOr, if you just want uv.lock to use the latest version of dependencies (including submodules):
uv lock --upgrade --no-cacheDocker builds also install from pyproject.toml for consistency.
- Inbound Call: PSTN → AWS Chime SIP → Asterisk (via SIP/TLS)
- ARI Events: Asterisk notifies Shim Server via ARI WebSocket
- Bridge Setup: Shim creates mixing bridge, ExternalMedia channel
- RTP Streaming: Asterisk ↔ Shim Server (μ-law RTP on UDP)
- WebSocket Bridge: Shim ↔ Voice Server (WSS to internal ALB)
- AI Processing: Voice Server ↔ AI Voice Agent Server
1. Asterisk PBX (deployment/asterisk-server/)
The core telephony engine running in Docker:
- SIP/TLS termination with AWS Chime SDK
- Dialplan routing for inbound/outbound calls
- ARI (Asterisk REST Interface) for programmatic control
- ExternalMedia channels for RTP bridging
- Auto-renewing Let's Encrypt TLS certificates
Config Files:
pjsip.conf- SIP trunk configuration for AWS Chimeextensions.conf- Call routing dialplanari.conf- REST API credentialshttp.conf- HTTP server for ARImodules.conf- Loaded Asterisk modules
2. Shim Server (src/servers/asterisk_shim_server.py)
FastAPI application that bridges Asterisk to your voice AI service:
- ARI Supervisor - Manages WebSocket connection to Asterisk
- CallSession - Per-call state machine handling RTP ↔ WSS
- RTP Pacer - Maintains perfect 20ms cadence for audio
- Health endpoints - Monitor active calls and system status
3. Call Session Manager (src/ari/call_session.py)
The heart of the real-time audio processing:
- RTP socket management - Allocates ports from configurable range
- ExternalMedia lifecycle - Creates/destroys ARI channels
- Bidirectional audio - Concurrent RTP → WSS and WSS → RTP loops
- Buffer management - Queues audio for smooth 20ms frame delivery
- Mark/ACK handling - Supports interruption via Twilio-compatible events
4. ARI Supervisor (src/ari/asterisk_ari_supervisor.py)
Manages all active calls:
- WebSocket lifecycle - Auto-reconnect on disconnection
- StasisStart events - Spawns CallSession for new calls
- Cleanup orchestration - Graceful shutdown on hangup
- Health monitoring - Tracks active sessions
5. Voice Agent Server (src/servers/voice_agent_server.py)
An example FastAPI application that bridges the shim server to OpenAI Realtime API:
- WebSocket endpoint - Receives Twilio-compatible media streams from shim server
- OpenAI Realtime API integration - Connects to OpenAI for voice-to-voice AI conversations
- Call state management - Tracks call metadata, transcripts, and session state
- Tool call handling - Supports agent tools (e.g.,
end_call) for call control - Interruption handling - Detects caller speech and truncates assistant audio
- Transcript logging - Captures conversation transcripts for both user and assistant
- AWS Account with Chime SDK SIP Media Application
- EC2 Instance (t3.medium or larger, Amazon Linux 2023)
- Elastic IP assigned to EC2
- Domain name pointing to the Elastic IP
- AI Voice Agent Server (external service that handles voice AI processing)
- Docker and Docker Compose installed
- Create a SIP Media Application in AWS Chime console
- Provision a phone number and associate it with the SIP Media Application
- Configure outbound calling hostnames (your Asterisk server domain)
- Note your Chime outbound hostname (e.g.,
+1XXXXXXXXXX.voiceconnector.chime.aws)
Before setting up TLS certificates, you need to configure DNS so that AWS Chime can resolve your Asterisk server's hostname. Create an A record pointing your SIP subdomain to your EC2 instance's Elastic IP:
| Record Type | Name | Value | TTL |
|---|---|---|---|
| A | sip.yourdomain.com |
Your Elastic IP (e.g., 54.123.45.67) |
300 (or default) |
This DNS record must be in place before:
- Requesting Let's Encrypt certificates (Certbot validates domain ownership)
- Configuring AWS Chime Voice Connector termination (Chime needs to resolve the hostname)
- Setting
external_signaling_addressinpjsip.conf(must match the DNS name)
After creating the record, wait for DNS propagation (usually a few minutes, but can take up to 48 hours depending on TTL). You can verify with:
dig sip.yourdomain.com
# or
nslookup sip.yourdomain.comcd deployment/asterisk-server
# Install Docker & Docker Compose (if needed)
sudo yum -y install docker
sudo systemctl start docker && sudo systemctl enable docker
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
# Start Asterisk
docker-compose up -d
# View logs
docker logs -f asterisk-server
# Access Asterisk CLI
docker exec -it asterisk-server asterisk -rvvvvvAWS Chime requires TLS for SIP. Use Let's Encrypt:
# Install certbot
sudo yum install -y certbot
# Get certificate (ensure port 80 is open)
sudo certbot certonly --standalone \
--preferred-challenges http \
-d sip.yourdomain.com \
--agree-tos -m your@email.com
# Enable auto-renewal
sudo systemctl enable --now certbot-renew.timer
# Set up deploy hook to reload Asterisk
sudo mkdir -p /etc/letsencrypt/renewal-hooks/deploy
sudo tee /etc/letsencrypt/renewal-hooks/deploy/reload-asterisk.sh > /dev/null <<'EOF'
#!/bin/sh
docker exec asterisk-server asterisk -rx "core reload"
EOF
sudo chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-asterisk.shEdit deployment/asterisk-server/asterisk-config/pjsip.conf:
[chime-out]
type=endpoint
context=from-pstn
disallow=all
allow=ulaw
aors=chime-out
outbound_auth=chime-out
media_encryption=sdes
; ... (see full config in file)
[chime-in]
type=endpoint
context=from-pstn
disallow=all
allow=ulaw
; ... (see full config in file)Edit deployment/asterisk-server/asterisk-config/extensions.conf:
[from-pstn]
exten => s,1,Answer()
same => n,Stasis(voice-agent)
same => n,Hangup()# Set environment variables
cp .env.example .env
# Edit .env with your values:
# - ARI_BASE=http://127.0.0.1:8088/ari
# - ARI_USER=ariuser
# - ARI_PASS=your_password
# - ECS_MEDIA_WSS_URL=wss://your-voice-server.com/voice/voice
# - EXTERNAL_MEDIA_HOST=127.0.0.1
# - RTP_PORT_START=10000
# - RTP_PORT_END=10299
# Build Docker image
docker build -t asterisk-shim -f deployment/shim-server/Dockerfile .
# Run shim server (host networking for RTP)
docker run -d --env-file .env --network host --name asterisk-shim asterisk-shim
# View logs
docker logs -f asterisk-shimCall your AWS Chime phone number. You should see:
- Asterisk logs: Inbound SIP INVITE
- Shim logs: CallSession started, RTP sockets allocated
- Voice Server logs: WebSocket connection established
- AI Voice Agent Server: Session created and processing audio
# Asterisk ARI Configuration
ARI_BASE=http://127.0.0.1:8088/ari # Asterisk HTTP API endpoint
ARI_USER=ariuser # ARI username (matches ari.conf)
ARI_PASS=your_secure_password # ARI password (matches ari.conf)
ARI_APP=voice-agent # Stasis app name (matches extensions.conf)
# RTP Configuration
EXTERNAL_MEDIA_HOST=127.0.0.1 # IP where shim binds RTP sockets
RTP_PORT_START=10000 # Start of RTP port range
RTP_PORT_END=10299 # End of RTP port range (300 ports = 300 concurrent calls)
# Voice Server Integration
ECS_MEDIA_WSS_URL=wss://voice.internal.yourdomain.com/voice/voice # Internal ALB endpointpjsip.conf - SIP trunking to AWS Chime:
- Transport TLS on port 5061
- Outbound/inbound endpoints for Chime
- Media encryption (SDES)
- Authentication credentials
extensions.conf - Dialplan routing:
from-pstncontext for inbound calls- Answer → Stasis(voice-agent) → Hangup
- Outbound calling via PJSIP/+1XXXXXXXXXX@chime-out
ari.conf - REST API access:
- Username/password for shim server
- Allowed origins (CORS)
http.conf - HTTP server settings:
- Bind address 0.0.0.0:8088
- Enable HTTP for ARI
The WebSocket API is modeled after Twilio's Media Streams. If you've integrated with Twilio before, this will look familiar: same event structure, same audio format.
{
"event": "start",
"start": {
"streamSid": "unique-stream-id",
"callSid": "asterisk-channel-id",
"customParameters": {
"source": "asterisk-shim",
"format": "ulaw"
}
}
}{
"event": "media",
"streamSid": "unique-stream-id",
"media": {
"payload": "base64-encoded-ulaw-audio",
"timestamp": 1234
}
}Audio specs:
- Format: μ-law (PCMU)
- Sample rate: 8000 Hz
- Frame size: 160 bytes (20ms)
- Encoding: Base64
{
"event": "clear"
}Clears the audio buffer immediately. Used for barge-in / interruption handling.
{
"event": "mark",
"streamSid": "unique-stream-id",
"mark": {"name": "responsePart"}
}Used for tracking audio playback position. The shim ACKs marks when the corresponding audio has actually been transmitted.
{
"event": "stop",
"streamSid": "unique-stream-id"
}Asterisk is mature, stable, and handles SIP/RTP at scale. But it doesn't natively support AI voice interactions. By using ARI (Asterisk REST Interface) and ExternalMedia channels, we can:
- Keep Asterisk doing what it does best (SIP, RTP, call routing)
- Bridge to modern WebSocket-based AI services
- Maintain frame-perfect 20ms audio cadence
- Handle interruptions gracefully
The shim server maintains two concurrent audio loops per call:
Loop 1: RTP → WebSocket (Caller Audio)
while not closed:
# Receive 20ms μ-law frame from Asterisk
datagram = await sock.recvfrom(2048)
payload = rtp_strip_header(datagram)
# Forward to voice server via WebSocket
await ecs_ws.send(json.dumps({
"event": "media",
"streamSid": stream_sid,
"media": {
"payload": base64.b64encode(payload).decode(),
"timestamp": timestamp_ms
}
}))Loop 2: WebSocket → RTP (AI Audio)
# Separate pacer task sends RTP every 20ms
async def rtp_pacer():
while not closed:
await asyncio.sleep(0.02) # 20ms tick
# Pull frame from buffer (or send silence)
if buffer:
frame = buffer.pop(Config.FRAME_BYTES)
else:
frame = bytes([0xFF]) * Config.FRAME_BYTES # μ-law silence
rtp_send(frame, marker=is_first_frame)This ensures perfect timing - Asterisk expects audio every 20ms, regardless of network jitter from the WebSocket connection.
Asterisk's ExternalMedia channel type creates a client-mode RTP connection:
- Shim allocates UDP port (e.g., 10000)
- Creates ExternalMedia channel pointing to
127.0.0.1:10000 - Asterisk sends RTP to that socket
- Asterisk exposes
UNICASTRTP_LOCAL_ADDRESSandUNICASTRTP_LOCAL_PORTvariables - Shim queries these variables to discover where to send return audio
Each call uses an ARI mixing bridge:
PSTN Channel ────┐
│
├──► Mixing Bridge ──► Mixed Audio
│
ExternalMedia────┘
This allows future enhancements like:
- Conference calling
- Music on hold
- Call recording
- Call transfer
Vertical Scaling:
- Each CallSession uses ~5-10MB RAM
- t3.medium handles 50+ concurrent calls
- t3.large handles 200+ concurrent calls
Horizontal Scaling:
- Run multiple Asterisk+Shim instances
- Use AWS Chime load balancing across SIP endpoints
- Share nothing architecture (each instance independent)
TLS Everywhere:
- SIP over TLS (port 5061) to AWS Chime
- WSS (WebSocket Secure) to voice server
- Let's Encrypt auto-renewal
Firewall Rules (Security Groups):
| Port | Protocol | Source | Description |
|---|---|---|---|
| 22 | TCP | Your IP | SSH |
| 80 | TCP | 0.0.0.0/0 | Let's Encrypt ACME challenge |
| 5061 | TCP | AWS Chime IPs | SIP/TLS |
| 10000-10299 | UDP | AWS Chime IPs | RTP media |
The repo includes a Lambda function that automatically updates your security group when AWS publishes new IP ranges for AMAZON, EC2, and CHIME_VOICECONNECTOR services.
Credentials:
- ARI username/password in environment variables
- Never hardcode in config files
- Rotate regularly
Health Endpoints:
# Shim server health
curl http://localhost:8080/health
# Returns:
{
"status": "ok",
"supervisor_task_status": "running",
"config": { ... },
"running": true,
"active_sessions": 5,
"active_channels": ["channel-id-1", "channel-id-2", ...]
}Asterisk CLI:
docker exec -it asterisk-server asterisk -rx 'core show channels'
docker exec -it asterisk-server asterisk -rx 'pjsip show endpoints'
docker exec -it asterisk-server asterisk -rx 'ari show apps'Logs:
# Asterisk logs
docker logs -f asterisk-server
# Shim server logs
docker logs -f asterisk-shim
# Enable verbose Asterisk logging
docker exec -it asterisk-server asterisk -rx 'core set verbose 10'
docker exec -it asterisk-server asterisk -rx 'pjsip set logger on'Check Asterisk SIP registration:
docker exec -it asterisk-server asterisk -rx 'pjsip show endpoints'Check ARI connectivity:
curl http://127.0.0.1:8088/ari/asterisk/info?api_key=ariuser:your-passwordCheck shim server status:
curl http://localhost:8080/healthCheck RTP ports are open:
sudo netstat -tulpn | grep '10[0-9][0-9][0-9]'Check ExternalMedia channel:
docker exec -it asterisk-server asterisk -rx 'core show channels'
# Should see UnicastRTP/ channelEnable RTP debugging:
docker exec -it asterisk-server asterisk -rx 'rtp set debug on'Check network to AWS Chime:
ping $(dig +short [your-chime-hostname])Check CPU usage:
top -b -n 1 | grep asteriskReduce concurrent calls if CPU > 80%
.
├── LICENSE
├── README.md
├── aws_lambda
│ └── update_telephony_vm_sg.py
├── deployment
│ ├── asterisk-server
│ │ ├── README.md
│ │ ├── asterisk-config
│ │ │ ├── ari.conf
│ │ │ ├── asterisk.conf
│ │ │ ├── extensions.conf
│ │ │ ├── http.conf
│ │ │ ├── logger.conf
│ │ │ ├── modules.conf
│ │ │ ├── pjsip.conf
│ │ │ └── rtp.conf
│ │ └── docker-compose.yml
│ ├── shim-server
│ │ └── Dockerfile
│ └── voice-agent-server
│ └── Dockerfile
├── images
│ └── system-architecture.png
├── pyproject.toml
├── src
│ ├── __init__.py
│ ├── ari
│ │ ├── __init__.py
│ │ ├── asterisk_ari_supervisor.py
│ │ └── call_session.py
│ ├── env_config.py
│ ├── servers
│ │ ├── __init__.py
│ │ ├── asterisk_shim_server.py
│ │ └── voice_agent_server.py
│ └── utils
│ ├── logger.py
│ └── telephony_utils.py
└── uv.lock
