HIPAA-eligible DIY Twilio alternative

For more context, see the blog release post: https://vectorly.app/blog/open-telephony-stack

This repository contains the complete infrastructure for building a HIPAA-eligible phone system using Asterisk and AWS Chime SDK SIP Media Application.

This is a production-ready alternative to Twilio that gives you full control over your telephony infrastructure while maintaining HIPAA eligibility. Includes a sample voice agent server for integrating with AI voice services (e.g., OpenAI Realtime API).

Why this matters

Twilio has limitations for healthcare applications:

Limited control over media routing
Complex HIPAA BAA requirements
$2,000/mo charge for HIPAA-compliant services
Media Streams only available on select plans

This solution provides:

HIPAA eligible solution - You control all infrastructure
Complete control - Own your PBX, customize everything
Production-ready - Battle-tested in real healthcare applications
Scalable - Handles concurrent calls with ease

What this is

A complete and secure telephony system built to handle both inbound and outbound calls:

Receives calls via AWS Chime Voice Connector (you get a real phone number)
Terminates SIP/TLS on Asterisk running in Docker
Bridges the audio via RTP to a WebSocket connection
Streams base64 μ-law audio to your AI voice server
Twilio-like API (the WebSocket interface is modeled after Twilio's Media Streams API)

You bring your own AI. This just handles the phone infrastructure.

Who this is for

Use case examples:

Building voice AI in healthcare and need HIPAA compliance without Twilio's BAA costs
Customizing call handling in ways Twilio doesn't allow
Wanting full control over your telephony stack
Learning how telephony infrastructure works and building a VoIP stack from scratch

Consider alternatives if:

You just need basic voice for a side project (Twilio is easier)
You don't want to manage infrastructure
You don't have any special compliance needs

Infrastructure requires time and maintenance. That's the trade-off.

Architecture

The system routes calls from AWS Chime Voice Connector through an Asterisk PBX server, which bridges RTP audio streams to a FastAPI shim server. The shim server converts the audio to WebSocket events compatible with Twilio's Media Streams API, allowing seamless integration with your voice AI services.

Port reference

Service	Port	Protocol	Description
Asterisk SIP	5061	TCP/TLS	SIP signaling with AWS Chime
Asterisk ARI	8088	HTTP	Asterisk REST Interface (localhost only)
Shim server	8080	HTTP	FastAPI server, health endpoints
RTP media	10000-10299	UDP	Audio streams to/from Asterisk

Call flow

Here's what happens when someone calls your number:

Caller dials your AWS Chime phone number
Chime sends SIP INVITE to your Asterisk server (TLS:5061)
Asterisk matches the call in extensions.conf
- Answer()
- Stasis(voice-agent)
ARI sends StasisStart event to shim server via WebSocket
Shim server:
- Opens WebSocket to your voice server
- Creates ARI mixing bridge
- Adds PSTN channel to bridge
- Allocates UDP port for RTP (10000-10299); each live call gets its own port
- Creates ExternalMedia channel pointing to that port
- Adds ExternalMedia channel to bridge
Audio flows: PSTN ↔ Bridge ↔ ExternalMedia ↔ Shim (RTP) ↔ Voice Server (WSS)
Caller hangs up (or AI ends call via a tool call)
ARI sends ChannelHangupRequest / ChannelDestroyed
Shim cleans up: closes WebSocket, deletes bridge, releases port

Environment and dependencies with `uv`

This project uses uv to manage Python environments and pinned dependencies.

Setup

# install uv
pip install uv
uv --version

# create a venv in .venv
uv venv --prompt open-telephony-stack

# activate env
source .venv/bin/activate  # or use `uv run` without activating

# install exactly what lockfile says (if uv.lock exists)
uv sync

# dev install (editable mode, so `import _` from src/ works live)
uv pip install -e .

Files

pyproject.toml: Project metadata + declared dependencies.
uv.lock: Fully pinned versions for reproducible installs (generated by uv lock). Do not manually edit.
src/: Source package (open-telephony-stack).

Common commands

# run shim server in env
uv run uvicorn servers.asterisk_shim_server:app --reload

Updating dependencies

Add new library items to dependencies in pyproject.toml, and then:

# 1. edit pyproject.toml
uv lock
# 2. regenerate + sync
uv sync

Or, if you just want uv.lock to use the latest version of dependencies (including submodules):

uv lock --upgrade --no-cache

Docker builds also install from pyproject.toml for consistency.

Architecture overview

Signal flow

Inbound Call: PSTN → AWS Chime SIP → Asterisk (via SIP/TLS)
ARI Events: Asterisk notifies Shim Server via ARI WebSocket
Bridge Setup: Shim creates mixing bridge, ExternalMedia channel
RTP Streaming: Asterisk ↔ Shim Server (μ-law RTP on UDP)
WebSocket Bridge: Shim ↔ Voice Server (WSS to internal ALB)
AI Processing: Voice Server ↔ AI Voice Agent Server

Key components

1. Asterisk PBX (deployment/asterisk-server/)

The core telephony engine running in Docker:

SIP/TLS termination with AWS Chime SDK
Dialplan routing for inbound/outbound calls
ARI (Asterisk REST Interface) for programmatic control
ExternalMedia channels for RTP bridging
Auto-renewing Let's Encrypt TLS certificates

Config Files:

pjsip.conf - SIP trunk configuration for AWS Chime
extensions.conf - Call routing dialplan
ari.conf - REST API credentials
http.conf - HTTP server for ARI
modules.conf - Loaded Asterisk modules

2. Shim Server (src/servers/asterisk_shim_server.py)

FastAPI application that bridges Asterisk to your voice AI service:

ARI Supervisor - Manages WebSocket connection to Asterisk
CallSession - Per-call state machine handling RTP ↔ WSS
RTP Pacer - Maintains perfect 20ms cadence for audio
Health endpoints - Monitor active calls and system status

3. Call Session Manager (src/ari/call_session.py)

The heart of the real-time audio processing:

RTP socket management - Allocates ports from configurable range
ExternalMedia lifecycle - Creates/destroys ARI channels
Bidirectional audio - Concurrent RTP → WSS and WSS → RTP loops
Buffer management - Queues audio for smooth 20ms frame delivery
Mark/ACK handling - Supports interruption via Twilio-compatible events

4. ARI Supervisor (src/ari/asterisk_ari_supervisor.py)

Manages all active calls:

WebSocket lifecycle - Auto-reconnect on disconnection
StasisStart events - Spawns CallSession for new calls
Cleanup orchestration - Graceful shutdown on hangup
Health monitoring - Tracks active sessions

5. Voice Agent Server (src/servers/voice_agent_server.py)

An example FastAPI application that bridges the shim server to OpenAI Realtime API:

WebSocket endpoint - Receives Twilio-compatible media streams from shim server
OpenAI Realtime API integration - Connects to OpenAI for voice-to-voice AI conversations
Call state management - Tracks call metadata, transcripts, and session state
Tool call handling - Supports agent tools (e.g., end_call) for call control
Interruption handling - Detects caller speech and truncates assistant audio
Transcript logging - Captures conversation transcripts for both user and assistant

Quick start

Prerequisites

AWS Account with Chime SDK SIP Media Application
EC2 Instance (t3.medium or larger, Amazon Linux 2023)
Elastic IP assigned to EC2
Domain name pointing to the Elastic IP
AI Voice Agent Server (external service that handles voice AI processing)
Docker and Docker Compose installed

1. Set up AWS Chime SIP trunk

Create a SIP Media Application in AWS Chime console
Provision a phone number and associate it with the SIP Media Application
Configure outbound calling hostnames (your Asterisk server domain)
Note your Chime outbound hostname (e.g., +1XXXXXXXXXX.voiceconnector.chime.aws)

2. Configure DNS

Before setting up TLS certificates, you need to configure DNS so that AWS Chime can resolve your Asterisk server's hostname. Create an A record pointing your SIP subdomain to your EC2 instance's Elastic IP:

Record Type	Name	Value	TTL
A	`sip.yourdomain.com`	Your Elastic IP (e.g., `54.123.45.67`)	300 (or default)

This DNS record must be in place before:

Requesting Let's Encrypt certificates (Certbot validates domain ownership)
Configuring AWS Chime Voice Connector termination (Chime needs to resolve the hostname)
Setting external_signaling_address in pjsip.conf (must match the DNS name)

After creating the record, wait for DNS propagation (usually a few minutes, but can take up to 48 hours depending on TTL). You can verify with:

dig sip.yourdomain.com
# or
nslookup sip.yourdomain.com

3. Install Asterisk server

cd deployment/asterisk-server

# Install Docker & Docker Compose (if needed)
sudo yum -y install docker
sudo systemctl start docker && sudo systemctl enable docker
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Start Asterisk
docker-compose up -d

# View logs
docker logs -f asterisk-server

# Access Asterisk CLI
docker exec -it asterisk-server asterisk -rvvvvv

4. Configure TLS certificates

AWS Chime requires TLS for SIP. Use Let's Encrypt:

# Install certbot
sudo yum install -y certbot

# Get certificate (ensure port 80 is open)
sudo certbot certonly --standalone \
  --preferred-challenges http \
  -d sip.yourdomain.com \
  --agree-tos -m your@email.com

# Enable auto-renewal
sudo systemctl enable --now certbot-renew.timer

# Set up deploy hook to reload Asterisk
sudo mkdir -p /etc/letsencrypt/renewal-hooks/deploy
sudo tee /etc/letsencrypt/renewal-hooks/deploy/reload-asterisk.sh > /dev/null <<'EOF'
#!/bin/sh
docker exec asterisk-server asterisk -rx "core reload"
EOF
sudo chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-asterisk.sh

5. Configure Asterisk

Edit deployment/asterisk-server/asterisk-config/pjsip.conf:

[chime-out]
type=endpoint
context=from-pstn
disallow=all
allow=ulaw
aors=chime-out
outbound_auth=chime-out
media_encryption=sdes
; ... (see full config in file)

[chime-in]
type=endpoint
context=from-pstn
disallow=all
allow=ulaw
; ... (see full config in file)

Edit deployment/asterisk-server/asterisk-config/extensions.conf:

[from-pstn]
exten => s,1,Answer()
 same => n,Stasis(voice-agent)
 same => n,Hangup()

6. Deploy shim server

# Set environment variables
cp .env.example .env
# Edit .env with your values:
# - ARI_BASE=http://127.0.0.1:8088/ari
# - ARI_USER=ariuser
# - ARI_PASS=your_password
# - ECS_MEDIA_WSS_URL=wss://your-voice-server.com/voice/voice
# - EXTERNAL_MEDIA_HOST=127.0.0.1
# - RTP_PORT_START=10000
# - RTP_PORT_END=10299

# Build Docker image
docker build -t asterisk-shim -f deployment/shim-server/Dockerfile .

# Run shim server (host networking for RTP)
docker run -d --env-file .env --network host --name asterisk-shim asterisk-shim

# View logs
docker logs -f asterisk-shim

7. Test the system

Call your AWS Chime phone number. You should see:

Asterisk logs: Inbound SIP INVITE
Shim logs: CallSession started, RTP sockets allocated
Voice Server logs: WebSocket connection established
AI Voice Agent Server: Session created and processing audio

Configuration reference

Environment variables

# Asterisk ARI Configuration
ARI_BASE=http://127.0.0.1:8088/ari      # Asterisk HTTP API endpoint
ARI_USER=ariuser                          # ARI username (matches ari.conf)
ARI_PASS=your_secure_password             # ARI password (matches ari.conf)
ARI_APP=voice-agent                       # Stasis app name (matches extensions.conf)

# RTP Configuration
EXTERNAL_MEDIA_HOST=127.0.0.1             # IP where shim binds RTP sockets
RTP_PORT_START=10000                      # Start of RTP port range
RTP_PORT_END=10299                        # End of RTP port range (300 ports = 300 concurrent calls)

# Voice Server Integration
ECS_MEDIA_WSS_URL=wss://voice.internal.yourdomain.com/voice/voice  # Internal ALB endpoint

Asterisk configuration files

pjsip.conf - SIP trunking to AWS Chime:

Transport TLS on port 5061
Outbound/inbound endpoints for Chime
Media encryption (SDES)
Authentication credentials

extensions.conf - Dialplan routing:

from-pstn context for inbound calls
Answer → Stasis(voice-agent) → Hangup
Outbound calling via PJSIP/+1XXXXXXXXXX@chime-out

ari.conf - REST API access:

Username/password for shim server
Allowed origins (CORS)

http.conf - HTTP server settings:

Bind address 0.0.0.0:8088
Enable HTTP for ARI

Real-time audio format

The WebSocket API is modeled after Twilio's Media Streams. If you've integrated with Twilio before, this will look familiar: same event structure, same audio format.

Start event (shim → voice server)

{
    "event": "start",
    "start": {
        "streamSid": "unique-stream-id",
        "callSid": "asterisk-channel-id",
        "customParameters": {
            "source": "asterisk-shim",
            "format": "ulaw"
        }
    }
}

Media event (bidirectional)

{
    "event": "media",
    "streamSid": "unique-stream-id",
    "media": {
        "payload": "base64-encoded-ulaw-audio",
        "timestamp": 1234
    }
}

Audio specs:

Format: μ-law (PCMU)
Sample rate: 8000 Hz
Frame size: 160 bytes (20ms)
Encoding: Base64

Clear event (voice server → shim)

{
    "event": "clear"
}

Clears the audio buffer immediately. Used for barge-in / interruption handling.

Mark event (bidirectional)

{
    "event": "mark",
    "streamSid": "unique-stream-id",
    "mark": {"name": "responsePart"}
}

Used for tracking audio playback position. The shim ACKs marks when the corresponding audio has actually been transmitted.

Stop event (either direction)

{
    "event": "stop",
    "streamSid": "unique-stream-id"
}

Architecture deep dive

Why this topology?

Asterisk is mature, stable, and handles SIP/RTP at scale. But it doesn't natively support AI voice interactions. By using ARI (Asterisk REST Interface) and ExternalMedia channels, we can:

Keep Asterisk doing what it does best (SIP, RTP, call routing)
Bridge to modern WebSocket-based AI services
Maintain frame-perfect 20ms audio cadence
Handle interruptions gracefully

RTP → WebSocket bridge

The shim server maintains two concurrent audio loops per call:

Loop 1: RTP → WebSocket (Caller Audio)

while not closed:
    # Receive 20ms μ-law frame from Asterisk
    datagram = await sock.recvfrom(2048)
    payload = rtp_strip_header(datagram)

    # Forward to voice server via WebSocket
    await ecs_ws.send(json.dumps({
        "event": "media",
        "streamSid": stream_sid,
        "media": {
            "payload": base64.b64encode(payload).decode(),
            "timestamp": timestamp_ms
        }
    }))

Loop 2: WebSocket → RTP (AI Audio)

# Separate pacer task sends RTP every 20ms
async def rtp_pacer():
    while not closed:
        await asyncio.sleep(0.02)  # 20ms tick

        # Pull frame from buffer (or send silence)
        if buffer:
            frame = buffer.pop(Config.FRAME_BYTES)
        else:
            frame = bytes([0xFF]) * Config.FRAME_BYTES  # μ-law silence

        rtp_send(frame, marker=is_first_frame)

This ensures perfect timing - Asterisk expects audio every 20ms, regardless of network jitter from the WebSocket connection.

ExternalMedia channels

Asterisk's ExternalMedia channel type creates a client-mode RTP connection:

Shim allocates UDP port (e.g., 10000)
Creates ExternalMedia channel pointing to 127.0.0.1:10000
Asterisk sends RTP to that socket
Asterisk exposes UNICASTRTP_LOCAL_ADDRESS and UNICASTRTP_LOCAL_PORT variables
Shim queries these variables to discover where to send return audio

Mixing bridge

Each call uses an ARI mixing bridge:

PSTN Channel ────┐
                 │
                 ├──► Mixing Bridge ──► Mixed Audio
                 │
ExternalMedia────┘

This allows future enhancements like:

Conference calling
Music on hold
Call recording
Call transfer

Production considerations

Scaling

Vertical Scaling:

Each CallSession uses ~5-10MB RAM
t3.medium handles 50+ concurrent calls
t3.large handles 200+ concurrent calls

Horizontal Scaling:

Run multiple Asterisk+Shim instances
Use AWS Chime load balancing across SIP endpoints
Share nothing architecture (each instance independent)

Security

TLS Everywhere:

SIP over TLS (port 5061) to AWS Chime
WSS (WebSocket Secure) to voice server
Let's Encrypt auto-renewal

Firewall Rules (Security Groups):

Port	Protocol	Source	Description
22	TCP	Your IP	SSH
80	TCP	0.0.0.0/0	Let's Encrypt ACME challenge
5061	TCP	AWS Chime IPs	SIP/TLS
10000-10299	UDP	AWS Chime IPs	RTP media

The repo includes a Lambda function that automatically updates your security group when AWS publishes new IP ranges for AMAZON, EC2, and CHIME_VOICECONNECTOR services.

Credentials:

ARI username/password in environment variables
Never hardcode in config files
Rotate regularly

Monitoring

Health Endpoints:

# Shim server health
curl http://localhost:8080/health

# Returns:
{
  "status": "ok",
  "supervisor_task_status": "running",
  "config": { ... },
  "running": true,
  "active_sessions": 5,
  "active_channels": ["channel-id-1", "channel-id-2", ...]
}

Asterisk CLI:

docker exec -it asterisk-server asterisk -rx 'core show channels'
docker exec -it asterisk-server asterisk -rx 'pjsip show endpoints'
docker exec -it asterisk-server asterisk -rx 'ari show apps'

Logs:

# Asterisk logs
docker logs -f asterisk-server

# Shim server logs
docker logs -f asterisk-shim

# Enable verbose Asterisk logging
docker exec -it asterisk-server asterisk -rx 'core set verbose 10'
docker exec -it asterisk-server asterisk -rx 'pjsip set logger on'

Troubleshooting

Call not connecting

Check Asterisk SIP registration:

docker exec -it asterisk-server asterisk -rx 'pjsip show endpoints'

Check ARI connectivity:

curl http://127.0.0.1:8088/ari/asterisk/info?api_key=ariuser:your-password

Check shim server status:

curl http://localhost:8080/health

No audio

Check RTP ports are open:

sudo netstat -tulpn | grep '10[0-9][0-9][0-9]'

Check ExternalMedia channel:

docker exec -it asterisk-server asterisk -rx 'core show channels'
# Should see UnicastRTP/ channel

Enable RTP debugging:

docker exec -it asterisk-server asterisk -rx 'rtp set debug on'

High latency

Check network to AWS Chime:

ping $(dig +short [your-chime-hostname])

Check CPU usage:

top -b -n 1 | grep asterisk

Reduce concurrent calls if CPU > 80%

File structure

.
├── LICENSE
├── README.md
├── aws_lambda
│   └── update_telephony_vm_sg.py
├── deployment
│   ├── asterisk-server
│   │   ├── README.md
│   │   ├── asterisk-config
│   │   │   ├── ari.conf
│   │   │   ├── asterisk.conf
│   │   │   ├── extensions.conf
│   │   │   ├── http.conf
│   │   │   ├── logger.conf
│   │   │   ├── modules.conf
│   │   │   ├── pjsip.conf
│   │   │   └── rtp.conf
│   │   └── docker-compose.yml
│   ├── shim-server
│   │   └── Dockerfile
│   └── voice-agent-server
│       └── Dockerfile
├── images
│   └── system-architecture.png
├── pyproject.toml
├── src
│   ├── __init__.py
│   ├── ari
│   │   ├── __init__.py
│   │   ├── asterisk_ari_supervisor.py
│   │   └── call_session.py
│   ├── env_config.py
│   ├── servers
│   │   ├── __init__.py
│   │   ├── asterisk_shim_server.py
│   │   └── voice_agent_server.py
│   └── utils
│       ├── logger.py
│       └── telephony_utils.py
└── uv.lock

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
aws_lambda		aws_lambda
deployment		deployment
images		images
src		src
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

VectorlyApp/open-telephony-stack

Folders and files

Latest commit

History

Repository files navigation