CalledIt: A Serverless Prediction Verification Platform

⚠️ DEMONSTRATION PROJECT ONLY
This is a demo/educational project showcasing serverless AI architecture patterns. NOT intended for production use. See License and Disclaimers for important usage restrictions.

CalledIt is a serverless web application that converts natural language predictions into structured, verifiable formats using AI agents. Built on AWS serverless architecture, it provides a robust platform for creating, managing, and validating predictions with intelligent verifiability categorization.

The application combines AWS Cognito for authentication, AWS Lambda for serverless compute, and DynamoDB for data persistence. The frontend is built with React and TypeScript, providing a responsive and intuitive user interface. The backend leverages Strands agents for AI orchestration, Amazon Bedrock for reasoning, and real-time WebSocket streaming for immediate user feedback during prediction processing.

🎯 Key Innovation: Verifiability Categorization

CalledIt automatically classifies every prediction into one of 5 verifiability categories, enabling future automated verification:

🧠 Agent Verifiable - Pure reasoning/knowledge (e.g., "The sun will rise tomorrow")
⏰ Current Tool Verifiable - Time-based verification (e.g., "It's past 11 PM")
🔧 Strands Tool Verifiable - Mathematical/computational (e.g., "Calculate compound interest")
🌐 API Tool Verifiable - External data required (e.g., "Bitcoin will hit $100k")
👤 Human Verifiable Only - Subjective assessment (e.g., "I will feel happy")

Each prediction includes AI-generated reasoning for its categorization, creating a structured foundation for automated verification systems.

Repository Structure

.
├── backend/                      # Backend serverless application
│   └── calledit-backend/
│       ├── handlers/            # Lambda function handlers
│       │   ├── auth_token/      # Cognito token management
│       │   ├── strands_make_call/ # Strands agent with streaming
│       │   ├── websocket/       # WebSocket connection handlers
│       │   ├── list_predictions/# Retrieve user predictions
│       │   ├── write_to_db/     # DynamoDB write operations
│       │   └── verification/    # Automated verification system
│       ├── template.yaml        # SAM template for AWS resources
│       └── tests/               # Backend unit tests
├── frontend/                    # React TypeScript frontend
│   ├── src/
│   │   ├── components/         # React components with category display
│   │   ├── services/          # API, auth, and WebSocket services
│   │   ├── types/             # TypeScript interfaces (CallResponse)
│   │   ├── hooks/             # Custom React hooks for state management
│   │   └── utils/             # Utility functions
│   └── package.json           # Frontend dependencies
├── testing/                     # Comprehensive testing framework
│   ├── active/                 # Working tests (100% success rate)
│   ├── integration/            # End-to-end integration tests
│   ├── automation/             # Automated testing tools
│   ├── deprecated/             # Archived/non-functional tests
│   ├── demo_prompts.py         # 40 compelling test prompts (5 categories)
│   ├── demo_api_test.py        # WebSocket API testing with results capture
│   └── demo_results_writer.py  # DynamoDB writer for demo data
├── verification/                # Automated verification system (core functionality #2)
│   ├── verify_predictions.py   # Main verification runner
│   ├── verification_agent.py   # Strands verification agent
│   ├── ddb_scanner.py          # DynamoDB scanner for pending predictions
│   └── email_notifier.py       # SNS email notifications ("crying" system)
├── strands/                     # Strands agent development
│   ├── demos/                  # Agent development examples
│   └── my_agent/               # Custom agent implementation
├── docs/                       # Organized documentation structure
│   ├── current/                # Up-to-date documentation
│   │   ├── API.md              # REST and WebSocket API documentation
│   │   ├── TRD.md              # Technical Requirements Document
│   │   ├── TESTING.md          # Testing strategy and coverage
│   │   ├── VERIFICATION_SYSTEM.md # Automated verification documentation
│   │   └── infra.svg           # Infrastructure diagram
│   ├── implementation-plans/   # Feature implementation plans
│   ├── historical/             # Archived documentation
│   └── archive/                # Deprecated documentation
└── CHANGELOG.md                # Version history and feature tracking

Usage Instructions

Prerequisites

Node.js 16.x or later
Python 3.12
AWS CLI configured with appropriate credentials
AWS SAM CLI installed
Docker (for local development)
Strands agents library (installed via pip)

Installation

Backend Setup

# Set up virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Navigate to backend directory
cd backend/calledit-backend

# Install Python dependencies (including Strands)
pip install -r requirements.txt

# Create SAM config from example
cp samconfig.toml.example samconfig.toml
# Edit samconfig.toml with your stack name and region

# Deploy to AWS
sam build
sam deploy --guided

Frontend Setup

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Create .env file from example
cp .env.example .env

# Update .env with your AWS configuration:
# - Replace YOUR-API-ID with your API Gateway ID
# - Replace YOUR-WEBSOCKET-ID with your WebSocket API ID
# - Replace YOUR-REGION with your AWS region
# - Replace Cognito values with your User Pool details
# - Replace CloudFront domain with your distribution

Testing Setup

# Install testing dependencies
pip install -r testing/requirements.txt

# Validate deployment with automated tests
python testing/verifiability_category_tests.py wss://your-websocket-url/prod

Quick Start

Start the frontend development server:

cd frontend
npm run dev

Open your browser to http://localhost:5173
Log in using your Cognito credentials
Create a prediction using streaming:
- Click "Streaming Call" tab
- Enter your prediction in the input field
- Click "Make Call" and watch real-time AI processing
- See the verifiability category with visual badge and reasoning
- Review the generated verification method
- Click "Log Call" to save your prediction with category

More Detailed Examples

Making a Streaming Prediction with Verifiability Categorization

The application uses Strands agents for intelligent prediction processing with automatic categorization:

// Example streaming prediction flow
1. User enters: "Bitcoin will hit $100k before 3pm today"
2. Strands agent processes with tools:
   - current_time tool for date/time context
   - Reasoning model for verification method generation
   - Verifiability categorization analysis
3. Real-time streaming shows:
   - "Processing your prediction with AI agent..."
   - "[Using tool: current_time]"
   - Generated verification method with timezone handling
   - Category analysis and reasoning
4. Final structured output with verifiability categorization:
{
  "prediction_statement": "Bitcoin will reach $100,000 before 15:00:00 on 2025-01-27",
  "verification_date": "2025-01-27T15:00:00Z",
  "verifiable_category": "api_tool_verifiable",
  "category_reasoning": "Verifying Bitcoin's price requires real-time financial data through external APIs",
  "verification_method": {
    "source": ["CoinGecko API", "CoinMarketCap"],
    "criteria": ["BTC/USD price exceeds $100,000 before 15:00 UTC"],
    "steps": ["Check BTC price at 15:00:00 on January 27, 2025"]
  },
  "date_reasoning": "Converted 3pm to 15:00 24-hour format for precision"
}

UI Display with Category Badges

The frontend displays verifiability categories with visual indicators:

Call Details:
- Prediction: "Bitcoin will hit $100k before 3pm today"
- Verification Date: 1/27/2025, 3:00:00 PM
- Verifiability: 🌐 API Verifiable
- Category Reasoning: "Verifying Bitcoin's price requires real-time financial data..."
- Status: PENDING

Troubleshooting

Common Issues

WebSocket Connection Issues

# Check WebSocket API deployment
aws apigatewayv2 get-apis

# Verify WebSocket URL in frontend .env
# VITE_WEBSOCKET_URL=wss://your-websocket-id.execute-api.region.amazonaws.com/prod

Strands Agent Errors

# Check agent function logs
sam logs -n MakeCallStreamFunction --stack-name calledit-backend

# Verify Strands dependencies in requirements.txt
# strands-agents>=0.1.0
# strands-agents-tools>=0.1.0

Streaming Issues

Ensure WebSocket permissions are configured
Check connection timeout settings (5 minutes default)
Verify Bedrock streaming permissions:

# Required permissions:
# bedrock:InvokeModel
# bedrock:InvokeModelWithResponseStream
# execute-api:ManageConnections

Authentication Issues

# Verify Cognito configuration
aws cognito-idp describe-user-pool --user-pool-id YOUR_POOL_ID

# Check user status
aws cognito-idp admin-get-user --user-pool-id YOUR_POOL_ID --username USER_EMAIL

Deployment Issues

# Check CloudFormation stack status
aws cloudformation describe-stacks --stack-name calledit-backend

# View deployment events
aws cloudformation describe-stack-events --stack-name calledit-backend

# Validate SAM template
sam validate

Verifiability Category Issues

# Test category classification
python testing/verifiability_category_tests.py

# Check agent logs for category processing
sam logs -n MakeCallStreamFunction --stack-name calledit-backend

# Verify category validation logic
# Categories: agent_verifiable, current_tool_verifiable, strands_tool_verifiable, api_tool_verifiable, human_verifiable_only

Data Flow

The application follows a serverless event-driven architecture with real-time streaming capabilities.

User -> Cognito Auth -> WebSocket API -> Strands Agent -> Bedrock (Reasoning)
                    |                      |              |
                    |                      -> Tools -> Real-time Stream
                    |
                    -> REST API -> Lambda Functions -> DynamoDB

Key component interactions:

User authenticates through Cognito user pool
WebSocket connection established for real-time streaming
Strands agent orchestrates between reasoning model and tools
Streaming responses sent back to frontend via WebSocket
Bedrock provides AI reasoning with InvokeModelWithResponseStream
Tools (current_time, etc.) provide context to the agent
Final predictions stored in DynamoDB via REST API
Frontend receives real-time updates during processing

Infrastructure

The application uses the following AWS resources:

API Gateways

CallitAPI (AWS::Serverless::Api): REST API for CRUD operations
- Handles authentication and data persistence
- Implements CORS and Cognito authorization
WebSocketApi (AWS::ApiGatewayV2::Api): Real-time streaming
- Handles WebSocket connections for streaming responses
- Routes: $connect, $disconnect, makecall

Lambda Functions

MakeCallStreamFunction: Strands agent with streaming via WebSocket
ConnectFunction/DisconnectFunction: WebSocket connection management
LogCall: Writes predictions to DynamoDB
ListPredictions: Retrieves user predictions
AuthTokenFunction: Handles Cognito token exchange

AI & Orchestration

Strands Agents: Orchestrate between reasoning models and tools
Amazon Bedrock: AI reasoning with streaming support
Custom Tools: current_time, date parsing utilities

Authentication

CognitoUserPool: Manages user authentication
UserPoolClient: Configures OAuth flows
UserPoolDomain: Provides hosted UI for authentication

Database

DynamoDB table "calledit-db" for storing predictions and verification data

Key Features

🎯 Verifiability Categorization: Automatic classification into 5 categories with AI reasoning
⚡ Real-time Streaming: WebSocket-based streaming for immediate feedback
🤖 Agent Orchestration: Strands agents coordinate AI reasoning and tool usage
🌍 Timezone Intelligence: Automatic timezone handling and 12/24-hour conversion
📋 Structured Verification: AI-generated verification methods with reasoning
🧪 Automated Testing: 100% success rate testing suite for all categories
📊 Visual Category Display: Beautiful UI badges with icons and explanations
💾 Complete Data Persistence: Categories and reasoning stored in DynamoDB
📢 "Crying" System: Celebrate successful predictions with notifications and social sharing
📧 Email Notifications: Get notified when your predictions are verified as TRUE
⚡ Zero Cold Starts: Provisioned concurrency on critical functions eliminates delays

Deployment

Production Deployment

Prerequisites

AWS CLI configured with deployment permissions
Virtual environment activated
All dependencies installed

Backend Deployment

# Activate virtual environment
source venv/bin/activate

# Navigate to backend
cd backend/calledit-backend

# Build and deploy
sam build
sam deploy --no-confirm-changeset

# Note the output URLs:
# - REST API URL for VITE_API_URL
# - WebSocket URL for VITE_WEBSOCKET_URL

Frontend Deployment

# Navigate to frontend
cd frontend

# Update environment variables
# Edit .env with URLs from backend deployment
VITE_API_URL=https://your-api-gateway-url/Prod
VITE_WEBSOCKET_URL=wss://your-websocket-url/prod

# Build for production
npm run build

# Deploy dist/ folder to your hosting service
# (AWS S3 + CloudFront, Netlify, Vercel, etc.)

Deployment Validation

# Run automated tests to verify deployment
python testing/verifiability_category_tests.py wss://your-websocket-url/prod

# Expected: 100% test success rate across all 5 categories

Testing

Automated Verifiability Testing

The project includes a comprehensive automated testing suite that validates the 5-category verifiability system:

# Run the complete test suite
python testing/verifiability_category_tests.py

# Expected output:
# 🚀 Starting Verifiability Category Tests
# ✅ Agent Verifiable - Natural Law
# ✅ Current Tool Verifiable - Time Check  
# ✅ Strands Tool Verifiable - Math Calculation
# ✅ API Tool Verifiable - Market Data
# ✅ Human Verifiable Only - Subjective Feeling
# 📊 Success Rate: 100.0%

Test Categories

Unit Tests: Backend Lambda functions (/backend/calledit-backend/tests/)
Integration Tests: API endpoints and WebSocket flows
End-to-End Tests: Complete verifiability categorization validation
Performance Tests: Real-time streaming and response times
Provisioned Concurrency Tests: Verify zero cold starts on critical functions

Provisioned Concurrency Monitoring

# Test all functions have proper alias + provisioned concurrency setup
python backend/calledit-backend/tests/test_provisioned_concurrency.py

# Expected output:
# 🎯 Overall: 3/3 tests passed
# 🎉 All provisioned concurrency tests PASSED!

See docs/TESTING.md for comprehensive testing documentation.

Documentation

Core Documentation

CHANGELOG.md - Version history and feature releases
docs/API.md - REST and WebSocket API documentation
docs/TRD.md - Technical Requirements Document
docs/TESTING.md - Testing strategy and coverage

Additional Resources

docs/infra.svg - Infrastructure architecture diagram
docs/UI_IMPROVEMENTS.md - UI/UX improvement plan and timeline
testing/README.md - Testing framework overview
strands/demos/ - Strands agent development examples

Environment Configuration

Backend Environment Variables

Managed automatically by AWS SAM template
Cognito User Pool and Client IDs auto-configured
DynamoDB table name: calledit-db

Frontend Environment Variables

# .env file
VITE_API_URL=https://your-api-gateway-url/Prod
VITE_WEBSOCKET_URL=wss://your-websocket-url/prod
VITE_APIGATEWAY=https://your-api-gateway-url/Prod

Monitoring & Maintenance

Health Checks

# Check API health
curl https://your-api-gateway-url/Prod/hello

# Check WebSocket connectivity
# Use browser dev tools or WebSocket testing tool

Log Monitoring

# View Lambda function logs
sam logs -n MakeCallStreamFunction --stack-name calledit-backend --tail

# View all function logs
aws logs describe-log-groups --log-group-name-prefix /aws/lambda/calledit-backend

Performance Monitoring

CloudWatch Metrics: Lambda invocations, duration, errors
API Gateway Metrics: Request count, latency, 4XX/5XX errors
DynamoDB Metrics: Read/write capacity, throttling

Rollback Procedures

Backend Rollback

# Rollback to previous version
aws cloudformation cancel-update-stack --stack-name calledit-backend

# Or deploy previous version
git checkout previous-commit
sam build && sam deploy --no-confirm-changeset

Frontend Rollback

# Rollback to previous build
git checkout previous-commit
npm run build
# Redeploy dist/ folder

Project Status

Current Version: v1.5.1 - 🔧 PRODUCTION DEPLOYMENT & SECURITY HARDENING (2025-08-23)

✅ Verifiability Categorization System: Complete 5-category classification
✅ Real-time Streaming: WebSocket-based AI processing
✅ Automated Testing: 100% success rate test suite
✅ Visual UI: Category badges with reasoning display
✅ Data Persistence: Complete DynamoDB integration
✅ Comprehensive Documentation: API, TRD, and testing docs
✅ Automated Verification System: Strands agent processes ALL predictions every 15 minutes
✅ Production Deployment: EventBridge scheduling, S3 logging, SNS notifications
✅ Frontend Integration: Real-time verification status display with confidence scores
✅ Tool Gap Analysis: MCP tool suggestions for missing verification capabilities
✅ "Crying" Notifications: Email alerts for successful predictions with social sharing setup
✅ Modern UI Design: Complete responsive redesign with educational UX and streaming text effects
✅ Lambda Provisioned Concurrency: Eliminated cold starts on 3 key functions with alias-based architecture
✅ MCP Sampling Review & Improvement System: FULLY OPERATIONAL
- Complete MCP Sampling pattern with multiple field updates
- WebSocket routing for improvement workflow (improve_section, improvement_answers)
- Server-initiated sampling with client-facilitated LLM interactions
- Human-in-the-loop design with floating status indicators
- Date conflict resolution ("today" vs "tomorrow" assumptions)
- Enterprise-grade state management with 4 custom React hooks
✅ Production Infrastructure: CloudFront deployment with security hardening
- CloudFront distribution (d2w6gdbi1zx8x5.cloudfront.net) with 10s cache TTL
- Comprehensive security fixes (KMS encryption, log injection prevention)
- CORS resolution and mobile UI improvements
- Environment variable configuration management

✅ MCP SAMPLING SYSTEM: PRODUCTION READY

🔍 Strands Review Agent: Complete MCP Sampling implementation
- Multiple field updates: prediction_statement improvements update verification_date and verification_method
- Date conflict resolution: Handles "today" vs "tomorrow" assumption conflicts intelligently
- JSON response processing: Proper parsing of complex improvement responses
🌐 WebSocket Infrastructure: Complete routing and state management
- Full routing: improve_section and improvement_answers with proper permissions
- Multiple field update handling: Backend processes complex JSON responses
- Real-time status indicators: Floating UI elements with smart timing
🎨 Enterprise UX: Production-grade user experience
- 4 custom React hooks: useReviewState, useErrorHandler, useWebSocketConnection, useImprovementHistory
- Floating review indicator: Always-visible status during improvement processing
- Smart state management: Proper status clearing and error handling
🧪 Validation Complete: End-to-end workflow tested and operational
- Test case: "it will rain" → "NYC tomorrow" → multiple field updates working
- All components tested: ReviewAgent (10/10), WebSocket routing (3/3), Frontend integration (15/15)
- Production deployment: All fixes applied and validated

✅ PREVIOUS: Automated Verification System

🤖 Strands Verification Agent: AI-powered prediction verification with 5-category routing
⏰ Automated Processing: Every 15 minutes via EventBridge, processes ALL predictions
🎯 Real-time Status Updates: Frontend displays actual verification results
📊 Tool Gap Detection: Automatic MCP tool suggestions for missing capabilities
📧 Smart Notifications: SNS email alerts for verified TRUE predictions
🗂️ Complete Audit Trail: S3 logging with structured JSON for analysis

Future Roadmap (Phase 3+)

🌐 MCP Tool Integration: Weather, sports, and financial API tools
📊 Analytics Dashboard: User statistics and accuracy tracking
📱 Mobile Application: React Native mobile app
📢 Social Media Integration: Auto-post successful predictions to Twitter, LinkedIn, Facebook
🏆 Leaderboards: Community prediction accuracy rankings
🎉 Crying Dashboard: Showcase your successful predictions with social proof

See CHANGELOG.md for detailed version history.

Contributing

When contributing to CalledIt:

Follow the testing requirements in docs/TESTING.md
Ensure all verifiability category tests pass
Update documentation for new features
Maintain the 5-category classification system integrity

Disclaimers

⚠️ DEMONSTRATION PROJECT ONLY

This is a demo/educational project showcasing serverless AI architecture patterns. It is NOT intended for production use.

🚫 Not Production Ready

This software is provided for demonstration and educational purposes only
DO NOT deploy in production environments without significant additional security review, testing, and hardening
No warranties or guarantees are provided regarding security, scalability, or reliability
Use entirely at your own risk

💰 AWS Costs Warning

This project deploys AWS resources that WILL incur costs
You are solely responsible for any AWS charges
Monitor your AWS billing dashboard when running this demo
Consider using AWS cost alerts and budgets

🔒 Security Notice

While security best practices are attempted, this is a demonstration project
May contain security vulnerabilities not suitable for production
Conduct your own security assessment before any use
See SECURITY.md for security considerations

📋 Usage Restrictions

This software may NOT be used for:

Any illegal activities under applicable law
Harassment, abuse, or harm to individuals or organizations
Fraud, deception, or misrepresentation
Violation of privacy or data protection laws
Any malicious or unethical purposes

🛡️ Liability Disclaimer

Use at your own risk - no liability accepted for any damages or issues
Authors disclaim all warranties and liability
Users assume full responsibility for any consequences of use
This software is provided "AS IS" without any guarantees

License

This project is licensed under the MIT License with additional disclaimers - see the LICENSE file for details.

This project is part of an educational/research initiative focused on AI-powered prediction verification systems.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.amazonq/rules		.amazonq/rules
.vscode		.vscode
backend/calledit-backend		backend/calledit-backend
docs		docs
frontend		frontend
qchats		qchats
repl_state		repl_state
strands		strands
testing		testing
verification		verification
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
DISCLAIMER.md		DISCLAIMER.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
devfile.yaml		devfile.yaml
localdebugsetup.md		localdebugsetup.md
loggingcalls.md		loggingcalls.md
notes.txt		notes.txt
requirements.txt		requirements.txt

License

dimafarer/calledit

Folders and files

Latest commit

History

Repository files navigation

CalledIt: A Serverless Prediction Verification Platform

🎯 Key Innovation: Verifiability Categorization

Repository Structure

Usage Instructions

Prerequisites

Installation

Backend Setup

Frontend Setup

Testing Setup

Quick Start

More Detailed Examples

Making a Streaming Prediction with Verifiability Categorization

UI Display with Category Badges

Troubleshooting

Common Issues

Data Flow

Infrastructure

API Gateways

Lambda Functions

AI & Orchestration

Authentication

Database

Key Features

Deployment

Production Deployment

Prerequisites

Backend Deployment

Frontend Deployment

Deployment Validation

Testing

Automated Verifiability Testing

Test Categories

Provisioned Concurrency Monitoring

Documentation

Core Documentation

Additional Resources

Environment Configuration

Backend Environment Variables

Frontend Environment Variables

Monitoring & Maintenance

Health Checks

Log Monitoring

Performance Monitoring

Rollback Procedures

Backend Rollback

Frontend Rollback

Project Status

Current Version: v1.5.1 - 🔧 PRODUCTION DEPLOYMENT & SECURITY HARDENING (2025-08-23)

✅ MCP SAMPLING SYSTEM: PRODUCTION READY

✅ PREVIOUS: Automated Verification System

Future Roadmap (Phase 3+)

Contributing

Disclaimers

⚠️ DEMONSTRATION PROJECT ONLY

🚫 Not Production Ready

💰 AWS Costs Warning

🔒 Security Notice

📋 Usage Restrictions

🛡️ Liability Disclaimer

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages