β οΈ DEMONSTRATION PROJECT ONLY
This is a demo/educational project showcasing serverless AI architecture patterns. NOT intended for production use. See License and Disclaimers for important usage restrictions.
CalledIt is a serverless web application that converts natural language predictions into structured, verifiable formats using AI agents. Built on AWS serverless architecture, it provides a robust platform for creating, managing, and validating predictions with intelligent verifiability categorization.
The application combines AWS Cognito for authentication, AWS Lambda for serverless compute, and DynamoDB for data persistence. The frontend is built with React and TypeScript, providing a responsive and intuitive user interface. The backend leverages Strands agents for AI orchestration, Amazon Bedrock for reasoning, and real-time WebSocket streaming for immediate user feedback during prediction processing.
CalledIt automatically classifies every prediction into one of 5 verifiability categories, enabling future automated verification:
- π§ Agent Verifiable - Pure reasoning/knowledge (e.g., "The sun will rise tomorrow")
- β° Current Tool Verifiable - Time-based verification (e.g., "It's past 11 PM")
- π§ Strands Tool Verifiable - Mathematical/computational (e.g., "Calculate compound interest")
- π API Tool Verifiable - External data required (e.g., "Bitcoin will hit $100k")
- π€ Human Verifiable Only - Subjective assessment (e.g., "I will feel happy")
Each prediction includes AI-generated reasoning for its categorization, creating a structured foundation for automated verification systems.
.
βββ backend/ # Backend serverless application
β βββ calledit-backend/
β βββ handlers/ # Lambda function handlers
β β βββ auth_token/ # Cognito token management
β β βββ strands_make_call/ # Strands agent with streaming
β β βββ websocket/ # WebSocket connection handlers
β β βββ list_predictions/# Retrieve user predictions
β β βββ write_to_db/ # DynamoDB write operations
β β βββ verification/ # Automated verification system
β βββ template.yaml # SAM template for AWS resources
β βββ tests/ # Backend unit tests
βββ frontend/ # React TypeScript frontend
β βββ src/
β β βββ components/ # React components with category display
β β βββ services/ # API, auth, and WebSocket services
β β βββ types/ # TypeScript interfaces (CallResponse)
β β βββ hooks/ # Custom React hooks for state management
β β βββ utils/ # Utility functions
β βββ package.json # Frontend dependencies
βββ testing/ # Comprehensive testing framework
β βββ active/ # Working tests (100% success rate)
β βββ integration/ # End-to-end integration tests
β βββ automation/ # Automated testing tools
β βββ deprecated/ # Archived/non-functional tests
β βββ demo_prompts.py # 40 compelling test prompts (5 categories)
β βββ demo_api_test.py # WebSocket API testing with results capture
β βββ demo_results_writer.py # DynamoDB writer for demo data
βββ verification/ # Automated verification system (core functionality #2)
β βββ verify_predictions.py # Main verification runner
β βββ verification_agent.py # Strands verification agent
β βββ ddb_scanner.py # DynamoDB scanner for pending predictions
β βββ email_notifier.py # SNS email notifications ("crying" system)
βββ strands/ # Strands agent development
β βββ demos/ # Agent development examples
β βββ my_agent/ # Custom agent implementation
βββ docs/ # Organized documentation structure
β βββ current/ # Up-to-date documentation
β β βββ API.md # REST and WebSocket API documentation
β β βββ TRD.md # Technical Requirements Document
β β βββ TESTING.md # Testing strategy and coverage
β β βββ VERIFICATION_SYSTEM.md # Automated verification documentation
β β βββ infra.svg # Infrastructure diagram
β βββ implementation-plans/ # Feature implementation plans
β βββ historical/ # Archived documentation
β βββ archive/ # Deprecated documentation
βββ CHANGELOG.md # Version history and feature tracking
- Node.js 16.x or later
- Python 3.12
- AWS CLI configured with appropriate credentials
- AWS SAM CLI installed
- Docker (for local development)
- Strands agents library (installed via pip)
# Set up virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Navigate to backend directory
cd backend/calledit-backend
# Install Python dependencies (including Strands)
pip install -r requirements.txt
# Create SAM config from example
cp samconfig.toml.example samconfig.toml
# Edit samconfig.toml with your stack name and region
# Deploy to AWS
sam build
sam deploy --guided# Navigate to frontend directory
cd frontend
# Install dependencies
npm install
# Create .env file from example
cp .env.example .env
# Update .env with your AWS configuration:
# - Replace YOUR-API-ID with your API Gateway ID
# - Replace YOUR-WEBSOCKET-ID with your WebSocket API ID
# - Replace YOUR-REGION with your AWS region
# - Replace Cognito values with your User Pool details
# - Replace CloudFront domain with your distribution# Install testing dependencies
pip install -r testing/requirements.txt
# Validate deployment with automated tests
python testing/verifiability_category_tests.py wss://your-websocket-url/prod- Start the frontend development server:
cd frontend
npm run dev-
Open your browser to
http://localhost:5173 -
Log in using your Cognito credentials
-
Create a prediction using streaming:
- Click "Streaming Call" tab
- Enter your prediction in the input field
- Click "Make Call" and watch real-time AI processing
- See the verifiability category with visual badge and reasoning
- Review the generated verification method
- Click "Log Call" to save your prediction with category
The application uses Strands agents for intelligent prediction processing with automatic categorization:
// Example streaming prediction flow
1. User enters: "Bitcoin will hit $100k before 3pm today"
2. Strands agent processes with tools:
- current_time tool for date/time context
- Reasoning model for verification method generation
- Verifiability categorization analysis
3. Real-time streaming shows:
- "Processing your prediction with AI agent..."
- "[Using tool: current_time]"
- Generated verification method with timezone handling
- Category analysis and reasoning
4. Final structured output with verifiability categorization:
{
"prediction_statement": "Bitcoin will reach $100,000 before 15:00:00 on 2025-01-27",
"verification_date": "2025-01-27T15:00:00Z",
"verifiable_category": "api_tool_verifiable",
"category_reasoning": "Verifying Bitcoin's price requires real-time financial data through external APIs",
"verification_method": {
"source": ["CoinGecko API", "CoinMarketCap"],
"criteria": ["BTC/USD price exceeds $100,000 before 15:00 UTC"],
"steps": ["Check BTC price at 15:00:00 on January 27, 2025"]
},
"date_reasoning": "Converted 3pm to 15:00 24-hour format for precision"
}The frontend displays verifiability categories with visual indicators:
Call Details:
- Prediction: "Bitcoin will hit $100k before 3pm today"
- Verification Date: 1/27/2025, 3:00:00 PM
- Verifiability: π API Verifiable
- Category Reasoning: "Verifying Bitcoin's price requires real-time financial data..."
- Status: PENDING
- WebSocket Connection Issues
# Check WebSocket API deployment
aws apigatewayv2 get-apis
# Verify WebSocket URL in frontend .env
# VITE_WEBSOCKET_URL=wss://your-websocket-id.execute-api.region.amazonaws.com/prod- Strands Agent Errors
# Check agent function logs
sam logs -n MakeCallStreamFunction --stack-name calledit-backend
# Verify Strands dependencies in requirements.txt
# strands-agents>=0.1.0
# strands-agents-tools>=0.1.0- Streaming Issues
- Ensure WebSocket permissions are configured
- Check connection timeout settings (5 minutes default)
- Verify Bedrock streaming permissions:
# Required permissions:
# bedrock:InvokeModel
# bedrock:InvokeModelWithResponseStream
# execute-api:ManageConnections- Authentication Issues
# Verify Cognito configuration
aws cognito-idp describe-user-pool --user-pool-id YOUR_POOL_ID
# Check user status
aws cognito-idp admin-get-user --user-pool-id YOUR_POOL_ID --username USER_EMAIL- Deployment Issues
# Check CloudFormation stack status
aws cloudformation describe-stacks --stack-name calledit-backend
# View deployment events
aws cloudformation describe-stack-events --stack-name calledit-backend
# Validate SAM template
sam validate- Verifiability Category Issues
# Test category classification
python testing/verifiability_category_tests.py
# Check agent logs for category processing
sam logs -n MakeCallStreamFunction --stack-name calledit-backend
# Verify category validation logic
# Categories: agent_verifiable, current_tool_verifiable, strands_tool_verifiable, api_tool_verifiable, human_verifiable_onlyThe application follows a serverless event-driven architecture with real-time streaming capabilities.
User -> Cognito Auth -> WebSocket API -> Strands Agent -> Bedrock (Reasoning)
| | |
| -> Tools -> Real-time Stream
|
-> REST API -> Lambda Functions -> DynamoDB
Key component interactions:
- User authenticates through Cognito user pool
- WebSocket connection established for real-time streaming
- Strands agent orchestrates between reasoning model and tools
- Streaming responses sent back to frontend via WebSocket
- Bedrock provides AI reasoning with InvokeModelWithResponseStream
- Tools (current_time, etc.) provide context to the agent
- Final predictions stored in DynamoDB via REST API
- Frontend receives real-time updates during processing
The application uses the following AWS resources:
- CallitAPI (AWS::Serverless::Api): REST API for CRUD operations
- Handles authentication and data persistence
- Implements CORS and Cognito authorization
- WebSocketApi (AWS::ApiGatewayV2::Api): Real-time streaming
- Handles WebSocket connections for streaming responses
- Routes: $connect, $disconnect, makecall
- MakeCallStreamFunction: Strands agent with streaming via WebSocket
- ConnectFunction/DisconnectFunction: WebSocket connection management
- LogCall: Writes predictions to DynamoDB
- ListPredictions: Retrieves user predictions
- AuthTokenFunction: Handles Cognito token exchange
- Strands Agents: Orchestrate between reasoning models and tools
- Amazon Bedrock: AI reasoning with streaming support
- Custom Tools: current_time, date parsing utilities
- CognitoUserPool: Manages user authentication
- UserPoolClient: Configures OAuth flows
- UserPoolDomain: Provides hosted UI for authentication
- DynamoDB table "calledit-db" for storing predictions and verification data
- π― Verifiability Categorization: Automatic classification into 5 categories with AI reasoning
- β‘ Real-time Streaming: WebSocket-based streaming for immediate feedback
- π€ Agent Orchestration: Strands agents coordinate AI reasoning and tool usage
- π Timezone Intelligence: Automatic timezone handling and 12/24-hour conversion
- π Structured Verification: AI-generated verification methods with reasoning
- π§ͺ Automated Testing: 100% success rate testing suite for all categories
- π Visual Category Display: Beautiful UI badges with icons and explanations
- πΎ Complete Data Persistence: Categories and reasoning stored in DynamoDB
- π’ "Crying" System: Celebrate successful predictions with notifications and social sharing
- π§ Email Notifications: Get notified when your predictions are verified as TRUE
- β‘ Zero Cold Starts: Provisioned concurrency on critical functions eliminates delays
- AWS CLI configured with deployment permissions
- Virtual environment activated
- All dependencies installed
# Activate virtual environment
source venv/bin/activate
# Navigate to backend
cd backend/calledit-backend
# Build and deploy
sam build
sam deploy --no-confirm-changeset
# Note the output URLs:
# - REST API URL for VITE_API_URL
# - WebSocket URL for VITE_WEBSOCKET_URL# Navigate to frontend
cd frontend
# Update environment variables
# Edit .env with URLs from backend deployment
VITE_API_URL=https://your-api-gateway-url/Prod
VITE_WEBSOCKET_URL=wss://your-websocket-url/prod
# Build for production
npm run build
# Deploy dist/ folder to your hosting service
# (AWS S3 + CloudFront, Netlify, Vercel, etc.)# Run automated tests to verify deployment
python testing/verifiability_category_tests.py wss://your-websocket-url/prod
# Expected: 100% test success rate across all 5 categoriesThe project includes a comprehensive automated testing suite that validates the 5-category verifiability system:
# Run the complete test suite
python testing/verifiability_category_tests.py
# Expected output:
# π Starting Verifiability Category Tests
# β
Agent Verifiable - Natural Law
# β
Current Tool Verifiable - Time Check
# β
Strands Tool Verifiable - Math Calculation
# β
API Tool Verifiable - Market Data
# β
Human Verifiable Only - Subjective Feeling
# π Success Rate: 100.0%- Unit Tests: Backend Lambda functions (
/backend/calledit-backend/tests/) - Integration Tests: API endpoints and WebSocket flows
- End-to-End Tests: Complete verifiability categorization validation
- Performance Tests: Real-time streaming and response times
- Provisioned Concurrency Tests: Verify zero cold starts on critical functions
# Test all functions have proper alias + provisioned concurrency setup
python backend/calledit-backend/tests/test_provisioned_concurrency.py
# Expected output:
# π― Overall: 3/3 tests passed
# π All provisioned concurrency tests PASSED!See docs/TESTING.md for comprehensive testing documentation.
- CHANGELOG.md - Version history and feature releases
- docs/API.md - REST and WebSocket API documentation
- docs/TRD.md - Technical Requirements Document
- docs/TESTING.md - Testing strategy and coverage
- docs/infra.svg - Infrastructure architecture diagram
- docs/UI_IMPROVEMENTS.md - UI/UX improvement plan and timeline
- testing/README.md - Testing framework overview
- strands/demos/ - Strands agent development examples
- Managed automatically by AWS SAM template
- Cognito User Pool and Client IDs auto-configured
- DynamoDB table name:
calledit-db
# .env file
VITE_API_URL=https://your-api-gateway-url/Prod
VITE_WEBSOCKET_URL=wss://your-websocket-url/prod
VITE_APIGATEWAY=https://your-api-gateway-url/Prod# Check API health
curl https://your-api-gateway-url/Prod/hello
# Check WebSocket connectivity
# Use browser dev tools or WebSocket testing tool# View Lambda function logs
sam logs -n MakeCallStreamFunction --stack-name calledit-backend --tail
# View all function logs
aws logs describe-log-groups --log-group-name-prefix /aws/lambda/calledit-backend- CloudWatch Metrics: Lambda invocations, duration, errors
- API Gateway Metrics: Request count, latency, 4XX/5XX errors
- DynamoDB Metrics: Read/write capacity, throttling
# Rollback to previous version
aws cloudformation cancel-update-stack --stack-name calledit-backend
# Or deploy previous version
git checkout previous-commit
sam build && sam deploy --no-confirm-changeset# Rollback to previous build
git checkout previous-commit
npm run build
# Redeploy dist/ folder- β Verifiability Categorization System: Complete 5-category classification
- β Real-time Streaming: WebSocket-based AI processing
- β Automated Testing: 100% success rate test suite
- β Visual UI: Category badges with reasoning display
- β Data Persistence: Complete DynamoDB integration
- β Comprehensive Documentation: API, TRD, and testing docs
- β Automated Verification System: Strands agent processes ALL predictions every 15 minutes
- β Production Deployment: EventBridge scheduling, S3 logging, SNS notifications
- β Frontend Integration: Real-time verification status display with confidence scores
- β Tool Gap Analysis: MCP tool suggestions for missing verification capabilities
- β "Crying" Notifications: Email alerts for successful predictions with social sharing setup
- β Modern UI Design: Complete responsive redesign with educational UX and streaming text effects
- β Lambda Provisioned Concurrency: Eliminated cold starts on 3 key functions with alias-based architecture
- β
MCP Sampling Review & Improvement System: FULLY OPERATIONAL
- Complete MCP Sampling pattern with multiple field updates
- WebSocket routing for improvement workflow (improve_section, improvement_answers)
- Server-initiated sampling with client-facilitated LLM interactions
- Human-in-the-loop design with floating status indicators
- Date conflict resolution ("today" vs "tomorrow" assumptions)
- Enterprise-grade state management with 4 custom React hooks
- β
Production Infrastructure: CloudFront deployment with security hardening
- CloudFront distribution (d2w6gdbi1zx8x5.cloudfront.net) with 10s cache TTL
- Comprehensive security fixes (KMS encryption, log injection prevention)
- CORS resolution and mobile UI improvements
- Environment variable configuration management
- π Strands Review Agent: Complete MCP Sampling implementation
- Multiple field updates: prediction_statement improvements update verification_date and verification_method
- Date conflict resolution: Handles "today" vs "tomorrow" assumption conflicts intelligently
- JSON response processing: Proper parsing of complex improvement responses
- π WebSocket Infrastructure: Complete routing and state management
- Full routing:
improve_sectionandimprovement_answerswith proper permissions - Multiple field update handling: Backend processes complex JSON responses
- Real-time status indicators: Floating UI elements with smart timing
- Full routing:
- π¨ Enterprise UX: Production-grade user experience
- 4 custom React hooks: useReviewState, useErrorHandler, useWebSocketConnection, useImprovementHistory
- Floating review indicator: Always-visible status during improvement processing
- Smart state management: Proper status clearing and error handling
- π§ͺ Validation Complete: End-to-end workflow tested and operational
- Test case: "it will rain" β "NYC tomorrow" β multiple field updates working
- All components tested: ReviewAgent (10/10), WebSocket routing (3/3), Frontend integration (15/15)
- Production deployment: All fixes applied and validated
- π€ Strands Verification Agent: AI-powered prediction verification with 5-category routing
- β° Automated Processing: Every 15 minutes via EventBridge, processes ALL predictions
- π― Real-time Status Updates: Frontend displays actual verification results
- π Tool Gap Detection: Automatic MCP tool suggestions for missing capabilities
- π§ Smart Notifications: SNS email alerts for verified TRUE predictions
- ποΈ Complete Audit Trail: S3 logging with structured JSON for analysis
- π MCP Tool Integration: Weather, sports, and financial API tools
- π Analytics Dashboard: User statistics and accuracy tracking
- π± Mobile Application: React Native mobile app
- π’ Social Media Integration: Auto-post successful predictions to Twitter, LinkedIn, Facebook
- π Leaderboards: Community prediction accuracy rankings
- π Crying Dashboard: Showcase your successful predictions with social proof
See CHANGELOG.md for detailed version history.
When contributing to CalledIt:
- Follow the testing requirements in docs/TESTING.md
- Ensure all verifiability category tests pass
- Update documentation for new features
- Maintain the 5-category classification system integrity
This is a demo/educational project showcasing serverless AI architecture patterns. It is NOT intended for production use.
- This software is provided for demonstration and educational purposes only
- DO NOT deploy in production environments without significant additional security review, testing, and hardening
- No warranties or guarantees are provided regarding security, scalability, or reliability
- Use entirely at your own risk
- This project deploys AWS resources that WILL incur costs
- You are solely responsible for any AWS charges
- Monitor your AWS billing dashboard when running this demo
- Consider using AWS cost alerts and budgets
- While security best practices are attempted, this is a demonstration project
- May contain security vulnerabilities not suitable for production
- Conduct your own security assessment before any use
- See SECURITY.md for security considerations
This software may NOT be used for:
- Any illegal activities under applicable law
- Harassment, abuse, or harm to individuals or organizations
- Fraud, deception, or misrepresentation
- Violation of privacy or data protection laws
- Any malicious or unethical purposes
- Use at your own risk - no liability accepted for any damages or issues
- Authors disclaim all warranties and liability
- Users assume full responsibility for any consequences of use
- This software is provided "AS IS" without any guarantees
This project is licensed under the MIT License with additional disclaimers - see the LICENSE file for details.
This project is part of an educational/research initiative focused on AI-powered prediction verification systems.