This guide provides comprehensive instructions for setting up the Content Processing Solution Accelerator for local development across Windows and Linux platforms.
This application consists of three separate services that run independently:
- ContentProcessorAPI - REST API server for the frontend
- ContentProcessor - Background processor that handles document processing from Azure Storage Queue
- ContentProcessorWeb - React-based user interface
⚠️ Critical: Each service must run in its own terminal/console window
- Do NOT close terminals while services are running
- Open 3 separate terminal windows for local development
- Each service will occupy its terminal and show live logs
Terminal Organization:
- Terminal 1: ContentProcessorAPI - HTTP server on port 8000
- Terminal 2: ContentProcessor - Runs continuously, polls Azure Storage Queue
- Terminal 3: ContentProcessorWeb - Development server on port 3000
All paths in this guide are relative to the repository root directory:
content-processing-solution-accelerator/ ← Repository root (start here)
├── src/
│ ├── ContentProcessorAPI/
│ │ ├── .venv/ ← Virtual environment
│ │ └── app/
│ │ ├── main.py ← API entry point
│ │ └── .env ← API config file
│ ├── ContentProcessor/
│ │ ├── .venv/ ← Virtual environment
│ │ └── src/
│ │ ├── main.py ← Processor entry point
│ │ └── .env ← Processor config file
│ └── ContentProcessorWeb/
│ ├── node_modules/
│ └── .env ← Frontend config file
└── docs/ ← Documentation (you are here)
Before starting any step, ensure you are in the repository root directory:
# Verify you're in the correct location
pwd # Linux/macOS - should show: .../content-processing-solution-accelerator
Get-Location # Windows PowerShell - should show: ...\content-processing-solution-accelerator
# If not, navigate to repository root
cd path/to/content-processing-solution-acceleratorThis project uses separate .env files in each service directory with different configuration requirements:
- ContentProcessorAPI:
src/ContentProcessorAPI/app/.env- Azure App Configuration URL and local dev settings - ContentProcessor:
src/ContentProcessor/src/.env- Azure App Configuration URL and local dev settings - ContentProcessorWeb:
src/ContentProcessorWeb/.env- API base URL, authentication settings
When copying .env samples, always navigate to the specific service directory first.
# Install Python 3.12+ and Git
winget install Python.Python.3.12
winget install Git.Git
# Install Node.js for frontend
winget install OpenJS.NodeJS.LTS
# Verify installations
python --version # Should show Python 3.12.x
node --version # Should show v18.x or higher
npm --version# Install prerequisites
sudo apt update && sudo apt install python3.12 python3.12-venv python3-pip git curl nodejs npm -y
# Verify installations
python3.12 --version
node --version
npm --version# Install prerequisites
sudo dnf install python3.11 python3.11-devel git curl gcc nodejs npm -y
# Verify installations
python3.11 --version
node --version
npm --versiongit clone https://github.com/microsoft/content-processing-solution-accelerator.git
cd content-processing-solution-acceleratorBefore configuring services, authenticate with Azure:
# Login to Azure CLI
az login
# Set your subscription
az account set --subscription "your-subscription-id"
# Verify authentication
az account showAfter deploying Azure resources (using azd up or Bicep template), gather the following information:
# List resources in your resource group
az resource list -g <resource-group-name> -o table
# Get App Configuration endpoint
az appconfig show -n <appconfig-name> -g <resource-group-name> --query endpoint -o tsv
# Get Cosmos DB endpoint
az cosmosdb show -n <cosmos-name> -g <resource-group-name> --query documentEndpoint -o tsvExample resource names from deployment:
- App Configuration:
appcs-{suffix}.azconfig.io - Cosmos DB:
cosmos-{suffix}.documents.azure.com - Storage Account:
st{suffix}.queue.core.windows.net - Content Understanding:
aicu-{suffix}.cognitiveservices.azure.com
To run the application locally, your Azure account needs the following role assignments on the deployed resources:
# Get your principal ID for role assignments
PRINCIPAL_ID=$(az ad signed-in-user show --query id -o tsv)
echo $PRINCIPAL_ID
# Get your subscription ID
SUBSCRIPTION_ID=$(az account show --query id -o tsv)
echo $SUBSCRIPTION_ID# 1. App Configuration Data Reader
az role assignment create \
--role "App Configuration Data Reader" \
--assignee $PRINCIPAL_ID \
--scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.AppConfiguration/configurationStores/<appconfig-name>"
# 2. Cosmos DB Built-in Data Contributor
az role assignment create \
--role "Cosmos DB Built-in Data Contributor" \
--assignee $PRINCIPAL_ID \
--scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.DocumentDB/databaseAccounts/<cosmos-name>"
# 3. Storage Blob Data Contributor (for document upload/download)
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee $PRINCIPAL_ID \
--scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account-name>"
# 4. Storage Queue Data Contributor (for message processing)
az role assignment create \
--role "Storage Queue Data Contributor" \
--assignee $PRINCIPAL_ID \
--scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account-name>"
# 5. Cognitive Services User
az role assignment create \
--role "Cognitive Services User" \
--assignee $PRINCIPAL_ID \
--scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.CognitiveServices/accounts/<content-understanding-name>"Note: RBAC permission changes can take 5-10 minutes to propagate. If you encounter "Forbidden" errors after assigning roles, wait a few minutes and try again.
📋 Terminal Reminder: Open a dedicated terminal window (Terminal 1) for the ContentProcessorAPI service. All commands in this section assume you start from the repository root directory.
The ContentProcessorAPI provides REST endpoints for the frontend and handles API requests.
# From repository root
cd src/ContentProcessorAPI# Create virtual environment
python -m venv .venv
# Activate virtual environment
.venv\Scripts\Activate.ps1 # Windows PowerShell
# or
source .venv/bin/activate # Linux/macOSNote for PowerShell Users: If you get an error about scripts being disabled, run:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser# Install uv package manager if not already installed
pip install uv
# Install all dependencies using uv
uv sync --python 3.12Note: This project uses uv as the package manager with pyproject.toml. The uv sync command automatically installs all dependencies with proper version resolution.
Create a .env file in the src/ContentProcessorAPI/app/ directory:
cd app
# Create .env file
New-Item .env # Windows PowerShell
# or
touch .env # Linux/macOSAdd the following to the .env file:
# App Configuration endpoint - ALL other settings are read from App Configuration
APP_CONFIG_ENDPOINT=https://<your-appconfig-name>.azconfig.io
# Local development settings - CRITICAL for local authentication
APP_ENV=dev
APP_AUTH_ENABLED=False
AZURE_IDENTITY_EXCLUDE_MANAGED_IDENTITY_CREDENTIAL=True
# Logging settings (required)
APP_LOGGING_LEVEL=INFO
AZURE_PACKAGE_LOGGING_LEVEL=WARNING
AZURE_LOGGING_PACKAGES=azure.core,azure.storage,azure.identity
⚠️ Important:
- Replace
<your-appconfig-name>with your actual App Configuration resource nameAPP_ENV=devis REQUIRED for local development - it enables Azure CLI credential usage instead of Managed Identity- All other settings (Cosmos DB, Storage, AI endpoints) are automatically loaded from Azure App Configuration
- Get your resource names from the Azure Portal or by running:
az resource list -g <resource-group-name>
Edit src/ContentProcessorAPI/app/main.py and add the CORS middleware configuration.
Add the import at the top:
from fastapi.middleware.cors import CORSMiddlewareThen after the line app = FastAPI(redirect_slashes=False), add:
# Configure CORS for local development
app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:3000"], # Frontend URL
allow_credentials=True,
allow_methods=["*"], # Allow all HTTP methods
allow_headers=["*"], # Allow all headers
)Note: This CORS configuration is only needed for local development. Azure deployment handles CORS at the infrastructure level.
# Make sure you're in the ContentProcessorAPI directory with activated venv
cd .. # Go back to ContentProcessorAPI root if in app/
# Run with uvicorn
python -m uvicorn app.main:app --reload --port 8000The ContentProcessorAPI will start at:
- API:
http://localhost:8000 - API Documentation:
http://localhost:8000/docs
Keep this terminal open - the API server will continue running and show request logs.
📋 Terminal Reminder: Open a second dedicated terminal window (Terminal 2) for the ContentProcessor. Keep Terminal 1 (API) running. All commands assume you start from the repository root directory.
The ContentProcessor handles background document processing from Azure Storage Queue.
# From repository root
cd src/ContentProcessor# Create virtual environment
python -m venv .venv
# Activate virtual environment
.venv\Scripts\Activate.ps1 # Windows PowerShell
# or
source .venv/bin/activate # Linux/macOS# Install uv package manager if not already installed
pip install uv
# Install all dependencies using uv
uv sync --python 3.12Note: This project uses uv as the package manager with pyproject.toml. The uv sync command automatically installs all dependencies with proper version resolution.
Create a .env file in the src/ContentProcessor/src/ directory:
cd src
# Create .env file
New-Item .env # Windows PowerShell
# or
touch .env # Linux/macOSAdd the following to the .env file:
# App Configuration endpoint - ALL other settings are read from App Configuration
APP_CONFIG_ENDPOINT=https://<your-appconfig-name>.azconfig.io
# Local development settings
APP_ENV=dev
APP_AUTH_ENABLED=False
AZURE_IDENTITY_EXCLUDE_MANAGED_IDENTITY_CREDENTIAL=True
# Logging settings
APP_LOGGING_LEVEL=INFO
APP_LOGGING_ENABLE=True
# Azure package logging configuration (required)
AZURE_PACKAGE_LOGGING_LEVEL=WARNING
AZURE_LOGGING_PACKAGES=azure.core,azure.storage,azure.identityThe code currently uses .env.dev by default. Update it to use the standard .env file:
- Open
src/ContentProcessor/src/main.py - Find line 25 (inside the
__init__method) - Change:
to:
env_file_path=os.path.join(os.path.dirname(__file__), ".env.dev"),
env_file_path=os.path.join(os.path.dirname(__file__), ".env"),
⚠️ Important:
- The
.envfile must be located insrc/ContentProcessor/src/directory, not insrc/ContentProcessor/root- After making this change, the application will look for
.envfile in the same directory asmain.py- All Azure resource settings (Cosmos DB, Storage, AI endpoints) are automatically loaded from Azure App Configuration
# Make sure you're in the src directory
python main.pyThe ContentProcessor will start and begin polling the Azure Storage Queue for messages.
Expected behavior:
- You may see Storage Queue authorization errors if roles haven't propagated (wait 5-10 minutes)
- The processor will show continuous polling activity
- Document processing will begin when files are uploaded via the frontend
Keep this terminal open - the processor will continue running and show processing logs.
📋 Terminal Reminder: Open a third dedicated terminal window (Terminal 3) for the ContentProcessorWeb. Keep Terminals 1 (API) and 2 (Processor) running. All commands assume you start from the repository root directory.
The ContentProcessorWeb provides the React-based user interface.
# From repository root
cd src/ContentProcessorWeb# Install dependencies with legacy peer deps flag
npm install --legacy-peer-deps
# Install additional required FluentUI packages
npm install @fluentui/react-dialog @fluentui/react-button --legacy-peer-depsNote: Always use the
--legacy-peer-depsflag for npm commands in this project to avoid dependency conflicts with @azure/msal-react.
Update the .env file in the src/ContentProcessorWeb/ directory:
REACT_APP_API_BASE_URL=http://localhost:8000
REACT_APP_AUTH_ENABLED=false
REACT_APP_CONSOLE_LOG_ENABLED=truenpm startThe ContentProcessorWeb will start at: http://localhost:3000
Keep this terminal open - the React development server will continue running with hot reload.
Before using the application, confirm all three services are running in separate terminals:
| Terminal | Service | Command | Expected Output | URL |
|---|---|---|---|---|
| Terminal 1 | ContentProcessorAPI | python -m uvicorn app.main:app --reload --port 8000 |
Application startup complete |
http://localhost:8000 |
| Terminal 2 | ContentProcessor | python main.py |
Polling messages, no fatal errors | N/A |
| Terminal 3 | ContentProcessorWeb | npm start |
Compiled successfully! |
http://localhost:3000 |
-
Check Backend API:
# In a new terminal (Terminal 4) curl http://localhost:8000/health # Expected: {"message":"I'm alive!"}
-
Check Frontend:
- Open browser to http://localhost:3000
- Should see the Content Processing UI
- No "Unable to connect to the server" errors
-
Check Processor:
- Look at Terminal 2 output
- Should see processing activity or queue polling
- No authorization errors (if roles have propagated)
Once all services are running (as confirmed in Step 6), you can:
- Access the Application: Open
http://localhost:3000in your browser to explore the frontend UI - Upload Documents: Use the UI to upload documents for processing
- View API Documentation: Navigate to
http://localhost:8000/docsto explore API endpoints - Check Processing Status: Monitor Terminal 2 for document processing logs
If you see errors when installing dependencies, ensure you're using uv sync instead of pip install:
# Install uv if not already installed
pip install uv
# Use uv sync which handles dependencies better
uv sync --python 3.12Explanation: This project uses uv as the package manager with pyproject.toml. The uv tool provides better dependency resolution and automatically uses precompiled wheels when available, avoiding compilation issues on Windows.
If you see "PyO3 modules compiled for CPython 3.8 or older may only be initialized once" or "ImportError: pydantic_core._pydantic_core":
# Uninstall and reinstall with compatible versions
pip uninstall -y pydantic pydantic-core
pip install pydantic==2.12.5 pydantic-core==2.41.5
pip install --upgrade "typing-extensions>=4.14.1"Explanation: Version mismatch between pydantic and pydantic-core causes runtime errors. The compatible versions above work reliably together.
# Clear npm cache and reinstall with legacy peer deps
npm cache clean --force
Remove-Item -Recurse -Force node_modules -ErrorAction SilentlyContinue
Remove-Item -Force package-lock.json -ErrorAction SilentlyContinue
npm install --legacy-peer-deps
# Install missing FluentUI packages if needed
npm install @fluentui/react-dialog @fluentui/react-button --legacy-peer-depsExplanation: The --legacy-peer-deps flag is required due to peer dependency conflicts with @azure/msal-react. Some FluentUI packages may not be included in the initial install and need to be added separately.
If you get "Forbidden" errors when accessing App Configuration or Cosmos DB:
# Check your current Azure account
az account show
# Get your principal ID for role assignments
az ad signed-in-user show --query id -o tsv
# Verify you have the correct role assignments
az role assignment list --assignee $(az ad signed-in-user show --query id -o tsv) --resource-group <resource-group-name>
# Refresh your access token
az account get-access-token --resource https://azconfig.ioIf roles are missing, assign them as shown in Step 2.
Note: Role assignments can take 5-10 minutes to propagate through Azure AD. If you just assigned roles, wait a few minutes before retrying.
If you see "401 Client Error: PermissionDenied" for Content Understanding service:
# Assign Cognitive Services User role
az role assignment create --role "Cognitive Services User" \
--assignee <principal-id> \
--scope /subscriptions/<sub-id>/resourceGroups/<rg-name>/providers/Microsoft.CognitiveServices/accounts/<content-understanding-name>This error occurs when processing documents. Wait 5-10 minutes after assigning the role, then restart the ContentProcessor service.
If you see "ManagedIdentityCredential authentication unavailable" or "No managed identity endpoint found":
# Ensure your .env files have these settings:
APP_ENV=dev
AZURE_IDENTITY_EXCLUDE_MANAGED_IDENTITY_CREDENTIAL=TrueLocations to check:
src/ContentProcessorAPI/app/.envsrc/ContentProcessor/src/.env(note: must be in thesrc/subdirectory)
Explanation: Managed Identity is used in Azure deployments but doesn't work locally. Setting APP_ENV=dev switches to Azure CLI credential authentication.
If the frontend loads but shows "Unable to connect to the server" error:
-
Verify CORS is configured in
src/ContentProcessorAPI/app/main.py:from fastapi.middleware.cors import CORSMiddleware app = FastAPI(redirect_slashes=False) # Configure CORS for local development app.add_middleware( CORSMiddleware, allow_origins=["http://localhost:3000"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], )
-
Restart the API service (Terminal 1) after adding CORS configuration
-
Check browser console (F12) for CORS errors
-
Verify API is running on port 8000 and frontend on port 3000
Explanation: CORS (Cross-Origin Resource Sharing) blocks requests between different origins by default. The frontend (localhost:3000) needs explicit permission to call the API (localhost:8000).
- Verify
.envfile is in the correct directory:- ContentProcessorAPI:
src/ContentProcessorAPI/app/.env - ContentProcessor:
src/ContentProcessor/src/.env(must be insrc/subdirectory) - ContentProcessorWeb:
src/ContentProcessorWeb/.env
- ContentProcessorAPI:
- Check file permissions (especially on Linux/macOS)
- Ensure no extra spaces in variable assignments
- Restart the service after changing
.envfiles
If you get "cannot be loaded because running scripts is disabled" when activating venv:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser# Check what's using the port
netstat -ano | findstr :8000 # Windows
netstat -tulpn | grep :8000 # Linux/Mac
# Kill the process using the port if needed
# Windows: taskkill /PID <PID> /F
# Linux: kill -9 <PID>Enable detailed logging by setting these environment variables in your .env files:
APP_LOGGING_LEVEL=DEBUG
APP_LOGGING_ENABLE=True- Deployment Guide - Production deployment instructions
- Technical Architecture - System architecture overview
- API Documentation - API endpoint details
- README - Project overview and getting started
For additional support, please submit issues to the GitHub repository.