Skip to content

Latest commit

 

History

History
690 lines (492 loc) · 22.1 KB

File metadata and controls

690 lines (492 loc) · 22.1 KB

Local Development Setup Guide

This guide provides comprehensive instructions for setting up the Content Processing Solution Accelerator for local development across Windows and Linux platforms.

Important Setup Notes

Multi-Service Architecture

This application consists of three separate services that run independently:

  1. ContentProcessorAPI - REST API server for the frontend
  2. ContentProcessor - Background processor that handles document processing from Azure Storage Queue
  3. ContentProcessorWeb - React-based user interface

⚠️ Critical: Each service must run in its own terminal/console window

  • Do NOT close terminals while services are running
  • Open 3 separate terminal windows for local development
  • Each service will occupy its terminal and show live logs

Terminal Organization:

  • Terminal 1: ContentProcessorAPI - HTTP server on port 8000
  • Terminal 2: ContentProcessor - Runs continuously, polls Azure Storage Queue
  • Terminal 3: ContentProcessorWeb - Development server on port 3000

Path Conventions

All paths in this guide are relative to the repository root directory:

content-processing-solution-accelerator/    ← Repository root (start here)
├── src/
│   ├── ContentProcessorAPI/
│   │   ├── .venv/                          ← Virtual environment
│   │   └── app/
│   │       ├── main.py                     ← API entry point
│   │       └── .env                        ← API config file
│   ├── ContentProcessor/
│   │   ├── .venv/                          ← Virtual environment
│   │   └── src/
│   │       ├── main.py                     ← Processor entry point
│   │       └── .env                        ← Processor config file
│   └── ContentProcessorWeb/
│       ├── node_modules/
│       └── .env                            ← Frontend config file
└── docs/                                   ← Documentation (you are here)

Before starting any step, ensure you are in the repository root directory:

# Verify you're in the correct location
pwd  # Linux/macOS - should show: .../content-processing-solution-accelerator
Get-Location  # Windows PowerShell - should show: ...\content-processing-solution-accelerator

# If not, navigate to repository root
cd path/to/content-processing-solution-accelerator

Configuration Files

This project uses separate .env files in each service directory with different configuration requirements:

  • ContentProcessorAPI: src/ContentProcessorAPI/app/.env - Azure App Configuration URL and local dev settings
  • ContentProcessor: src/ContentProcessor/src/.env - Azure App Configuration URL and local dev settings
  • ContentProcessorWeb: src/ContentProcessorWeb/.env - API base URL, authentication settings

When copying .env samples, always navigate to the specific service directory first.

Step 1: Prerequisites - Install Required Tools

Windows Development

# Install Python 3.12+ and Git
winget install Python.Python.3.12
winget install Git.Git

# Install Node.js for frontend
winget install OpenJS.NodeJS.LTS

# Verify installations
python --version  # Should show Python 3.12.x
node --version    # Should show v18.x or higher
npm --version

Linux Development

Ubuntu/Debian

# Install prerequisites
sudo apt update && sudo apt install python3.12 python3.12-venv python3-pip git curl nodejs npm -y

# Verify installations
python3.12 --version
node --version
npm --version

RHEL/CentOS/Fedora

# Install prerequisites
sudo dnf install python3.11 python3.11-devel git curl gcc nodejs npm -y

# Verify installations
python3.11 --version
node --version
npm --version

Clone the Repository

git clone https://github.com/microsoft/content-processing-solution-accelerator.git
cd content-processing-solution-accelerator

Step 2: Azure Authentication Setup

Before configuring services, authenticate with Azure:

# Login to Azure CLI
az login

# Set your subscription
az account set --subscription "your-subscription-id"

# Verify authentication
az account show

Get Azure Resource Information

After deploying Azure resources (using azd up or Bicep template), gather the following information:

# List resources in your resource group
az resource list -g <resource-group-name> -o table

# Get App Configuration endpoint
az appconfig show -n <appconfig-name> -g <resource-group-name> --query endpoint -o tsv

# Get Cosmos DB endpoint
az cosmosdb show -n <cosmos-name> -g <resource-group-name> --query documentEndpoint -o tsv

Example resource names from deployment:

  • App Configuration: appcs-{suffix}.azconfig.io
  • Cosmos DB: cosmos-{suffix}.documents.azure.com
  • Storage Account: st{suffix}.queue.core.windows.net
  • Content Understanding: aicu-{suffix}.cognitiveservices.azure.com

Required Azure RBAC Permissions

To run the application locally, your Azure account needs the following role assignments on the deployed resources:

Get Your Principal ID

# Get your principal ID for role assignments
PRINCIPAL_ID=$(az ad signed-in-user show --query id -o tsv)
echo $PRINCIPAL_ID

# Get your subscription ID
SUBSCRIPTION_ID=$(az account show --query id -o tsv)
echo $SUBSCRIPTION_ID

Assign Required Roles

# 1. App Configuration Data Reader
az role assignment create \
  --role "App Configuration Data Reader" \
  --assignee $PRINCIPAL_ID \
  --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.AppConfiguration/configurationStores/<appconfig-name>"

# 2. Cosmos DB Built-in Data Contributor
az role assignment create \
  --role "Cosmos DB Built-in Data Contributor" \
  --assignee $PRINCIPAL_ID \
  --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.DocumentDB/databaseAccounts/<cosmos-name>"

# 3. Storage Blob Data Contributor (for document upload/download)
az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee $PRINCIPAL_ID \
  --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account-name>"

# 4. Storage Queue Data Contributor (for message processing)
az role assignment create \
  --role "Storage Queue Data Contributor" \
  --assignee $PRINCIPAL_ID \
  --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account-name>"

# 5. Cognitive Services User
az role assignment create \
  --role "Cognitive Services User" \
  --assignee $PRINCIPAL_ID \
  --scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/<resource-group>/providers/Microsoft.CognitiveServices/accounts/<content-understanding-name>"

Note: RBAC permission changes can take 5-10 minutes to propagate. If you encounter "Forbidden" errors after assigning roles, wait a few minutes and try again.

Step 3: ContentProcessorAPI Setup & Run Instructions

📋 Terminal Reminder: Open a dedicated terminal window (Terminal 1) for the ContentProcessorAPI service. All commands in this section assume you start from the repository root directory.

The ContentProcessorAPI provides REST endpoints for the frontend and handles API requests.

3.1. Navigate to API Directory

# From repository root
cd src/ContentProcessorAPI

3.2. Create Virtual Environment

# Create virtual environment
python -m venv .venv

# Activate virtual environment
.venv\Scripts\Activate.ps1  # Windows PowerShell
# or
source .venv/bin/activate  # Linux/macOS

Note for PowerShell Users: If you get an error about scripts being disabled, run:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

3.3. Install Dependencies

# Install uv package manager if not already installed
pip install uv

# Install all dependencies using uv
uv sync --python 3.12

Note: This project uses uv as the package manager with pyproject.toml. The uv sync command automatically installs all dependencies with proper version resolution.

3.4. Configure Environment Variables

Create a .env file in the src/ContentProcessorAPI/app/ directory:

cd app

# Create .env file
New-Item .env  # Windows PowerShell
# or
touch .env  # Linux/macOS

Add the following to the .env file:

# App Configuration endpoint - ALL other settings are read from App Configuration
APP_CONFIG_ENDPOINT=https://<your-appconfig-name>.azconfig.io

# Local development settings - CRITICAL for local authentication
APP_ENV=dev
APP_AUTH_ENABLED=False
AZURE_IDENTITY_EXCLUDE_MANAGED_IDENTITY_CREDENTIAL=True

# Logging settings (required)
APP_LOGGING_LEVEL=INFO
AZURE_PACKAGE_LOGGING_LEVEL=WARNING
AZURE_LOGGING_PACKAGES=azure.core,azure.storage,azure.identity

⚠️ Important:

  • Replace <your-appconfig-name> with your actual App Configuration resource name
  • APP_ENV=dev is REQUIRED for local development - it enables Azure CLI credential usage instead of Managed Identity
  • All other settings (Cosmos DB, Storage, AI endpoints) are automatically loaded from Azure App Configuration
  • Get your resource names from the Azure Portal or by running: az resource list -g <resource-group-name>

3.5. Configure CORS for Local Development

Edit src/ContentProcessorAPI/app/main.py and add the CORS middleware configuration.

Add the import at the top:

from fastapi.middleware.cors import CORSMiddleware

Then after the line app = FastAPI(redirect_slashes=False), add:

# Configure CORS for local development
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:3000"],  # Frontend URL
    allow_credentials=True,
    allow_methods=["*"],  # Allow all HTTP methods
    allow_headers=["*"],  # Allow all headers
)

Note: This CORS configuration is only needed for local development. Azure deployment handles CORS at the infrastructure level.

3.6. Run the API

# Make sure you're in the ContentProcessorAPI directory with activated venv
cd ..  # Go back to ContentProcessorAPI root if in app/

# Run with uvicorn
python -m uvicorn app.main:app --reload --port 8000

The ContentProcessorAPI will start at:

  • API: http://localhost:8000
  • API Documentation: http://localhost:8000/docs

Keep this terminal open - the API server will continue running and show request logs.

Step 4: ContentProcessor Setup & Run Instructions

📋 Terminal Reminder: Open a second dedicated terminal window (Terminal 2) for the ContentProcessor. Keep Terminal 1 (API) running. All commands assume you start from the repository root directory.

The ContentProcessor handles background document processing from Azure Storage Queue.

4.1. Navigate to Processor Directory

# From repository root
cd src/ContentProcessor

4.2. Create Virtual Environment

# Create virtual environment
python -m venv .venv

# Activate virtual environment
.venv\Scripts\Activate.ps1  # Windows PowerShell
# or
source .venv/bin/activate  # Linux/macOS

4.3. Install Dependencies

# Install uv package manager if not already installed
pip install uv

# Install all dependencies using uv
uv sync --python 3.12

Note: This project uses uv as the package manager with pyproject.toml. The uv sync command automatically installs all dependencies with proper version resolution.

4.4. Configure Environment Variables

Create a .env file in the src/ContentProcessor/src/ directory:

cd src

# Create .env file
New-Item .env  # Windows PowerShell
# or
touch .env  # Linux/macOS

Add the following to the .env file:

# App Configuration endpoint - ALL other settings are read from App Configuration
APP_CONFIG_ENDPOINT=https://<your-appconfig-name>.azconfig.io

# Local development settings
APP_ENV=dev
APP_AUTH_ENABLED=False
AZURE_IDENTITY_EXCLUDE_MANAGED_IDENTITY_CREDENTIAL=True

# Logging settings
APP_LOGGING_LEVEL=INFO
APP_LOGGING_ENABLE=True

# Azure package logging configuration (required)
AZURE_PACKAGE_LOGGING_LEVEL=WARNING
AZURE_LOGGING_PACKAGES=azure.core,azure.storage,azure.identity

4.5. Update main.py to Use .env File

The code currently uses .env.dev by default. Update it to use the standard .env file:

  1. Open src/ContentProcessor/src/main.py
  2. Find line 25 (inside the __init__ method)
  3. Change:
    env_file_path=os.path.join(os.path.dirname(__file__), ".env.dev"),
    to:
    env_file_path=os.path.join(os.path.dirname(__file__), ".env"),

⚠️ Important:

  • The .env file must be located in src/ContentProcessor/src/ directory, not in src/ContentProcessor/ root
  • After making this change, the application will look for .env file in the same directory as main.py
  • All Azure resource settings (Cosmos DB, Storage, AI endpoints) are automatically loaded from Azure App Configuration

4.6. Run the Processor

# Make sure you're in the src directory
python main.py

The ContentProcessor will start and begin polling the Azure Storage Queue for messages.

Expected behavior:

  • You may see Storage Queue authorization errors if roles haven't propagated (wait 5-10 minutes)
  • The processor will show continuous polling activity
  • Document processing will begin when files are uploaded via the frontend

Keep this terminal open - the processor will continue running and show processing logs.

Step 5: ContentProcessorWeb Setup & Run Instructions

📋 Terminal Reminder: Open a third dedicated terminal window (Terminal 3) for the ContentProcessorWeb. Keep Terminals 1 (API) and 2 (Processor) running. All commands assume you start from the repository root directory.

The ContentProcessorWeb provides the React-based user interface.

5.1. Navigate to Frontend Directory

# From repository root
cd src/ContentProcessorWeb

5.2. Install Dependencies

# Install dependencies with legacy peer deps flag
npm install --legacy-peer-deps

# Install additional required FluentUI packages
npm install @fluentui/react-dialog @fluentui/react-button --legacy-peer-deps

Note: Always use the --legacy-peer-deps flag for npm commands in this project to avoid dependency conflicts with @azure/msal-react.

5.3. Configure Environment Variables

Update the .env file in the src/ContentProcessorWeb/ directory:

REACT_APP_API_BASE_URL=http://localhost:8000
REACT_APP_AUTH_ENABLED=false
REACT_APP_CONSOLE_LOG_ENABLED=true

5.4. Start Development Server

npm start

The ContentProcessorWeb will start at: http://localhost:3000

Keep this terminal open - the React development server will continue running with hot reload.

Step 6: Verify All Services Are Running

Before using the application, confirm all three services are running in separate terminals:

Terminal Status Checklist

Terminal Service Command Expected Output URL
Terminal 1 ContentProcessorAPI python -m uvicorn app.main:app --reload --port 8000 Application startup complete http://localhost:8000
Terminal 2 ContentProcessor python main.py Polling messages, no fatal errors N/A
Terminal 3 ContentProcessorWeb npm start Compiled successfully! http://localhost:3000

Quick Verification

  1. Check Backend API:

    # In a new terminal (Terminal 4)
    curl http://localhost:8000/health
    # Expected: {"message":"I'm alive!"}
  2. Check Frontend:

    • Open browser to http://localhost:3000
    • Should see the Content Processing UI
    • No "Unable to connect to the server" errors
  3. Check Processor:

    • Look at Terminal 2 output
    • Should see processing activity or queue polling
    • No authorization errors (if roles have propagated)

Step 7: Next Steps

Once all services are running (as confirmed in Step 6), you can:

  1. Access the Application: Open http://localhost:3000 in your browser to explore the frontend UI
  2. Upload Documents: Use the UI to upload documents for processing
  3. View API Documentation: Navigate to http://localhost:8000/docs to explore API endpoints
  4. Check Processing Status: Monitor Terminal 2 for document processing logs

Troubleshooting

Common Issues

Python Compilation Errors (Windows)

If you see errors when installing dependencies, ensure you're using uv sync instead of pip install:

# Install uv if not already installed
pip install uv

# Use uv sync which handles dependencies better
uv sync --python 3.12

Explanation: This project uses uv as the package manager with pyproject.toml. The uv tool provides better dependency resolution and automatically uses precompiled wheels when available, avoiding compilation issues on Windows.

pydantic_core ImportError

If you see "PyO3 modules compiled for CPython 3.8 or older may only be initialized once" or "ImportError: pydantic_core._pydantic_core":

# Uninstall and reinstall with compatible versions
pip uninstall -y pydantic pydantic-core
pip install pydantic==2.12.5 pydantic-core==2.41.5
pip install --upgrade "typing-extensions>=4.14.1"

Explanation: Version mismatch between pydantic and pydantic-core causes runtime errors. The compatible versions above work reliably together.

Node.js Dependencies Issues

# Clear npm cache and reinstall with legacy peer deps
npm cache clean --force
Remove-Item -Recurse -Force node_modules -ErrorAction SilentlyContinue
Remove-Item -Force package-lock.json -ErrorAction SilentlyContinue
npm install --legacy-peer-deps

# Install missing FluentUI packages if needed
npm install @fluentui/react-dialog @fluentui/react-button --legacy-peer-deps

Explanation: The --legacy-peer-deps flag is required due to peer dependency conflicts with @azure/msal-react. Some FluentUI packages may not be included in the initial install and need to be added separately.

Azure Authentication Issues

If you get "Forbidden" errors when accessing App Configuration or Cosmos DB:

# Check your current Azure account
az account show

# Get your principal ID for role assignments
az ad signed-in-user show --query id -o tsv

# Verify you have the correct role assignments
az role assignment list --assignee $(az ad signed-in-user show --query id -o tsv) --resource-group <resource-group-name>

# Refresh your access token
az account get-access-token --resource https://azconfig.io

If roles are missing, assign them as shown in Step 2.

Note: Role assignments can take 5-10 minutes to propagate through Azure AD. If you just assigned roles, wait a few minutes before retrying.

Cognitive Services Permission Errors

If you see "401 Client Error: PermissionDenied" for Content Understanding service:

# Assign Cognitive Services User role
az role assignment create --role "Cognitive Services User" \
  --assignee <principal-id> \
  --scope /subscriptions/<sub-id>/resourceGroups/<rg-name>/providers/Microsoft.CognitiveServices/accounts/<content-understanding-name>

This error occurs when processing documents. Wait 5-10 minutes after assigning the role, then restart the ContentProcessor service.

ManagedIdentityCredential Errors

If you see "ManagedIdentityCredential authentication unavailable" or "No managed identity endpoint found":

# Ensure your .env files have these settings:
APP_ENV=dev
AZURE_IDENTITY_EXCLUDE_MANAGED_IDENTITY_CREDENTIAL=True

Locations to check:

  • src/ContentProcessorAPI/app/.env
  • src/ContentProcessor/src/.env (note: must be in the src/ subdirectory)

Explanation: Managed Identity is used in Azure deployments but doesn't work locally. Setting APP_ENV=dev switches to Azure CLI credential authentication.

CORS Issues

If the frontend loads but shows "Unable to connect to the server" error:

  1. Verify CORS is configured in src/ContentProcessorAPI/app/main.py:

    from fastapi.middleware.cors import CORSMiddleware
    
    app = FastAPI(redirect_slashes=False)
    
    # Configure CORS for local development
    app.add_middleware(
        CORSMiddleware,
        allow_origins=["http://localhost:3000"],
        allow_credentials=True,
        allow_methods=["*"],
        allow_headers=["*"],
    )
  2. Restart the API service (Terminal 1) after adding CORS configuration

  3. Check browser console (F12) for CORS errors

  4. Verify API is running on port 8000 and frontend on port 3000

Explanation: CORS (Cross-Origin Resource Sharing) blocks requests between different origins by default. The frontend (localhost:3000) needs explicit permission to call the API (localhost:8000).

Environment Variables Not Loading

  • Verify .env file is in the correct directory:
    • ContentProcessorAPI: src/ContentProcessorAPI/app/.env
    • ContentProcessor: src/ContentProcessor/src/.env (must be in src/ subdirectory)
    • ContentProcessorWeb: src/ContentProcessorWeb/.env
  • Check file permissions (especially on Linux/macOS)
  • Ensure no extra spaces in variable assignments
  • Restart the service after changing .env files

PowerShell Script Execution Policy Error

If you get "cannot be loaded because running scripts is disabled" when activating venv:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Port Conflicts

# Check what's using the port
netstat -ano | findstr :8000  # Windows
netstat -tulpn | grep :8000   # Linux/Mac

# Kill the process using the port if needed
# Windows: taskkill /PID <PID> /F
# Linux: kill -9 <PID>

Debug Mode

Enable detailed logging by setting these environment variables in your .env files:

APP_LOGGING_LEVEL=DEBUG
APP_LOGGING_ENABLE=True

Related Documentation


For additional support, please submit issues to the GitHub repository.