This guide shows you how to run the Knowledge Base Processor in different Docker scenarios.
Use the provided wrapper script for easy Docker usage:
# Make script executable (first time only)
chmod +x scripts/docker-run.sh
# Show help
./scripts/docker-run.sh --help
# Initialize in current directory
./scripts/docker-run.sh init
# Process documents in specific directory
./scripts/docker-run.sh -w ~/Documents scan
# Use custom config file
./scripts/docker-run.sh -c ./my-config.json publish --watch
# Search your knowledge base
./scripts/docker-run.sh search "todo items"For persistent services and easier management:
# Build and run interactively
docker-compose -f docker-compose.app.yml up kbp
# Run in watch mode (continuous processing)
docker-compose -f docker-compose.app.yml --profile watch up kbp-watch
# Run with SPARQL server
docker-compose -f docker-compose.app.yml up fuseki kbpFor full control over the container:
# Build the image
docker build -t knowledgebase-processor:latest .
# Run with volume mounts
docker run --rm -it \
-v "$(pwd):/workspace" \
-e KBP_WORK_DIR=/workspace \
-e KBP_HOME=/workspace/.kbp \
-w /workspace \
knowledgebase-processor:latest kb --helpThe Docker setup creates this structure in your mounted directory:
your-documents/
├── .kbp/ # KBP configuration directory
│ ├── config.yaml # Configuration file
│ ├── metadata/ # Document metadata and cache
│ └── cache/ # Processing cache
├── your-files.md # Your existing documents (unchanged)
└── kbp_config.json # Optional: custom config file
Configure the tool's behavior with environment variables:
| Variable | Description | Default |
|---|---|---|
KBP_WORK_DIR |
Working directory path | /workspace |
KBP_HOME |
KBP configuration directory | $KBP_WORK_DIR/.kbp |
KBP_KNOWLEDGE_BASE_PATH |
Documents directory | $KBP_WORK_DIR |
KBP_METADATA_STORE_PATH |
Metadata storage path | $KBP_HOME/metadata |
KBP_CONFIG_PATH |
Custom config file path | (auto-detected) |
# Using wrapper script
./scripts/docker-run.sh -w ~/my-docs init
# Using docker directly
docker run --rm -it \
-v ~/my-docs:/workspace \
-e KBP_WORK_DIR=/workspace \
-e KBP_HOME=/workspace/.kbp \
-w /workspace \
knowledgebase-processor:latest kb init# Scan and process all documents
./scripts/docker-run.sh -w ~/my-docs scan
# With custom patterns
./scripts/docker-run.sh -w ~/my-docs scan --pattern "**/*.md" --pattern "**/*.txt"# Watch for changes and auto-process
./scripts/docker-run.sh -w ~/my-docs publish --watch
# Or using docker-compose
docker-compose -f docker-compose.app.yml --profile watch up# Search for content
./scripts/docker-run.sh -w ~/my-docs search "project tasks"
# Advanced search with filters
./scripts/docker-run.sh -w ~/my-docs search --type todo "deadlines"Create a kbp_config.json file:
{
"file_patterns": ["**/*.md", "**/*.txt"],
"extract_frontmatter": true,
"sparql_endpoint_url": "http://localhost:3030/ds/query",
"sparql_update_endpoint_url": "http://localhost:3030/ds/update"
}Then use it:
./scripts/docker-run.sh -c ./kbp_config.json -w ~/my-docs scan- Start Fuseki with docker-compose:
docker-compose -f docker-compose.app.yml up fuseki- Configure KBP to use Fuseki:
./scripts/docker-run.sh init --sparql-endpoint http://fuseki:3030/ds- Publish your knowledge base:
./scripts/docker-run.sh publish --syncSet environment variables:
docker run --rm -it \
-v "$(pwd):/workspace" \
-e KBP_WORK_DIR=/workspace \
-e SPARQL_ENDPOINT=http://your-sparql-server/query \
-e SPARQL_UPDATE_ENDPOINT=http://your-sparql-server/update \
knowledgebase-processor:latest kb publish --sync# With wrapper script
VERBOSE=1 ./scripts/docker-run.sh -w ~/my-docs scan --verbose
# Direct docker command
docker run --rm -it \
-v "$(pwd):/workspace" \
-e KBP_WORK_DIR=/workspace \
-w /workspace \
knowledgebase-processor:latest kb scan --verbosedocker run --rm -it \
-v "$(pwd):/workspace" \
-e KBP_WORK_DIR=/workspace \
-w /workspace \
knowledgebase-processor:latest /bin/bash# Build with custom tag
docker build -t my-kbp:latest .
# Use custom image
./scripts/docker-run.sh -i my-kbp:latest scanIf you encounter permission errors:
# Fix ownership (Linux/macOS)
sudo chown -R $(id -u):$(id -g) ./.kbp
# Or run container with your user ID
docker run --rm -it \
--user $(id -u):$(id -g) \
-v "$(pwd):/workspace" \
knowledgebase-processor:latest kb scanEnsure you're mounting the right directory:
# Check what's mounted
./scripts/docker-run.sh ls -la
# Verify config path
./scripts/docker-run.sh config showOn Windows, use full paths:
# PowerShell
./scripts/docker-run.sh -w "C:\Users\YourName\Documents" init
# Or use WSL paths
./scripts/docker-run.sh -w "/mnt/c/Users/YourName/Documents" init- Use volume mounts instead of copying files
- Persist the .kbp directory to avoid reprocessing
- Use watch mode for continuous updates
- Configure exclusion patterns for large directories
Example optimized setup:
# docker-compose.override.yml
version: '3.8'
services:
kbp:
volumes:
- ./docs:/workspace/docs
- ./.kbp:/workspace/.kbp
- /node_modules # Exclude from processing
environment:
- KBP_EXCLUDE_PATTERNS=["**/node_modules/**", "**/.git/**"]