Synapses-Scout

The Sensory Cortex of Synapses-OS. A web intelligence acquisition layer that gives AI coding agents the ability to think from the internet.

Instead of hallucinating answers about the outside world, an agent asks Scout — and gets structured, cached, LLM-distilled Markdown back in under a second.

Agent → "What are the latest Python 3.13 features?"
Scout → Orchestrated multi-query search (4 angles, parallel fan-out)
      → Deduplicated, relevance-scored results
      → Fast-path extraction (httpx + trafilatura, <1s)
      → Optional LLM distillation via synapses-intelligence
      → Cached in SQLite with TTL

The agent gets clean context. Not raw HTML.

Features

Unified Scout.fetch() — One interface for search queries, web pages, and YouTube. Auto-detects content type.
Fast-Path Extraction — httpx + trafilatura for ~80% of pages (<1s). Crawl4AI browser fallback for JS-heavy SPAs (3-8s). Transparent to the caller.
Orchestrated Search — Expands queries into multiple angles, fans out in parallel, deduplicates by URL, scores by cross-query frequency + relevance.
News & Image Search — Full DuckDuckGo feature set: regions, time filters, SafeSearch, image size/color/layout.
YouTube Intelligence — yt-dlp metadata + auto-generated transcript extraction. No downloads, no Whisper needed.
Intelligence Distillation — 2-step pipeline: POST /v1/prune (0.8B strips boilerplate) → POST /v1/ingest (4B summarizes clean content). Fail-silent: works without intelligence.
SQLite Cache — TTL-based caching (search: 6h, web: 24h, YouTube: 7d). URL normalization strips tracking params.
HTTP API — REST server on localhost:11436. Same pattern as the intelligence sidecar.
Local-First — Everything runs on your machine. No cloud APIs required for core features.

Quick Start

# Clone
git clone https://github.com/SynapsesOS/synapses-scout.git
cd synapses-scout

# Install
python3 -m venv .venv && source .venv/bin/activate
pip install -e .

# Use
scout fetch "python async patterns"                          # search
scout fetch "https://docs.python.org/3/library/asyncio.html" # extract web page
scout fetch "https://youtube.com/watch?v=..."                # YouTube transcript
scout deep-search "rust vs go performance 2026"              # orchestrated multi-query
scout news "Apple M4 chip" --time d                          # today's news
scout images "neural network architecture" --size Large      # image search

Architecture

Agent (Claude Code / Cursor)
    │
    ▼
Synapses (MCP Server :8766)
    │                    │
    ▼                    ▼
Intelligence (:11435)    Scout (:11436)
    │                        │
    ▼                        ├─→ DuckDuckGo (search, news, images)
  Ollama (:11434)            ├─→ httpx + trafilatura (fast extraction)
                             ├─→ Crawl4AI (JS-heavy fallback)
                             └─→ yt-dlp (YouTube)
                                     │
                                     ▼
                             Intelligence (:11435) for distillation

Scout uses a 2-step distillation pipeline with intelligence:

POST /v1/prune (0.8B Reflex) — strips navigation, ads, footers from raw web content → ~1200 chars clean signal
POST /v1/ingest (4B Specialist) — summarizes the clean technical content → prose briefing

Both steps are fail-silent. No duplicate Ollama setup — scout reuses the brain sidecar.

HTTP API

Start the server:

scout serve                    # default: localhost:11436
scout serve --port 8080        # custom port

Endpoints

Method	Path	Description
`GET`	`/v1/health`	Status, intelligence availability, cache stats
`POST`	`/v1/fetch`	Unified fetch: `{"input": "url or query", "max_results": 10, "force_refresh": false}`
`POST`	`/v1/search`	Web search: `{"query": "...", "max_results": 5}`
`POST`	`/v1/deep-search`	Orchestrated multi-query: `{"query": "...", "max_results": 10}`
`POST`	`/v1/news`	News search: `{"query": "...", "timelimit": "d"}`
`POST`	`/v1/images`	Image search: `{"query": "...", "size": "Large"}`
`POST`	`/v1/extract`	Direct URL extraction: `{"url": "...", "force_refresh": false}`
`GET/DELETE`	`/v1/cache`	Cache stats / prune

All endpoints accept optional region (e.g., "us-en", "fr-fr") and timelimit ("d", "w", "m", "y") parameters.

CLI Reference

scout fetch <url_or_query>     Fetch a URL or search query
    --no-cache                 Bypass cache
    --no-distill               Skip intelligence distillation
    --region <code>            Search region (us-en, fr-fr, de-de, ...)
    --time <d|w|m|y>           Time filter
    --json                     Output JSON instead of Markdown

scout deep-search <query>      Orchestrated multi-query search
    -n, --max-results <N>      Max results (default: 10)
    --no-expand                Disable query expansion
    --region / --time / --json Same as fetch

scout news <query>             Search news articles
scout images <query>           Search images
    --size <Small|Medium|Large|Wallpaper>
    --layout <Square|Tall|Wide>

scout serve                    Start HTTP server
    --port <N>                 Port (default: 11436)

scout status                   Show cache stats and intelligence availability

Configuration

Config file: ~/.synapses/scout.json (optional — all fields have defaults)

{
    "port": 11436,
    "db_path": "~/.synapses/scout.db",
    "intelligence_url": "http://localhost:11435",
    "intelligence_timeout_ms": 5000,
    "search_provider": "duckduckgo",
    "tavily_api_key": null,
    "distill": true,
    "max_content_chars": 50000,
    "default_ttl_search_hours": 6,
    "default_ttl_web_hours": 24,
    "default_ttl_media_hours": 168,
    "search_region": "wt-wt",
    "search_safesearch": "moderate",
    "search_expand": true
}

Environment variable overrides: SCOUT_CONFIG, SCOUT_PORT, SCOUT_INTELLIGENCE_URL, TAVILY_API_KEY.

Using Tavily

Tavily provides richer, context-enriched search results compared to DuckDuckGo. To enable it:

Option 1 — environment variable (recommended):

export TAVILY_API_KEY="tvly-your-key-here"

Option 2 — config file:

{
    "search_provider": "tavily",
    "tavily_api_key": "tvly-your-key-here"
}

Get a free API key at tavily.com. Without a key, Scout falls back to DuckDuckGo automatically.

Caching Strategy

Content Type	Default TTL	Cache Key
Search results	6 hours	`sha256(normalized_query)`
Web pages	24 hours	`sha256(normalized_url)`
YouTube	7 days	`sha256(normalized_url)`

URL normalization: lowercase host, strip trailing slash, remove tracking params (utm_*, fbclid, gclid), sort query params.

Cache lives at ~/.synapses/scout.db (SQLite). Expired entries pruned on startup.

Integration with Synapses-OS

Scout integrates with synapses-intelligence for LLM distillation. No changes needed to intelligence — Scout maps web content to its existing IngestRequest format:

POST http://localhost:11435/v1/ingest
{
    "node_id": "scout:web_page:a1b2c3d4e5f6",
    "node_name": "Article Title",
    "node_type": "web article",
    "package": "example.com",
    "code": "first 3000 chars of content..."
}

Node types mapped per content:

Scout content type	`node_type` sent to intelligence
Web page	`"web article"`
YouTube video	`"youtube video"`
Search results	`"search result set"`

If intelligence is unavailable, Scout skips distillation and returns raw content. Fail-silent — same contract as the rest of the ecosystem.

Development

# Setup
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# Test (116 tests)
make test

# Lint
make lint

# Format
make format

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.claude		.claude
.github/workflows		.github/workflows
src/scout		src/scout
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
PRD.md		PRD.md
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
scout.json.example		scout.json.example
synapses.json		synapses.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synapses-Scout

Features

Quick Start

Architecture

HTTP API

Endpoints

CLI Reference

Configuration

Using Tavily

Caching Strategy

Integration with Synapses-OS

Development

License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synapses-Scout

Features

Quick Start

Architecture

HTTP API

Endpoints

CLI Reference

Configuration

Using Tavily

Caching Strategy

Integration with Synapses-OS

Development

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages