diff --git a/docs/content/docs/(getting-started)/docker.mdx b/docs/content/docs/(getting-started)/docker.mdx index ece45fbae..6284d961c 100644 --- a/docs/content/docs/(getting-started)/docker.mdx +++ b/docs/content/docs/(getting-started)/docker.mdx @@ -227,7 +227,13 @@ docker stop spacebot && docker rm spacebot ### One-Click Update -Mount the Docker socket to enable updating directly from the web UI: +Mount a container runtime socket to enable updating directly from the web UI. + +- Docker: mount `/var/run/docker.sock` +- Podman (rootful): mount `/run/podman/podman.sock` +- Podman (rootless): mount `${XDG_RUNTIME_DIR}/podman/podman.sock` to `/run/podman/podman.sock` in the container + +Docker example: ```yaml services: @@ -271,10 +277,48 @@ curl http://localhost:19898/api/update/check # Force a fresh check curl -X POST http://localhost:19898/api/update/check -# Apply update (requires Docker socket) +# Apply update (requires container runtime socket) curl -X POST http://localhost:19898/api/update/apply ``` +### Podman One-Click Update + +Set `SPACEBOT_DEPLOYMENT=docker` and mount a Podman socket. + +Rootful: + +```bash +sudo systemctl enable --now podman.socket + +podman run -d \ + --name spacebot \ + -e ANTHROPIC_API_KEY="sk-ant-..." \ + -e SPACEBOT_DEPLOYMENT=docker \ + -v spacebot-data:/data \ + -v /run/podman/podman.sock:/run/podman/podman.sock \ + --security-opt label=disable \ + -p 19898:19898 \ + ghcr.io/spacedriveapp/spacebot:latest +``` + +Rootless: + +```bash +systemctl --user enable --now podman.socket + +podman run -d \ + --name spacebot \ + -e ANTHROPIC_API_KEY="sk-ant-..." \ + -e SPACEBOT_DEPLOYMENT=docker \ + -v spacebot-data:/data \ + -v ${XDG_RUNTIME_DIR}/podman/podman.sock:/run/podman/podman.sock \ + --security-opt label=disable \ + -p 19898:19898 \ + ghcr.io/spacedriveapp/spacebot:latest +``` + +On Fedora/RHEL with SELinux enforcing, keep `--security-opt label=disable` when mounting the Podman socket. + ## CI / Releases Images are built and pushed to `ghcr.io/spacedriveapp/spacebot` via GitHub Actions (`.github/workflows/release.yml`). diff --git a/docs/design-docs/context-inspect-tool.md b/docs/design-docs/context-inspect-tool.md new file mode 100644 index 000000000..dc0128dc0 --- /dev/null +++ b/docs/design-docs/context-inspect-tool.md @@ -0,0 +1,593 @@ +# Context Inspect Tool + +A debug tool for cortex chat that spawns a branch to analyze the complete internal context of an associated channel. This allows cortex chat to see exactly what a channel's LLM sees on its next turn, enabling effective debugging of channel behavior. + +## Problem + +When debugging channel issues via cortex chat, the admin only sees a 50-message transcript summary injected into the cortex's system prompt. This is insufficient to understand: + +- Why a channel is making certain decisions +- What the full system prompt contains (identity, bulletin, skills, capabilities) +- What tools are available with their exact schemas +- How much context is being used (compaction state) +- What the complete conversation history looks like with branch/worker results +- What the current status block shows + +Admins need to see the **exact context** that the channel's LLM sees — not a summary, but the complete system prompt + history + tools + status that would be sent to the LLM on the next turn. + +## Solution + +Add a `context_inspect` tool to cortex chat that: + +1. Spawns a branch (to avoid polluting cortex chat's history with massive channel context) +2. The branch gets a special `read_channel_context` tool +3. This tool builds the **full channel context** exactly as the channel would see it +4. The branch analyzes the context and returns conclusions to cortex chat + +### Architecture + +``` +Cortex Chat (opened on channel page) + ↓ calls context_inspect tool + ↓ spawns Branch with special tool server + ↓ Branch calls read_channel_context + ↓ ChannelContextInspector builds full snapshot: + • System prompt (identity + bulletin + skills + capabilities + status) + • Tool definitions with complete schemas + • Conversation history (full Vec) + • Status block text + • Context statistics (token counts, usage %) + ↓ Branch analyzes and returns conclusion + ↓ Cortex chat receives analysis +``` + +## Components + +### 1. `ChannelContextInspector` Service + +**File:** `src/agent/channel_context.rs` (new) + +A reusable service that assembles the complete channel context for inspection purposes. Mirrors the logic in `Channel::build_system_prompt()` and `Channel::run_agent_turn()` but makes it accessible from outside the channel. + +**Core type:** +```rust +pub struct ChannelContextSnapshot { + pub channel_id: String, + pub channel_name: Option, + pub system_prompt: String, + pub tool_definitions: Vec, + pub history: Vec, + pub status_text: String, + pub stats: ContextStats, +} + +pub struct ContextStats { + pub system_prompt_tokens: usize, + pub tool_defs_tokens: usize, + pub history_tokens: usize, + pub total_tokens: usize, + pub context_window: usize, + pub usage_percent: f32, +} +``` + +**Key method:** +```rust +impl ChannelContextInspector { + pub async fn inspect_channel(&self, channel_id: &str) + -> Result; +} +``` + +**Implementation steps:** + +1. Look up channel state from the active channels registry +2. Build system prompt components: + - Identity context (`RuntimeConfig::identity.load().render()`) + - Memory bulletin (`RuntimeConfig::memory_bulletin.load()`) + - Skills prompt (`RuntimeConfig::skills.load().render_channel_prompt()`) + - Worker capabilities (`PromptEngine::render_worker_capabilities()`) + - Conversation context (platform, server, channel name) + - Status block (`ChannelState::status_block.read().await.render()`) + - Available channels list + - Org context (if configured) + - Link context (if applicable) +3. Render full system prompt via `PromptEngine::render_channel_prompt_with_links()` +4. Clone conversation history (`ChannelState::history.read().await.clone()`) +5. Query tool server for tool definitions (`ToolServerHandle::get_tool_defs()`) +6. Calculate token estimates using `estimate_history_tokens()` +7. Return complete snapshot + +**Dependencies:** +- Needs access to active channels (via shared registry in `AgentDeps`) +- Needs `RuntimeConfig` for dynamic context components +- Needs channel's `ChannelState` for history/status/tools + +### 2. `ReadChannelContextTool` + +**File:** `src/tools/read_channel_context.rs` (new) + +A tool that branches can use to read the full internal context of a channel. + +**Tool interface:** +```rust +pub struct ReadChannelContextArgs { + pub channel_id: String, +} + +pub struct ReadChannelContextOutput { + pub channel_id: String, + pub channel_name: Option, + pub system_prompt: String, + pub tool_list: String, // Formatted list of tools + pub history_summary: String, // Message count + preview + pub status_block: String, + pub stats: ContextStats, + pub context_preview: String, // First 1000 chars + pub full_context_dump: String, // Complete formatted output +} +``` + +**Tool description** (for LLM): +```markdown +Read the complete internal context of a channel exactly as it appears to the +channel's LLM on its next turn. Returns: + +- Full system prompt (identity, bulletin, skills, capabilities, status block) +- All available tools with complete parameter schemas +- Complete conversation history with all messages, branch results, worker results +- Token usage statistics and compaction state +- Context window utilization percentage + +Use this to understand exactly what information and capabilities a channel has +access to. Particularly useful for debugging unexpected channel behavior. + +Parameters: +- channel_id: The channel to inspect (e.g., "telegram:123456" or "discord:guild:channel") +``` + +**Output formatting:** + +The `full_context_dump` field provides a structured markdown report: + +```markdown +# Channel Context: #{channel_name} + +**Channel ID:** {channel_id} +**Context Usage:** {stats.total_tokens} / {stats.context_window} tokens ({stats.usage_percent}%) + +## System Prompt ({stats.system_prompt_tokens} tokens) + +{system_prompt} + +## Available Tools ({tool_count} tools, {stats.tool_defs_tokens} tokens) + +{tool_list} + +## Conversation History ({message_count} messages, {stats.history_tokens} tokens) + +{history_summary} + +## Status Block + +{status_block} + +## Analysis + +- Compaction risk: {if usage_percent > 80% then "HIGH" else "Normal"} +- Active processes: {branch_count} branches, {worker_count} workers +- Recent completions: {recent_completion_count} +``` + +### 3. `ContextInspectTool` + +**File:** `src/tools/context_inspect.rs` (new) + +The tool cortex chat invokes to spawn the inspection branch. + +**Tool interface:** +```rust +pub struct ContextInspectArgs { + /// Optional: specific aspect to focus analysis on + pub focus: Option, +} + +pub struct ContextInspectOutput { + pub branch_id: BranchId, + pub channel_id: String, + pub message: String, +} +``` + +**Tool description** (for LLM): +```markdown +Spawn a branch to deeply inspect the complete internal context of the channel +you're currently viewing. + +The inspection branch will analyze: +- Full system prompt (identity, memory bulletin, skills, worker capabilities) +- Complete conversation history with all messages and delegated process results +- Status block showing active branches and workers +- All available tools with their complete parameter schemas +- Context usage statistics and compaction risk assessment + +You can optionally specify a `focus` parameter to direct the analysis toward a +specific aspect: +- "memory_influence" - How memories are affecting channel behavior +- "compaction_state" - Context usage and compaction risk +- "tool_availability" - What tools are available and properly configured +- "recent_history" - Analysis of recent conversation patterns +- Or any custom focus area + +**Requirements:** Only works when cortex chat is opened on a channel page. + +Returns a branch ID. The branch will analyze the context and send its +conclusions back asynchronously. +``` + +**Implementation:** + +```rust +impl Tool for ContextInspectTool { + async fn call(&self, args: Self::Args) -> Result { + // Verify channel context is active + let Some(channel_id) = &self.channel_context_id else { + return Err(ContextInspectError( + "No channel context active. Open cortex chat on a channel \ + page to use context_inspect.".into() + )); + }; + + // Build branch description and prompt + let description = format!("Inspect context of channel {}", channel_id); + let prompt = match args.focus { + Some(focus) => format!( + "Use the read_channel_context tool to inspect channel {}. \ + Focus your analysis on: {}. Provide actionable insights and \ + identify any issues or anomalies.", + channel_id, focus + ), + None => format!( + "Use the read_channel_context tool to inspect channel {}. \ + Provide a comprehensive analysis of the channel's current state, \ + including: context usage, active processes, recent conversation \ + patterns, memory influence, and any visible issues or anomalies.", + channel_id + ), + }; + + // Spawn branch with inspection tool server + let branch_id = self.spawn_inspection_branch( + channel_id, + &description, + &prompt, + ).await?; + + Ok(ContextInspectOutput { + branch_id, + channel_id: channel_id.clone(), + message: format!( + "Spawned inspection branch {}. The branch will analyze the \ + complete internal context of {} and return detailed findings.", + branch_id, channel_id + ), + }) + } +} + +impl ContextInspectTool { + async fn spawn_inspection_branch( + &self, + channel_id: &str, + description: &str, + prompt: &str, + ) -> Result { + // Build branch system prompt + let rc = &self.deps.runtime_config; + let prompt_engine = rc.prompts.load(); + let system_prompt = prompt_engine.render_branch_prompt( + &rc.instance_dir.display().to_string(), + &rc.workspace_dir.display().to_string(), + )?; + + // Create inspection tool server with read_channel_context + let tool_server = create_inspection_branch_tool_server( + self.deps.memory_search.clone(), + self.conversation_logger.clone(), + self.channel_store.clone(), + self.run_logger.clone(), + self.channel_inspector.clone(), + &self.deps.agent_id, + ); + + // Build branch with empty history (doesn't need channel history) + let branch = Branch::new( + Arc::from(channel_id), + description, + self.deps.clone(), + system_prompt, + vec![], // Empty history - branch will read context via tool + tool_server, + **rc.branch_max_turns.load(), + ); + + let branch_id = branch.id; + + // Spawn branch task + tokio::spawn(async move { + if let Err(error) = branch.run(prompt).await { + tracing::error!( + branch_id = %branch_id, + %error, + "context inspection branch failed" + ); + } + }); + + Ok(branch_id) + } +} +``` + +### 4. Tool Server Factory + +**File:** `src/tools.rs` (modify existing) + +Add a new factory function for inspection branches: + +```rust +/// Create a tool server for context inspection branches. +/// +/// Inspection branches are spawned by cortex chat's context_inspect tool to +/// analyze channel context. They get memory tools, channel_recall, worker_inspect, +/// and the special read_channel_context tool. +pub fn create_inspection_branch_tool_server( + memory_search: Arc, + conversation_logger: ConversationLogger, + channel_store: ChannelStore, + run_logger: ProcessRunLogger, + channel_inspector: Arc, + agent_id: &str, +) -> ToolServerHandle { + ToolServer::new() + .tool(MemorySaveTool::new(memory_search.clone())) + .tool(MemoryRecallTool::new(memory_search.clone())) + .tool(MemoryDeleteTool::new(memory_search)) + .tool(ChannelRecallTool::new(conversation_logger, channel_store)) + .tool(WorkerInspectTool::new(run_logger, agent_id.to_string())) + .tool(ReadChannelContextTool::new(channel_inspector)) + .run() +} +``` + +### 5. Cortex Chat Integration + +**File:** `src/agent/cortex_chat.rs` (modify existing) + +Store `channel_context_id` in the session and pass to tool server: + +```rust +pub struct CortexChatSession { + pub deps: AgentDeps, + pub tool_server: ToolServerHandle, + pub store: CortexChatStore, + pub channel_context_id: Option, // NEW + send_lock: Mutex<()>, +} + +impl CortexChatSession { + pub fn new( + deps: AgentDeps, + tool_server: ToolServerHandle, + store: CortexChatStore, + channel_context_id: Option, // NEW + ) -> Self { + Self { + deps, + tool_server, + store, + channel_context_id, + send_lock: Mutex::new(()), + } + } +} +``` + +**File:** `src/tools.rs` (modify `create_cortex_chat_tool_server`) + +Add `context_inspect` tool to cortex chat: + +```rust +#[allow(clippy::too_many_arguments)] +pub fn create_cortex_chat_tool_server( + memory_search: Arc, + conversation_logger: ConversationLogger, + channel_store: ChannelStore, + run_logger: ProcessRunLogger, + channel_inspector: Arc, // NEW + agent_id: &str, + channel_context_id: Option, // NEW + browser_config: BrowserConfig, + screenshot_dir: PathBuf, + brave_search_key: Option, + workspace: PathBuf, + sandbox: Arc, +) -> ToolServerHandle { + let mut server = ToolServer::new() + .tool(MemorySaveTool::new(memory_search.clone())) + .tool(MemoryRecallTool::new(memory_search.clone())) + .tool(MemoryDeleteTool::new(memory_search)) + .tool(ChannelRecallTool::new(conversation_logger.clone(), channel_store.clone())) + .tool(WorkerInspectTool::new(run_logger.clone(), agent_id.to_string())) + .tool(ContextInspectTool::new( // NEW + conversation_logger, + channel_store, + run_logger, + channel_inspector, + channel_context_id, + )) + .tool(ShellTool::new(workspace.clone(), sandbox.clone())) + .tool(FileTool::new(workspace.clone())) + .tool(ExecTool::new(workspace, sandbox)); + + if browser_config.enabled { + server = server.tool(BrowserTool::new(browser_config, screenshot_dir)); + } + + if let Some(key) = brave_search_key { + server = server.tool(WebSearchTool::new(key)); + } + + server.run() +} +``` + +## Data Dependencies + +### Active Channels Registry + +To enable context inspection, we need access to active channel state. Add to `AgentDeps`: + +```rust +pub struct AgentDeps { + // ... existing fields ... + + /// Registry of active channels for context inspection + pub active_channels: Arc>>, +} +``` + +**Population:** When `MessagingManager` creates a channel, register it: + +```rust +// In MessagingManager::start_channel() or similar +let channel_state = ChannelState::new(/* ... */); +deps.active_channels.write().await.insert(channel_id.clone(), channel_state.clone()); +``` + +**Cleanup:** When a channel is removed/closed, unregister it: + +```rust +deps.active_channels.write().await.remove(&channel_id); +``` + +This allows `ChannelContextInspector` to look up any active channel by ID. + +## Prompt Templates + +### `prompts/en/tools/context_inspect.md.j2` + +```markdown +Spawn a branch to deeply inspect the complete internal context of the channel you're currently viewing. + +The inspection branch will analyze the channel's full LLM context, including: +- Complete system prompt (identity, memory bulletin, skills, worker capabilities, status block) +- Conversation history with all messages, branch conclusions, and worker results +- Available tools with complete parameter schemas +- Context usage statistics and compaction risk + +Use this when debugging channel behavior or understanding why a channel is making certain decisions. + +**Optional focus parameter:** Direct the analysis toward a specific aspect: +- "memory_influence" - How memories affect behavior +- "compaction_state" - Context usage and overflow risk +- "tool_availability" - Tool configuration status +- "recent_history" - Conversation pattern analysis +- Or specify any custom focus area + +**Requirements:** Only works when cortex chat is opened on a channel page. +``` + +### `prompts/en/tools/read_channel_context.md.j2` + +```markdown +Read the complete internal context of a channel exactly as its LLM sees it on the next turn. + +Returns a comprehensive snapshot including: +- Full system prompt with all dynamic components +- Complete tool definitions with parameter schemas +- Entire conversation history +- Status block (active branches, workers, recent completions) +- Token usage statistics and compaction state + +This is the exact context that would be sent to the channel's LLM. Use it to understand what information and capabilities the channel has access to. +``` + +## Implementation Phases + +### Phase 1: Core Infrastructure +1. Create `src/agent/channel_context.rs` with `ChannelContextInspector` +2. Add `active_channels` registry to `AgentDeps` +3. Update `MessagingManager` to populate/clean registry +4. Add unit tests for context snapshot generation + +### Phase 2: Tool Implementation +1. Create `src/tools/read_channel_context.rs` +2. Create `src/tools/context_inspect.rs` +3. Add tool prompt templates +4. Add `create_inspection_branch_tool_server()` factory + +### Phase 3: Cortex Chat Integration +1. Modify `CortexChatSession` to store `channel_context_id` +2. Update `create_cortex_chat_tool_server()` to add `context_inspect` tool +3. Pass `channel_context_id` from API layer + +### Phase 4: Testing +1. Integration test: Create channel, open cortex chat, call `context_inspect` +2. Verify branch receives full context via `read_channel_context` +3. Verify cortex chat receives analysis +4. Test error cases (no channel context, invalid channel ID) + +## File Changes + +**New files:** +- `src/agent/channel_context.rs` - `ChannelContextInspector` service +- `src/tools/context_inspect.rs` - `ContextInspectTool` +- `src/tools/read_channel_context.rs` - `ReadChannelContextTool` +- `prompts/en/tools/context_inspect.md.j2` - Tool description +- `prompts/en/tools/read_channel_context.md.j2` - Tool description + +**Modified files:** +- `src/lib.rs` - Add `active_channels` to `AgentDeps`, re-export new types +- `src/tools.rs` - Add factories for inspection branch and update cortex chat factory +- `src/agent/cortex_chat.rs` - Store and use `channel_context_id` +- `src/api/cortex.rs` - Pass `channel_context_id` from request +- `src/messaging/manager.rs` - Populate/clean active channels registry + +**Estimated complexity:** Medium +**Estimated lines:** ~800 lines + +## Usage Example + +Admin opens cortex chat on a misbehaving Discord channel: + +``` +Admin: why isn't this channel responding to me? + +Cortex: Let me inspect the channel's internal context to diagnose the issue. +[calls context_inspect tool] + +Branch (internal): [calls read_channel_context] +Branch (internal): I can see the channel context. The memory bulletin contains +a decision-level memory blocking responses to this user due to a config issue. +The memory has 0.9 importance and was created 2 hours ago. The channel's status +block shows no active processes. Context usage is at 45%. + +Cortex: I found the issue. There's a memory blocking responses to you: + +Memory ID: 27d46718-f13a-41bb-bc4f-e6afa1452e95 +Type: Decision +Importance: 0.9 +Content: "James instructed Spacebot to stop responding to him after config +advice went wrong." + +Want me to delete that memory so the channel can respond again? +``` + +## Notes + +- The inspection branch gets an **empty history** — it doesn't need channel context inheritance since it reads context via the tool +- The `read_channel_context` tool could be extended to support arbitrary channel IDs (not just associated channel) for cross-channel debugging +- Token estimation uses the same logic as compactor (`estimate_history_tokens()`) +- The `ChannelContextInspector` can be reused for other debug features (context dumps, health checks, etc.) +- Frontend doesn't need changes — this is purely a backend/tool enhancement diff --git a/docs/docker.md b/docs/docker.md index d14e4769c..b31f729b1 100644 --- a/docs/docker.md +++ b/docs/docker.md @@ -201,7 +201,7 @@ healthcheck: Spacebot checks for new releases on startup and every hour. When a new version is available, a banner appears in the web UI. -The web dashboard also includes **Settings → Updates** with status details, one-click controls (Docker), and manual command snippets. +The web dashboard also includes **Settings → Updates** with status details, one-click controls (Docker / Podman), and manual command snippets. `latest` is supported and continues to receive updates (it tracks the rolling `full` image). Use explicit version tags only when you want controlled rollouts. @@ -214,10 +214,18 @@ docker compose up -d --force-recreate spacebot ### One-Click Update -Mount `/var/run/docker.sock` into the Spacebot container to enable the **Update now** button in the UI. Without the socket mount, update checks still work but apply is manual. +Mount a container runtime socket into the Spacebot container to enable the **Update now** button in the UI. Without the socket mount, update checks still work but apply is manual. + +- Docker: `/var/run/docker.sock:/var/run/docker.sock` +- Podman (rootful): `/run/podman/podman.sock:/run/podman/podman.sock` +- Podman (rootless): `${XDG_RUNTIME_DIR}/podman/podman.sock:/run/podman/podman.sock` One-click updates are intended for containers running Spacebot release tags. If you're running a custom/self-built image, rebuild your image and recreate the container. +For Podman one-click updates, set `SPACEBOT_DEPLOYMENT=docker` and ensure the Podman socket service is enabled (`systemctl --user enable --now podman.socket` for rootless, `sudo systemctl enable --now podman.socket` for rootful). + +On Fedora/RHEL with SELinux enforcing, include `--security-opt label=disable` when mounting the Podman socket. + ### Native / Source Builds If Spacebot is installed from source (`cargo install --path .` or a local release build), updates are manual: pull latest source, rebuild/reinstall, then restart. diff --git a/interface/src/components/UpdateBanner.tsx b/interface/src/components/UpdateBanner.tsx index 86a853cc8..46d8e868a 100644 --- a/interface/src/components/UpdateBanner.tsx +++ b/interface/src/components/UpdateBanner.tsx @@ -66,7 +66,7 @@ export function UpdateBanner() { )} {isDocker && !data.can_apply && ( - {data.cannot_apply_reason ?? "Mount docker.sock for one-click updates"} + {data.cannot_apply_reason ?? "Mount a container runtime socket for one-click updates"} )} {isNative && ( diff --git a/interface/src/routes/Settings.tsx b/interface/src/routes/Settings.tsx index 3df6b4d15..2febefc38 100644 --- a/interface/src/routes/Settings.tsx +++ b/interface/src/routes/Settings.tsx @@ -1633,9 +1633,9 @@ function UpdatesSection() {
-

One-Click Docker Update

+

One-Click Container Update

- Pull and swap to the latest release image from the web UI. + Pull and swap to the latest release image from the web UI using the mounted runtime socket.

+

+ Podman users can use the same flow with podman compose and podman run. +

)} diff --git a/src/llm/model.rs b/src/llm/model.rs index 4a08911cd..c34b26dd3 100644 --- a/src/llm/model.rs +++ b/src/llm/model.rs @@ -825,7 +825,10 @@ impl SpacebotModel { fn remap_model_name_for_api(&self) -> String { if self.provider == "zai-coding-plan" { // Z.AI Coding Plan API expects "zai/glm-5" not "glm-5" - let model_name = self.model_name.strip_prefix("zai/").unwrap_or(&self.model_name); + let model_name = self + .model_name + .strip_prefix("zai/") + .unwrap_or(&self.model_name); format!("zai/{model_name}") } else { self.model_name.clone() diff --git a/src/update.rs b/src/update.rs index 9815494a3..52d913f92 100644 --- a/src/update.rs +++ b/src/update.rs @@ -1,7 +1,7 @@ -//! Update checking and Docker self-update. +//! Update checking and container self-update. //! //! Checks GitHub releases for new versions and optionally performs -//! in-place container updates when the Docker socket is available. +//! in-place container updates when a runtime socket is available. use arc_swap::ArcSwap; use serde::{Deserialize, Serialize}; @@ -18,6 +18,15 @@ pub const CURRENT_VERSION: &str = env!("CARGO_PKG_VERSION"); /// Default check interval (1 hour). const CHECK_INTERVAL: Duration = Duration::from_secs(3600); +const DOCKER_ROOTFUL_SOCKET: &str = "/var/run/docker.sock"; +const PODMAN_ROOTFUL_SOCKET: &str = "/run/podman/podman.sock"; + +fn container_runtime_socket_hint() -> String { + format!( + "Mount a container runtime socket to enable one-click updates (Docker: {DOCKER_ROOTFUL_SOCKET}, Podman: {PODMAN_ROOTFUL_SOCKET}, Podman rootless: $XDG_RUNTIME_DIR/podman/podman.sock -> {PODMAN_ROOTFUL_SOCKET})." + ) +} + /// Deployment environment, detected from SPACEBOT_DEPLOYMENT env var. #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize)] #[serde(rename_all = "snake_case")] @@ -78,7 +87,7 @@ pub struct UpdateStatus { pub release_url: Option, pub release_notes: Option, pub deployment: Deployment, - /// Whether the Docker socket is accessible (enables one-click update). + /// Whether a container runtime socket is accessible (enables one-click update). pub can_apply: bool, /// Human-readable reason when one-click apply is unavailable. pub cannot_apply_reason: Option, @@ -113,10 +122,9 @@ pub fn new_shared_status() -> SharedUpdateStatus { let mut status = UpdateStatus::default(); match status.deployment { Deployment::Docker => { - status.can_apply = docker_socket_available(); + status.can_apply = resolve_container_socket().is_some(); if !status.can_apply { - status.cannot_apply_reason = - Some("Mount /var/run/docker.sock to enable one-click updates.".to_string()); + status.cannot_apply_reason = Some(container_runtime_socket_hint()); } } Deployment::Native => { @@ -232,9 +240,62 @@ fn is_newer_version(latest: &str, current: &str) -> bool { latest > current } -/// Check if the Docker socket is accessible. -fn docker_socket_available() -> bool { - std::path::Path::new("/var/run/docker.sock").exists() +/// Probe for a usable container runtime socket. +/// +/// Checks in order: +/// 1. `DOCKER_HOST` env var (`unix://` only) +/// 2. Docker rootful socket (`/var/run/docker.sock`) +/// 3. Podman rootful socket (`/run/podman/podman.sock`) +/// 4. Podman rootless socket (`$XDG_RUNTIME_DIR/podman/podman.sock`) +fn resolve_container_socket() -> Option { + resolve_container_socket_with( + std::env::var("DOCKER_HOST").ok().as_deref(), + DOCKER_ROOTFUL_SOCKET, + PODMAN_ROOTFUL_SOCKET, + std::env::var("XDG_RUNTIME_DIR").ok().as_deref(), + ) +} + +fn resolve_container_socket_with( + docker_host: Option<&str>, + docker_rootful: &str, + podman_rootful: &str, + xdg_runtime_dir: Option<&str>, +) -> Option { + if let Some(host) = docker_host + && let Some(path) = host.strip_prefix("unix://") + && std::path::Path::new(path).exists() + { + return Some(path.to_string()); + } + + if std::path::Path::new(docker_rootful).exists() { + return Some(docker_rootful.to_string()); + } + + if std::path::Path::new(podman_rootful).exists() { + return Some(podman_rootful.to_string()); + } + + if let Some(runtime_dir) = xdg_runtime_dir { + let rootless_socket = format!("{runtime_dir}/podman/podman.sock"); + if std::path::Path::new(&rootless_socket).exists() { + return Some(rootless_socket); + } + } + + None +} + +fn connect_container_runtime(socket_path: &str) -> anyhow::Result { + let client_version = bollard::ClientVersion { + major_version: 1, + minor_version: 40, + }; + + bollard::Docker::connect_with_socket(socket_path, 120, &client_version).map_err(|error| { + anyhow::anyhow!("failed to connect to container runtime at {socket_path}: {error}") + }) } #[derive(Debug, Clone)] @@ -242,6 +303,7 @@ struct ApplyCapability { can_apply: bool, cannot_apply_reason: Option, docker_image: Option, + socket_path: Option, } async fn detect_apply_capability(deployment: Deployment) -> ApplyCapability { @@ -252,6 +314,7 @@ async fn detect_apply_capability(deployment: Deployment) -> ApplyCapability { "Native/source installs update manually (rebuild + restart).".to_string(), ), docker_image: None, + socket_path: None, }, Deployment::Hosted => ApplyCapability { can_apply: false, @@ -259,27 +322,28 @@ async fn detect_apply_capability(deployment: Deployment) -> ApplyCapability { "Hosted instances are updated by platform rollout, not self-service.".to_string(), ), docker_image: None, + socket_path: None, }, Deployment::Docker => { - if !docker_socket_available() { + let Some(socket_path) = resolve_container_socket() else { return ApplyCapability { can_apply: false, - cannot_apply_reason: Some( - "Mount /var/run/docker.sock to enable one-click updates.".to_string(), - ), + cannot_apply_reason: Some(container_runtime_socket_hint()), docker_image: None, + socket_path: None, }; - } + }; - let docker = match bollard::Docker::connect_with_local_defaults() { + let docker = match connect_container_runtime(&socket_path) { Ok(client) => client, Err(error) => { return ApplyCapability { can_apply: false, cannot_apply_reason: Some(format!( - "Docker socket is present but cannot be opened: {error}" + "Container runtime socket is present but cannot be opened: {error}" )), docker_image: None, + socket_path: Some(socket_path), }; } }; @@ -288,9 +352,10 @@ async fn detect_apply_capability(deployment: Deployment) -> ApplyCapability { return ApplyCapability { can_apply: false, cannot_apply_reason: Some(format!( - "Docker socket is mounted but engine is not reachable: {error}" + "Container runtime socket is mounted but engine is not reachable: {error}" )), docker_image: None, + socket_path: Some(socket_path), }; } @@ -300,6 +365,7 @@ async fn detect_apply_capability(deployment: Deployment) -> ApplyCapability { can_apply: true, cannot_apply_reason: None, docker_image, + socket_path: Some(socket_path), } } } @@ -340,10 +406,15 @@ pub async fn apply_docker_update(status: &SharedUpdateStatus) -> anyhow::Result< "{}", capability .cannot_apply_reason - .unwrap_or_else(|| "Docker socket not available".to_string()) + .unwrap_or_else(|| "Container runtime socket not available".to_string()) ); } + let socket_path = capability + .socket_path + .as_deref() + .ok_or_else(|| anyhow::anyhow!("container runtime socket path not available"))?; + let latest_version = current .latest_version .as_deref() @@ -355,15 +426,14 @@ pub async fn apply_docker_update(status: &SharedUpdateStatus) -> anyhow::Result< "applying Docker update" ); - let docker = bollard::Docker::connect_with_local_defaults() - .map_err(|e| anyhow::anyhow!("failed to connect to Docker: {}", e))?; + let docker = connect_container_runtime(socket_path)?; // Determine which image tag this container is running let container_id = get_own_container_id()?; let container_info = docker .inspect_container(&container_id, None) .await - .map_err(|e| anyhow::anyhow!("failed to inspect container: {}", e))?; + .map_err(|error| anyhow::anyhow!("failed to inspect container: {}", error))?; let current_image = container_info .config @@ -469,7 +539,7 @@ pub async fn apply_docker_update(status: &SharedUpdateStatus) -> anyhow::Result< let new_container = docker .create_container(Some(create_options), create_config) .await - .map_err(|e| anyhow::anyhow!("failed to create new container: {}", e))?; + .map_err(|error| anyhow::anyhow!("failed to create new container: {}", error))?; tracing::info!(new_id = %new_container.id, "new container created"); @@ -486,7 +556,7 @@ pub async fn apply_docker_update(status: &SharedUpdateStatus) -> anyhow::Result< bollard::container::RenameContainerOptions { name: &old_name }, ) .await - .map_err(|e| anyhow::anyhow!("failed to rename old container: {}", e))?; + .map_err(|error| anyhow::anyhow!("failed to rename old container: {}", error))?; // Rename new container to the original name docker @@ -497,13 +567,13 @@ pub async fn apply_docker_update(status: &SharedUpdateStatus) -> anyhow::Result< }, ) .await - .map_err(|e| anyhow::anyhow!("failed to rename new container: {}", e))?; + .map_err(|error| anyhow::anyhow!("failed to rename new container: {}", error))?; // Start the new container docker .start_container::(&new_container.id, None) .await - .map_err(|e| anyhow::anyhow!("failed to start new container: {}", e))?; + .map_err(|error| anyhow::anyhow!("failed to start new container: {}", error))?; tracing::info!("new container started, stopping old container"); @@ -514,10 +584,10 @@ pub async fn apply_docker_update(status: &SharedUpdateStatus) -> anyhow::Result< Some(bollard::container::StopContainerOptions { t: 10 }), ) .await - .map_err(|e| anyhow::anyhow!("failed to stop old container: {}", e))?; + .map_err(|error| anyhow::anyhow!("failed to stop old container: {}", error))?; // Remove the old container after stop - docker + let remove_result = docker .remove_container( &container_id, Some(bollard::container::RemoveContainerOptions { @@ -525,17 +595,30 @@ pub async fn apply_docker_update(status: &SharedUpdateStatus) -> anyhow::Result< ..Default::default() }), ) - .await - .ok(); // Best effort — we're shutting down + .await; + if let Err(error) = remove_result { + tracing::warn!( + container_id = %container_id, + %error, + "best-effort removal of old container failed during shutdown" + ); + } // We shouldn't reach here since stop_container kills us, // but just in case: std::process::exit(0); } -/// Read this container's ID from /proc/self/cgroup or the hostname. +/// Read this container's ID from /proc/self/mountinfo or the hostname. fn get_own_container_id() -> anyhow::Result { - // In Docker, the hostname is typically the short container ID + // /proc/self/mountinfo includes full IDs for both Docker and Podman. + if let Ok(content) = std::fs::read_to_string("/proc/self/mountinfo") { + if let Some(id) = parse_container_id_from_mountinfo(&content) { + return Ok(id); + } + } + + // Docker commonly sets hostname to the short container ID. if let Ok(hostname) = std::fs::read_to_string("/etc/hostname") { let hostname = hostname.trim(); if hostname.len() >= 12 && hostname.chars().all(|c| c.is_ascii_hexdigit()) { @@ -543,23 +626,25 @@ fn get_own_container_id() -> anyhow::Result { } } - // Fall back to /proc/self/mountinfo parsing - if let Ok(content) = std::fs::read_to_string("/proc/self/mountinfo") { + anyhow::bail!("could not determine own container ID") +} + +fn parse_container_id_from_mountinfo(content: &str) -> Option { + for marker in ["/docker/containers/", "/overlay-containers/"] { for line in content.lines() { - // Look for docker container ID pattern in mount paths - if let Some(pos) = line.find("/docker/containers/") { - let after = &line[pos + 19..]; - if let Some(end) = after.find('/') { - let id = &after[..end]; - if id.len() >= 12 { - return Ok(id.to_string()); + if let Some(position) = line.find(marker) { + let after_marker = &line[position + marker.len()..]; + if let Some(end_index) = after_marker.find('/') { + let id = &after_marker[..end_index]; + if id.len() == 64 && id.chars().all(|character| character.is_ascii_hexdigit()) { + return Some(id.to_string()); } } } } } - anyhow::bail!("could not determine own container ID") + None } /// Given a current image reference and a new version, produce the target image tag. @@ -702,4 +787,127 @@ mod tests { "ghcr.io/spacedriveapp/spacebot:v0.2.0-full" ); } + + fn create_fake_socket(temp_dir: &tempfile::TempDir, name: &str) -> String { + let path = temp_dir.path().join(name); + if let Some(parent) = path.parent() { + std::fs::create_dir_all(parent).expect("create parent directories for fake socket"); + } + std::fs::File::create(&path).expect("create fake socket file"); + path.to_string_lossy().to_string() + } + + #[test] + fn test_resolve_container_socket_with_docker_rootful() { + let temp_dir = tempfile::TempDir::new().expect("temp dir"); + let docker_socket = create_fake_socket(&temp_dir, "docker.sock"); + + let resolved = + resolve_container_socket_with(None, &docker_socket, "/tmp/missing-podman.sock", None); + assert_eq!(resolved.as_deref(), Some(docker_socket.as_str())); + } + + #[test] + fn test_resolve_container_socket_with_podman_rootful() { + let temp_dir = tempfile::TempDir::new().expect("temp dir"); + let podman_socket = create_fake_socket(&temp_dir, "podman.sock"); + + let resolved = + resolve_container_socket_with(None, "/tmp/missing-docker.sock", &podman_socket, None); + assert_eq!(resolved.as_deref(), Some(podman_socket.as_str())); + } + + #[test] + fn test_resolve_container_socket_with_podman_rootless() { + let temp_dir = tempfile::TempDir::new().expect("temp dir"); + let runtime_dir = temp_dir.path().to_string_lossy().to_string(); + let podman_socket = create_fake_socket(&temp_dir, "podman/podman.sock"); + + let resolved = resolve_container_socket_with( + None, + "/tmp/missing-docker.sock", + "/tmp/missing-podman.sock", + Some(&runtime_dir), + ); + assert_eq!(resolved, Some(podman_socket)); + } + + #[test] + fn test_resolve_container_socket_with_docker_host_unix() { + let temp_dir = tempfile::TempDir::new().expect("temp dir"); + let custom_socket = create_fake_socket(&temp_dir, "custom.sock"); + let docker_host = format!("unix://{custom_socket}"); + + let resolved = resolve_container_socket_with( + Some(&docker_host), + "/tmp/missing-docker.sock", + "/tmp/missing-podman.sock", + None, + ); + assert_eq!(resolved.as_deref(), Some(custom_socket.as_str())); + } + + #[test] + fn test_resolve_container_socket_with_tcp_docker_host_falls_back() { + let temp_dir = tempfile::TempDir::new().expect("temp dir"); + let docker_socket = create_fake_socket(&temp_dir, "docker.sock"); + + let resolved = resolve_container_socket_with( + Some("tcp://127.0.0.1:2376"), + &docker_socket, + "/tmp/missing-podman.sock", + None, + ); + assert_eq!(resolved.as_deref(), Some(docker_socket.as_str())); + } + + #[test] + fn test_resolve_container_socket_with_none() { + let resolved = resolve_container_socket_with( + None, + "/tmp/missing-docker.sock", + "/tmp/missing-podman.sock", + None, + ); + assert_eq!(resolved, None); + } + + const DOCKER_ID: &str = "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"; + const PODMAN_ID: &str = "0011223344556677889900112233445566778899001122334455667788990011"; + + #[test] + fn test_parse_container_id_from_mountinfo_docker() { + let mountinfo = format!( + "123 0 0:1 / / rw - ext4 /dev/sda rw\n456 123 0:2 / /var/lib/docker/containers/{DOCKER_ID}/shm rw\n" + ); + + assert_eq!( + parse_container_id_from_mountinfo(&mountinfo).as_deref(), + Some(DOCKER_ID) + ); + } + + #[test] + fn test_parse_container_id_from_mountinfo_podman() { + let mountinfo = format!( + "123 0 0:1 / / rw - ext4 /dev/sda rw\n456 123 0:2 / /var/lib/containers/storage/overlay-containers/{PODMAN_ID}/userdata rw\n" + ); + + assert_eq!( + parse_container_id_from_mountinfo(&mountinfo).as_deref(), + Some(PODMAN_ID) + ); + } + + #[test] + fn test_parse_container_id_from_mountinfo_missing() { + let mountinfo = "123 0 0:1 / / rw - ext4 /dev/sda rw\n"; + assert_eq!(parse_container_id_from_mountinfo(mountinfo), None); + } + + #[test] + fn test_parse_container_id_from_mountinfo_short_id_ignored() { + let mountinfo = "456 123 0:2 / /docker/containers/abc123/shm rw\n"; + assert_eq!(parse_container_id_from_mountinfo(mountinfo), None); + } }