From e4a815818e55c383d6958612bae27c1664e74bb5 Mon Sep 17 00:00:00 2001
From: Jeff Mesnil <jmesnil@redhat.com>
Date: Thu, 29 Jan 2026 10:01:26 +0100
Subject: [PATCH 1/6] chore: Add PRD to improve performance

The insights-runtime-extractor must be able to scan only a
subset of containers (requested by the Insights Operator)
instead of collecting data for *all* containers when the
Insights Operator has a hard limit on number of containers
that are gathered.

This addresses https://issues.redhat.com/browse/CCXDEV-15960.

Signed-off-by: Jeff Mesnil <jmesnil@redhat.com>
---
 docs/prd/prd-0001-container-ids-filtering.md | 401 +++++++++++++++++++
 1 file changed, 401 insertions(+)
 create mode 100644 docs/prd/prd-0001-container-ids-filtering.md
diff --git a/docs/prd/prd-0001-container-ids-filtering.md b/docs/prd/prd-0001-container-ids-filtering.md
new file mode 100644
index 00000000..076ebc32
--- /dev/null
+++ b/docs/prd/prd-0001-container-ids-filtering.md
@@ -0,0 +1,401 @@
+# PRD: Container IDs Filtering for Runtime Extraction
+
+## Overview
+
+This document describes the changes required to support targeted container scanning via a list of container IDs. Currently, the system either scans all running containers or a single container. This enhancement will allow clients to request scanning of a specific subset of containers by providing their IDs in a POST request.
+
+## Motivation
+
+- **Performance**: Scanning only specific containers reduces resource usage and response time.
+
+---
+
+## Current Architecture
+
+### Request Flow
+```
+HTTP Client → Exporter (Go) → Extractor Server (Rust) → Coordinator (Rust) → Fingerprints
+             GET /gather_runtime_info    TCP 3000          /coordinator
+```
+
+### Current Capabilities
+- **Exporter**: Only supports `GET /gather_runtime_info?hash=true|false`
+- **Extractor Server**: Receives empty TCP payload, spawns coordinator
+- **Coordinator**: Accepts optional single `--container-id <id>` argument (not used by the extractor server)
+
+---
+
+## Proposed Changes
+
+### 1. Exporter HTTP Server (Go)
+
+**File**: `exporter/cmd/exporter/main.go`
+
+#### New Endpoint
+Add a new HTTP POST endpoint alongside the existing GET endpoint:
+
+| Method | Path | Request Body | Description |
+|--------|------|--------------|-------------|
+| POST | `/gather_runtime_info` | JSON object with container IDs | Scan specific containers |
+
+#### Request Format
+```json
+{
+  "containerIds": ["abc123...", "def456...", "ghi789..."]
+}
+```
+
+#### Validation Rules
+- Request body must be valid JSON
+- `containerIds` field must be present and be an array
+- Array can be empty (equivalent to scanning all containers)
+- Each container ID must be a non-empty string
+- Container IDs should be the short or full CRI-O container ID format
+
+#### Changes Required
+
+1. **Add JSON request struct**:
+   ```go
+   type GatherRuntimeInfoRequest struct {
+       ContainerIds []string `json:"containerIds"`
+   }
+   ```
+
+2. **Update HTTP handler**:
+   - Parse request method (GET vs POST)
+   - For POST: parse JSON body and validate
+   - Pass container IDs to `triggerRuntimeInfoExtraction()`
+
+3. **Update `triggerRuntimeInfoExtraction()` function**:
+   - Add parameter: `containerIds []string`
+   - Serialize container IDs as a single string of comma-separated container IDs before sending over TCP
+   - Send string payload instead of empty payload to extractor server
+
+4. **TCP Protocol Change**:
+   - Current: Empty payload triggers extraction
+   - New: String payload with a list of comma-separated container IDs
+   ```
+   "abc123,def456,..."
+   ```
+   - Empty string means "scan all containers"
+
+#### Backward Compatibility
+- GET endpoint behavior remains unchanged (scans all containers)
+- GET internally sends an empty string `""` to maintain protocol consistency
+
+---
+
+### 2. Extractor Server (Rust)
+
+**File**: `extractor/src/bin/extractor_server.rs`
+
+#### Changes Required
+
+1. **Update `handle_trigger_extraction()` function**:
+   - Read incoming data from TCP stream (currently ignored) as a String
+   - Pas the incoming data to the CLI argument if not empty. Wrap them in a String.
+   
+2. **Update coordinator invocation**:
+   - Current:
+     ```rust
+     Command::new("/coordinator")
+         .arg("--log-level")
+         .arg(log_level)
+         .output()
+     ```
+   - New (when container IDs provided):
+     ```rust
+     Command::new("/coordinator")
+         .arg("--log-level")
+         .arg(log_level)
+         .arg(container_ids)
+         .output()
+     ```
+
+3. **Error Handling**:
+   - Invalid String: Return error response via TCP
+   - Log validation errors at warn level
+
+#### TCP Protocol
+
+| Direction | Format | Example |
+|-----------|--------|---------|
+| Request | Plain text | `abc123,def456` |
+| Response | Plain text (path) | `data/out-1234567890` |
+
+---
+
+### 3. Coordinator (Rust)
+
+**File**: `extractor/src/bin/coordinator.rs`
+
+#### Current Arguments
+```rust
+#[arg(short, long)]
+log_level: Option<String>
+
+#[arg(help = "ID of the container to scan. If absent, all containers are scanned")]
+container_id: Option<String>
+```
+
+#### New Arguments
+```rust
+#[arg(short, long)]
+log_level: Option<String>
+
+#[arg(help = "Comma-separated list of container IDs to scan")]
+container_ids: Option<String>
+```
+
+#### Changes Required
+
+1. **Add new CLI argument**:
+   - Change `container_id` to `container_ids` so that multiple containers ID can be passed (separated by commas)
+   - Parse them into `Vec<String>`
+
+2. **Update container filtering logic**:
+
+   - Current location in coordinator.rs `main()`:
+     ```rust
+      let containers = match args.container_id {
+        None => get_containers(),
+        Some(container_id) => match get_container(&container_id) {
+            Some(container) => vec![container],
+            None => vec![],
+        },
+    };
+     ```
+   - New logic:
+     ```rust
+      let containers = match args.container_ids {         
+        None => get_containers(),
+        Some(container_ids) => {
+         // split them from commas,
+         // iterate on them: for each trimmed non-empty container ID,
+         // call get_container(container_id)
+         // and collect them in a Vector
+        },
+      };
+
+     ```
+3. **Update container information retrieval**:
+
+   - Update `get_container(container_id: &String)` to invoke `crictl inspect $container_id` instead of `crictl ps`. 
+
+4. **Logging**:
+   - Log at info level: "Scanning X containers"
+
+---
+
+## Data Flow Diagram
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                              HTTP Client                                    │
+└─────────────────────────────────┬───────────────────────────────────────────┘
+                                  │
+                POST /gather_runtime_info
+                Body: {"containerIds": ["abc123", "def456"]}
+                                  │
+                                  ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                        EXPORTER (Go) - Port 8000                            │
+│                                                                             │
+│  1. Parse POST body as JSON                                                 │
+│  2. Validate containerIds array                                             │
+│  3. Send comma-separated containerIds to extractor_server via TCP           │
+└─────────────────────────────────┬───────────────────────────────────────────┘
+                                  │
+                    TCP: abc123,def456
+                                  │
+                                  ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    EXTRACTOR_SERVER (Rust) - Port 3000                      │
+│                                                                             │
+│  1. Read String from TCP stream                                             │
+│  2. Execute: /coordinator --log-level info "abc123,def456"                  │
+└─────────────────────────────────┬───────────────────────────────────────────┘
+                                  │
+                       Subprocess with args
+                                  │
+                                  ▼
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         COORDINATOR (Rust)                                  │
+│                                                                             │
+│  1. Parse executable argument                                               │
+│  2. Get all running containers via crictl                                   │
+│  4. Scan filtered containers                                                │
+│  5. Output path to stdout                                                   │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## API Specification
+
+### POST /gather_runtime_info
+
+#### Request
+
+**Headers**:
+```
+Content-Type: application/json
+```
+
+**Body**:
+```json
+{
+  "containerIds": ["container-id-1", "container-id-2"]
+}
+```
+
+**Field Descriptions**:
+| Field | Type | Required | Description |
+|-------|------|----------|-------------|
+| containerIds | string[] | Yes | List of container IDs to scan. Empty array scans all containers. |
+
+#### Response
+
+Same as existing GET endpoint response:
+
+```json
+{
+  "node-name": {
+    "namespace-1": {
+      "pod-name-1": {
+        "container-name-1": {
+          "os": "rhel",
+          "osVersion": "8.9",
+          "kind": "Java",
+          "kindVersion": "17.0.9",
+          "kindImplementer": "Red Hat, Inc.",
+          "runtimes": [
+            {"name": "Quarkus", "version": "3.2.0"}
+          ]
+        }
+      }
+    }
+  }
+}
+```
+
+#### Error Responses
+
+| Status | Condition | Response Body |
+|--------|-----------|---------------|
+| 400 | Invalid JSON | `{"error": "Invalid JSON in request body"}` |
+| 400 | Missing containerIds | `{"error": "containerIds field is required"}` |
+| 400 | Invalid container ID format | `{"error": "Container ID at index N is invalid"}` |
+| 500 | Extraction failed | `{"error": "Runtime extraction failed: <details>"}` |
+
+---
+
+## Testing Requirements
+
+### Unit Tests
+
+1. **Exporter**:
+   - JSON parsing of valid request body
+   - Validation of container ID formats
+   - Empty array handling
+   - Malformed JSON rejection
+
+2. **Extractor Server**:
+   - JSON deserialization from TCP stream
+   - Container ID validation
+   - Command argument construction
+
+3. **Coordinator**:
+   - Comma-separated parsing of the main argument
+   - Container filtering logic
+
+### Integration Tests
+
+1. **End-to-end with specific containers**:
+   - POST request with known container IDs
+   - Verify only those containers appear in response
+
+2. **Empty array behavior**:
+   - POST with `{"containerIds": []}` should scan all containers
+   - Result should match GET endpoint
+
+3. **Non-existent container IDs**:
+   - POST with IDs that don't match running containers
+   - Should return empty/partial results gracefully
+
+4. **Mixed valid/invalid IDs**:
+   - Some IDs exist, some don't
+   - Should scan existing ones, log warnings for missing
+
+### E2E Tests
+
+Add test cases to existing e2e test suite in `runtime-samples/`:
+- Test POST endpoint with subset of containers
+- Verify filtering works correctly in Kubernetes environment
+
+---
+
+## Implementation Phases
+
+### Phase 1: Coordinator Changes
+1. Change main argument `container_ids` to `container_id`
+2. Implement container filtering logic
+3. Add unit tests
+4. Manual testing with direct coordinator invocation
+
+### Phase 2: Extractor Server Changes
+1. Update coordinator invocation
+2. Add error handling
+3. Add unit tests
+
+### Phase 3: Exporter Changes
+1. Add POST endpoint handler
+2. Implement JSON validation
+3. Update TCP communication
+4. Add unit tests
+
+### Phase 4: Integration & E2E
+1. Integration tests
+2. E2E tests in runtime-samples
+3. Documentation updates
+
+---
+
+## Backward Compatibility
+
+| Component | Backward Compatible | Notes |
+|-----------|---------------------|-------|
+| Exporter | Yes | GET endpoint unchanged |
+| Extractor Server | Yes | Empty payload treated as empty containerIds |
+| Coordinator | Yes | passing a singular container ID still works |
+
+---
+
+## Security Considerations
+
+- No new privileges required
+- No shell injection risk (IDs passed as single argument, not executed)
+- Logging does not expose sensitive data
+
+---
+
+## Open Questions
+
+1. **Container ID format**: Should we support both short (12-char) and full (64-char) container IDs?
+   - Recommendation: Yes, support both with prefix matching
+
+2. **Maximum number of container IDs**: Should there be a limit?
+   - Recommendation: No hard limit, but document performance implications
+
+3. **Response for non-existent IDs**: Should we return an error or just skip missing containers?
+   - Recommendation: Skip missing, return results for found containers, log warnings
+
+---
+
+## Files to Modify
+
+| File | Changes |
+|------|---------|
+| `exporter/cmd/exporter/main.go` | Add POST handler, JSON parsing, update TCP payload |
+| `extractor/src/bin/extractor_server.rs` | Add JSON deserialization, update coordinator args |
+| `extractor/src/bin/coordinator.rs` | Change main argument, filtering logic |

From cefcfea6e58916362a5306b44e110127fb1cded2 Mon Sep 17 00:00:00 2001
From: Jeff Mesnil <jmesnil@redhat.com>
Date: Thu, 29 Jan 2026 11:28:46 +0100
Subject: [PATCH 2/6] chore: Bump to use ocp/4.22 base images

Signed-off-by: Jeff Mesnil <jmesnil@redhat.com>
---
 .ci-operator.yaml       | 2 +-
 Containerfile-exporter  | 4 ++--
 Containerfile-extractor | 6 +++---
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/.ci-operator.yaml b/.ci-operator.yaml
index e307e5af..284a9100 100644
--- a/.ci-operator.yaml
+++ b/.ci-operator.yaml
@@ -1,4 +1,4 @@
 build_root_image:
   name: release
   namespace: openshift
-  tag: rhel-9-release-golang-1.24-openshift-4.21
+  tag: rhel-9-release-golang-1.24-openshift-4.22
diff --git a/Containerfile-exporter b/Containerfile-exporter
index 26047258..f8e0f03d 100644
--- a/Containerfile-exporter
+++ b/Containerfile-exporter
@@ -1,4 +1,4 @@
-FROM registry.ci.openshift.org/ocp/builder:rhel-9-golang-1.24-openshift-4.21 AS go-builder
+FROM registry.ci.openshift.org/ocp/builder:rhel-9-golang-1.24-openshift-4.22 AS go-builder
 
 WORKDIR /workspace/exporter
 COPY exporter .
@@ -6,7 +6,7 @@ ARG GO_LDFLAGS=""
 ENV GOEXPERIMENT=strictfipsruntime
 RUN CGO_ENABLED=0 GOOS=linux GO111MODULE=on make build
 
-FROM registry.ci.openshift.org/ocp/4.21:base-rhel9
+FROM registry.ci.openshift.org/ocp/4.22:base-rhel9
 
 COPY --from=go-builder /workspace/exporter/bin/exporter /
 ENTRYPOINT [ "/exporter" ]
\ No newline at end of file
diff --git a/Containerfile-extractor b/Containerfile-extractor
index b186ca43..18734261 100644
--- a/Containerfile-extractor
+++ b/Containerfile-extractor
@@ -1,4 +1,4 @@
-FROM registry.ci.openshift.org/ocp/4.21:base-rhel9 AS rust-builder
+FROM registry.ci.openshift.org/ocp/4.22:base-rhel9 AS rust-builder
 ARG TARGETARCH
 
 RUN dnf update -y && dnf -y install \
@@ -8,7 +8,7 @@ WORKDIR /workspace/extractor
 COPY extractor .
 RUN make TARGETARCH=${TARGETARCH}
 
-FROM registry.ci.openshift.org/ocp/builder:rhel-9-golang-1.24-openshift-4.21 AS go-builder
+FROM registry.ci.openshift.org/ocp/builder:rhel-9-golang-1.24-openshift-4.22 AS go-builder
 
 WORKDIR /workspace/fingerprints
 COPY fingerprints .
@@ -16,7 +16,7 @@ ARG GO_LDFLAGS=""
 ENV GOEXPERIMENT=strictfipsruntime
 RUN CGO_ENABLED=0 GOOS=linux GO111MODULE=on make build
 
-FROM registry.ci.openshift.org/ocp/4.21:base-rhel9
+FROM registry.ci.openshift.org/ocp/4.22:base-rhel9
 
 RUN dnf update -y && dnf -y install \
     cri-tools

From 6498edb0f2d4e64456f76902ea514e220bdf3dee Mon Sep 17 00:00:00 2001
From: Jeff Mesnil <jmesnil@redhat.com>
Date: Fri, 30 Jan 2026 11:58:00 +0100
Subject: [PATCH 3/6] feat: Update coordinator to scan a list of containers

The coordinator executable now accepts a comma-separated list
of container ID that needs to be collected.
If this list is empty, *all* running containers are collected.

Signed-off-by: Jeff Mesnil <jmesnil@redhat.com>
---
 extractor/src/bin/coordinator.rs              | 28 +++++---
 extractor/src/insights_runtime_extractor.rs   |  2 +-
 .../insights_runtime_extractor/container.rs   | 66 ++++---------------
 extractor/src/lib.rs                          |  1 -
 4 files changed, 32 insertions(+), 65 deletions(-)

diff --git a/extractor/src/bin/coordinator.rs b/extractor/src/bin/coordinator.rs
index f82d1aac..83c77f1f 100644
--- a/extractor/src/bin/coordinator.rs
+++ b/extractor/src/bin/coordinator.rs
@@ -2,7 +2,7 @@ use clap::Parser;
 use log::info;
 use std::time::{SystemTime, UNIX_EPOCH};
 
-use insights_runtime_extractor::{config, file, get_container, get_containers, perms};
+use insights_runtime_extractor::{config, file, get_containers, perms};
 
 #[derive(Parser, Debug)]
 #[command(about, long_about = None)]
@@ -14,8 +14,10 @@ struct Args {
     )]
     log_level: Option<String>,
 
-    #[arg(help = "ID of the container to scan. If absent, all containers are scanned")]
-    container_id: Option<String>,
+    #[arg(
+        help = "Comma-separated list of container IDs to scan. If absent, all containers are scanned"
+    )]
+    container_ids: Option<String>,
 }
 
 fn main() {
@@ -43,13 +45,19 @@ fn main() {
         &exec_dir
     );
 
-    let containers = match args.container_id {
-        None => get_containers(),
-        Some(container_id) => match get_container(&container_id) {
-            Some(container) => vec![container],
-            None => vec![],
-        },
-    };
+    let container_ids: Vec<String> = args
+        .container_ids
+        .map(|ids| {
+            ids.split(',')
+                .map(|id| id.trim().to_string())
+                .filter(|id| !id.is_empty())
+                .collect()
+        })
+        .unwrap_or_default();
+
+    let containers = get_containers(container_ids);
+
+    info!("Scanning {} containers", containers.len());
 
     for container in containers {
         info!(
diff --git a/extractor/src/insights_runtime_extractor.rs b/extractor/src/insights_runtime_extractor.rs
index acd049ec..e09be686 100644
--- a/extractor/src/insights_runtime_extractor.rs
+++ b/extractor/src/insights_runtime_extractor.rs
@@ -71,7 +71,7 @@ pub fn scan_container(config: &Config, out: &String, container: &Container) {
     let leaves = process::get_process_leaves(&root_pid);
 
     // fingerprint only the first process
-    info!("🔎  Fingerprinting {} processes...", leaves.len());
+    debug!("🔎  Fingerprinting {} processes...", leaves.len());
 
     if let Some(process) = leaves.get(0) {
         // create a directory to store this process' fingerprints
diff --git a/extractor/src/insights_runtime_extractor/container.rs b/extractor/src/insights_runtime_extractor/container.rs
index c7bf0a1a..1cd31f57 100644
--- a/extractor/src/insights_runtime_extractor/container.rs
+++ b/extractor/src/insights_runtime_extractor/container.rs
@@ -18,58 +18,13 @@ pub struct Container {
     pub pid: u32,
 }
 
-pub fn get_container(container_id: &String) -> Option<Container> {
-    info!("🔎  Reading container information with crictl...");
-
-    let output = Command::new("crictl")
-        .args(["ps", "-o", "json", "-s", "RUNNING"])
-        .output()
-        .expect("List containers with crictl");
-    let json = String::from_utf8(output.stdout).unwrap();
-
-    let deserialized_containers: Value = serde_json::from_str(&json).unwrap();
-
-    let container_id = match container_id.strip_prefix("cri-o://") {
-        Some(container_id) => container_id.to_string(),
-        None => container_id.to_string(),
-    };
-
-    for c in deserialized_containers["containers"].as_array().unwrap() {
-        let id = c["id"].as_str().unwrap().to_string();
-
-        match id == container_id {
-            false => {}
-            true => {
-                let pod_namespace = c["labels"]["io.kubernetes.pod.namespace"]
-                    .as_str()
-                    .unwrap()
-                    .to_string();
-                let image_ref = c["imageRef"].as_str().unwrap().to_string();
-                let name = c["labels"]["io.kubernetes.container.name"]
-                    .as_str()
-                    .unwrap()
-                    .to_string();
-                let pod_name = c["labels"]["io.kubernetes.pod.name"]
-                    .as_str()
-                    .unwrap()
-                    .to_string();
-                let pid: u32 = get_root_pid(&id);
-                return Some(Container {
-                    id: "cri-o://".to_owned() + &id,
-                    image_ref,
-                    name,
-                    pod_name,
-                    pod_namespace,
-                    pid,
-                });
-            }
-        }
-    }
-
-    None
-}
+pub fn get_containers(container_ids: Vec<String>) -> Vec<Container> {
+    // Normalize container_ids by stripping the cri-o:// prefix
+    let ids_to_collect: Vec<String> = container_ids
+        .iter()
+        .map(|id| id.strip_prefix("cri-o://").unwrap_or(id).to_string())
+        .collect();
 
-pub fn get_containers() -> Vec<Container> {
     info!("🔎  Reading container information with crictl...");
 
     let output = Command::new("crictl")
@@ -85,12 +40,17 @@ pub fn get_containers() -> Vec<Container> {
     let mut containers: Vec<Container> = Vec::new();
 
     for c in deserialized_containers["containers"].as_array().unwrap() {
+        let id = c["id"].as_str().unwrap().to_string();
+
+        // If ids_to_collect is not empty, skip containers not in the list
+        if !ids_to_collect.is_empty() && !ids_to_collect.contains(&id) {
+            continue;
+        }
+
         let pod_namespace = c["labels"]["io.kubernetes.pod.namespace"]
             .as_str()
             .unwrap()
             .to_string();
-
-        let id = c["id"].as_str().unwrap().to_string();
         let image_ref = c["imageRef"].as_str().unwrap().to_string();
         let name = c["labels"]["io.kubernetes.container.name"]
             .as_str()
diff --git a/extractor/src/lib.rs b/extractor/src/lib.rs
index d495e98a..5396f916 100644
--- a/extractor/src/lib.rs
+++ b/extractor/src/lib.rs
@@ -1,7 +1,6 @@
 mod insights_runtime_extractor;
 
 pub use crate::insights_runtime_extractor::config;
-pub use crate::insights_runtime_extractor::container::get_container;
 pub use crate::insights_runtime_extractor::container::get_containers;
 pub use crate::insights_runtime_extractor::file;
 pub use crate::insights_runtime_extractor::perms;

From d51bacaf4e7c1ed58b71862b6ed98981fa18efd7 Mon Sep 17 00:00:00 2001
From: Jeff Mesnil <jmesnil@redhat.com>
Date: Fri, 30 Jan 2026 13:33:57 +0100
Subject: [PATCH 4/6] feat: accept list of container ids from the extractor TCP
 server

When a TCP connection is created, read a String that is expected to contain a list of container IDs.
If the string is empty (as was previously the case), all containers on the node are scanned.

Signed-off-by: Jeff Mesnil <jmesnil@redhat.com>
---
 extractor/src/bin/extractor_server.rs | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/extractor/src/bin/extractor_server.rs b/extractor/src/bin/extractor_server.rs
index c6f7c868..b3105e4b 100644
--- a/extractor/src/bin/extractor_server.rs
+++ b/extractor/src/bin/extractor_server.rs
@@ -1,7 +1,7 @@
 use clap::Parser;
-use log::{error, info, trace};
+use log::{error, info, trace, warn};
 use std::fs;
-use std::io::Write;
+use std::io::{Read, Write};
 use std::net::{Shutdown, TcpListener, TcpStream};
 use std::process::Command;
 use std::thread;
@@ -60,10 +60,22 @@ fn handle_trigger_extraction(mut stream: TcpStream, log_level: String) {
 
     let start = Instant::now();
 
+    // Read container IDs from TCP stream until EOF
+    let mut buffer = String::new();
+    if let Err(e) = stream.read_to_string(&mut buffer) {
+        warn!("Failed to read from socket; err = {:?}", e);
+        if let Err(e) = stream.shutdown(Shutdown::Both) {
+            error!("Failed to shutdown socket; err = {:?}", e);
+        }
+        return;
+    }
+    let container_ids = buffer.trim().to_string();
+
     // Execute the "extractor_coordinator" program
     let output = Command::new("/coordinator")
         .arg("--log-level")
         .arg(log_level)
+        .arg(container_ids)
         .output();
     match output {
         Ok(output) => {

From 7d0c1c3233b66f45999ba8fe5970580e1dc00169 Mon Sep 17 00:00:00 2001
From: Jeff Mesnil <jmesnil@redhat.com>
Date: Fri, 30 Jan 2026 15:24:16 +0100
Subject: [PATCH 5/6] feat: Add POST endpoint to filter containers by ID

Add POST /gather_runtime_info endpoint that accepts a JSON body with
containerIds array to scan specific containers. The container IDs are
sent as comma-separated values over TCP to the extractor server.
Uses CloseWrite() to signal EOF for proper TCP stream handling.

Signed-off-by: Jeff Mesnil <jmesnil@redhat.com>
---
 exporter/cmd/exporter/main.go | 44 +++++++++++++++++++++++++++++++----
 1 file changed, 39 insertions(+), 5 deletions(-)

diff --git a/exporter/cmd/exporter/main.go b/exporter/cmd/exporter/main.go
index 2aef5f8a..5f6b0d6f 100644
--- a/exporter/cmd/exporter/main.go
+++ b/exporter/cmd/exporter/main.go
@@ -22,10 +22,32 @@ const (
 	EXTRACTOR_ADDRESS string = "127.0.0.1:3000"
 )
 
+// GatherRuntimeInfoRequest represents the JSON body for POST requests
+type GatherRuntimeInfoRequest struct {
+	ContainerIds []string `json:"containerIds"`
+}
+
 // gatherRuntimeInfo will trigger a new extraction of runtime info
 // and reply with a JSON payload
 func gatherRuntimeInfo(w http.ResponseWriter, r *http.Request) {
-	if r.Method != "GET" {
+	var containerIds []string
+
+	switch r.Method {
+	case "GET":
+		// GET scans all containers (empty containerIds)
+		containerIds = []string{}
+	case "POST":
+		var req GatherRuntimeInfoRequest
+		if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+			http.Error(w, `{"error": "Invalid JSON in request body"}`, http.StatusBadRequest)
+			return
+		}
+		if req.ContainerIds == nil {
+			http.Error(w, `{"error": "containerIds field is required"}`, http.StatusBadRequest)
+			return
+		}
+		containerIds = req.ContainerIds
+	default:
 		http.Error(w, "Method is not supported.", http.StatusNotFound)
 		return
 	}
@@ -34,7 +56,8 @@ func gatherRuntimeInfo(w http.ResponseWriter, r *http.Request) {
 	hash := hashParam == "" || hashParam == "true"
 
 	startTime := time.Now()
-	dataPath, err := triggerRuntimeInfoExtraction()
+	dataPath, err := triggerRuntimeInfoExtraction(containerIds)
+
 	if err != nil {
 		http.Error(w, err.Error(), http.StatusInternalServerError)
 		return
@@ -63,7 +86,7 @@ func gatherRuntimeInfo(w http.ResponseWriter, r *http.Request) {
 	w.Write(response)
 }
 
-func triggerRuntimeInfoExtraction() (string, error) {
+func triggerRuntimeInfoExtraction(containerIds []string) (string, error) {
 	conn, err := net.Dial("tcp", EXTRACTOR_ADDRESS)
 	if err != nil {
 		return "", err
@@ -71,10 +94,21 @@ func triggerRuntimeInfoExtraction() (string, error) {
 	defer conn.Close()
 
 	log.Println("Requesting a new runtime extraction")
+	tcpConn, ok := conn.(*net.TCPConn)
+	if !ok {
+		return "", fmt.Errorf("failed to get TCP connection")
+	}
+
+	// Send comma-separated container IDs (empty string if no specific containers requested)
+	payload := strings.Join(containerIds, ",")
 	// write to TCP connection to trigger a runtime extraction
-	fmt.Fprintf(conn, "")
+	fmt.Fprintf(tcpConn, "%s", payload)
+	// and close the write side to signal EOF to the server
+	if err := tcpConn.CloseWrite(); err != nil {
+		return "", fmt.Errorf("failed to close write side: %w", err)
+	}
 
-	dataPath, err := bufio.NewReader(conn).ReadString('\n')
+	dataPath, err := bufio.NewReader(tcpConn).ReadString('\n')
 	if err != nil {
 		return "", err
 	}

From e18320a7988c5a0d7238af8b7ad2d8878f3ec725 Mon Sep 17 00:00:00 2001
From: Jeff Mesnil <jmesnil@redhat.com>
Date: Fri, 30 Jan 2026 15:34:39 +0100
Subject: [PATCH 6/6] chore: update PRD for performance improvement

Signed-off-by: Jeff Mesnil <jmesnil@redhat.com>
---
 docs/prd/prd-0001-container-ids-filtering.md | 303 +++++++------------
 1 file changed, 117 insertions(+), 186 deletions(-)

diff --git a/docs/prd/prd-0001-container-ids-filtering.md b/docs/prd/prd-0001-container-ids-filtering.md
index 076ebc32..c2e2bbe7 100644
--- a/docs/prd/prd-0001-container-ids-filtering.md
+++ b/docs/prd/prd-0001-container-ids-filtering.md
@@ -2,7 +2,7 @@
 
 ## Overview
 
-This document describes the changes required to support targeted container scanning via a list of container IDs. Currently, the system either scans all running containers or a single container. This enhancement will allow clients to request scanning of a specific subset of containers by providing their IDs in a POST request.
+This document describes the changes implemented to support targeted container scanning via a list of container IDs. Previously, the system only scanned all running containers. This enhancement allows clients to request scanning of a specific subset of containers by providing their IDs in a POST request.
 
 ## Motivation
 
@@ -10,35 +10,35 @@ This document describes the changes required to support targeted container scann
 
 ---
 
-## Current Architecture
+## Architecture
 
 ### Request Flow
 ```
 HTTP Client → Exporter (Go) → Extractor Server (Rust) → Coordinator (Rust) → Fingerprints
-             GET /gather_runtime_info    TCP 3000          /coordinator
+             GET/POST /gather_runtime_info    TCP 3000          /coordinator
 ```
 
-### Current Capabilities
-- **Exporter**: Only supports `GET /gather_runtime_info?hash=true|false`
-- **Extractor Server**: Receives empty TCP payload, spawns coordinator
-- **Coordinator**: Accepts optional single `--container-id <id>` argument (not used by the extractor server)
+### Capabilities
+- **Exporter**: Supports `GET /gather_runtime_info?hash=true|false` (scans all) and `POST /gather_runtime_info` (scans specific containers)
+- **Extractor Server**: Receives comma-separated container IDs via TCP, spawns coordinator with IDs as argument
+- **Coordinator**: Accepts optional `container_ids` positional argument (comma-separated list)
 
 ---
 
-## Proposed Changes
+## Implemented Changes
 
 ### 1. Exporter HTTP Server (Go)
 
 **File**: `exporter/cmd/exporter/main.go`
 
-#### New Endpoint
-Add a new HTTP POST endpoint alongside the existing GET endpoint:
+#### Endpoints
 
 | Method | Path | Request Body | Description |
 |--------|------|--------------|-------------|
+| GET | `/gather_runtime_info` | None | Scan all containers |
 | POST | `/gather_runtime_info` | JSON object with container IDs | Scan specific containers |
 
-#### Request Format
+#### Request Format (POST)
 ```json
 {
   "containerIds": ["abc123...", "def456...", "ghi789..."]
@@ -49,38 +49,35 @@ Add a new HTTP POST endpoint alongside the existing GET endpoint:
 - Request body must be valid JSON
 - `containerIds` field must be present and be an array
 - Array can be empty (equivalent to scanning all containers)
-- Each container ID must be a non-empty string
-- Container IDs should be the short or full CRI-O container ID format
+- Container IDs must be in full CRI-O format, with or without `cri-o://` prefix
 
-#### Changes Required
+#### Implementation
 
-1. **Add JSON request struct**:
+1. **JSON request struct**:
    ```go
    type GatherRuntimeInfoRequest struct {
        ContainerIds []string `json:"containerIds"`
    }
    ```
 
-2. **Update HTTP handler**:
-   - Parse request method (GET vs POST)
-   - For POST: parse JSON body and validate
-   - Pass container IDs to `triggerRuntimeInfoExtraction()`
+2. **HTTP handler** (`gatherRuntimeInfo`):
+   - For GET: uses empty `containerIds` slice
+   - For POST: decodes JSON body and validates `containerIds` field is present
+   - Passes container IDs to `triggerRuntimeInfoExtraction()`
 
-3. **Update `triggerRuntimeInfoExtraction()` function**:
-   - Add parameter: `containerIds []string`
-   - Serialize container IDs as a single string of comma-separated container IDs before sending over TCP
-   - Send string payload instead of empty payload to extractor server
+3. **`triggerRuntimeInfoExtraction()` function**:
+   - Accepts `containerIds []string` parameter
+   - Joins container IDs with commas: `strings.Join(containerIds, ",")`
+   - Sends payload via TCP
+   - **Calls `tcpConn.CloseWrite()` to signal EOF** - this is critical because the Rust server uses `read_to_string()` which reads until EOF
 
-4. **TCP Protocol Change**:
-   - Current: Empty payload triggers extraction
-   - New: String payload with a list of comma-separated container IDs
-   ```
-   "abc123,def456,..."
-   ```
-   - Empty string means "scan all containers"
+#### TCP Protocol
+- Client sends comma-separated container IDs as plain text
+- Client calls `CloseWrite()` to signal end of data
+- Server responds with path to extracted data
 
 #### Backward Compatibility
-- GET endpoint behavior remains unchanged (scans all containers)
+- GET endpoint behavior unchanged (scans all containers)
 - GET internally sends an empty string `""` to maintain protocol consistency
 
 ---
@@ -89,39 +86,28 @@ Add a new HTTP POST endpoint alongside the existing GET endpoint:
 
 **File**: `extractor/src/bin/extractor_server.rs`
 
-#### Changes Required
-
-1. **Update `handle_trigger_extraction()` function**:
-   - Read incoming data from TCP stream (currently ignored) as a String
-   - Pas the incoming data to the CLI argument if not empty. Wrap them in a String.
-   
-2. **Update coordinator invocation**:
-   - Current:
-     ```rust
-     Command::new("/coordinator")
-         .arg("--log-level")
-         .arg(log_level)
-         .output()
-     ```
-   - New (when container IDs provided):
-     ```rust
-     Command::new("/coordinator")
-         .arg("--log-level")
-         .arg(log_level)
-         .arg(container_ids)
-         .output()
-     ```
-
-3. **Error Handling**:
-   - Invalid String: Return error response via TCP
-   - Log validation errors at warn level
+#### Implementation
+
+1. **`handle_trigger_extraction()` function**:
+   - Reads incoming data from TCP stream using `read_to_string()` (reads until EOF)
+   - Trims the received string
+   - Passes container IDs as positional argument to coordinator
+
+2. **Coordinator invocation**:
+   ```rust
+   Command::new("/coordinator")
+       .arg("--log-level")
+       .arg(log_level)
+       .arg(container_ids)  // comma-separated string, passed even if empty
+       .output()
+   ```
 
 #### TCP Protocol
 
 | Direction | Format | Example |
 |-----------|--------|---------|
-| Request | Plain text | `abc123,def456` |
-| Response | Plain text (path) | `data/out-1234567890` |
+| Request | Plain text (read until EOF) | `abc123,def456` |
+| Response | Plain text (path) | `data/out-1234567890\n` |
 
 ---
 
@@ -129,61 +115,50 @@ Add a new HTTP POST endpoint alongside the existing GET endpoint:
 
 **File**: `extractor/src/bin/coordinator.rs`
 
-#### Current Arguments
+#### CLI Arguments
 ```rust
 #[arg(short, long)]
 log_level: Option<String>
 
-#[arg(help = "ID of the container to scan. If absent, all containers are scanned")]
-container_id: Option<String>
+#[arg(help = "Comma-separated list of container IDs to scan. If absent, all containers are scanned")]
+container_ids: Option<String>
 ```
 
-#### New Arguments
-```rust
-#[arg(short, long)]
-log_level: Option<String>
+#### Implementation
+
+1. **Parsing container IDs**:
+   ```rust
+   let container_ids: Vec<String> = args
+       .container_ids
+       .map(|ids| {
+           ids.split(',')
+               .map(|id| id.trim().to_string())
+               .filter(|id| !id.is_empty())
+               .collect()
+       })
+       .unwrap_or_default();
+   ```
 
-#[arg(help = "Comma-separated list of container IDs to scan")]
-container_ids: Option<String>
-```
+2. **Container retrieval**:
+   - Calls `get_containers(container_ids)` passing the parsed vector
+   - Empty vector means "scan all containers"
 
-#### Changes Required
-
-1. **Add new CLI argument**:
-   - Change `container_id` to `container_ids` so that multiple containers ID can be passed (separated by commas)
-   - Parse them into `Vec<String>`
-
-2. **Update container filtering logic**:
-
-   - Current location in coordinator.rs `main()`:
-     ```rust
-      let containers = match args.container_id {
-        None => get_containers(),
-        Some(container_id) => match get_container(&container_id) {
-            Some(container) => vec![container],
-            None => vec![],
-        },
-    };
-     ```
-   - New logic:
-     ```rust
-      let containers = match args.container_ids {         
-        None => get_containers(),
-        Some(container_ids) => {
-         // split them from commas,
-         // iterate on them: for each trimmed non-empty container ID,
-         // call get_container(container_id)
-         // and collect them in a Vector
-        },
-      };
-
-     ```
-3. **Update container information retrieval**:
-
-   - Update `get_container(container_id: &String)` to invoke `crictl inspect $container_id` instead of `crictl ps`. 
-
-4. **Logging**:
-   - Log at info level: "Scanning X containers"
+3. **Logging**:
+   - Logs "Scanning X containers" at info level
+
+---
+
+### 4. Container Module (Rust)
+
+**File**: `extractor/src/insights_runtime_extractor/container.rs`
+
+#### Implementation
+
+1. **`get_containers(container_ids: Vec<String>)` function**:
+   - Normalizes container IDs by stripping `cri-o://` prefix if present
+   - Runs `crictl ps -o json -s RUNNING` to get all running containers
+   - If `container_ids` is not empty, filters to only include matching containers
+   - Returns `Vec<Container>`
 
 ---
 
@@ -202,17 +177,18 @@ container_ids: Option<String>
 │                        EXPORTER (Go) - Port 8000                            │
 │                                                                             │
 │  1. Parse POST body as JSON                                                 │
-│  2. Validate containerIds array                                             │
+│  2. Validate containerIds field is present                                  │
 │  3. Send comma-separated containerIds to extractor_server via TCP           │
+│  4. Call CloseWrite() to signal EOF                                         │
 └─────────────────────────────────┬───────────────────────────────────────────┘
                                   │
-                    TCP: abc123,def456
+                    TCP: "abc123,def456" + EOF
                                   │
                                   ▼
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                    EXTRACTOR_SERVER (Rust) - Port 3000                      │
 │                                                                             │
-│  1. Read String from TCP stream                                             │
+│  1. Read String from TCP stream until EOF (read_to_string)                  │
 │  2. Execute: /coordinator --log-level info "abc123,def456"                  │
 └─────────────────────────────────┬───────────────────────────────────────────┘
                                   │
@@ -222,8 +198,9 @@ container_ids: Option<String>
 ┌─────────────────────────────────────────────────────────────────────────────┐
 │                         COORDINATOR (Rust)                                  │
 │                                                                             │
-│  1. Parse executable argument                                               │
-│  2. Get all running containers via crictl                                   │
+│  1. Parse comma-separated container IDs from argument                       │
+│  2. Get all running containers via crictl ps                                │
+│  3. Filter to only requested containers (if any specified)                  │
 │  4. Scan filtered containers                                                │
 │  5. Output path to stdout                                                   │
 └─────────────────────────────────────────────────────────────────────────────┘
@@ -252,7 +229,7 @@ Content-Type: application/json
 **Field Descriptions**:
 | Field | Type | Required | Description |
 |-------|------|----------|-------------|
-| containerIds | string[] | Yes | List of container IDs to scan. Empty array scans all containers. |
+| containerIds | string[] | Yes | List of container IDs to scan. Empty array scans all containers. Supports full CRI-O IDs, with or without `cri-o://` prefix. |
 
 #### Response
 
@@ -260,19 +237,17 @@ Same as existing GET endpoint response:
 
 ```json
 {
-  "node-name": {
-    "namespace-1": {
-      "pod-name-1": {
-        "container-name-1": {
-          "os": "rhel",
-          "osVersion": "8.9",
-          "kind": "Java",
-          "kindVersion": "17.0.9",
-          "kindImplementer": "Red Hat, Inc.",
-          "runtimes": [
-            {"name": "Quarkus", "version": "3.2.0"}
-          ]
-        }
+  "namespace-1": {
+    "pod-name-1": {
+      "container-id-1": {
+        "os": "rhel",
+        "osVersion": "8.9",
+        "kind": "Java",
+        "kindVersion": "17.0.9",
+        "kindImplementer": "Red Hat, Inc.",
+        "runtimes": [
+          {"name": "Quarkus", "version": "3.2.0"}
+        ]
       }
     }
   }
@@ -285,30 +260,10 @@ Same as existing GET endpoint response:
 |--------|-----------|---------------|
 | 400 | Invalid JSON | `{"error": "Invalid JSON in request body"}` |
 | 400 | Missing containerIds | `{"error": "containerIds field is required"}` |
-| 400 | Invalid container ID format | `{"error": "Container ID at index N is invalid"}` |
-| 500 | Extraction failed | `{"error": "Runtime extraction failed: <details>"}` |
+| 500 | Extraction failed | Error message from extraction process |
 
 ---
 
-## Testing Requirements
-
-### Unit Tests
-
-1. **Exporter**:
-   - JSON parsing of valid request body
-   - Validation of container ID formats
-   - Empty array handling
-   - Malformed JSON rejection
-
-2. **Extractor Server**:
-   - JSON deserialization from TCP stream
-   - Container ID validation
-   - Command argument construction
-
-3. **Coordinator**:
-   - Comma-separated parsing of the main argument
-   - Container filtering logic
-
 ### Integration Tests
 
 1. **End-to-end with specific containers**:
@@ -325,7 +280,7 @@ Same as existing GET endpoint response:
 
 4. **Mixed valid/invalid IDs**:
    - Some IDs exist, some don't
-   - Should scan existing ones, log warnings for missing
+   - Should scan existing ones only
 
 ### E2E Tests
 
@@ -335,39 +290,13 @@ Add test cases to existing e2e test suite in `runtime-samples/`:
 
 ---
 
-## Implementation Phases
-
-### Phase 1: Coordinator Changes
-1. Change main argument `container_ids` to `container_id`
-2. Implement container filtering logic
-3. Add unit tests
-4. Manual testing with direct coordinator invocation
-
-### Phase 2: Extractor Server Changes
-1. Update coordinator invocation
-2. Add error handling
-3. Add unit tests
-
-### Phase 3: Exporter Changes
-1. Add POST endpoint handler
-2. Implement JSON validation
-3. Update TCP communication
-4. Add unit tests
-
-### Phase 4: Integration & E2E
-1. Integration tests
-2. E2E tests in runtime-samples
-3. Documentation updates
-
----
-
 ## Backward Compatibility
 
 | Component | Backward Compatible | Notes |
 |-----------|---------------------|-------|
 | Exporter | Yes | GET endpoint unchanged |
-| Extractor Server | Yes | Empty payload treated as empty containerIds |
-| Coordinator | Yes | passing a singular container ID still works |
+| Extractor Server | Yes | Empty string treated as "scan all" |
+| Coordinator | Yes | Empty/missing argument scans all containers |
 
 ---
 
@@ -379,23 +308,25 @@ Add test cases to existing e2e test suite in `runtime-samples/`:
 
 ---
 
-## Open Questions
+## Design Decisions
+
+1. **Container ID format**: Full (64-char) container IDs are supported. The `cri-o://` prefix is automatically stripped if present.
+
+2. **Maximum number of container IDs**: No hard limit imposed.
 
-1. **Container ID format**: Should we support both short (12-char) and full (64-char) container IDs?
-   - Recommendation: Yes, support both with prefix matching
+3. **Response for non-existent IDs**: Missing containers are silently skipped. Results are returned for found containers only.
 
-2. **Maximum number of container IDs**: Should there be a limit?
-   - Recommendation: No hard limit, but document performance implications
+4. **TCP EOF signaling**: The Go exporter uses `CloseWrite()` to signal EOF because the Rust server uses `read_to_string()` which reads until EOF.
 
-3. **Response for non-existent IDs**: Should we return an error or just skip missing containers?
-   - Recommendation: Skip missing, return results for found containers, log warnings
+5. **Filtering approach**: Instead of calling `crictl inspect` for each container ID, the implementation fetches all running containers with `crictl ps` and filters in memory. This is simpler and avoids multiple subprocess calls.
 
 ---
 
-## Files to Modify
+## Files Modified
 
 | File | Changes |
 |------|---------|
-| `exporter/cmd/exporter/main.go` | Add POST handler, JSON parsing, update TCP payload |
-| `extractor/src/bin/extractor_server.rs` | Add JSON deserialization, update coordinator args |
-| `extractor/src/bin/coordinator.rs` | Change main argument, filtering logic |
+| `exporter/cmd/exporter/main.go` | Add POST handler, JSON parsing, TCP EOF signaling with CloseWrite() |
+| `extractor/src/bin/extractor_server.rs` | Read container IDs from TCP stream, pass to coordinator |
+| `extractor/src/bin/coordinator.rs` | Accept container_ids positional argument, parse and pass to get_containers |
+| `extractor/src/insights_runtime_extractor/container.rs` | Update get_containers to accept Vec<String> and filter containers |