Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,61 @@ that ran and statistics about them (e.g. pass/fail, duration). These XML files w
used by external applications to present metrics and data for others to see into. An example of
this is they are used to present data in [TestGrid Dashboards][TestGrid Dashboard].

## Slack Notifications

OSDe2e can send AI-powered failure analysis to Slack when tests fail. Each test suite can notify a different Slack channel with failure details, analysis, and logs.

### Setup

**1. Add Workflow to Your Slack Channel**

Each team adds the shared E2E Test Notifications workflow to their channel:

1. Open the workflow: https://slack.com/shortcuts/Ft09RL7M2AMV/60f07b46919da20d103806a8f5bba094
2. Click **Add to Slack**
3. Select your destination channel
4. Copy the webhook URL (starts with `https://hooks.slack.com/workflows/...`)

**2. Get Your Channel ID**

Right-click your channel → **View channel details** → copy the channel ID (starts with `C`, e.g., `C06HQR8HN0L`)

**3. Configure Test Suites**

Set `TEST_SUITES_YAML` with your test images, webhook URLs, and Slack channel IDs:

```bash
export TEST_SUITES_YAML='
- image: quay.io/openshift/osde2e-tests:latest
slackWebhook: https://hooks.slack.com/workflows/T.../A.../...
slackChannel: C06HQR8HN0L
- image: quay.io/openshift/custom-tests:v1.0
slackWebhook: https://hooks.slack.com/workflows/T.../B.../...
slackChannel: C07ABC123XY
'
```

**4. Enable Notifications**

Enable Slack notifications in your config:

```yaml
tests:
enableSlackNotify: true
logAnalysis:
enableAnalysis: true
```

### What You'll Receive

When tests fail, you'll get a threaded Slack message with:
1. **Main message**: Test suite info (what failed)
2. **Reply 1**: AI analysis (why it failed)
3. **Reply 2**: Test failure logs (evidence)
4. **Reply 3**: Cluster details (for debugging)

For implementation details, see [internal/reporter/README.md](internal/reporter/README.md).

## CI Jobs

Periodic jobs are run daily validating Managed OpenShift clusters, using
Expand Down
4 changes: 2 additions & 2 deletions internal/aggregator/aggregator.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (
"github.com/go-logr/logr"
"github.com/joshdk/go-junit"
"github.com/openshift/osde2e/internal/sanitizer"
"github.com/openshift/osde2e/pkg/common/util"
)

type Aggregator struct {
Expand Down Expand Up @@ -277,8 +278,7 @@ func extractErrorsFromLogFile(logFile string) (string, error) {
// use string builder to collect errors
var errors strings.Builder
for _, line := range lines {
// Check all case variants directly - fastest approach
if strings.Contains(line, "error") || strings.Contains(line, "Error") || strings.Contains(line, "ERROR") {
if util.ContainsErrorMarker(line) {
errors.WriteString(line)
errors.WriteString("\n") // Add newline separator
}
Expand Down
126 changes: 124 additions & 2 deletions internal/reporter/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Reporter System
# Reporter System (Developer Documentation)

The reporter system handles notification delivery after LLM analysis completion, providing a flexible and extensible way to send analysis results to external systems.
The reporter system handles notification delivery after LLM analysis completion. This document covers the internal architecture and implementation details for developers working on the reporter system.

**For user setup instructions, see the [root README](../../README.md#slack-notifications).**

## Architecture Overview

Expand Down Expand Up @@ -69,3 +71,123 @@ if len(reporters) > 0 {
}
}
```

## Slack Workflow Integration

The Slack reporter sends test failure notifications using a **Slack Workflow** that creates threaded messages. This allows teams to add the shared workflow to their channels and receive structured failure notifications.

### How It Works

The workflow creates four messages in a thread:

1. **Initial Message** - Test suite information (what failed)
2. **First Reply** - AI-powered analysis with root cause and recommendations (briefly why)
3. **Second Reply** - Extracted test failure logs (evidence - only failure blocks, not full stdout)
4. **Third Reply** - Cluster information for debugging (least important - cluster is ephemeral)

**Note:** The code sends fallback messages (e.g., "Test output logs not available") when data is unavailable. This ensures the workflow is resilient to version drift between code and workflow changes.

### Message Format

**Summary (Initial Message - What Failed):**
```
:failed: Pipeline Failed at E2E Test

====== 🧪 Test Suite Information ======
• Image: `quay.io/openshift/osde2e-tests`
• Commit: `abc123`
• Environment: `stage`
```

**Analysis (First Reply - Briefly Why):**
```
====== 🔍 Possible Cause ======
<AI-generated root cause analysis>

====== 💡 Recommendations ======
1. <recommendation 1>
2. <recommendation 2>
```

**Extended Logs (Second Reply - Evidence):**
```
Found 3 test failure(s):

[FAILED] test description
<failure context lines>
...
```

**Cluster Details (Third Reply - For Debugging):**
```
====== ☸️ Cluster Information ======
• Cluster ID: `abc-123`
• Name: `my-cluster`
• Version: `4.20`
• Provider: `aws`
• Expiration: `2026-01-28T10:00:00Z`
```

### Testing

#### Unit Tests
```bash
# Run all reporter tests
go test -v github.com/openshift/osde2e/internal/reporter

# Run specific workflow tests
go test -v -run TestSlackReporter_buildWorkflowPayload
go test -v -run TestSlackReporter_extractFailureBlocks
```

#### Integration Test (with real Slack)
```bash
# Set environment variables
export LOG_ANALYSIS_SLACK_WEBHOOK="https://hooks.slack.com/workflows/..."
export LOG_ANALYSIS_SLACK_CHANNEL="C06HQR8HN0L"

# Run integration test
go test -v -run TestSlackReporter_Integration github.com/openshift/osde2e/pkg/e2e
```

**Note:** Integration test automatically skips if environment variables are not set.

### Workflow Payload Structure

The reporter sends this JSON payload to the Slack Workflow:

```json
{
"channel": "C06HQR8HN0L",
"summary": "Pipeline Failed at E2E Test\n\n# Test Suite Info...",
"analysis": "# Possible Cause\n...",
"extended_logs": "Found 3 test failure(s):\n...",
"cluster_details": "# Cluster Information\nCluster ID: abc-123\n...",
"image": "quay.io/openshift/osde2e:abc123",
"env": "stage",
"commit": "abc123"
}
```

## Implementation Notes

**Workflow vs Legacy Webhooks:**
- Workflow webhooks use `/workflows/` in the URL path
- Legacy incoming webhooks use `/services/` instead
- The code uses workflow webhooks to support threaded messages

**Payload Limits:**
- Maximum field length: 30KB per field (enforced by `maxWorkflowFieldLength` constant)
- Content exceeding limits is truncated with a notice
- Slack workflows handle much larger payloads than legacy webhooks

**Fallback Behavior:**
- All optional fields provide fallback messages when data is unavailable
- This ensures resilience to version drift between code and workflow changes
- Required fields: `channel`, `summary`, `analysis`

**Log Extraction Strategy:**
- For logs ≤250 lines: return full content
- For logs >250 lines: extract up to 3 failure blocks (max 30 lines each)
- Failure detection: `[FAILED]` markers and `ERROR`/`Error`/`error` strings
- Block deduplication: skip-ahead logic prevents overlapping extractions
Loading