Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…andards-risbfy Modernizing the project to 2026 standards
Added Go Reference badge and updated project description.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bbfff38bb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| cache: true | ||
|
|
||
| - name: Verify module graph | ||
| if: matrix.go-version == '1.23.x' |
There was a problem hiding this comment.
Align CI conditionals with Go version matrix
In .github/workflows/ci.yml, the job matrix only runs 1.24.x and 1.25.x, but this step (and the Staticcheck step below) is guarded by if: matrix.go-version == '1.23.x', which can never match. As a result, module graph verification and static analysis are silently skipped on every CI run, so dependency drift and staticcheck-detectable regressions can merge without detection.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR modernizes web-octopus for a module-based Go workflow, adds CI/CD automation via GitHub Actions, expands documentation, and increases unit test coverage around core pipeline behaviors and adapters.
Changes:
- Add Go modules (
go.mod,go.sum) and introduce unit tests for core setup, pipeline helpers, HTML parsing, and adapters. - Replace legacy Travis CI with GitHub Actions CI + publish workflows, plus add release/version documentation (
VERSION,CHANGELOG.md, README updates). - Improve safety/robustness in a few pipeline and adapter paths (e.g., remove
log.Fatal, add nil-guards, simplify channel reads).
Reviewed changes
Copilot reviewed 21 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
octopus/setup_test.go |
Adds tests for rate-limit validation and invalid TimeToQuit setup behavior. |
octopus/setup.go |
Changes invalid TimeToQuit handling from process exit to panic; includes rate-limit validation helper. |
octopus/pipes_test.go |
Adds tests for link absolution, duplicate filtering, and timeout propagation. |
octopus/pipe_spl_ingest.go |
Refactors ingest channel connector to range over channel. |
octopus/pipe_spl_distributor.go |
Cleans up distributor pipe signature and quit handling logic. |
octopus/pipe_process_htmlparsing.go |
Simplifies import style and adds nil-guard in HTML parsing stage. |
octopus/pipe_ctrl_ratelimit.go |
Simplifies import formatting. |
octopus/pipe_augment_linkabsolution.go |
Removes log.Fatal on nil input and simplifies import formatting. |
octopus/models.go |
Removes unused fields and reflows CrawlOptions doc comments. |
octopus/modelfactory_test.go |
Adds tests for default options and default channel set factory. |
octopus/htmlparse_test.go |
Adds unit test ensuring anchor links are extracted from HTML bodies. |
octopus/doc.go |
Converts package docs to Markdown-style headings and re-indents lists. |
go.sum |
Adds module checksums for golang.org/x/net and golang.org/x/time. |
go.mod |
Introduces Go module definition and required dependencies. |
adapter/doc.go |
Fixes doc comment formatting/indentation. |
adapter/basicadapters_test.go |
Adds unit test for FileWriterAdapter writing output to disk. |
adapter/basicadapters.go |
Improves file adapter error handling and file open flags; adjusts doc formatting. |
VERSION |
Adds project version 1.3.0. |
README.md |
Large documentation refresh: installation, architecture, testing, CI/CD, release flow. |
CHANGELOG.md |
Adds changelog entries for modernization and CI/CD changes. |
.travis.yml |
Removes Travis CI configuration. |
.github/workflows/publish.yml |
Adds release-driven publish workflow (proxy/pkg.go.dev refresh). |
.github/workflows/ci.yml |
Adds CI workflow for formatting, vet, tests, race, and optional module/static checks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - name: Verify release tag is on main | ||
| run: | | ||
| set -euo pipefail | ||
| git fetch origin main:refs/remotes/origin/main | ||
| MAIN_SHA=$(git rev-parse origin/main) | ||
| if ! test "${GITHUB_SHA}" = "${MAIN_SHA}"; then | ||
| echo "Release tag is not at main HEAD; refusing to publish" | ||
| echo " tag: ${{ github.event.release.tag_name }}" | ||
| echo " tag sha: ${GITHUB_SHA}" | ||
| echo " main head: ${MAIN_SHA}" |
There was a problem hiding this comment.
This publish check hard-codes main. Since the repo still references master elsewhere (and CI runs on both), this can fail on repos where the default branch is master. Consider using ${{ github.event.repository.default_branch }} (or checking both branches) when fetching/comparing the default branch head.
| - name: Verify release tag is on main | |
| run: | | |
| set -euo pipefail | |
| git fetch origin main:refs/remotes/origin/main | |
| MAIN_SHA=$(git rev-parse origin/main) | |
| if ! test "${GITHUB_SHA}" = "${MAIN_SHA}"; then | |
| echo "Release tag is not at main HEAD; refusing to publish" | |
| echo " tag: ${{ github.event.release.tag_name }}" | |
| echo " tag sha: ${GITHUB_SHA}" | |
| echo " main head: ${MAIN_SHA}" | |
| - name: Verify release tag is on default branch | |
| env: | |
| DEFAULT_BRANCH: ${{ github.event.repository.default_branch }} | |
| run: | | |
| set -euo pipefail | |
| git fetch origin "${DEFAULT_BRANCH}:refs/remotes/origin/${DEFAULT_BRANCH}" | |
| DEFAULT_BRANCH_SHA=$(git rev-parse "origin/${DEFAULT_BRANCH}") | |
| if ! test "${GITHUB_SHA}" = "${DEFAULT_BRANCH_SHA}"; then | |
| echo "Release tag is not at default branch HEAD; refusing to publish" | |
| echo " tag: ${{ github.event.release.tag_name }}" | |
| echo " tag sha: ${GITHUB_SHA}" | |
| echo " default branch: ${DEFAULT_BRANCH}" | |
| echo " default branch head: ${DEFAULT_BRANCH_SHA}" |
| for node := range inChSet.NodeCh { | ||
| opChSet.NodeCh <- node |
There was a problem hiding this comment.
channelConnector now ranges on inChSet.NodeCh, but inChSet.NodeCh is never closed anywhere in the codebase, so this goroutine has no shutdown path and will leak. Consider selecting on inChSet.QuitCh/masterQuitCh as well, or explicitly closing inChSet.NodeCh as part of quit propagation.
| for node := range inChSet.NodeCh { | |
| opChSet.NodeCh <- node | |
| for { | |
| select { | |
| case node, ok := <-inChSet.NodeCh: | |
| if !ok { | |
| return | |
| } | |
| opChSet.NodeCh <- node | |
| case <-inChSet.QuitCh: | |
| return | |
| } |
| case <-listenQuitCh: | ||
| { | ||
| for _, outChSet := range outChSetList { | ||
| if outChSet != nil { | ||
| outChSet.QuitCh <- 1 | ||
| } | ||
| for _, outChSet := range outChSetList { | ||
| if outChSet != nil { | ||
| outChSet.QuitCh <- 1 | ||
| } | ||
| } | ||
| return | ||
| } |
There was a problem hiding this comment.
After receiving listenQuitCh, distribute now returns immediately. Any in-flight producers that still try to send to listenCh (e.g., goroutines spawned by upstream stages) will block forever because there is no longer a receiver, potentially deadlocking shutdown. Consider continuing to drain listenCh after quit (discarding nodes) until upstream stops, or otherwise coordinating closure/cancellation so senders can exit safely.
|
|
||
| 1. `adapter.StdOpAdapter` : Writes the crawled output (only links, not body) to the standard output. | ||
| 1. `adapter.FileWriterAdapter` : Writes the crawled output (only links, not body) to a supplied file. | ||
| - **CI workflow** (`.github/workflows/ci.yml`) runs automatically on PR open/sync/reopen and on pushes to the default branch. It validates module tidiness, formatting, vet/staticcheck, and test suites (including race detection). |
There was a problem hiding this comment.
The README claims the CI workflow validates module tidiness and runs staticcheck, but the current .github/workflows/ci.yml gates those steps on a Go version that isn’t in the matrix, so they won’t actually run. Either update the workflow to run those checks or adjust this documentation to match reality.
| - **CI workflow** (`.github/workflows/ci.yml`) runs automatically on PR open/sync/reopen and on pushes to the default branch. It validates module tidiness, formatting, vet/staticcheck, and test suites (including race detection). | |
| - **CI workflow** (`.github/workflows/ci.yml`) runs automatically on PR open/sync/reopen and on pushes to the default branch. It runs formatting, vet, and test suites (including race detection). |
|
|
||
| ### Changed | ||
| - Removed legacy Travis CI configuration in favor of GitHub Actions. | ||
| - Hardened CI reliability by adding workflow concurrency cancellation, running module/static checks on the primary Go version, and pinning staticcheck. |
There was a problem hiding this comment.
Changelog states CI runs module/static checks on the primary Go version and pins staticcheck, but the current CI workflow conditions prevent the module/staticcheck steps from running at all. Once CI is fixed, this will be accurate; otherwise consider adjusting this entry to avoid overstating CI coverage.
| - Hardened CI reliability by adding workflow concurrency cancellation, running module/static checks on the primary Go version, and pinning staticcheck. | |
| - Hardened CI reliability by adding workflow concurrency cancellation and initial configuration for module/static checks on the primary Go version with a pinned staticcheck version. |
| - name: Verify module graph | ||
| if: matrix.go-version == '1.23.x' | ||
| run: | | ||
| go mod tidy | ||
| git diff --exit-code -- go.mod go.sum | ||
|
|
There was a problem hiding this comment.
In this workflow matrix the Go versions are 1.24.x/1.25.x, but this step is gated on matrix.go-version == '1.23.x', so go mod tidy/diff never runs. Update the condition to match one of the matrix versions (or remove the condition) so module tidiness is actually validated.
| - name: Staticcheck | ||
| if: matrix.go-version == '1.23.x' | ||
| run: | | ||
| go install honnef.co/go/tools/cmd/staticcheck@v0.6.1 | ||
| TOOLBIN="$(go env GOBIN)" | ||
| if [ -z "$TOOLBIN" ]; then | ||
| TOOLBIN="$(go env GOPATH)/bin" | ||
| fi | ||
| "$TOOLBIN/staticcheck" ./... | ||
|
|
There was a problem hiding this comment.
Staticcheck is also gated on matrix.go-version == '1.23.x', which is not present in the matrix. As written, staticcheck will never execute in CI; align the condition with the matrix (or run it unconditionally).
No description provided.