Skip to content

Replaced cheerio with lightweight html-utils#26819

Open
kevinansfield wants to merge 1 commit intomainfrom
no-cheerio
Open

Replaced cheerio with lightweight html-utils#26819
kevinansfield wants to merge 1 commit intomainfrom
no-cheerio

Conversation

@kevinansfield
Copy link
Member

Summary

  • Replaced the direct cheerio (0.22) dependency with a thin html-utils utility built on htmlparser2, css-select, domutils, and dom-serializer — the same underlying packages cheerio wraps
  • The cheerio 0.22 dependency causes yarn hoisting conflicts with the cheerio ^1.0 that juice requires, leading to version resolution issues across the monorepo
  • Companion to Replaced cheerio with htmlparser2+domutils in url-utils SDK#821 which does the same for @tryghost/url-utils

Changes

  • New: ghost/core/core/server/lib/html-utils.js — lightweight HTML manipulation utility providing load() and select() functions with a jQuery-like API (attr, remove, addClass, each, before, text, html, etc.)
  • 6 production files updated to use html-utils instead of cheerio: email-renderer, mention-sending-service, mention-discovery-service, mention.js, oembed-service, generate-feed
  • 14 test files updated to use html-utils
  • package.json: Removed cheerio: 0.22.0, added css-select, dom-serializer, domhandler, domutils, htmlparser2
  • Snapshot updates: Entity encoding changed from hex format (cheerio 0.22 artifact) to the original format used in source code — e.g.   , ­­. Same characters, different (now correct) encoding. Additionally, juice now correctly resolves to cheerio 1.0 (its declared dep) instead of accidentally picking up 0.22 via hoisting.

Benchmark results

Tested cheerio 0.22 vs html-utils across all Ghost usage patterns:

Scenario                          Speedup
OEmbed link discovery              2.3x
Extract links (mentions)           5.8x
Mention verification               6.8x
Segment extraction                 5.9x
Full email post-processing         2.1x
Segment filtering + serialize      2.4x
RSS feed generation               18.8x

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

Walkthrough

A new HTML utility module was added at ghost/core/core/server/lib/html-utils.js, exporting load(html) and select(selector, html). References to Cheerio were replaced across core services (RSS, email renderer, mentions, oEmbed) and many tests to use this utility. package.json was updated to remove Cheerio and add htmlparser2, css-select, domutils, dom-serializer and related packages. A comprehensive unit test suite for the html-utils module and multiple test updates were included.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: replacing cheerio with a lightweight html-utils solution, which is the core objective of this PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the motivation, changes made, and performance improvements of replacing cheerio with html-utils.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch no-cheerio
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.41.1)
ghost/core/test/unit/server/services/email-service/email-renderer.test.js

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevinansfield kevinansfield force-pushed the no-cheerio branch 2 times, most recently from 268497a to bffdb5b Compare March 14, 2026 21:15
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
ghost/core/test/integration/services/email-service/cards.test.js (1)

38-39: Update stale inline comments referencing Cheerio.

These comments now describe outdated implementation details and should mention html-utils/htmlLoad instead.

✏️ Suggested comment-only cleanup
-    // Remove the preheader span from the email using cheerio
+    // Remove the preheader span from the email using html-utils
-        // Remove the preheader span from the email using cheerio
+        // Remove the preheader span from the email using html-utils

Also applies to: 131-132

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ghost/core/test/integration/services/email-service/cards.test.js` around
lines 38 - 39, Update the stale inline comments that mention "Cheerio" to
reference the current HTML utility by name (html-utils/htmlLoad) so they match
the implementation; locate occurrences around the htmlLoad(...) call in
cards.test.js (e.g., the comment above "const $ = htmlLoad(data.html);" and the
similar comment at the later occurrence around lines 131-132) and change the
wording to say that the preheader span is removed using html-utils/htmlLoad
rather than Cheerio.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ghost/core/core/server/lib/html-utils.js`:
- Around line 143-152: The before() insertion loop wires every newNodes[i].next
to el and computes prev from pre-insert children which breaks chains for
multi-node inserts; update wiring so each new node's prev points to the previous
new node (or parent.children[idx-1] if i==0) and each new node's next points to
the next new node (or el if i is last), then set parent.children[idx-1].next to
newNodes[0] if that sibling exists and set el.prev to
newNodes[newNodes.length-1]; adjust the loop in html-utils.js that iterates
newNodes and update assignments to prev/next using parent, el, newNodes and idx
accordingly.
- Line 41: WrappedSet.each currently uses an expression-body arrow in the
forEach callback which implicitly returns fn(i, el); update the callback to a
block-body and call fn without returning its value to satisfy the lint rule.
Locate WrappedSet.each and change the forEach invocation from using (el, i) =>
fn(i, el) to a block-style callback that calls fn(i, el); (e.g., (el, i) => {
fn(i, el); }) so no value is returned from the callback.

In `@ghost/core/test/unit/server/services/mentions/mentions-api.test.js`:
- Line 336: The test stub for htmlUtils.load doesn't take effect because
mention.js destructures load at import time; change mention.js to require the
module object (e.g., const htmlUtils = require('../../lib/html-utils')) and
update all calls that used the destructured load to call htmlUtils.load so tests
can stub htmlUtils.load at runtime and Mention.verify will exercise the error
path; alternatively, if you prefer not to change mention.js, update the test to
inject/override the dependency at module load (e.g., using proxyquire) so the
stubbed load is used when Mention.verify is imported.

---

Nitpick comments:
In `@ghost/core/test/integration/services/email-service/cards.test.js`:
- Around line 38-39: Update the stale inline comments that mention "Cheerio" to
reference the current HTML utility by name (html-utils/htmlLoad) so they match
the implementation; locate occurrences around the htmlLoad(...) call in
cards.test.js (e.g., the comment above "const $ = htmlLoad(data.html);" and the
similar comment at the later occurrence around lines 131-132) and change the
wording to say that the preheader span is removed using html-utils/htmlLoad
rather than Cheerio.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a9c83de5-252b-43f7-97d8-153e21199c8d

📥 Commits

Reviewing files that changed from the base of the PR and between 88005c0 and bffdb5b.

⛔ Files ignored due to path filters (2)
  • ghost/core/test/integration/services/email-service/__snapshots__/cards.test.js.snap is excluded by !**/*.snap
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (22)
  • ghost/core/core/frontend/services/rss/generate-feed.js
  • ghost/core/core/server/lib/html-utils.js
  • ghost/core/core/server/services/email-service/email-renderer.js
  • ghost/core/core/server/services/mentions/mention-discovery-service.js
  • ghost/core/core/server/services/mentions/mention-sending-service.js
  • ghost/core/core/server/services/mentions/mention.js
  • ghost/core/core/server/services/oembed/oembed-service.js
  • ghost/core/package.json
  • ghost/core/test/e2e-api/content/posts.test.js
  • ghost/core/test/e2e-api/members/send-magic-link.test.js
  • ghost/core/test/e2e-frontend/default-routes.test.js
  • ghost/core/test/e2e-frontend/email-routes.test.js
  • ghost/core/test/e2e-frontend/preview-routes.test.js
  • ghost/core/test/e2e-frontend/theme-i18n.test.js
  • ghost/core/test/integration/services/email-service/batch-sending.test.js
  • ghost/core/test/integration/services/email-service/cards.test.js
  • ghost/core/test/legacy/mock-express-style/api-vs-frontend.test.js
  • ghost/core/test/legacy/site/dynamic-routing.test.js
  • ghost/core/test/legacy/site/frontend.test.js
  • ghost/core/test/unit/server/services/email-service/email-renderer.test.js
  • ghost/core/test/unit/server/services/mentions/mention.test.js
  • ghost/core/test/unit/server/services/mentions/mentions-api.test.js

@ErisDS
Copy link
Member

ErisDS commented Mar 14, 2026

🤖 Velo CI Failure Analysis

Classification: 🟠 SOFT FAIL

  • Workflow: CI
  • Failed Step: E2E tests
  • Run: View failed run
    What failed: Code coverage below required threshold
    Why: The failure is caused by a code issue that needs to be fixed in this PR.
    Action:
    Check the error message and fix the issue in your code.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ghost/core/core/server/lib/html-utils.js`:
- Around line 18-204: The new WrappedSet utility (functions/classes: WrappedSet,
load, select) lacks tests for empty/multi-node and mutation behavior; add unit
tests that cover: attr getter when element/attribs is absent and setter across
multiple elements (attr/removeAttr/addClass), html() getter on empty and
non-empty elements and html(content) setter replacing children correctly,
before(htmlStr) inserting multiple nodes and rebuilding prev/next/parent links,
find/select returning correct nodes for multiple roots, and
text()/first()/toArray()/each()/map() behaviors; assert DOM tree structure,
attributes, class string deduplication, and prev/next linkage after mutations to
close the coverage gap.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8b77184e-082f-4a2d-9881-3fd3f077c1ab

📥 Commits

Reviewing files that changed from the base of the PR and between bffdb5b and eaf91a6.

⛔ Files ignored due to path filters (3)
  • ghost/core/test/e2e-api/admin/__snapshots__/email-previews.test.js.snap is excluded by !**/*.snap
  • ghost/core/test/integration/services/email-service/__snapshots__/cards.test.js.snap is excluded by !**/*.snap
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (23)
  • ghost/core/core/frontend/services/rss/generate-feed.js
  • ghost/core/core/server/lib/html-utils.js
  • ghost/core/core/server/services/email-service/email-renderer.js
  • ghost/core/core/server/services/mentions/mention-discovery-service.js
  • ghost/core/core/server/services/mentions/mention-sending-service.js
  • ghost/core/core/server/services/mentions/mention.js
  • ghost/core/core/server/services/oembed/oembed-service.js
  • ghost/core/package.json
  • ghost/core/test/e2e-api/admin/email-previews.test.js
  • ghost/core/test/e2e-api/content/posts.test.js
  • ghost/core/test/e2e-api/members/send-magic-link.test.js
  • ghost/core/test/e2e-frontend/default-routes.test.js
  • ghost/core/test/e2e-frontend/email-routes.test.js
  • ghost/core/test/e2e-frontend/preview-routes.test.js
  • ghost/core/test/e2e-frontend/theme-i18n.test.js
  • ghost/core/test/integration/services/email-service/batch-sending.test.js
  • ghost/core/test/integration/services/email-service/cards.test.js
  • ghost/core/test/legacy/mock-express-style/api-vs-frontend.test.js
  • ghost/core/test/legacy/site/dynamic-routing.test.js
  • ghost/core/test/legacy/site/frontend.test.js
  • ghost/core/test/unit/server/services/email-service/email-renderer.test.js
  • ghost/core/test/unit/server/services/mentions/mention.test.js
  • ghost/core/test/unit/server/services/mentions/mentions-api.test.js
🚧 Files skipped from review as they are similar to previous changes (12)
  • ghost/core/test/legacy/mock-express-style/api-vs-frontend.test.js
  • ghost/core/test/legacy/site/dynamic-routing.test.js
  • ghost/core/test/e2e-frontend/theme-i18n.test.js
  • ghost/core/test/e2e-api/content/posts.test.js
  • ghost/core/test/legacy/site/frontend.test.js
  • ghost/core/test/e2e-api/members/send-magic-link.test.js
  • ghost/core/package.json
  • ghost/core/test/unit/server/services/mentions/mentions-api.test.js
  • ghost/core/test/integration/services/email-service/cards.test.js
  • ghost/core/core/server/services/mentions/mention-discovery-service.js
  • ghost/core/test/integration/services/email-service/batch-sending.test.js
  • ghost/core/test/unit/server/services/email-service/email-renderer.test.js

@ErisDS
Copy link
Member

ErisDS commented Mar 15, 2026

🤖 Velo CI Failure Analysis

Classification: 🟠 SOFT FAIL

  • Workflow: CI
  • Failed Step: E2E tests
  • Run: View failed run
    What failed: Type error in code
    Why: The failure is caused by a code issue that needs to be fixed in this PR.
    Action:
    Check the error message and fix the issue in your code.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ghost/core/core/server/lib/html-utils.js`:
- Around line 178-185: The $() wrapper currently builds a WrappedSet around any
input including null/undefined or arrays, producing a bogus length of 1; change
the function so that when selectorOrElement is nullish (selectorOrElement ==
null) or is an Array (Array.isArray(selectorOrElement)), it returns new
WrappedSet([], root) instead of wrapping the raw value; keep the existing path
that calls selectAll(selectorOrElement, root) and the fallback raw-element
wrapping for real single elements, but short-circuit these nullish/array cases
to prevent invalid .length and runtime errors in WrappedSet methods like
remove(), text(), and html().
- Around line 103-108: WrappedSet.text currently returns only the first
element's text (this._elements[0]) but should concatenate text from all matched
elements like Cheerio; update the WrappedSet.text method to iterate over
this._elements, call textContent for each element, concatenate the results into
a single string (preserving ordering), and return that string (still returning
'' when there are no elements) so the behavior of textContent and
WrappedSet.text aligns with multi-element aggregation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e147962e-9af1-40fe-a562-f0d2971bf3c7

📥 Commits

Reviewing files that changed from the base of the PR and between eaf91a6 and 74e1f44.

⛔ Files ignored due to path filters (3)
  • ghost/core/test/e2e-api/admin/__snapshots__/email-previews.test.js.snap is excluded by !**/*.snap
  • ghost/core/test/integration/services/email-service/__snapshots__/cards.test.js.snap is excluded by !**/*.snap
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (24)
  • ghost/core/core/frontend/services/rss/generate-feed.js
  • ghost/core/core/server/lib/html-utils.js
  • ghost/core/core/server/services/email-service/email-renderer.js
  • ghost/core/core/server/services/mentions/mention-discovery-service.js
  • ghost/core/core/server/services/mentions/mention-sending-service.js
  • ghost/core/core/server/services/mentions/mention.js
  • ghost/core/core/server/services/oembed/oembed-service.js
  • ghost/core/package.json
  • ghost/core/test/e2e-api/admin/email-previews.test.js
  • ghost/core/test/e2e-api/content/posts.test.js
  • ghost/core/test/e2e-api/members/send-magic-link.test.js
  • ghost/core/test/e2e-frontend/default-routes.test.js
  • ghost/core/test/e2e-frontend/email-routes.test.js
  • ghost/core/test/e2e-frontend/preview-routes.test.js
  • ghost/core/test/e2e-frontend/theme-i18n.test.js
  • ghost/core/test/integration/services/email-service/batch-sending.test.js
  • ghost/core/test/integration/services/email-service/cards.test.js
  • ghost/core/test/legacy/mock-express-style/api-vs-frontend.test.js
  • ghost/core/test/legacy/site/dynamic-routing.test.js
  • ghost/core/test/legacy/site/frontend.test.js
  • ghost/core/test/unit/server/lib/html-utils.test.js
  • ghost/core/test/unit/server/services/email-service/email-renderer.test.js
  • ghost/core/test/unit/server/services/mentions/mention.test.js
  • ghost/core/test/unit/server/services/mentions/mentions-api.test.js
🚧 Files skipped from review as they are similar to previous changes (16)
  • ghost/core/test/integration/services/email-service/cards.test.js
  • ghost/core/test/unit/server/services/mentions/mentions-api.test.js
  • ghost/core/test/unit/server/services/email-service/email-renderer.test.js
  • ghost/core/test/e2e-api/admin/email-previews.test.js
  • ghost/core/core/server/services/mentions/mention-sending-service.js
  • ghost/core/test/e2e-frontend/email-routes.test.js
  • ghost/core/test/e2e-api/content/posts.test.js
  • ghost/core/core/server/services/mentions/mention.js
  • ghost/core/core/server/services/oembed/oembed-service.js
  • ghost/core/test/unit/server/services/mentions/mention.test.js
  • ghost/core/core/frontend/services/rss/generate-feed.js
  • ghost/core/test/legacy/site/frontend.test.js
  • ghost/core/core/server/services/email-service/email-renderer.js
  • ghost/core/package.json
  • ghost/core/test/e2e-frontend/theme-i18n.test.js
  • ghost/core/test/e2e-frontend/default-routes.test.js

Comment on lines +178 to +185
function $(selectorOrElement) {
if (typeof selectorOrElement === 'string') {
const elements = selectAll(selectorOrElement, root);
return new WrappedSet(elements, root);
}
// Wrap a raw element
return new WrappedSet(selectorOrElement, root);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find places where optional element extraction might be wrapped back into $()
rg -nP '\.get\(\d+\)' ghost/core -g '!**/node_modules/**'
rg -nP '\$\(\s*[A-Za-z_$][A-Za-z0-9_$]*\s*\)' ghost/core -g '!**/node_modules/**'

Repository: TryGhost/Ghost

Length of output: 50371


🏁 Script executed:

# Read the html-utils.js file around the $ function and WrappedSet class
head -n 250 ghost/core/core/server/lib/html-utils.js | tail -n +100

Repository: TryGhost/Ghost

Length of output: 3297


🏁 Script executed:

# Check the WrappedSet constructor
grep -n "class WrappedSet\|function WrappedSet" ghost/core/core/server/lib/html-utils.js -A 20

Repository: TryGhost/Ghost

Length of output: 691


🏁 Script executed:

# Look at the test file to understand expected behavior
sed -n '1,100p' ghost/core/test/unit/server/lib/html-utils.test.js

Repository: TryGhost/Ghost

Length of output: 4030


🏁 Script executed:

# Check test cases around line 76 and surrounding tests
sed -n '70,100p' ghost/core/test/unit/server/lib/html-utils.test.js

Repository: TryGhost/Ghost

Length of output: 1233


🏁 Script executed:

# Look for any usages where $() might receive null/undefined
grep -n "$(null\|$(undefined" ghost/core/core/server -r -g '!**/node_modules/**' 2>/dev/null || echo "No direct patterns found"

Repository: TryGhost/Ghost

Length of output: 126


🏁 Script executed:

# Look at the complete html-utils.js file
wc -l ghost/core/core/server/lib/html-utils.js

Repository: TryGhost/Ghost

Length of output: 102


🏁 Script executed:

# Read the WrappedSet class methods to see if they assume valid elements
sed -n '18,150p' ghost/core/core/server/lib/html-utils.js

Repository: TryGhost/Ghost

Length of output: 3580


🏁 Script executed:

# Check email-renderer.js usage around line 409
sed -n '400,420p' ghost/core/core/server/services/email-service/email-renderer.js

Repository: TryGhost/Ghost

Length of output: 902


🏁 Script executed:

# Check mention-sending-service.js usage around line 191
sed -n '185,200p' ghost/core/core/server/services/mentions/mention-sending-service.js

Repository: TryGhost/Ghost

Length of output: 603


🏁 Script executed:

# Check the removeElement function
grep -n "function removeElement\|const removeElement" ghost/core/core/server/lib/html-utils.js -A 10

Repository: TryGhost/Ghost

Length of output: 40


🏁 Script executed:

# Check if there are any tests that call $() with null/undefined
grep -n "null\|undefined" ghost/core/test/unit/server/lib/html-utils.test.js | head -20

Repository: TryGhost/Ghost

Length of output: 532


🏁 Script executed:

# Look at the complete test file to see all tests
sed -n '1,250p' ghost/core/test/unit/server/lib/html-utils.test.js | tail -100

Repository: TryGhost/Ghost

Length of output: 4100


🏁 Script executed:

# Check the removeElement function to see if it handles null/undefined safely
sed -n '1,212p' ghost/core/core/server/lib/html-utils.js | grep -B5 -A15 "removeElement"

Repository: TryGhost/Ghost

Length of output: 1270


🏁 Script executed:

# Check textContent and getChildren functions
sed -n '1,212p' ghost/core/core/server/lib/html-utils.js | head -30

Repository: TryGhost/Ghost

Length of output: 1093


Handle nullish and array inputs in the $() function to return empty sets and prevent incorrect .length values.

When selectorOrElement is null, undefined, or an array, the function should treat these as empty sets rather than wrapping them as single invalid elements. Currently, passing null/undefined creates a WrappedSet with .length = 1 and an invalid element at index 0, causing incorrect behavior in .length checks and runtime failures in methods like .remove(), .text(), and .html().

🔧 Proposed fix
     function $(selectorOrElement) {
         if (typeof selectorOrElement === 'string') {
             const elements = selectAll(selectorOrElement, root);
             return new WrappedSet(elements, root);
         }
+        if (selectorOrElement === null || selectorOrElement === undefined) {
+            return new WrappedSet([], root);
+        }
+        if (Array.isArray(selectorOrElement)) {
+            return new WrappedSet(selectorOrElement, root);
+        }
         // Wrap a raw element
         return new WrappedSet(selectorOrElement, root);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ghost/core/core/server/lib/html-utils.js` around lines 178 - 185, The $()
wrapper currently builds a WrappedSet around any input including null/undefined
or arrays, producing a bogus length of 1; change the function so that when
selectorOrElement is nullish (selectorOrElement == null) or is an Array
(Array.isArray(selectorOrElement)), it returns new WrappedSet([], root) instead
of wrapping the raw value; keep the existing path that calls
selectAll(selectorOrElement, root) and the fallback raw-element wrapping for
real single elements, but short-circuit these nullish/array cases to prevent
invalid .length and runtime errors in WrappedSet methods like remove(), text(),
and html().

ref TryGhost/SDK#821
The direct cheerio 0.22 dependency causes yarn hoisting conflicts with
the cheerio ^1.0 that juice requires, leading to version resolution
issues across the monorepo. This replaces all cheerio usage with a thin
utility built on htmlparser2, css-select, domutils, and dom-serializer
— the same packages cheerio wraps — avoiding the conflict entirely.

Benchmark results (cheerio 0.22 → html-utils):

  Scenario                          Speedup
  OEmbed link discovery              2.3x
  Extract links (mentions)           5.8x
  Mention verification               6.8x
  Segment extraction                 5.9x
  Full email post-processing         2.1x
  Segment filtering + serialize      2.4x
  RSS feed generation               18.8x
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ghost/core/core/frontend/services/rss/generate-feed.js (1)

20-20: Consider hoisting the require to module level.

The require inside generateItem is called on every post iteration. While Node.js caches modules, hoisting it to the top of the file (alongside other requires) would be more consistent with the codebase style and slightly more efficient.

Suggested change

Move the require to the top of the file:

 const downsize = require('downsize');
 const RSS = require('rss');
 const urlUtils = require('../../../shared/url-utils');
 const {routerManager} = require('../routing');
+const {load} = require('../../../server/lib/html-utils');

Then simplify the function:

 const generateItem = function generateItem(post) {
-    const {load} = require('../../../server/lib/html-utils');
-
     const itemUrl = routerManager.getUrlByResourceId(post.id, {absolute: true});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ghost/core/core/frontend/services/rss/generate-feed.js` at line 20, The
require for '../../../server/lib/html-utils' is currently inside generateItem
and runs per post; hoist it to the module top with the other requires (importing
the load symbol) and remove the in-function require. Update generateItem to
reference the module-level load directly (and clean up any now-unnecessary local
variable) so the module is required once and the function is simpler.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ghost/core/core/frontend/services/rss/generate-feed.js`:
- Line 20: The require for '../../../server/lib/html-utils' is currently inside
generateItem and runs per post; hoist it to the module top with the other
requires (importing the load symbol) and remove the in-function require. Update
generateItem to reference the module-level load directly (and clean up any
now-unnecessary local variable) so the module is required once and the function
is simpler.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2228c32a-69d9-456f-b8ff-48a56ca31072

📥 Commits

Reviewing files that changed from the base of the PR and between 74e1f44 and 7795890.

⛔ Files ignored due to path filters (3)
  • ghost/core/test/e2e-api/admin/__snapshots__/email-previews.test.js.snap is excluded by !**/*.snap
  • ghost/core/test/integration/services/email-service/__snapshots__/cards.test.js.snap is excluded by !**/*.snap
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (24)
  • ghost/core/core/frontend/services/rss/generate-feed.js
  • ghost/core/core/server/lib/html-utils.js
  • ghost/core/core/server/services/email-service/email-renderer.js
  • ghost/core/core/server/services/mentions/mention-discovery-service.js
  • ghost/core/core/server/services/mentions/mention-sending-service.js
  • ghost/core/core/server/services/mentions/mention.js
  • ghost/core/core/server/services/oembed/oembed-service.js
  • ghost/core/package.json
  • ghost/core/test/e2e-api/admin/email-previews.test.js
  • ghost/core/test/e2e-api/content/posts.test.js
  • ghost/core/test/e2e-api/members/send-magic-link.test.js
  • ghost/core/test/e2e-frontend/default-routes.test.js
  • ghost/core/test/e2e-frontend/email-routes.test.js
  • ghost/core/test/e2e-frontend/preview-routes.test.js
  • ghost/core/test/e2e-frontend/theme-i18n.test.js
  • ghost/core/test/integration/services/email-service/batch-sending.test.js
  • ghost/core/test/integration/services/email-service/cards.test.js
  • ghost/core/test/legacy/mock-express-style/api-vs-frontend.test.js
  • ghost/core/test/legacy/site/dynamic-routing.test.js
  • ghost/core/test/legacy/site/frontend.test.js
  • ghost/core/test/unit/server/lib/html-utils.test.js
  • ghost/core/test/unit/server/services/email-service/email-renderer.test.js
  • ghost/core/test/unit/server/services/mentions/mention.test.js
  • ghost/core/test/unit/server/services/mentions/mentions-api.test.js
✅ Files skipped from review due to trivial changes (2)
  • ghost/core/core/server/services/mentions/mention.js
  • ghost/core/test/unit/server/lib/html-utils.test.js
🚧 Files skipped from review as they are similar to previous changes (10)
  • ghost/core/test/e2e-frontend/email-routes.test.js
  • ghost/core/test/legacy/site/frontend.test.js
  • ghost/core/test/legacy/site/dynamic-routing.test.js
  • ghost/core/test/e2e-frontend/theme-i18n.test.js
  • ghost/core/test/unit/server/services/mentions/mentions-api.test.js
  • ghost/core/test/e2e-frontend/default-routes.test.js
  • ghost/core/test/unit/server/services/email-service/email-renderer.test.js
  • ghost/core/core/server/lib/html-utils.js
  • ghost/core/package.json
  • ghost/core/core/server/services/oembed/oembed-service.js

@codecov
Copy link

codecov bot commented Mar 15, 2026

Codecov Report

❌ Patch coverage is 81.62393% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.19%. Comparing base (e67e006) to head (7795890).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
ghost/core/core/server/lib/html-utils.js 80.82% 40 Missing and 2 partials ⚠️
...t/core/core/frontend/services/rss/generate-feed.js 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #26819      +/-   ##
==========================================
+ Coverage   73.18%   73.19%   +0.01%     
==========================================
  Files        1534     1535       +1     
  Lines      121070   121285     +215     
  Branches    14640    14682      +42     
==========================================
+ Hits        88610    88780     +170     
- Misses      31429    31471      +42     
- Partials     1031     1034       +3     
Flag Coverage Δ
admin-tests 54.31% <ø> (-0.03%) ⬇️
e2e-tests 73.19% <81.62%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@kevinansfield kevinansfield requested a review from sagzy March 16, 2026 11:46
Copy link
Contributor

@EvanHahn EvanHahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. Will read more soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this file change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The email output switched over to the same character encoding we used in the source files and matches the expected juice behaviour when it hasn't had its cheerio dependency hoisted from 1.0.0 back to 0.22.

assert(response.html.includes('some text for both'));
assert(!response.html.includes('finishing part only for members'));
assert(response.html.includes('Devenez un(e) abonn&#xE9;(e) payant de Cathy&#39;s Blog pour acc&#xE9;der &#xE0; du contenu exclusif'));
assert(response.html.includes('Devenez un(e) abonné(e) payant de Cathy&#39;s Blog pour accéder à du contenu exclusif'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about this behavior change. I admit I know little about the email renderer, but I think escaped HTML entities are probably more reliable than full Unicode characters. Maybe this is fine, but wanted to flag.

const elements = $('p').get();
const wrapped = $(elements);
assert.equal(wrapped.length, 2);
assert.equal(wrapped.attr('href'), undefined);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need this assertion?

Suggested change
assert.equal(wrapped.attr('href'), undefined);

Comment on lines +20 to +21
this._elements = Array.isArray(elements) ? elements : [elements];
this._root = root;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we use proper JavaScript private members (this.#elements and this.#root) instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see JSDoc types for this file. Happy to help with that.

Comment on lines +162 to +164
node.parent = parent;
node.prev = i > 0 ? parent.children[i - 1] : null;
node.next = i < parent.children.length - 1 ? parent.children[i + 1] : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if this could be DRYed out with what's above in html().

it('returns all elements when called without index', function () {
const $ = load('<p>1</p><p>2</p>');
const elements = $('p').get();
assert.equal(elements.length, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we also assert on what's returned here, beyond the length? A check for tagName like you've done below is probably enough.

const $ = load('<p>1</p><p>2</p>');
const arr = $('p').toArray();
assert.equal(arr.length, 2);
assert.notEqual(arr, $('p').get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these will never be equal because notEqual checks reference equality.

const $ = load('<a href="http://example.com">link</a>');
assert.equal($('a').attr('href'), 'http://example.com');
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd love to see tests for:

  • Getting an empty attribute value (e.g., <a href="">link</a>)
  • What happens when the same attribute is duplicated (e.g., <a href="x" href="y">link</a>)
  • If capitalization is affected (e.g., <a hREF="#foo">link</a>)

addClass(className) {
for (const el of this._elements) {
if (el.attribs) {
const existing = el.attribs.class || '';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the ternary on the line below makes this || '' unnecessary.

@@ -0,0 +1,219 @@
const {parseDocument} = require('htmlparser2');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my reading of the Cheerio source, it uses Parse5 unless you're in XML mode. Why use htmlparser2 here?

$('p').before('<span>A</span><span>B</span>');
assert.equal($.html(), '<span>A</span><span>B</span><p>target</p>');
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you add a test like this?

it('inserts before each match', function () {
  const $ = load('<p>a</p><p>b</p>');
  $('p').before('<span>x</span>');
  assert.equal($.html(), '<span>x</span><p>a</p><span>x</span><p>b</p>');
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants