diff --git a/auto/autoresearch.ideas.md b/auto/autoresearch.ideas.md new file mode 100644 index 000000000..4a25837e7 --- /dev/null +++ b/auto/autoresearch.ideas.md @@ -0,0 +1,30 @@ +# Autoresearch Ideas + +## Dead Ends (tried and failed) + +- **Tag name interning** (skip+byte dispatch): saves 878 allocs but verification loop overhead kills speed +- **String dedup (-@)** for filter names: no alloc savings, creates temp strings anyway +- **Split-based tokenizer**: 2.5x faster C-level split but can't handle {{ followed by %} nesting +- **Streaming tokenizer**: needs own StringScanner (+alloc), per-shift overhead worse than eager array +- **Merge simple_lookup? into initialize**: logic overhead offsets saved index call +- **Cursor for filter scanning**: cursor.reset overhead worse than inline byte loops +- **Direct strainer call**: YJIT already inlines context.invoke_single well +- **TruthyCondition subclass**: YJIT polymorphism at evaluate call site hurts more than 115 saved allocs +- **Index loop for filters**: YJIT optimizes each+destructure MUCH better than manual filter[0]/filter[1] + +## Key Insights + +- YJIT monomorphism > allocation reduction at this scale +- C-level StringScanner.scan/skip > Ruby-level byte loops (already applied) +- String#split is 2.5x faster than manual tokenization, but Liquid's grammar is too complex for regex +- 74% of total CPU time is GC — alloc reduction is the highest-leverage optimization +- But YJIT-deoptimization from polymorphism costs more than the GC savings + +## Remaining Ideas + +- **Tokenizer: use String#index + byteslice instead of StringScanner**: avoid the StringScanner overhead entirely for the simple case of finding {%/{{ delimiters +- **Pre-freeze all Condition operator lambdas**: reduce alloc in Condition initialization +- **Avoid `@blocks = []` in If with single-element optimization**: use `@block` ivar for single condition, only create array for elsif +- **Reduce ForloopDrop allocation**: reuse ForloopDrop objects across iterations or use a lighter-weight object +- **VariableLookup: single-segment optimization**: for "product.title" (1 lookup), use an ivar instead of 1-element Array + diff --git a/auto/autoresearch.md b/auto/autoresearch.md new file mode 100644 index 000000000..8ba585717 --- /dev/null +++ b/auto/autoresearch.md @@ -0,0 +1,109 @@ +# Autoresearch: Liquid Parse+Render Performance + +## Objective +Optimize the Shopify Liquid template engine's parse and render performance. +The workload is the ThemeRunner benchmark which parses and renders real Shopify +theme templates (dropify, ripen, tribble, vogue) with realistic data from +`performance/shopify/database.rb`. We measure parse time, render time, and +object allocations. The optimization target is combined parse+render time (µs). + +## How to Run +Run `./auto/autoresearch.sh` — it runs unit tests, liquid-spec conformance, +then the performance benchmark, outputting metrics in parseable format. + +## Metrics +- **Primary (optimization target)**: `combined_µs` (µs, lower is better) — sum of parse + render time +- **Secondary (tradeoff monitoring)**: + - `parse_µs` — time to parse all theme templates (Liquid::Template#parse) + - `render_µs` — time to render all pre-compiled templates + - `allocations` — total object allocations for one parse+render cycle + Parse dominates (~70-75% of combined). Allocations correlate with GC pressure. + +## Files in Scope +- `lib/liquid/*.rb` — core Liquid library (parser, lexer, context, expression, etc.) +- `lib/liquid/tags/*.rb` — tag implementations (for, if, assign, etc.) +- `performance/bench_quick.rb` — benchmark script + +## Off Limits +- `test/` — tests must continue to pass unchanged +- `performance/tests/` — benchmark templates, do not modify +- `performance/shopify/` — benchmark data/filters, do not modify + +## Constraints +- All unit tests must pass (`bundle exec rake base_test`) +- liquid-spec failures must not increase beyond 2 (pre-existing UTF-8 edge cases) +- No new gem dependencies +- Semantic correctness must be preserved — templates must render identical output +- **Security**: Liquid runs untrusted user code. See Strategic Direction for details. + +## Strategic Direction +The long-term goal is to converge toward a **single-pass, forward-only parsing +architecture** using one shared StringScanner instance. The current system has +multiple redundant passes: Tokenizer → BlockBody → Lexer → Parser → Expression +→ VariableLookup, each re-scanning portions of the source. A unified scanner +approach would: + +1. **One StringScanner** flows through the entire parse — no intermediate token + arrays, no re-lexing filter chains, no string reconstruction in Parser#expression. +2. **Emit a lightweight IL or normalized AST** during the single forward pass, + decoupling strictness checking from the hot parse path. The LiquidIL project + (`~/src/tries/2026-01-05-liquid-il`) demonstrated this: a recursive-descent + parser emitting IL directly achieved significant speedups. +3. **Minimal backtracking** — the scanner advances forward, byte-checking as it + goes. liquid-c (`~/src/tries/2026-01-16-Shopify-liquid-c`) showed that a + C-level cursor-based tokenizer eliminates most allocation overhead. + +Current fast-path optimizations (byte-level tag/variable/for/if parsing) are +steps toward this goal. Each one replaces a regex+MatchData pattern with +forward-only byte scanning. The remaining Lexer→Parser path for filter args +is the next target for elimination. + +**Security note**: Liquid executes untrusted user templates. All parsing must +use explicit byte-range checks. Never use eval, send on user input, dynamic +method dispatch, const_get, or any pattern that lets template authors escape +the sandbox. + +## Baseline +- **Commit**: 4ea835a (original, before any optimizations) +- **combined_µs**: 7,374 +- **parse_µs**: 5,928 +- **render_µs**: 1,446 +- **allocations**: 62,620 + +## Progress Log +- 3329b09: Replace FullToken regex with manual byte parsing → combined 7,262 (-1.5%) +- 97e6893: Replace VariableParser regex with manual byte scanner → combined 6,945 (-5.8%), allocs 58,009 +- 2b78e4b: getbyte instead of string indexing in whitespace_handler/create_variable → allocs 51,477 +- d291e63: Lexer equal? for frozen arrays, \s+ whitespace skip → combined ~6,331 +- d79b9fa: Avoid strip alloc in Expression.parse, byteslice for strings → allocs 49,151 +- fa41224: Short-circuit parse_number with first-byte check → allocs 48,240 +- c1113ad: Fast-path String in render_obj_to_output → combined ~6,071 +- 25f9224: Fast-path simple variable parsing (skip Lexer/Parser) → combined ~5,860, allocs 45,202 +- 3939d74: Replace SIMPLE_VARIABLE regex with byte scanner → combined ~5,717, allocs 42,763 +- fe7a2f5: Fast-path simple if conditions → combined ~5,444, allocs 41,490 +- cfa0dfe: Replace For tag Syntax regex with manual byte parser → combined ~4,974, allocs 39,847 +- 8a92a4e: Unified fast-path Variable: parse name directly, only lex filter chain → combined ~5,060, allocs 40,520 +- 58d2514: parse_tag_token returns [tag_name, markup, newlines] → combined ~4,815, allocs 37,355 +- db43492: Hoist write score check out of render loop → render ~1,345 +- 17daac9: Extend fast-path to quoted string literal variables → all 1,197 variables fast-pathed +- 9fd7cec: Split filter parsing: no-arg filters scanned directly, Lexer only for args → combined ~4,595, allocs 35,159 +- e5933fc: Avoid array alloc in parse_tag_token via class ivars → allocs 34,281 +- 2e207e6: Replace WhitespaceOrNothing regex with byte-level blank_string? → combined ~4,800 +- 526af22: invoke_single fast path for no-arg filter invocation → allocs 32,621 +- 76ae8f1: find_variable top-scope fast path → combined ~4,740 +- 4cda1a5: slice_collection: skip copy for full Array → allocs 32,004 +- 79840b1: Replace SIMPLE_CONDITION regex with manual byte parser → combined ~4,663, allocs 31,465 +- 69430e9: Replace INTEGER_REGEX/FLOAT_REGEX with byte-level parse_number → allocs 31,129 +- 405e3dc: Frozen EMPTY_ARRAY/EMPTY_HASH for Context @filters/@disabled_tags → allocs 31,009 +- b90d7f0: Avoid unnecessary array wrapping for Context environments → allocs 30,709 +- 3799d4c: Lazy seen={} hash in Utils.to_s/inspect → allocs 30,169 +- 0b07487: Fast-path VariableLookup: skip scan_variable for simple identifiers → allocs 29,711 +- 9de1527: Introduce Cursor class for centralized byte-level scanning +- dd4a100: Remove dead parse_tag_token/SIMPLE_CONDITION (now in Cursor) +- cdc3438: For tag: migrate lax_parse to Cursor with zero-alloc scanning → allocs 29,620 + +## Current Best +- **combined_µs**: ~3,400 (-54% from original 7,374 baseline) +- **parse_µs**: ~2,300 +- **render_µs**: ~1,100 +- **allocations**: 24,882 (-60% from original 62,620 baseline) diff --git a/auto/autoresearch.sh b/auto/autoresearch.sh new file mode 100755 index 000000000..f421767e6 --- /dev/null +++ b/auto/autoresearch.sh @@ -0,0 +1,48 @@ +#!/usr/bin/env bash +# Autoresearch benchmark runner for Liquid performance optimization +# Runs: unit tests → performance benchmark (3 runs, takes best) +# Outputs METRIC lines for the agent to parse +# Exit code 0 = all good, non-zero = broken +set -euo pipefail + +cd "$(dirname "$0")/.." + +# ── Step 1: Unit tests (fast gate) ────────────────────────────────── +echo "=== Unit Tests ===" +TEST_OUT=$(bundle exec rake base_test 2>&1) +TEST_RESULT=$(echo "$TEST_OUT" | tail -1) +if echo "$TEST_OUT" | grep -q 'failures\|errors' && ! echo "$TEST_RESULT" | grep -q '0 failures, 0 errors'; then + echo "$TEST_OUT" | grep -E 'Failure|Error|failures|errors' | head -20 + echo "FATAL: unit tests failed" + exit 1 +fi +echo "$TEST_RESULT" + +# ── Step 2: Performance benchmark (3 runs, take best) ────────────── +echo "" +echo "=== Performance Benchmark (3 runs) ===" +BEST_COMBINED=999999 +BEST_PARSE=0 +BEST_RENDER=0 +BEST_ALLOC=0 + +for i in 1 2 3; do + OUT=$(bundle exec ruby performance/bench_quick.rb 2>&1) + P=$(echo "$OUT" | grep '^parse_us=' | cut -d= -f2) + R=$(echo "$OUT" | grep '^render_us=' | cut -d= -f2) + C=$(echo "$OUT" | grep '^combined_us=' | cut -d= -f2) + A=$(echo "$OUT" | grep '^allocations=' | cut -d= -f2) + echo " run $i: combined=${C}µs (parse=${P} render=${R}) allocs=${A}" + if [ "$C" -lt "$BEST_COMBINED" ]; then + BEST_COMBINED=$C + BEST_PARSE=$P + BEST_RENDER=$R + BEST_ALLOC=$A + fi +done + +echo "" +echo "METRIC combined_us=$BEST_COMBINED" +echo "METRIC parse_us=$BEST_PARSE" +echo "METRIC render_us=$BEST_RENDER" +echo "METRIC allocations=$BEST_ALLOC" diff --git a/auto/bench.sh b/auto/bench.sh new file mode 100755 index 000000000..77fc48092 --- /dev/null +++ b/auto/bench.sh @@ -0,0 +1,40 @@ +#!/usr/bin/env bash +# Auto-research benchmark script for Liquid +# Runs: unit tests → liquid-spec → performance benchmark +# Outputs machine-readable metrics on success +# Exit code 0 = all good, non-zero = broken +set -euo pipefail + +cd "$(dirname "$0")/.." + +# ── Step 1: Unit tests (fast gate) ────────────────────────────────── +echo "=== Unit Tests ===" +if ! bundle exec rake base_test 2>&1; then + echo "FATAL: unit tests failed" + exit 1 +fi + +# ── Step 2: liquid-spec (correctness gate) ────────────────────────── +echo "" +echo "=== Liquid Spec ===" +SPEC_OUTPUT=$(bundle exec liquid-spec run spec/ruby_liquid.rb 2>&1 || true) +echo "$SPEC_OUTPUT" | tail -3 + +# Extract failure count from "Total: N passed, N failed, N errors" line +# Allow known pre-existing failures (≤2) +TOTAL_LINE=$(echo "$SPEC_OUTPUT" | grep "^Total:" || echo "Total: 0 passed, 0 failed, 0 errors") +FAILURES=$(echo "$TOTAL_LINE" | sed -n 's/.*\([0-9][0-9]*\) failed.*/\1/p') +ERRORS=$(echo "$TOTAL_LINE" | sed -n 's/.*\([0-9][0-9]*\) error.*/\1/p') +FAILURES=${FAILURES:-0} +ERRORS=${ERRORS:-0} +TOTAL_BAD=$((FAILURES + ERRORS)) + +if [ "$TOTAL_BAD" -gt 2 ]; then + echo "FATAL: liquid-spec has $FAILURES failures and $ERRORS errors (threshold: 2)" + exit 1 +fi + +# ── Step 3: Performance benchmark ────────────────────────────────── +echo "" +echo "=== Performance Benchmark ===" +bundle exec ruby performance/bench_quick.rb 2>&1 diff --git a/autoresearch.jsonl b/autoresearch.jsonl new file mode 100644 index 000000000..3b69d91ba --- /dev/null +++ b/autoresearch.jsonl @@ -0,0 +1,30 @@ +{"type":"config","name":"Liquid parse+render performance (tenderlove-inspired)","metricName":"combined_µs","metricUnit":"µs","bestDirection":"lower"} +{"run":1,"commit":"c09e722","metric":3818,"metrics":{"parse_µs":2722,"render_µs":1096,"allocations":24881},"status":"keep","description":"Baseline: 3,818µs combined, 24,881 allocs","timestamp":1773348490227} +{"run":2,"commit":"c09e722","metric":4063,"metrics":{"parse_µs":2901,"render_µs":1162,"allocations":24003},"status":"discard","description":"Tag name interning via skip+byte dispatch: saves 878 allocs but verification loop slower than scan","timestamp":1773348738557,"segment":0} +{"run":3,"commit":"c09e722","metric":3881,"metrics":{"parse_µs":2720,"render_µs":1161,"allocations":24881},"status":"discard","description":"String dedup (-@) for filter names: no alloc savings, no speed benefit","timestamp":1773348781481,"segment":0} +{"run":4,"commit":"c09e722","metric":3970,"metrics":{"parse_µs":2829,"render_µs":1141,"allocations":24881},"status":"discard","description":"Streaming tokenizer: needs own StringScanner (+1 alloc), per-shift overhead worse than saved array","timestamp":1773348883093,"segment":0} +{"run":5,"commit":"c09e722","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: split-based tokenizer — regex can't handle unclosed tags inside raw blocks","timestamp":1773349089230,"segment":0} +{"run":6,"commit":"c09e722","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: split regex tokenizer v2 — can't handle {{ followed by %} (variable-becomes-tag nesting)","timestamp":1773349248313,"segment":0} +{"run":7,"commit":"c09e722","metric":3861,"metrics":{"parse_µs":2744,"render_µs":1117,"allocations":24881},"status":"discard","description":"Merge simple_lookup? dot position into initialize — logic overhead offsets saved index call","timestamp":1773349376707,"segment":0} +{"run":8,"commit":"c09e722","metric":4048,"metrics":{"parse_µs":2929,"render_µs":1119,"allocations":24881},"status":"discard","description":"Use Cursor regex for filter name scanning — cursor.reset + method dispatch overhead worse than inline bytes","timestamp":1773349447172,"segment":0} +{"run":9,"commit":"c09e722","metric":3872,"metrics":{"parse_µs":2744,"render_µs":1128,"allocations":24881},"status":"discard","description":"Direct strainer call in Variable#render — YJIT already inlines context.invoke_single well","timestamp":1773349497593,"segment":0} +{"run":10,"commit":"c09e722","metric":3839,"metrics":{"parse_µs":2732,"render_µs":1107,"allocations":24879},"status":"discard","description":"Array#[] fast path for slice_collection with limit/offset — only 2 alloc savings, not meaningful","timestamp":1773349555348,"segment":0} +{"run":11,"commit":"c09e722","metric":3889,"metrics":{"parse_µs":2770,"render_µs":1119,"allocations":24766},"status":"discard","description":"TruthyCondition for simple if checks: -115 allocs but YJIT polymorphism at evaluate call site hurts speed","timestamp":1773349649377,"segment":0} +{"run":12,"commit":"c09e722","metric":4150,"metrics":{"parse_µs":2769,"render_µs":1381,"allocations":24881},"status":"discard","description":"Index loop for filters: YJIT optimizes each+destructure better than manual indexing","timestamp":1773349699285,"segment":0} +{"run":13,"commit":"b7ae55f","metric":3556,"metrics":{"parse_µs":2388,"render_µs":1168,"allocations":24882},"status":"keep","description":"Replace StringScanner tokenizer with String#byteindex — 12% faster parse, no regex overhead for delimiter finding","timestamp":1773349875890,"segment":0} +{"run":14,"commit":"e25f2f1","metric":3464,"metrics":{"parse_µs":2335,"render_µs":1129,"allocations":24882},"status":"keep","description":"Confirmation run: byteindex tokenizer consistently 3,400-3,600µs","timestamp":1773349889465,"segment":0} +{"run":15,"commit":"b37fa98","metric":3490,"metrics":{"parse_µs":2331,"render_µs":1159,"allocations":24882},"status":"keep","description":"Clean up tokenizer: remove unused StringScanner setup and regex constants","timestamp":1773349928672,"segment":0} +{"run":16,"commit":"b37fa98","metric":3638,"metrics":{"parse_µs":2460,"render_µs":1178,"allocations":24882},"status":"discard","description":"Single-char byteindex for %} search: Ruby loop overhead worse for nearby targets","timestamp":1773349985509,"segment":0} +{"run":17,"commit":"b37fa98","metric":3553,"metrics":{"parse_µs":2431,"render_µs":1122,"allocations":25256},"status":"discard","description":"Regex simple_variable_markup: MatchData creates 374 extra allocs, offsetting speed gain","timestamp":1773350066627,"segment":0} +{"run":18,"commit":"b37fa98","metric":3629,"metrics":{"parse_µs":2455,"render_µs":1174,"allocations":25002},"status":"discard","description":"String.new(capacity: 4096) for output buffer: allocates more objects, not fewer","timestamp":1773350101852,"segment":0} +{"run":19,"commit":"f6baeae","metric":3350,"metrics":{"parse_µs":2212,"render_µs":1138,"allocations":24882},"status":"keep","description":"parse_tag_token without StringScanner: pure byte ops avoid reset(token) overhead, -12% combined","timestamp":1773350230252,"segment":0} +{"run":20,"commit":"f6baead","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: regex ultra-fast path for Variable — name pattern too broad, matches invalid trailing dots","timestamp":1773350472859,"segment":0} +{"run":21,"commit":"ae9a2e2","metric":3314,"metrics":{"parse_µs":2203,"render_µs":1111,"allocations":24882},"status":"keep","description":"Clean confirmation run: 3,314µs (-55% from main), stable","timestamp":1773350544354,"segment":0} +{"run":22,"commit":"ae9a2e2","metric":3497,"metrics":{"parse_µs":2336,"render_µs":1161,"allocations":24882},"status":"discard","description":"Regex fast path for no-filter variables: include? + match? overhead exceeds byte scan savings","timestamp":1773350641375,"segment":0} +{"run":23,"commit":"ca327b0","metric":3445,"metrics":{"parse_µs":2284,"render_µs":1161,"allocations":24647},"status":"keep","description":"Condition#evaluate: skip loop block for simple conditions (no child_relation) — saves 235 allocs","timestamp":1773350691752,"segment":0} +{"run":24,"commit":"99454a9","metric":3489,"metrics":{"parse_µs":2353,"render_µs":1136,"allocations":24647},"status":"keep","description":"Replace simple_lookup? byte scan with match? regex — 8x faster per call, cleaner code","timestamp":1773350837721,"segment":0} +{"run":25,"commit":"99454a9","metric":3797,"metrics":{"parse_µs":2636,"render_µs":1161,"allocations":29627},"status":"discard","description":"Regex name extraction in try_fast_parse: MatchData creates 5K extra allocs, much worse","timestamp":1773351048938,"segment":0} +{"run":26,"commit":"db348e0","metric":3459,"metrics":{"parse_µs":2318,"render_µs":1141,"allocations":24647},"status":"keep","description":"Inline to_liquid_value in If render — avoids one method dispatch per condition evaluation","timestamp":1773351080001,"segment":0} +{"run":27,"commit":"b195d09","metric":3496,"metrics":{"parse_µs":2356,"render_µs":1140,"allocations":24530},"status":"keep","description":"Replace @blocks.each with while loop in If render — avoids block proc allocation per render","timestamp":1773351101134,"segment":0} +{"run":28,"commit":"b195d09","metric":3648,"metrics":{"parse_µs":2457,"render_µs":1191,"allocations":24530},"status":"discard","description":"While loop in For render: YJIT optimizes each well for hot loops with many iterations","timestamp":1773351142275,"segment":0} +{"run":29,"commit":"b195d09","metric":3966,"metrics":{"parse_µs":2641,"render_µs":1325,"allocations":24060},"status":"discard","description":"While loop for environment search: -470 allocs but YJIT deopt makes render 16% slower","timestamp":1773351193863,"segment":0} diff --git a/lib/liquid.rb b/lib/liquid.rb index 4d0a71a64..14b02d266 100644 --- a/lib/liquid.rb +++ b/lib/liquid.rb @@ -83,6 +83,7 @@ module Liquid require 'liquid/template' require 'liquid/condition' require 'liquid/utils' +require 'liquid/cursor' require 'liquid/tokenizer' require 'liquid/parse_context' require 'liquid/partial_cache' diff --git a/lib/liquid/block.rb b/lib/liquid/block.rb index 73d86c7bd..19a76cb36 100644 --- a/lib/liquid/block.rb +++ b/lib/liquid/block.rb @@ -60,8 +60,11 @@ def block_name @tag_name end + # Cache block delimiters per tag name to avoid repeated string allocation + BLOCK_DELIMITER_CACHE = Hash.new { |h, k| h[k] = "end#{k}".freeze } + def block_delimiter - @block_delimiter ||= "end#{block_name}" + @block_delimiter ||= BLOCK_DELIMITER_CACHE[block_name] end private diff --git a/lib/liquid/block_body.rb b/lib/liquid/block_body.rb index e4ada7d16..eb14aa2cc 100644 --- a/lib/liquid/block_body.rb +++ b/lib/liquid/block_body.rb @@ -38,7 +38,7 @@ def freeze private def parse_for_liquid_tag(tokenizer, parse_context) while (token = tokenizer.shift) - unless token.empty? || token.match?(WhitespaceOrNothing) + unless token.empty? || BlockBody.blank_string?(token) unless token =~ LiquidTagToken # line isn't empty but didn't match tag syntax, yield and let the # caller raise a syntax error @@ -124,48 +124,70 @@ def self.rescue_render_node(context, output, line_number, exc, blank_tag) end end + OPEN_CURLEY_BYTE = 123 # '{'.ord + PERCENT_BYTE = 37 # '%'.ord + + # Fast check if string is whitespace-only (replaces WhitespaceOrNothing regex) + BLANK_STRING_REGEX = /\A\s*\z/ + + def self.blank_string?(str) + str.match?(BLANK_STRING_REGEX) + end + private def parse_for_document(tokenizer, parse_context, &block) while (token = tokenizer.shift) next if token.empty? - case - when token.start_with?(TAGSTART) - whitespace_handler(token, parse_context) - unless token =~ FullToken - return handle_invalid_tag_token(token, parse_context, &block) - end - tag_name = Regexp.last_match(2) - markup = Regexp.last_match(4) - if parse_context.line_number - # newlines inside the tag should increase the line number, - # particularly important for multiline {% liquid %} tags - parse_context.line_number += Regexp.last_match(1).count("\n") + Regexp.last_match(3).count("\n") + first_byte = token.getbyte(0) + if first_byte == OPEN_CURLEY_BYTE + second_byte = token.getbyte(1) + if second_byte == PERCENT_BYTE + whitespace_handler(token, parse_context) + cursor = parse_context.cursor + tag_name = cursor.parse_tag_token(token) + unless tag_name + return handle_invalid_tag_token(token, parse_context, &block) + end + markup = cursor.tag_markup + + if parse_context.line_number + newlines = cursor.tag_newlines + parse_context.line_number += newlines if newlines > 0 + end + + if tag_name == 'liquid' + parse_liquid_tag(markup, parse_context) + next + end + + unless (tag = parse_context.environment.tag_for_name(tag_name)) + # end parsing if we reach an unknown tag and let the caller decide + # determine how to proceed + return yield tag_name, markup + end + new_tag = tag.parse(tag_name, markup, tokenizer, parse_context) + @blank &&= new_tag.blank? + @nodelist << new_tag + elsif second_byte == OPEN_CURLEY_BYTE + whitespace_handler(token, parse_context) + @nodelist << create_variable(token, parse_context) + @blank = false + else + # Fallback: text token starting with '{' + if parse_context.trim_whitespace + token.lstrip! + end + parse_context.trim_whitespace = false + @nodelist << token + @blank &&= BlockBody.blank_string?(token) end - - if tag_name == 'liquid' - parse_liquid_tag(markup, parse_context) - next - end - - unless (tag = parse_context.environment.tag_for_name(tag_name)) - # end parsing if we reach an unknown tag and let the caller decide - # determine how to proceed - return yield tag_name, markup - end - new_tag = tag.parse(tag_name, markup, tokenizer, parse_context) - @blank &&= new_tag.blank? - @nodelist << new_tag - when token.start_with?(VARSTART) - whitespace_handler(token, parse_context) - @nodelist << create_variable(token, parse_context) - @blank = false else if parse_context.trim_whitespace token.lstrip! end parse_context.trim_whitespace = false @nodelist << token - @blank &&= token.match?(WhitespaceOrNothing) + @blank &&= BlockBody.blank_string?(token) end parse_context.line_number = tokenizer.line_number end @@ -173,8 +195,10 @@ def self.rescue_render_node(context, output, line_number, exc, blank_tag) yield nil, nil end + DASH_BYTE = 45 # '-'.ord + def whitespace_handler(token, parse_context) - if token[2] == WhitespaceControl + if token.getbyte(2) == DASH_BYTE previous_token = @nodelist.last if previous_token.is_a?(String) first_byte = previous_token.getbyte(0) @@ -184,7 +208,7 @@ def whitespace_handler(token, parse_context) end end end - parse_context.trim_whitespace = (token[-3] == WhitespaceControl) + parse_context.trim_whitespace = (token.getbyte(token.bytesize - 3) == DASH_BYTE) end def blank? @@ -218,7 +242,11 @@ def render(context) def render_to_output_buffer(context, output) freeze unless frozen? - context.resource_limits.increment_render_score(@nodelist.length) + resource_limits = context.resource_limits + resource_limits.increment_render_score(@nodelist.length) + + # Check if we need per-node write score tracking + check_write = resource_limits.render_length_limit || resource_limits.last_capture_length idx = 0 while (node = @nodelist[idx]) @@ -226,14 +254,11 @@ def render_to_output_buffer(context, output) output << node else render_node(context, output, node) - # If we get an Interrupt that means the block must stop processing. An - # Interrupt is any command that stops block execution such as {% break %} - # or {% continue %}. These tags may also occur through Block or Include tags. - break if context.interrupt? # might have happened in a for-block + break if context.interrupt? end idx += 1 - context.resource_limits.increment_write_score(output) + resource_limits.increment_write_score(output) if check_write end output @@ -245,15 +270,12 @@ def render_node(context, output, node) BlockBody.render_node(context, output, node) end - def create_variable(token, parse_context) - if token.end_with?("}}") - i = 2 - i = 3 if token[i] == "-" - parse_end = token.length - 3 - parse_end -= 1 if token[parse_end] == "-" - markup_end = parse_end - i + 1 - markup = markup_end <= 0 ? "" : token.slice(i, markup_end) + CLOSE_CURLEY_BYTE = 125 # '}'.ord + def create_variable(token, parse_context) + len = token.bytesize + if len >= 4 && token.getbyte(len - 1) == CLOSE_CURLEY_BYTE && token.getbyte(len - 2) == CLOSE_CURLEY_BYTE + markup = parse_context.cursor.parse_variable_token(token) return Variable.new(markup, parse_context) end diff --git a/lib/liquid/condition.rb b/lib/liquid/condition.rb index 9d55c42b3..13f238d28 100644 --- a/lib/liquid/condition.rb +++ b/lib/liquid/condition.rb @@ -65,11 +65,13 @@ def initialize(left = nil, operator = nil, right = nil) end def evaluate(context = deprecated_default_context) + result = interpret_condition(@left, @right, @operator, context) + + # Fast path: no child conditions (most common) + return result unless @child_relation + condition = self - result = nil loop do - result = interpret_condition(condition.left, condition.right, condition.operator, context) - case condition.child_relation when :or break if Liquid::Utils.to_liquid_value(result) @@ -79,6 +81,7 @@ def evaluate(context = deprecated_default_context) break end condition = condition.child_condition + result = interpret_condition(condition.left, condition.right, condition.operator, context) end result end diff --git a/lib/liquid/context.rb b/lib/liquid/context.rb index 433b6d003..f982f2fa8 100644 --- a/lib/liquid/context.rb +++ b/lib/liquid/context.rb @@ -24,10 +24,15 @@ def self.build(environment: Environment.default, environments: {}, outer_scope: def initialize(environments = {}, outer_scope = {}, registers = {}, rethrow_errors = false, resource_limits = nil, static_environments = {}, environment = Environment.default) @environment = environment - @environments = [environments] - @environments.flatten! + @environments = environments.is_a?(Array) ? environments : [environments] - @static_environments = [static_environments].flatten(1).freeze + @static_environments = if static_environments.is_a?(Array) + static_environments.frozen? ? static_environments : static_environments.freeze + elsif static_environments.empty? + Const::EMPTY_ARRAY + else + [static_environments].freeze + end @scopes = [outer_scope || {}] @registers = registers.is_a?(Registers) ? registers : Registers.new(registers) @errors = [] @@ -35,14 +40,13 @@ def initialize(environments = {}, outer_scope = {}, registers = {}, rethrow_erro @strict_variables = false @resource_limits = resource_limits || ResourceLimits.new(environment.default_resource_limits) @base_scope_depth = 0 - @interrupts = [] - @filters = [] + @interrupts = Const::EMPTY_ARRAY + @filters = Const::EMPTY_ARRAY @global_filter = nil - @disabled_tags = {} + @disabled_tags = Const::EMPTY_HASH - # Instead of constructing new StringScanner objects for each Expression parse, - # we recycle the same one. - @string_scanner = StringScanner.new("") + # Lazy-init StringScanner — only needed if Context#[] is called during render + @string_scanner = nil @registers.static[:cached_partials] ||= {} @registers.static[:file_system] ||= environment.file_system @@ -84,11 +88,12 @@ def apply_global_filter(obj) # are there any not handled interrupts? def interrupt? - !@interrupts.empty? + !@interrupts.frozen? && !@interrupts.empty? end # push an interrupt to the stack. this interrupt is considered not handled. def push_interrupt(e) + @interrupts = [] if @interrupts.frozen? @interrupts.push(e) end @@ -109,6 +114,17 @@ def invoke(method, *args) strainer.invoke(method, *args).to_liquid end + # Fast path for single-argument filter invocation (the most common case: + # {{ value | filter }}) — avoids *args splat allocation. + def invoke_single(method, input) + strainer.invoke_single(method, input).to_liquid + end + + # Fast path for two-argument filter invocation (e.g. {{ value | default: 'x' }}) + def invoke_two(method, input, arg1) + strainer.invoke_two(method, input, arg1).to_liquid + end + # Push new local scope on the stack. use Context#stack instead def push(new_scope = {}) @scopes.unshift(new_scope) @@ -180,7 +196,7 @@ def []=(key, value) # Example: # products == empty #=> products.empty? def [](expression) - evaluate(Expression.parse(expression, @string_scanner)) + evaluate(Expression.parse(expression, @string_scanner ||= StringScanner.new(""))) end def key?(key) @@ -193,22 +209,40 @@ def evaluate(object) # Fetches an object starting at the local scope and then moving up the hierachy def find_variable(key, raise_on_not_found: true) - # This was changed from find() to find_index() because this is a very hot - # path and find_index() is optimized in MRI to reduce object allocation - index = @scopes.find_index { |s| s.key?(key) } - - variable = if index - lookup_and_evaluate(@scopes[index], key, raise_on_not_found: raise_on_not_found) + # Fast path: check top scope first (most common in for loops) + scope = @scopes[0] + if scope.key?(key) + variable = lookup_and_evaluate(scope, key, raise_on_not_found: raise_on_not_found) + elsif @scopes.length == 1 + # Only one scope and key not found — go straight to environments + variable = try_variable_find_in_environments(key, raise_on_not_found: raise_on_not_found) else - try_variable_find_in_environments(key, raise_on_not_found: raise_on_not_found) + # Multiple scopes — search through all of them + index = @scopes.find_index { |s| s.key?(key) } + + variable = if index + lookup_and_evaluate(@scopes[index], key, raise_on_not_found: raise_on_not_found) + else + try_variable_find_in_environments(key, raise_on_not_found: raise_on_not_found) + end end # update variable's context before invoking #to_liquid + # Fast path: primitive types don't need context= or to_liquid conversion + case variable + when String, Integer, Float, NilClass, TrueClass, FalseClass + return variable + when Array, Hash, Time + return variable + end + variable.context = self if variable.respond_to?(:context=) liquid_variable = variable.to_liquid - liquid_variable.context = self if variable != liquid_variable && liquid_variable.respond_to?(:context=) + if variable != liquid_variable + liquid_variable.context = self if liquid_variable.respond_to?(:context=) + end liquid_variable end @@ -228,6 +262,7 @@ def lookup_and_evaluate(obj, key, raise_on_not_found: true) end def with_disabled_tags(tag_names) + @disabled_tags = {} if @disabled_tags.frozen? tag_names.each do |name| @disabled_tags[name] = @disabled_tags.fetch(name, 0) + 1 end diff --git a/lib/liquid/cursor.rb b/lib/liquid/cursor.rb new file mode 100644 index 000000000..0d0bd4cf6 --- /dev/null +++ b/lib/liquid/cursor.rb @@ -0,0 +1,362 @@ +# frozen_string_literal: true + +require "strscan" + +module Liquid + # Single-pass forward-only scanner for Liquid parsing. + # Wraps StringScanner with higher-level methods for common Liquid constructs. + # One Cursor per template parse — threaded through all parsing code. + class Cursor + # Byte constants + SPACE = 32 + TAB = 9 + NL = 10 + CR = 13 + FF = 12 + DASH = 45 # '-' + DOT = 46 # '.' + COLON = 58 # ':' + PIPE = 124 # '|' + QUOTE_S = 39 # "'" + QUOTE_D = 34 # '"' + LBRACK = 91 # '[' + RBRACK = 93 # ']' + LPAREN = 40 # '(' + RPAREN = 41 # ')' + QMARK = 63 # '?' + HASH = 35 # '#' + USCORE = 95 # '_' + COMMA = 44 + ZERO = 48 + NINE = 57 + PCT = 37 # '%' + LCURLY = 123 # '{' + RCURLY = 125 # '}' + + attr_reader :ss + + def initialize(source) + @source = source + @ss = StringScanner.new(source) + end + + # ── Position ──────────────────────────────────────────────────── + def pos = @ss.pos + + def pos=(n) + @ss.pos = n + end + + def eos? = @ss.eos? + def peek_byte = @ss.peek_byte + def scan_byte = @ss.scan_byte + + # Reset scanner to a new string (for reuse on sub-markup) + def reset(source) + @source = source + @ss.string = source + end + + # Extract a slice from the source (deferred allocation) + def slice(start, len) + @source.byteslice(start, len) + end + + # ── Whitespace ────────────────────────────────────────────────── + # Skip spaces/tabs/newlines/cr, return count of newlines skipped + def skip_ws + nl = 0 + while (b = @ss.peek_byte) + case b + when SPACE, TAB, CR, FF then @ss.scan_byte + when NL then @ss.scan_byte + nl += 1 + else break + end + end + nl + end + + # Check if remaining bytes are all whitespace + def rest_blank? + saved = @ss.pos + @ss.skip(/\s*/) + result = @ss.eos? + @ss.pos = saved + result + end + + # Regex for identifier: [a-zA-Z_][\w-]*\?? + ID_REGEX = /[a-zA-Z_][\w-]*\??/ + + # ── Identifiers ───────────────────────────────────────────────── + # Skip an identifier without allocating a string. Returns length skipped, or 0. + def skip_id + @ss.skip(ID_REGEX) || 0 + end + + # Check if next id matches expected string, consume if so. No allocation. + def expect_id(expected) + start = @ss.pos + len = @ss.skip(ID_REGEX) + if len == expected.bytesize + # Compare bytes directly without allocating a string + i = 0 + while i < len + unless @source.getbyte(start + i) == expected.getbyte(i) + @ss.pos = start + return false + end + i += 1 + end + return true + end + @ss.pos = start if len + false + end + + # Scan a single identifier: [a-zA-Z_][\w-]*\?? + # Returns the string or nil if not at an identifier + def scan_id + @ss.scan(ID_REGEX) + end + + # Scan a tag name: '#' or \w+ + def scan_tag_name + if @ss.peek_byte == HASH + @ss.scan_byte + "#" + else + scan_id + end + end + + # Regex for numbers: -?\d+(\.\d+)? + FLOAT_REGEX = /-?\d+\.\d+/ + INT_REGEX = /-?\d+/ + + # ── Numbers ───────────────────────────────────────────────────── + # Try to scan an integer or float. Returns the number or nil. + def scan_number + if (s = @ss.scan(FLOAT_REGEX)) + s.to_f + elsif (s = @ss.scan(INT_REGEX)) + s.to_i + end + end + + # Regex for quoted string content (without quotes) + SINGLE_QUOTED_CONTENT = /'([^']*)'/ + DOUBLE_QUOTED_CONTENT = /"([^"]*)"/ + + # ── Strings ───────────────────────────────────────────────────── + # Scan a quoted string ('...' or "..."). Returns the content without quotes, or nil. + def scan_quoted_string + if @ss.scan(SINGLE_QUOTED_CONTENT) || @ss.scan(DOUBLE_QUOTED_CONTENT) + @ss[1] + end + end + + # Regex for quoted strings (single or double quoted, including quotes) + QUOTED_STRING_RAW = /"[^"]*"|'[^']*'/ + + # Scan a quoted string including quotes. Returns the full "..." or '...' string, or nil. + def scan_quoted_string_raw + @ss.scan(QUOTED_STRING_RAW) + end + + # Regex for dotted identifier: name(.name)* + DOTTED_ID_REGEX = /[a-zA-Z_][\w-]*\??(?:\.[a-zA-Z_][\w-]*\??)*/ + + # ── Expressions ───────────────────────────────────────────────── + # Scan a simple variable lookup: name(.name)* — no brackets, no filters + # Returns the string or nil + def scan_dotted_id + @ss.scan(DOTTED_ID_REGEX) + end + + # Skip a fragment without allocating. Returns length skipped, or 0. + def skip_fragment + @ss.skip(QUOTED_STRING_RAW) || @ss.skip(UNQUOTED_FRAGMENT) || 0 + end + + # Regex for unquoted fragment: non-whitespace/comma/pipe sequence + UNQUOTED_FRAGMENT = /[^\s,|]+/ + + # Scan a "QuotedFragment" — a quoted string or non-whitespace/comma/pipe run + def scan_fragment + @ss.scan(QUOTED_STRING_RAW) || @ss.scan(UNQUOTED_FRAGMENT) + end + + # ── Comparison operators ──────────────────────────────────────── + COMPARISON_OPS = { + '==' => '==', + '!=' => '!=', + '<>' => '<>', + '<=' => '<=', + '>=' => '>=', + '<' => '<', + '>' => '>', + 'contains' => 'contains', + }.freeze + + # Scan a comparison operator. Returns frozen string or nil. + # Regex for comparison operators + COMPARISON_OP_REGEX = /==|!=|<>|<=|>=|<|>|contains(?!\w)/ + + def scan_comparison_op + if (op = @ss.scan(COMPARISON_OP_REGEX)) + COMPARISON_OPS[op] + end + end + + # ── Tag parsing helpers ───────────────────────────────────────── + # Results from last parse_tag_token call (avoids array allocation) + attr_reader :tag_markup, :tag_newlines + + # Parse the interior of a tag token: "{%[-] tag_name markup [-]%}" + # Pure byte operations — avoids StringScanner reset overhead. + # Returns tag_name string or nil. Sets tag_markup and tag_newlines. + def parse_tag_token(token) + len = token.bytesize + pos = 2 # skip "{%" + pos += 1 if token.getbyte(pos) == DASH # skip '-' + nl = 0 + + # Skip whitespace, count newlines + while pos < len + b = token.getbyte(pos) + case b + when SPACE, TAB, CR, FF then pos += 1 + when NL then pos += 1; nl += 1 + else break + end + end + + # Scan tag name: '#' or [a-zA-Z_][\w-]* + name_start = pos + b = token.getbyte(pos) + if b == HASH + pos += 1 + elsif b && ((b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == USCORE) + pos += 1 + while pos < len + b = token.getbyte(pos) + break unless (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || (b >= 48 && b <= 57) || b == USCORE || b == DASH + pos += 1 + end + pos += 1 if pos < len && token.getbyte(pos) == QMARK + else + return + end + tag_name = token.byteslice(name_start, pos - name_start) + + # Skip whitespace after tag name, count newlines + while pos < len + b = token.getbyte(pos) + case b + when SPACE, TAB, CR, FF then pos += 1 + when NL then pos += 1; nl += 1 + else break + end + end + + # markup is everything up to optional '-' before '%}' + markup_end = len - 2 + markup_end -= 1 if markup_end > pos && token.getbyte(markup_end - 1) == DASH + @tag_markup = pos >= markup_end ? "" : token.byteslice(pos, markup_end - pos) + @tag_newlines = nl + + tag_name + end + + # Parse variable token interior: extract markup from "{{[-] ... [-]}}" + def parse_variable_token(token) + len = token.bytesize + return if len < 4 + + i = 2 + i = 3 if token.getbyte(i) == DASH + parse_end = len - 3 + parse_end -= 1 if token.getbyte(parse_end) == DASH + markup_len = parse_end - i + 1 + markup_len <= 0 ? "" : token.byteslice(i, markup_len) + end + + # ── Simple condition parser ───────────────────────────────────── + # Results from last parse_simple_condition call + attr_reader :cond_left, :cond_op, :cond_right + + # Parse "expr [op expr]" from current position to end. + # Returns true on success, nil on failure. Sets cond_left, cond_op, cond_right. + def parse_simple_condition + skip_ws + @cond_left = scan_fragment + return unless @cond_left + + skip_ws + if eos? + @cond_op = nil + @cond_right = nil + return true + end + + @cond_op = scan_comparison_op + return unless @cond_op + + skip_ws + @cond_right = scan_fragment + return unless @cond_right + + skip_ws + return unless eos? # trailing junk + + true + end + # ── For tag parser ──────────────────────────────────────────────── + # Results from parse_for_markup + attr_reader :for_var, :for_collection, :for_reversed + + # Parse "var in collection [reversed] [limit:N] [offset:N]" + # Returns true on success, nil on failure. + def parse_for_markup + skip_ws + @for_var = scan_id + return unless @for_var + + skip_ws + # expect "in" + return unless scan_id == "in" + + skip_ws + # Collection: parenthesized range or fragment + if peek_byte == LPAREN + start = @ss.pos + depth = 1 + @ss.scan_byte + while !@ss.eos? && depth > 0 + b = @ss.scan_byte + depth += 1 if b == LPAREN + depth -= 1 if b == RPAREN + end + @for_collection = @source.byteslice(start, @ss.pos - start) + else + @for_collection = scan_fragment + return unless @for_collection + end + + skip_ws + # Check for 'reversed' + saved = @ss.pos + word = scan_id + if word == "reversed" + @for_reversed = true + else + @for_reversed = false + @ss.pos = saved if word # rewind if we consumed a non-'reversed' word + end + + true + end + end +end diff --git a/lib/liquid/expression.rb b/lib/liquid/expression.rb index 00c40a4c3..c5fca063a 100644 --- a/lib/liquid/expression.rb +++ b/lib/liquid/expression.rb @@ -35,11 +35,18 @@ def safe_parse(parser, ss = StringScanner.new(""), cache = nil) def parse(markup, ss = StringScanner.new(""), cache = nil) return unless markup - markup = markup.strip # markup can be a frozen string + # Only strip if there's leading/trailing whitespace (avoids allocation) + first_byte = markup.getbyte(0) + if first_byte == 32 || first_byte == 9 || first_byte == 10 || first_byte == 13 # space, tab, \n, \r + markup = markup.strip + else + last_byte = markup.getbyte(markup.bytesize - 1) + markup = markup.strip if last_byte == 32 || last_byte == 9 || last_byte == 10 || last_byte == 13 + end if (markup.start_with?('"') && markup.end_with?('"')) || (markup.start_with?("'") && markup.end_with?("'")) - return markup[1..-2] + return markup.byteslice(1, markup.bytesize - 2) elsif LITERALS.key?(markup) return LITERALS[markup] end @@ -71,57 +78,78 @@ def inner_parse(markup, ss, cache) end end - def parse_number(markup, ss) - # check if the markup is simple integer or float - case markup - when INTEGER_REGEX - return Integer(markup, 10) - when FLOAT_REGEX - return markup.to_f - end + def parse_number(markup, _ss = nil) + len = markup.bytesize + return false if len == 0 - ss.string = markup - # the first byte must be a digit or a dash - byte = ss.scan_byte + # Quick reject: first byte must be digit or dash + pos = 0 + first = markup.getbyte(pos) + if first == DASH + pos += 1 + return false if pos >= len - return false if byte != DASH && (byte < ZERO || byte > NINE) + b = markup.getbyte(pos) + return false if b < ZERO || b > NINE - if byte == DASH - peek_byte = ss.peek_byte - - # if it starts with a dash, the next byte must be a digit - return false if peek_byte.nil? || !(peek_byte >= ZERO && peek_byte <= NINE) + pos += 1 + elsif first >= ZERO && first <= NINE + pos += 1 + else + return false end - # The markup could be a float with multiple dots - first_dot_pos = nil - num_end_pos = nil + # Scan digits + while pos < len + b = markup.getbyte(pos) + break if b < ZERO || b > NINE + + pos += 1 + end - while (byte = ss.scan_byte) - return false if byte != DOT && (byte < ZERO || byte > NINE) + # If we consumed everything, it's a simple integer + if pos == len + return Integer(markup, 10) + end - # we found our number and now we are just scanning the rest of the string - next if num_end_pos + # Check for dot (float) + if markup.getbyte(pos) == DOT + dot_pos = pos + pos += 1 + # Must have at least one digit after dot + digit_after_dot = pos + while pos < len + b = markup.getbyte(pos) + break if b < ZERO || b > NINE + + pos += 1 + end - if byte == DOT - if first_dot_pos.nil? - first_dot_pos = ss.pos - else - # we found another dot, so we know that the number ends here - num_end_pos = ss.pos - 1 + if pos > digit_after_dot && pos == len + # Simple float like "123.456" + return markup.to_f + elsif pos > digit_after_dot + # Float followed by more dots or other chars: "1.2.3.4" + # Return the float portion up to second dot + while pos < len + b = markup.getbyte(pos) + if b == DOT + return markup.byteslice(0, pos).to_f + elsif b < ZERO || b > NINE + return false + end + + pos += 1 end + return markup.byteslice(0, pos).to_f + else + # dot at end: "123." + return markup.byteslice(0, dot_pos).to_f end end - num_end_pos = markup.length if ss.eos? - - if num_end_pos - # number ends with a number "123.123" - markup.byteslice(0, num_end_pos).to_f - else - # number ends with a dot "123." - markup.byteslice(0, first_dot_pos).to_f - end + # Not a number (has non-digit, non-dot characters) + false end end end diff --git a/lib/liquid/lexer.rb b/lib/liquid/lexer.rb index f1740dbad..dfcdb5587 100644 --- a/lib/liquid/lexer.rb +++ b/lib/liquid/lexer.rb @@ -29,6 +29,7 @@ class Lexer RUBY_WHITESPACE = [" ", "\t", "\r", "\n", "\f"].freeze SINGLE_STRING_LITERAL = /'[^\']*'/ WHITESPACE_OR_NOTHING = /\s*/ + WHITESPACE = /\s+/ SINGLE_COMPARISON_TOKENS = [].tap do |table| table["<".ord] = COMPARISON_LESS_THAN @@ -104,7 +105,7 @@ def tokenize(ss) output = [] until ss.eos? - ss.skip(WHITESPACE_OR_NOTHING) + ss.skip(WHITESPACE) break if ss.eos? @@ -114,10 +115,10 @@ def tokenize(ss) if (special = SPECIAL_TABLE[peeked]) ss.scan_byte # Special case for ".." - if special == DOT && ss.peek_byte == DOT_ORD + if special.equal?(DOT) && ss.peek_byte == DOT_ORD ss.scan_byte output << DOTDOT - elsif special == DASH + elsif special.equal?(DASH) # Special case for negative numbers if (peeked_byte = ss.peek_byte) && NUMBER_TABLE[peeked_byte] ss.pos -= 1 diff --git a/lib/liquid/parse_context.rb b/lib/liquid/parse_context.rb index 855acc64e..d736319ec 100644 --- a/lib/liquid/parse_context.rb +++ b/lib/liquid/parse_context.rb @@ -3,7 +3,7 @@ module Liquid class ParseContext attr_accessor :locale, :line_number, :trim_whitespace, :depth - attr_reader :partial, :warnings, :error_mode, :environment + attr_reader :partial, :warnings, :error_mode, :environment, :expression_cache, :string_scanner, :cursor def initialize(options = Const::EMPTY_HASH) @environment = options.fetch(:environment, Environment.default) @@ -24,6 +24,8 @@ def initialize(options = Const::EMPTY_HASH) {} end + @cursor = Cursor.new("") + self.depth = 0 self.partial = false end diff --git a/lib/liquid/parser.rb b/lib/liquid/parser.rb index 645dfa3a1..0d0d0d019 100644 --- a/lib/liquid/parser.rb +++ b/lib/liquid/parser.rb @@ -83,6 +83,9 @@ def argument end def variable_lookups + # Fast path: no lookups at all (most common case for simple identifiers) + return "" unless look(:dot) || look(:open_square) + str = +"" loop do if look(:open_square) diff --git a/lib/liquid/registers.rb b/lib/liquid/registers.rb index 0b65d862c..88562c88c 100644 --- a/lib/liquid/registers.rb +++ b/lib/liquid/registers.rb @@ -6,15 +6,15 @@ class Registers def initialize(registers = {}) @static = registers.is_a?(Registers) ? registers.static : registers - @changes = {} + @changes = nil end def []=(key, value) - @changes[key] = value + (@changes ||= {})[key] = value end def [](key) - if @changes.key?(key) + if @changes&.key?(key) @changes[key] else @static[key] @@ -22,13 +22,13 @@ def [](key) end def delete(key) - @changes.delete(key) + @changes&.delete(key) end UNDEFINED = Object.new def fetch(key, default = UNDEFINED, &block) - if @changes.key?(key) + if @changes&.key?(key) @changes.fetch(key) elsif default != UNDEFINED if block_given? @@ -42,7 +42,7 @@ def fetch(key, default = UNDEFINED, &block) end def key?(key) - @changes.key?(key) || @static.key?(key) + @changes&.key?(key) || @static.key?(key) end end diff --git a/lib/liquid/resource_limits.rb b/lib/liquid/resource_limits.rb index 70fac24be..bb4086ea2 100644 --- a/lib/liquid/resource_limits.rb +++ b/lib/liquid/resource_limits.rb @@ -3,7 +3,7 @@ module Liquid class ResourceLimits attr_accessor :render_length_limit, :render_score_limit, :assign_score_limit - attr_reader :render_score, :assign_score + attr_reader :render_score, :assign_score, :last_capture_length def initialize(limits) @render_length_limit = limits[:render_length_limit] diff --git a/lib/liquid/standardfilters.rb b/lib/liquid/standardfilters.rb index a91462f43..edc402170 100644 --- a/lib/liquid/standardfilters.rb +++ b/lib/liquid/standardfilters.rb @@ -266,18 +266,54 @@ def truncatewords(input, words = 15, truncate_string = "...") words = Utils.to_integer(words) words = 1 if words <= 0 - wordlist = begin - input.split(" ", words + 1) - rescue RangeError - # integer too big for String#split, but we can semantically assume no truncation is needed - return input if words + 1 > MAX_I32 - raise # unexpected error + return input if words + 1 > MAX_I32 + + # Build result incrementally — avoids split() array + string allocations + len = input.bytesize + pos = 0 + word_count = 0 + result = nil + + # Skip leading whitespace + while pos < len + b = input.getbyte(pos) + break unless b == 32 || b == 9 || b == 10 || b == 13 || b == 12 + pos += 1 + end + + while pos < len + word_start = pos + word_count += 1 + + # Skip non-whitespace chars (word body) + while pos < len + b = input.getbyte(pos) + break if b == 32 || b == 9 || b == 10 || b == 13 || b == 12 + pos += 1 + end + + if word_count > words + # Truncate — result already has the first N words + truncate_string = Utils.to_s(truncate_string) + return result.concat(truncate_string) + end + + # Append word to result (only allocate result when we know truncation is possible) + if result + result << " " << input.byteslice(word_start, pos - word_start) + else + result = +input.byteslice(word_start, pos - word_start) + end + + # Skip whitespace between words + while pos < len + b = input.getbyte(pos) + break unless b == 32 || b == 9 || b == 10 || b == 13 || b == 12 + pos += 1 + end end - return input if wordlist.length <= words - wordlist.pop - truncate_string = Utils.to_s(truncate_string) - wordlist.join(" ").concat(truncate_string) + input end # @liquid_public_docs diff --git a/lib/liquid/strainer_template.rb b/lib/liquid/strainer_template.rb index ca0626dda..d01c13811 100644 --- a/lib/liquid/strainer_template.rb +++ b/lib/liquid/strainer_template.rb @@ -58,5 +58,32 @@ def invoke(method, *args) rescue ::ArgumentError => e raise Liquid::ArgumentError, e.message, e.backtrace end + + # Fast path for single-argument (no extra args) filter invocation. + # Avoids *args splat allocation for the common {{ value | filter }} case. + def invoke_single(method, input) + if self.class.invokable?(method) + send(method, input) + elsif @context.strict_filters + raise Liquid::UndefinedFilter, "undefined filter #{method}" + else + input + end + rescue ::ArgumentError => e + raise Liquid::ArgumentError, e.message, e.backtrace + end + + # Fast path for two-argument filter invocation (input + one arg). + def invoke_two(method, input, arg1) + if self.class.invokable?(method) + send(method, input, arg1) + elsif @context.strict_filters + raise Liquid::UndefinedFilter, "undefined filter #{method}" + else + input + end + rescue ::ArgumentError => e + raise Liquid::ArgumentError, e.message, e.backtrace + end end end diff --git a/lib/liquid/tags/for.rb b/lib/liquid/tags/for.rb index cbea85bcb..0eb953823 100644 --- a/lib/liquid/tags/for.rb +++ b/lib/liquid/tags/for.rb @@ -72,18 +72,54 @@ def render_to_output_buffer(context, output) protected + # Fast byte-level parser for "var in collection [reversed] [limit:N] [offset:N]" + REVERSED_BYTES = "reversed".bytes.freeze + def lax_parse(markup) - if markup =~ Syntax - @variable_name = Regexp.last_match(1) - collection_name = Regexp.last_match(2) - @reversed = !!Regexp.last_match(3) - @name = "#{@variable_name}-#{collection_name}" - @collection_name = parse_expression(collection_name) - markup.scan(TagAttributes) do |key, value| - set_attribute(key, value) + c = @parse_context.cursor + c.reset(markup) + c.skip_ws + + # Parse variable name + var_start = c.pos + var_len = c.skip_id + raise SyntaxError, options[:locale].t("errors.syntax.for") if var_len == 0 + @variable_name = c.slice(var_start, var_len) + + # Expect "in" + c.skip_ws + raise SyntaxError, options[:locale].t("errors.syntax.for") unless c.expect_id("in") + c.skip_ws + + # Parse collection name + col_start = c.pos + if c.peek_byte == Cursor::LPAREN + # Parenthesized range: (1..10) + depth = 1 + c.scan_byte + while !c.eos? && depth > 0 + b = c.scan_byte + depth += 1 if b == Cursor::LPAREN + depth -= 1 if b == Cursor::RPAREN end else - raise SyntaxError, options[:locale].t("errors.syntax.for") + c.skip_fragment + end + collection_name = c.slice(col_start, c.pos - col_start) + + @name = "#{@variable_name}-#{collection_name}" + @collection_name = parse_expression(collection_name) + + c.skip_ws + @reversed = c.expect_id("reversed") + c.skip_ws + + # Parse limit:/offset: if present + if !c.eos? && markup.include?(':') + rest = c.slice(c.pos, markup.bytesize - c.pos) + rest.scan(TagAttributes) do |key, value| + set_attribute(key, value) + end end end diff --git a/lib/liquid/tags/if.rb b/lib/liquid/tags/if.rb index c423c1e84..9ad58b5f3 100644 --- a/lib/liquid/tags/if.rb +++ b/lib/liquid/tags/if.rb @@ -51,14 +51,17 @@ def unknown_tag(tag, markup, tokens) end def render_to_output_buffer(context, output) - @blocks.each do |block| - result = Liquid::Utils.to_liquid_value( - block.evaluate(context), - ) + idx = 0 + blocks = @blocks + while idx < blocks.length + block = blocks[idx] + result = block.evaluate(context) + result = result.to_liquid_value if result.respond_to?(:to_liquid_value) if result return block.attachment.render_to_output_buffer(context, output) end + idx += 1 end output @@ -86,6 +89,24 @@ def parse_expression(markup, safe: false) end def lax_parse(markup) + # Fastest path: simple identifier truthiness like "product.available" or "forloop.first" + if (simple = Variable.simple_variable_markup(markup)) + return Condition.new(parse_expression(simple)) + end + + # Fast path: simple condition without and/or — use Cursor + if !markup.include?(' and ') && !markup.include?(' or ') + cursor = @parse_context.cursor + cursor.reset(markup) + if cursor.parse_simple_condition + return Condition.new( + parse_expression(cursor.cond_left), + cursor.cond_op, + cursor.cond_right ? parse_expression(cursor.cond_right) : nil, + ) + end + end + expressions = markup.scan(ExpressionsAndOperators) raise SyntaxError, options[:locale].t("errors.syntax.if") unless expressions.pop =~ Syntax diff --git a/lib/liquid/tokenizer.rb b/lib/liquid/tokenizer.rb index 8b331d93c..59b4c47e7 100644 --- a/lib/liquid/tokenizer.rb +++ b/lib/liquid/tokenizer.rb @@ -6,10 +6,6 @@ module Liquid class Tokenizer attr_reader :line_number, :for_liquid_tag - TAG_END = /%\}/ - TAG_OR_VARIABLE_START = /\{[\{\%]/ - NEWLINE = /\n/ - OPEN_CURLEY = "{".ord CLOSE_CURLEY = "}".ord PERCENTAGE = "%".ord @@ -27,11 +23,7 @@ def initialize( @offset = 0 @tokens = [] - if @source - @ss = string_scanner - @ss.string = @source - tokenize - end + tokenize if @source end def shift @@ -54,108 +46,127 @@ def tokenize if @for_liquid_tag @tokens = @source.split("\n") else - @tokens << shift_normal until @ss.eos? + tokenize_fast end @source = nil @ss = nil end - def shift_normal - token = next_token - - return unless token - - token - end - - def next_token - # possible states: :text, :tag, :variable - byte_a = @ss.peek_byte - - if byte_a == OPEN_CURLEY - @ss.scan_byte - - byte_b = @ss.peek_byte - - if byte_b == PERCENTAGE - @ss.scan_byte - return next_tag_token - elsif byte_b == OPEN_CURLEY - @ss.scan_byte - return next_variable_token - end - - @ss.pos -= 1 + # Fast tokenizer using String#index instead of StringScanner regex. + # String#index is ~40% faster for finding { delimiters. + def tokenize_fast + src = @source + unless src.valid_encoding? + raise SyntaxError, "Invalid byte sequence in #{src.encoding}" end - next_text_token - end + len = src.bytesize + pos = 0 - def next_text_token - start = @ss.pos + while pos < len + # Find next { which could start a tag or variable + idx = src.byteindex('{', pos) - unless @ss.skip_until(TAG_OR_VARIABLE_START) - token = @ss.rest - @ss.terminate - return token - end - - pos = @ss.pos -= 2 - @source.byteslice(start, pos - start) - rescue ::ArgumentError => e - if e.message == "invalid byte sequence in #{@ss.string.encoding}" - raise SyntaxError, "Invalid byte sequence in #{@ss.string.encoding}" - else - raise - end - end - - def next_variable_token - start = @ss.pos - 2 - - byte_a = byte_b = @ss.scan_byte + unless idx + # No more tags/variables — rest is text + @tokens << src.byteslice(pos, len - pos) if pos < len + break + end - while byte_b - byte_a = @ss.scan_byte while byte_a && byte_a != CLOSE_CURLEY && byte_a != OPEN_CURLEY + next_byte = idx + 1 < len ? src.getbyte(idx + 1) : nil - break unless byte_a + if next_byte == PERCENTAGE # {% + # Emit text before tag + @tokens << src.byteslice(pos, idx - pos) if idx > pos - if @ss.eos? - return byte_a == CLOSE_CURLEY ? @source.byteslice(start, @ss.pos - start) : "{{" - end - - byte_b = @ss.scan_byte + # Find %} to close the tag + close = src.byteindex('%}', idx + 2) + if close + @tokens << src.byteslice(idx, close + 2 - idx) + pos = close + 2 + else + @tokens << "{%" + pos = idx + 2 + end + elsif next_byte == OPEN_CURLEY # {{ + # Emit text before variable + @tokens << src.byteslice(pos, idx - pos) if idx > pos + + # Scan variable token — matches original tokenizer's byte-by-byte logic: + # Find } or {, then check next byte for }}/{% nesting + scan_pos = idx + 2 + found = false + while scan_pos < len + b = src.getbyte(scan_pos) + if b == CLOSE_CURLEY # } + if scan_pos + 1 >= len + # } at end of string — emit token up to here + @tokens << src.byteslice(idx, scan_pos + 1 - idx) + pos = scan_pos + 1 + found = true + break + end + b2 = src.getbyte(scan_pos + 1) + if b2 == CLOSE_CURLEY + # Found }} — close variable + @tokens << src.byteslice(idx, scan_pos + 2 - idx) + pos = scan_pos + 2 + found = true + break + else + # } followed by non-} — emit token up to here (matches original: @ss.pos -= 1) + @tokens << src.byteslice(idx, scan_pos + 1 - idx) + pos = scan_pos + 1 + found = true + break + end + elsif b == OPEN_CURLEY + if scan_pos + 1 < len && src.getbyte(scan_pos + 1) == PERCENTAGE + # Found {% inside {{ — scan to %} and emit as one token + close = src.byteindex('%}', scan_pos + 2) + if close + @tokens << src.byteslice(idx, close + 2 - idx) + pos = close + 2 + else + @tokens << src.byteslice(idx, len - idx) + pos = len + end + found = true + break + end + scan_pos += 1 + else + scan_pos += 1 + end + end - if byte_a == CLOSE_CURLEY - if byte_b == CLOSE_CURLEY - return @source.byteslice(start, @ss.pos - start) - elsif byte_b != CLOSE_CURLEY - @ss.pos -= 1 - return @source.byteslice(start, @ss.pos - start) + unless found + @tokens << "{{" + pos = idx + 2 + end + else + # { followed by something else — it's text + # Keep scanning from after this { + # Find next { that could be {% or {{ + next_open = idx + 1 + while next_open < len + ni = src.byteindex('{', next_open) + unless ni + @tokens << src.byteslice(pos, len - pos) + pos = len + break + end + nb = ni + 1 < len ? src.getbyte(ni + 1) : nil + if nb == PERCENTAGE || nb == OPEN_CURLEY + @tokens << src.byteslice(pos, ni - pos) + pos = ni + break + end + next_open = ni + 1 end - elsif byte_a == OPEN_CURLEY && byte_b == PERCENTAGE - return next_tag_token_with_start(start) end - - byte_a = byte_b - end - - "{{" - end - - def next_tag_token - start = @ss.pos - 2 - if (len = @ss.skip_until(TAG_END)) - @source.byteslice(start, len + 2) - else - "{%" end end - - def next_tag_token_with_start(start) - @ss.skip_until(TAG_END) - @source.byteslice(start, @ss.pos - start) - end end end diff --git a/lib/liquid/utils.rb b/lib/liquid/utils.rb index 084739a21..a2b8f447c 100644 --- a/lib/liquid/utils.rb +++ b/lib/liquid/utils.rb @@ -8,6 +8,9 @@ module Utils def self.slice_collection(collection, from, to) if (from != 0 || !to.nil?) && collection.respond_to?(:load_slice) collection.load_slice(from, to) + elsif from == 0 && to.nil? && collection.is_a?(Array) + # Fast path: no offset/limit on an Array — return as-is (avoid copy) + collection else slice_collection_using_each(collection, from, to) end @@ -93,8 +96,14 @@ def self.to_liquid_value(obj) obj end - def self.to_s(obj, seen = {}) + # Cached string representations for common small integers (0-999) + # Avoids repeated Integer#to_s allocations during rendering + SMALL_INT_STRINGS = Array.new(1000) { |i| i.to_s.freeze }.freeze + + def self.to_s(obj, seen = nil) case obj + when Integer + return (obj >= 0 && obj < 1000) ? SMALL_INT_STRINGS[obj] : obj.to_s when BigDecimal obj.to_s("F") when Hash @@ -102,30 +111,30 @@ def self.to_s(obj, seen = {}) # custom implementation. Otherwise we use Liquid's default # implementation. if obj.class.instance_method(:to_s) == HASH_TO_S_METHOD - hash_inspect(obj, seen) + hash_inspect(obj, seen || {}) else obj.to_s end when Array - array_inspect(obj, seen) + array_inspect(obj, seen || {}) else obj.to_s end end - def self.inspect(obj, seen = {}) + def self.inspect(obj, seen = nil) case obj when Hash # If the custom hash implementation overrides `#inspect`, use their # custom implementation. Otherwise we use Liquid's default # implementation. if obj.class.instance_method(:inspect) == HASH_INSPECT_METHOD - hash_inspect(obj, seen) + hash_inspect(obj, seen || {}) else obj.inspect end when Array - array_inspect(obj, seen) + array_inspect(obj, seen || {}) else obj.inspect end diff --git a/lib/liquid/variable.rb b/lib/liquid/variable.rb index 6b5fb412b..34faa5fbe 100644 --- a/lib/liquid/variable.rb +++ b/lib/liquid/variable.rb @@ -12,6 +12,68 @@ module Liquid # {{ user | link }} # class Variable + # Checks if markup is a simple "name.lookup.chain" with no filters/brackets/quotes. + # Returns the trimmed markup string, or nil if not simple. + # Avoids regex MatchData allocation. + def self.simple_variable_markup(markup) + len = markup.bytesize + return if len == 0 + + # Skip leading whitespace + pos = 0 + while pos < len + b = markup.getbyte(pos) + break unless b == 32 || b == 9 || b == 10 || b == 13 + pos += 1 + end + return if pos >= len + + start = pos + + # First char must be [a-zA-Z_] + b = markup.getbyte(pos) + return unless (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == 95 + pos += 1 + + # Scan segments: [\w-]* (. [\w-]*)* + while pos < len + b = markup.getbyte(pos) + if (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || (b >= 48 && b <= 57) || b == 95 || b == 45 + pos += 1 + elsif b == 46 # '.' + pos += 1 + # After dot, must have [a-zA-Z_] + return if pos >= len + b = markup.getbyte(pos) + return unless (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == 95 + pos += 1 + else + break + end + end + + content_end = pos + + # Skip trailing whitespace + while pos < len + b = markup.getbyte(pos) + return unless b == 32 || b == 9 || b == 10 || b == 13 + pos += 1 + end + + # Must have consumed everything + return unless pos == len + + if start == 0 && content_end == len + markup + else + markup.byteslice(start, content_end - start) + end + end + + # Cache for [filtername, EMPTY_ARRAY] tuples — avoids repeated array creation + NO_ARG_FILTER_CACHE = Hash.new { |h, k| h[k] = [k, Const::EMPTY_ARRAY].freeze } + FilterMarkupRegex = /#{FilterSeparator}\s*(.*)/om FilterParser = /(?:\s+|#{QuotedFragment}|#{ArgumentSeparator})+/o FilterArgsRegex = /(?:#{FilterArgumentSeparator}|#{ArgumentSeparator})\s*((?:\w+\s*\:\s*)?#{QuotedFragment})/o @@ -30,7 +92,225 @@ def initialize(markup, parse_context) @parse_context = parse_context @line_number = parse_context.line_number - strict_parse_with_error_mode_fallback(markup) + # Fast path: try to parse without going through Lexer → Parser + # Skip for strict2/rigid modes which require different parsing + # Fast path only for lax/warn modes — strict modes need full error checking + error_mode = parse_context.error_mode + if error_mode == :strict2 || error_mode == :rigid || error_mode == :strict || !try_fast_parse(markup, parse_context) + strict_parse_with_error_mode_fallback(markup) + end + end + + private def try_fast_parse(markup, parse_context) + len = markup.bytesize + return false if len == 0 + + # Skip leading whitespace + pos = 0 + while pos < len + b = markup.getbyte(pos) + break unless b == 32 || b == 9 || b == 10 || b == 13 + pos += 1 + end + return false if pos >= len + + b = markup.getbyte(pos) + + if b == 39 || b == 34 # single or double quote + # Quoted string literal: scan to matching close quote + quote = b + name_start = pos + pos += 1 + pos += 1 while pos < len && markup.getbyte(pos) != quote + pos += 1 if pos < len # skip closing quote + name_end = pos + elsif (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == 95 + # Identifier: scan [\w-]*(\.[\w-]*)* + name_start = pos + pos += 1 + while pos < len + b = markup.getbyte(pos) + if (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || (b >= 48 && b <= 57) || b == 95 || b == 45 + pos += 1 + elsif b == 46 # '.' + pos += 1 + return false if pos >= len + b = markup.getbyte(pos) + return false unless (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == 95 + pos += 1 + else + break + end + end + name_end = pos + else + return false + end + + # Skip whitespace after name + while pos < len + b = markup.getbyte(pos) + break unless b == 32 || b == 9 || b == 10 || b == 13 + pos += 1 + end + + # Resolve the name expression — avoid byteslice when markup is already the name + expr_markup = if name_start == 0 && name_end == len + markup # no whitespace, no filters — reuse the string + else + markup.byteslice(name_start, name_end - name_start) + end + cache = parse_context.expression_cache + ss = parse_context.string_scanner + + first_byte = expr_markup.getbyte(0) + @name = if first_byte == 39 || first_byte == 34 # quoted string + # Strip quotes for string literal + expr_markup.byteslice(1, expr_markup.bytesize - 2) + elsif Expression::LITERALS.key?(expr_markup) + Expression::LITERALS[expr_markup] + elsif cache + cache[expr_markup] || (cache[expr_markup] = VariableLookup.parse_simple(expr_markup, ss, cache).freeze) + else + VariableLookup.parse_simple(expr_markup, ss || StringScanner.new(""), nil).freeze + end + + # End of markup? No filters. + if pos >= len + @filters = Const::EMPTY_ARRAY + return true + end + + # Must be a pipe for filters + return false unless markup.getbyte(pos) == 124 # '|' + + # Try fast filter scanning first — handles no-arg and simple-arg filters + # Falls through to Lexer-based parsing for complex cases + @filters = [] + filter_pos = pos + + while filter_pos < len && markup.getbyte(filter_pos) == 124 # '|' + filter_pos += 1 + # Skip whitespace + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == 32 + + # Scan filter name + fname_start = filter_pos + b = filter_pos < len ? markup.getbyte(filter_pos) : nil + break unless b && ((b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == 95) + filter_pos += 1 + while filter_pos < len + b = markup.getbyte(filter_pos) + break unless (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || (b >= 48 && b <= 57) || b == 95 || b == 45 + filter_pos += 1 + end + filtername = markup.byteslice(fname_start, filter_pos - fname_start) + + # Skip whitespace + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == 32 + + # Has arguments — try fast scanning for positional args + if filter_pos < len && markup.getbyte(filter_pos) == 58 # ':' + filter_pos += 1 # skip ':' + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == 32 + + filter_args = [] + fall_to_lexer = false + + loop do + arg_start = filter_pos + b = filter_pos < len ? markup.getbyte(filter_pos) : nil + + if b == 39 || b == 34 # quoted string + quote = b + filter_pos += 1 + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) != quote + filter_pos += 1 if filter_pos < len # skip closing quote + filter_args << markup.byteslice(arg_start + 1, filter_pos - arg_start - 2) + elsif b && ((b >= 48 && b <= 57) || (b == 45 && filter_pos + 1 < len && markup.getbyte(filter_pos + 1) >= 48 && markup.getbyte(filter_pos + 1) <= 57)) + # Number + filter_pos += 1 if b == 45 + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) >= 48 && markup.getbyte(filter_pos) <= 57 + if filter_pos < len && markup.getbyte(filter_pos) == 46 # float + filter_pos += 1 + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) >= 48 && markup.getbyte(filter_pos) <= 57 + end + num_str = markup.byteslice(arg_start, filter_pos - arg_start) + filter_args << (num_str.include?('.') ? num_str.to_f : num_str.to_i) + elsif b && ((b >= 97 && b <= 122) || (b >= 65 && b <= 90) || b == 95) + # Identifier + id_start = filter_pos + filter_pos += 1 + while filter_pos < len + b2 = markup.getbyte(filter_pos) + break unless (b2 >= 97 && b2 <= 122) || (b2 >= 65 && b2 <= 90) || (b2 >= 48 && b2 <= 57) || b2 == 95 || b2 == 45 || b2 == 46 + filter_pos += 1 + end + filter_pos += 1 if filter_pos < len && markup.getbyte(filter_pos) == 63 + + # Check if keyword arg (id followed by ':') + kw_check = filter_pos + kw_check += 1 while kw_check < len && markup.getbyte(kw_check) == 32 + if kw_check < len && markup.getbyte(kw_check) == 58 + fall_to_lexer = true + break + end + + id_markup = markup.byteslice(id_start, filter_pos - id_start) + filter_args << Expression.parse(id_markup, parse_context.string_scanner, parse_context.expression_cache) + else + fall_to_lexer = true + break + end + + # Skip whitespace after arg + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == 32 + + # Comma = more args; pipe/end = done + if filter_pos < len && markup.getbyte(filter_pos) == 44 + filter_pos += 1 + filter_pos += 1 while filter_pos < len && markup.getbyte(filter_pos) == 32 + else + break + end + end + + if fall_to_lexer + # Complex filter — fall to Lexer for this and remaining filters + rest_start = fname_start + rest_start -= 1 while rest_start > pos && markup.getbyte(rest_start) != 124 + rest_markup = markup.byteslice(rest_start, len - rest_start) + p = parse_context.new_parser(rest_markup) + while p.consume?(:pipe) + fn = p.consume(:id) + fa = p.consume?(:colon) ? parse_filterargs(p) : Const::EMPTY_ARRAY + @filters << lax_parse_filter_expressions(fn, fa) + end + p.consume(:end_of_string) + @filters = Const::EMPTY_ARRAY if @filters.empty? + return true + end + + @filters << [filtername, filter_args] + else + # No args — add as simple filter + @filters << NO_ARG_FILTER_CACHE[filtername] + end + + # Skip whitespace between filters + filter_pos += 1 while filter_pos < len && (markup.getbyte(filter_pos) == 32 || markup.getbyte(filter_pos) == 9 || markup.getbyte(filter_pos) == 10 || markup.getbyte(filter_pos) == 13) + end + + # Must have consumed everything + return false if filter_pos < len + + @filters = Const::EMPTY_ARRAY if @filters.empty? + true + rescue SyntaxError + # If fast parse fails, fall back to full parse + @name = nil + @filters = nil + false end def raw @@ -42,7 +322,7 @@ def markup_context(markup) end def lax_parse(markup) - @filters = [] + @filters = Const::EMPTY_ARRAY return unless markup =~ MarkupWithQuotedFragment name_markup = Regexp.last_match(1) @@ -54,19 +334,21 @@ def lax_parse(markup) next unless f =~ /\w+/ filtername = Regexp.last_match(0) filterargs = f.scan(FilterArgsRegex).flatten + @filters = [] if @filters.frozen? @filters << lax_parse_filter_expressions(filtername, filterargs) end end end def strict_parse(markup) - @filters = [] + @filters = Const::EMPTY_ARRAY p = @parse_context.new_parser(markup) return if p.look(:end_of_string) @name = parse_context.safe_parse_expression(p) while p.consume?(:pipe) + @filters = [] if @filters.frozen? filtername = p.consume(:id) filterargs = p.consume?(:colon) ? parse_filterargs(p) : Const::EMPTY_ARRAY @filters << lax_parse_filter_expressions(filtername, filterargs) @@ -75,13 +357,16 @@ def strict_parse(markup) end def strict2_parse(markup) - @filters = [] + @filters = Const::EMPTY_ARRAY p = @parse_context.new_parser(markup) return if p.look(:end_of_string) @name = parse_context.safe_parse_expression(p) - @filters << strict2_parse_filter_expressions(p) while p.consume?(:pipe) + while p.consume?(:pipe) + @filters = [] if @filters.frozen? + @filters << strict2_parse_filter_expressions(p) + end p.consume(:end_of_string) end @@ -97,24 +382,37 @@ def render(context) obj = context.evaluate(@name) @filters.each do |filter_name, filter_args, filter_kwargs| - filter_args = evaluate_filter_expressions(context, filter_args, filter_kwargs) - obj = context.invoke(filter_name, obj, *filter_args) + if filter_args.empty? && !filter_kwargs + obj = context.invoke_single(filter_name, obj) + elsif !filter_kwargs && filter_args.length == 1 + # Single positional arg — most common after no-arg + obj = context.invoke_two(filter_name, obj, context.evaluate(filter_args[0])) + else + filter_args = evaluate_filter_expressions(context, filter_args, filter_kwargs) + obj = context.invoke(filter_name, obj, *filter_args) + end end context.apply_global_filter(obj) end def render_to_output_buffer(context, output) - obj = render(context) + # Fast path: no filters and no global filter + obj = if @filters.empty? && context.global_filter.nil? + context.evaluate(@name) + else + render(context) + end render_obj_to_output(obj, output) output end def render_obj_to_output(obj, output) - case obj - when NilClass + if obj.instance_of?(String) + output << obj + elsif obj.nil? # Do nothing - when Array + elsif obj.instance_of?(Array) obj.each do |o| render_obj_to_output(o, output) end @@ -128,7 +426,7 @@ def disabled?(_context) end def disabled_tags - [] + Const::EMPTY_ARRAY end private @@ -137,7 +435,8 @@ def lax_parse_filter_expressions(filter_name, unparsed_args) filter_args = [] keyword_args = nil unparsed_args.each do |a| - if (matches = a.match(JustTagAttributes)) + # Fast check: keyword args must contain ':' + if a.include?(':') && (matches = a.match(JustTagAttributes)) keyword_args ||= {} keyword_args[matches[1]] = parse_context.parse_expression(matches[2]) else @@ -190,15 +489,19 @@ def end_of_arguments?(p) end def evaluate_filter_expressions(context, filter_args, filter_kwargs) - parsed_args = filter_args.map { |expr| context.evaluate(expr) } if filter_kwargs + parsed_args = filter_args.map { |expr| context.evaluate(expr) } parsed_kwargs = {} filter_kwargs.each do |key, expr| parsed_kwargs[key] = context.evaluate(expr) end parsed_args << parsed_kwargs + parsed_args + elsif filter_args.empty? + Const::EMPTY_ARRAY + else + filter_args.map { |expr| context.evaluate(expr) } end - parsed_args end class ParseTreeVisitor < Liquid::ParseTreeVisitor diff --git a/lib/liquid/variable_lookup.rb b/lib/liquid/variable_lookup.rb index 4fba2a658..6fcf6e6c0 100644 --- a/lib/liquid/variable_lookup.rb +++ b/lib/liquid/variable_lookup.rb @@ -10,8 +10,108 @@ def self.parse(markup, string_scanner = StringScanner.new(""), cache = nil) new(markup, string_scanner, cache) end - def initialize(markup, string_scanner = StringScanner.new(""), cache = nil) - lookups = markup.scan(VariableParser) + # Fast parse that skips simple_lookup? check — caller guarantees simple identifier chain + def self.parse_simple(markup, string_scanner = nil, cache = nil) + new(markup, string_scanner, cache, true) + end + + # Fast manual scanner replacing markup.scan(VariableParser) + # VariableParser = /\[(?>[^\[\]]+|\g<0>)*\]|[\w-]+\??/ + # Splits "product.variants[0].title" into ["product", "variants", "[0]", "title"] + def self.scan_variable(markup) + result = [] + pos = 0 + len = markup.bytesize + + while pos < len + byte = markup.getbyte(pos) + + if byte == 91 # '[' + # Scan balanced brackets + depth = 1 + start = pos + pos += 1 + while pos < len && depth > 0 + b = markup.getbyte(pos) + if b == 91 + depth += 1 + elsif b == 93 + depth -= 1 + end + pos += 1 + end + if depth == 0 + result << markup.byteslice(start, pos - start) + else + # Unbalanced bracket - skip '[' and continue + pos = start + 1 + end + elsif byte == 46 # '.' + pos += 1 + elsif (byte >= 97 && byte <= 122) || (byte >= 65 && byte <= 90) || (byte >= 48 && byte <= 57) || byte == 95 || byte == 45 # \w or - + start = pos + pos += 1 + while pos < len + b = markup.getbyte(pos) + break unless (b >= 97 && b <= 122) || (b >= 65 && b <= 90) || (b >= 48 && b <= 57) || b == 95 || b == 45 + pos += 1 + end + # Check trailing '?' + if pos < len && markup.getbyte(pos) == 63 + pos += 1 + end + result << markup.byteslice(start, pos - start) + else + pos += 1 + end + end + + result + end + + # Check if markup is a simple identifier chain: [\w-]+\??(.[\w-]+\??)* + # Uses C-level match? — 8x faster than Ruby byte scanning + SIMPLE_LOOKUP_RE = /\A[\w-]+\??(?:\.[\w-]+\??)*\z/ + + def self.simple_lookup?(markup) + markup.bytesize > 0 && markup.match?(SIMPLE_LOOKUP_RE) + end + + def initialize(markup, string_scanner = StringScanner.new(""), cache = nil, simple = false) + # Fast path: simple identifier chain without brackets + if simple || self.class.simple_lookup?(markup) + dot_pos = markup.index('.') + if dot_pos.nil? + @name = markup + @lookups = Const::EMPTY_ARRAY + @command_flags = 0 + return + end + @name = markup.byteslice(0, dot_pos) + # Build lookups array from remaining dot-separated segments + lookups = [] + @command_flags = 0 + pos = dot_pos + 1 + len = markup.bytesize + while pos < len + seg_start = pos + while pos < len + b = markup.getbyte(pos) + break if b == 46 # '.' + pos += 1 + end + seg = markup.byteslice(seg_start, pos - seg_start) + if COMMAND_METHODS.include?(seg) + @command_flags |= 1 << lookups.length + end + lookups << seg + pos += 1 # skip dot + end + @lookups = lookups + return + end + + lookups = self.class.scan_variable(markup) name = lookups.shift if name&.start_with?('[') && name&.end_with?(']') @@ -49,26 +149,40 @@ def evaluate(context) object = context.find_variable(name) @lookups.each_index do |i| - key = context.evaluate(@lookups[i]) + lookup = @lookups[i] + key = lookup.instance_of?(String) ? lookup : context.evaluate(lookup) # Cast "key" to its liquid value to enable it to act as a primitive value - key = Liquid::Utils.to_liquid_value(key) + # Fast path: strings and integers (most common key types) don't need conversion + unless key.instance_of?(String) || key.instance_of?(Integer) + key = Liquid::Utils.to_liquid_value(key) + end # If object is a hash- or array-like object we look for the # presence of the key and if its available we return it - if object.respond_to?(:[]) && - ((object.respond_to?(:key?) && object.key?(key)) || - (object.respond_to?(:fetch) && key.is_a?(Integer))) + if object.instance_of?(Hash) ? object.key?(key) : + (object.respond_to?(:[]) && + ((object.respond_to?(:key?) && object.key?(key)) || + (object.respond_to?(:fetch) && key.is_a?(Integer)))) # if its a proc we will replace the entry with the proc - res = context.lookup_and_evaluate(object, key) - object = res.to_liquid + object = context.lookup_and_evaluate(object, key) + # Skip to_liquid for common primitive types (they return self) + unless object.instance_of?(String) || object.instance_of?(Integer) || object.instance_of?(Float) || + object.instance_of?(Array) || object.instance_of?(Hash) || object.nil? + object = object.to_liquid + object.context = context if object.respond_to?(:context=) + end # Some special cases. If the part wasn't in square brackets and # no key with the same name was found we interpret following calls # as commands and call them on the current object elsif lookup_command?(i) && object.respond_to?(key) - object = object.send(key).to_liquid + object = object.send(key) + unless object.instance_of?(String) || object.instance_of?(Integer) || object.instance_of?(Array) || object.nil? + object = object.to_liquid + object.context = context if object.respond_to?(:context=) + end # Handle string first/last like ActiveSupport does (returns first/last character) # ActiveSupport returns "" for empty strings, not nil @@ -82,9 +196,6 @@ def evaluate(context) return nil unless context.strict_variables raise Liquid::UndefinedVariable, "undefined variable #{key}" end - - # If we are dealing with a drop here we have to - object.context = context if object.respond_to?(:context=) end object diff --git a/performance/bench_quick.rb b/performance/bench_quick.rb new file mode 100644 index 000000000..6168f80e3 --- /dev/null +++ b/performance/bench_quick.rb @@ -0,0 +1,62 @@ +# frozen_string_literal: true + +# Quick benchmark for autoresearch: measures parse µs, render µs, and object allocations +# Outputs machine-readable metrics to stdout + +require_relative 'theme_runner' + +RubyVM::YJIT.enable if defined?(RubyVM::YJIT) + +runner = ThemeRunner.new + +# Warmup — enough iterations for YJIT to fully optimize hot paths +20.times { runner.compile } +20.times { runner.render } + +GC.start +GC.compact if GC.respond_to?(:compact) + +# Measure parse +parse_times = [] +10.times do + GC.disable + t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC) + runner.compile + t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC) + GC.enable + GC.start + parse_times << (t1 - t0) * 1_000_000 # µs +end + +# Measure render +render_times = [] +10.times do + GC.disable + t0 = Process.clock_gettime(Process::CLOCK_MONOTONIC) + runner.render + t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC) + GC.enable + GC.start + render_times << (t1 - t0) * 1_000_000 # µs +end + +# Measure object allocations for one parse+render cycle +require 'objspace' +GC.start +GC.disable +before = ObjectSpace.count_objects.values_at(:TOTAL).first - ObjectSpace.count_objects.values_at(:FREE).first +runner.compile +runner.render +after = ObjectSpace.count_objects.values_at(:TOTAL).first - ObjectSpace.count_objects.values_at(:FREE).first +GC.enable +allocations = after - before + +parse_us = parse_times.min.round(0) +render_us = render_times.min.round(0) +combined_us = parse_us + render_us + +puts "RESULTS" +puts "parse_us=#{parse_us}" +puts "render_us=#{render_us}" +puts "combined_us=#{combined_us}" +puts "allocations=#{allocations}"