Skip to content

Conversation

@roylou
Copy link

@roylou roylou commented Dec 30, 2025

Summary

  • Added comprehensive tooling for deriving Scroll Galileo gas fee parameters
  • Includes data collection from blockchain and parameter analysis capabilities
  • Supports both real-time collection and cached data loading modes

Changes

  • Added derive_galileo_gas_parameter.py: Main script for collecting batch data and deriving optimal gas parameters
  • Added show_batches.py: Utility to display batch information
  • Added Python environment setup with Pipfile for dependency management
  • Added CLAUDE.md with project documentation and usage guidelines

Summary by CodeRabbit

  • New Features

    • Added a utility script to compute and analyze gas parameters from on-chain transaction and batch data with support for collecting new data or loading cached data.
    • Added a companion script to display and inspect batch and transaction summaries from generated data files.
  • Documentation

    • Added comprehensive setup and usage documentation including environment configuration, CLI options, and operational examples.

✏️ Tip: You can customize this high-level summary in your review settings.

Add scripts and environment setup for deriving Scroll gas fee parameters
using data collected from L1 batches. Includes data collection, analysis,
and batch visualization utilities.
@coderabbitai
Copy link

coderabbitai bot commented Dec 30, 2025

📝 Walkthrough

Walkthrough

This PR introduces a new script directory scripts/derive_galileo_gas_parameter/ containing tools to compute Galileo L1 gas parameters from on-chain data through a two-stage derivation process. The directory includes a main analysis script, a data inspection utility, configuration files, and documentation.

Changes

Cohort / File(s) Summary
Configuration & Documentation
scripts/derive_galileo_gas_parameter/Pipfile, scripts/derive_galileo_gas_parameter/claude.md
Adds Pipenv configuration with dependencies (web3, pandas, numpy, rlp, zstandard, requests, async_timeout) for Python 3.10 and comprehensive CLI documentation covering environment setup, usage modes, options, and monitoring.
Main Analysis Script
scripts/derive_galileo_gas_parameter/derive_galileo_gas_parameter.py
Introduces comprehensive script implementing two-stage gas parameter derivation: Stage 1 derives penalty_multiplier from P95 transaction size distribution; Stage 2 solves for commit_scalar and blob_scalar from aggregated batch costs. Includes Web3 integration (mainnet/Scroll/beacon), transaction/blob parsing, parallelized data fetching, DataFrame-based data pipeline, and analysis functions for fit metrics and penalty calculations.
Data Inspection Utility
scripts/derive_galileo_gas_parameter/show_batches.py
Adds utility script to load and display cached batch/transaction data from pickle files, printing human-readable summaries with per-batch costs, associated transactions, and metadata conversion to gwei.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Script as derive_galileo_gas_parameter.py
    participant Blockchain as Blockchain/Beacon RPC
    participant FileSystem as File System

    User->>Script: Run with collect mode
    Script->>Blockchain: Fetch latest finalized blocks
    Blockchain-->>Script: Block data + beacon slots
    Script->>Blockchain: Fetch blob data and transaction receipts (parallelized)
    Blockchain-->>Script: Raw transaction & blob data
    Script->>Script: Parse blobs & compress transactions
    Script->>Script: Build batch_df & tx_df DataFrames
    Script->>FileSystem: Save data (pickle)
    Script->>Script: Calculate penalty_multiplier (P95 tx size)
    Script->>Script: Aggregate batch costs (commit/blob/finalize)
    Script->>Script: Solve for commit_scalar & blob_scalar
    Script->>Script: Analyze results (RMSE, MAE, recovery ratio)
    Script-->>User: Output: parameters & metrics
    User->>FileSystem: Inspect cached data
    FileSystem-->>User: show_batches.py displays summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

The main script introduces substantial logic across ~25 public functions handling Web3 integration, blockchain data parsing (blobs, transactions, receipts), algebraic solving, and statistical analysis. The supporting scripts are simpler, but the heterogeneity of operations across transaction encoding, batch aggregation, and parameter solving requires careful review of data flow integrity and mathematical correctness.

Poem

🐰 A rabbit hops through chains of blocks,
Parsing blobs and transaction locks,
Two stages solve the gas so fine—
Parameters align divine!

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description covers the purpose, changes, and functionality, but lacks the required conventional commits checklist and deployment/breaking change labels specified in the template. Add the PR title checklist confirming conventional commits compliance and include deployment tag versioning and breaking change label checkboxes as required by the repository template.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding Galileo gas parameter derivation tooling, which matches the core purpose of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 96.15% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
scripts/derive_galileo_gas_parameter/Pipfile (1)

6-13: Consider pinning dependency versions for reproducibility.

Using wildcard "*" for most dependencies can lead to non-reproducible builds and potential breaking changes when new versions are released. For production tooling, consider pinning to specific versions or at least setting upper bounds.

🔎 Suggested improvement
 [packages]
 web3 = "<7,>=6"
-pandas = "*"
-numpy = "*"
-rlp = "*"
-zstandard = "*"
-requests = "*"
-async_timeout = "*"
+pandas = ">=2.0,<3"
+numpy = ">=1.24,<2"
+rlp = ">=3.0,<4"
+zstandard = ">=0.21,<1"
+requests = ">=2.28,<3"
+async_timeout = ">=4.0,<5"

Alternatively, run pipenv lock to generate a Pipfile.lock that captures exact versions for reproducibility.

scripts/derive_galileo_gas_parameter/show_batches.py (2)

31-31: Remove extraneous f-prefix from string literal.

This f-string has no placeholders.

🔎 Proposed fix
-        print(f"\nBatch IDs:")
+        print("\nBatch IDs:")

72-73: Consider logging the exception type for better debugging.

While catching broad Exception is acceptable for a CLI utility, including the exception type would help with debugging.

🔎 Proposed fix
     except Exception as e:
-        print(f"Error reading file: {e}")
+        print(f"Error reading file: {type(e).__name__}: {e}")
scripts/derive_galileo_gas_parameter/derive_galileo_gas_parameter.py (3)

373-374: Global mutable state for caching.

Using a global dictionary for caching works for single-threaded execution but could cause issues if this module is ever used in a multi-threaded context. Consider using a class-based approach or passing the cache explicitly if thread-safety becomes a concern.


426-426: Unused variable version.

This variable is assigned but never used. Consider either removing it or using it for validation (e.g., asserting expected version).

🔎 Proposed fix
-    version = int(batch_data[0])
+    version = int(batch_data[0])  # Currently unused, but parsed from batch header
+    # TODO: Add version validation if needed

Or remove entirely if not needed:

-    version = int(batch_data[0])
+    # batch_data[0] contains version byte (unused)

1107-1112: DataFrame mutation side effect.

This function modifies tx_df directly by adding new columns. Since tx_df is passed in and also used later in analyze_results (which also modifies it), this side effect is likely intentional. However, consider either:

  1. Documenting this side effect in the docstring
  2. Using tx_df = tx_df.copy() if you want to avoid mutation
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7de388e and 8fc6d65.

⛔ Files ignored due to path filters (1)
  • scripts/derive_galileo_gas_parameter/Pipfile.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • scripts/derive_galileo_gas_parameter/Pipfile
  • scripts/derive_galileo_gas_parameter/claude.md
  • scripts/derive_galileo_gas_parameter/derive_galileo_gas_parameter.py
  • scripts/derive_galileo_gas_parameter/show_batches.py
🧰 Additional context used
🪛 Ruff (0.14.10)
scripts/derive_galileo_gas_parameter/show_batches.py

19-19: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


31-31: f-string without any placeholders

Remove extraneous f prefix

(F541)


72-72: Do not catch blind exception: Exception

(BLE001)

scripts/derive_galileo_gas_parameter/derive_galileo_gas_parameter.py

38-38: Avoid specifying long messages outside the exception class

(TRY003)


40-40: Avoid specifying long messages outside the exception class

(TRY003)


146-146: Probable use of requests call without timeout

(S113)


186-186: Avoid specifying long messages outside the exception class

(TRY003)


331-331: Avoid specifying long messages outside the exception class

(TRY003)


398-398: Do not catch blind exception: Exception

(BLE001)


426-426: Local variable version is assigned to but never used

Remove assignment to unused variable version

(F841)


476-476: f-string without any placeholders

Remove extraneous f prefix

(F541)


518-518: Avoid specifying long messages outside the exception class

(TRY003)


522-522: f-string without any placeholders

Remove extraneous f prefix

(F541)


566-566: f-string without any placeholders

Remove extraneous f prefix

(F541)


607-607: f-string without any placeholders

Remove extraneous f prefix

(F541)


678-678: f-string without any placeholders

Remove extraneous f prefix

(F541)


744-744: f-string without any placeholders

Remove extraneous f prefix

(F541)


795-795: f-string without any placeholders

Remove extraneous f prefix

(F541)


809-809: f-string without any placeholders

Remove extraneous f prefix

(F541)


846-846: f-string without any placeholders

Remove extraneous f prefix

(F541)


884-884: f-string without any placeholders

Remove extraneous f prefix

(F541)


948-948: f-string without any placeholders

Remove extraneous f prefix

(F541)


959-959: f-string without any placeholders

Remove extraneous f prefix

(F541)


968-968: f-string without any placeholders

Remove extraneous f prefix

(F541)


1022-1022: Avoid specifying long messages outside the exception class

(TRY003)


1025-1025: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)


1067-1067: f-string without any placeholders

Remove extraneous f prefix

(F541)


1096-1096: Docstring contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF002)


1097-1097: Docstring contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF002)


1127-1127: f-string without any placeholders

Remove extraneous f prefix

(F541)


1168-1168: f-string without any placeholders

Remove extraneous f prefix

(F541)


1170-1170: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


1173-1173: f-string without any placeholders

Remove extraneous f prefix

(F541)


1175-1175: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)


1228-1228: f-string without any placeholders

Remove extraneous f prefix

(F541)


1240-1240: f-string without any placeholders

Remove extraneous f prefix

(F541)


1297-1297: f-string without any placeholders

Remove extraneous f prefix

(F541)


1302-1302: f-string without any placeholders

Remove extraneous f prefix

(F541)


1307-1307: f-string without any placeholders

Remove extraneous f prefix

(F541)


1330-1330: f-string without any placeholders

Remove extraneous f prefix

(F541)


1335-1335: f-string without any placeholders

Remove extraneous f prefix

(F541)


1340-1340: f-string without any placeholders

Remove extraneous f prefix

(F541)


1409-1409: Avoid specifying long messages outside the exception class

(TRY003)


1440-1440: f-string without any placeholders

Remove extraneous f prefix

(F541)


1444-1444: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: tests
🔇 Additional comments (5)
scripts/derive_galileo_gas_parameter/claude.md (1)

1-70: Documentation looks good overall.

The documentation clearly explains the CLI options, modes, and monitoring approach. A couple of minor suggestions:

  1. The hardcoded /tmp/ path won't work on Windows - consider noting this is Linux/macOS specific or using a cross-platform alternative.
  2. Consider renaming to CLAUDE.md (uppercase) or README.md for better visibility in the directory listing.
scripts/derive_galileo_gas_parameter/show_batches.py (1)

17-19: Pickle deserialization - acceptable for local tooling.

The static analysis flags pickle.load() as a potential security issue (S301). Since this script only loads locally-generated pickle files from the companion derive_galileo_gas_parameter.py script, this is acceptable. However, adding a brief comment noting this assumption would be helpful for future maintainers.

scripts/derive_galileo_gas_parameter/derive_galileo_gas_parameter.py (3)

740-741: Verify block range calculation - potential off-by-one issue.

The last_block is calculated as max(initial_L2_block_number + num_blocks), and then range(first_block, last_block) is used. Since Python's range() excludes the end value, the last block in the range won't be fetched.

If num_blocks represents the count of blocks starting from initial_L2_block_number, the calculation should be:

  • Last block index = initial_L2_block_number + num_blocks - 1
  • Range should be range(first_block, last_block + 1) to include it

Please verify the intended behavior:

  1. Does num_blocks include or exclude the initial block?
  2. Should the last block be fetched/processed?

Also applies to: 775-775, 813-813


198-331: Well-structured transaction serialization with good type coverage.

The get_raw_transaction_from_structured function handles multiple transaction types comprehensively (Legacy, EIP-2930, EIP-1559, L1 message, and EIP-7702). The error handling for unsupported types is appropriate.


1362-1458: Well-structured main function with clear execution flow.

The main function provides a clear step-by-step execution flow with good progress reporting. The argument handling is appropriate, and returning a results dictionary enables programmatic use of the script.

Comment on lines +144 to +147
url = f"{mainnet_beacon_url}/eth/v1/beacon/headers/head"
headers = {'accept': 'application/json'}
response = requests.get(url, headers=headers)
return int(response.json()['data']['header']['message']['slot']) - (latest_block_number - l1_head)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to prevent indefinite hangs.

This requests.get() call lacks a timeout, which could cause the script to hang indefinitely if the beacon API is unresponsive.

🔎 Proposed fix
     url = f"{mainnet_beacon_url}/eth/v1/beacon/headers/head"
     headers = {'accept': 'application/json'}
-    response = requests.get(url, headers=headers)
+    response = requests.get(url, headers=headers, timeout=10)
     return int(response.json()['data']['header']['message']['slot']) - (latest_block_number - l1_head)
🧰 Tools
🪛 Ruff (0.14.10)

146-146: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
In scripts/derive_galileo_gas_parameter/derive_galileo_gas_parameter.py around
lines 144 to 147, the requests.get call has no timeout and can hang
indefinitely; modify the call to include a reasonable timeout (e.g., timeout=10)
and handle potential requests.exceptions.Timeout and other RequestException
errors by raising or returning a clear error so the script doesn't block
forever.

print("=" * 60)

# Get L1 head
mainnet_beacon_url = "https://eth-mainnetbeacon.g.alchemy.com/v2/Lzj9QLupql91nuRFYAosy"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Hardcoded API key must be moved to environment variable.

The Alchemy API key is hardcoded in the source code. This is a security risk as the key will be exposed in version control and could be abused.

🔎 Proposed fix
-    mainnet_beacon_url = "https://eth-mainnetbeacon.g.alchemy.com/v2/Lzj9QLupql91nuRFYAosy"
+    mainnet_beacon_url = os.getenv('MAINNET_BEACON_URL', '')
+    if not mainnet_beacon_url:
+        raise ValueError("MAINNET_BEACON_URL environment variable is not set. Please set it to your Ethereum beacon chain RPC endpoint.")

Also update the documentation in claude.md to mention this required environment variable.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mainnet_beacon_url = "https://eth-mainnetbeacon.g.alchemy.com/v2/Lzj9QLupql91nuRFYAosy"
mainnet_beacon_url = os.getenv('MAINNET_BEACON_URL', '')
if not mainnet_beacon_url:
raise ValueError("MAINNET_BEACON_URL environment variable is not set. Please set it to your Ethereum beacon chain RPC endpoint.")
🤖 Prompt for AI Agents
In scripts/derive_galileo_gas_parameter/derive_galileo_gas_parameter.py around
line 468 the Alchemy API key is hardcoded in mainnet_beacon_url; remove the
literal key and build the URL using an environment variable (e.g., read
ALCHEMY_API_KEY from os.environ), validate that the variable exists and raise a
clear error if missing, and ensure any tests/launch scripts use the env var;
also update claude.md to document the new ALCHEMY_API_KEY environment variable
and how to set it.

Comment on lines +800 to +806
while l1_base_fee == 0:
block = scroll_w3.eth.get_block(cur_block_num, full_transactions=True)
for tx in reversed(block['transactions']):
if tx.to == '0x5300000000000000000000000000000000000002' and tx.input.hex()[:10] == '0x39455d3a':
l1_base_fee = int.from_bytes(bytes.fromhex(tx.input.hex()[2:])[-64:-32], 'big')
l1_blob_base_fee = int.from_bytes(bytes.fromhex(tx.input.hex()[2:])[-32:], 'big')
cur_block_num -= 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential infinite loop or crash if L1 fee oracle transaction not found.

This while loop decrements cur_block_num without a bound check. If the fee oracle transaction is never found in the blocks being searched, this will either loop indefinitely or eventually crash when trying to fetch block 0 or negative block numbers.

🔎 Proposed fix
+    max_blocks_to_search = 1000  # Safety limit
+    blocks_searched = 0
     while l1_base_fee == 0:
+        if blocks_searched >= max_blocks_to_search:
+            raise RuntimeError(f"Could not find L1 fee oracle transaction in {max_blocks_to_search} blocks starting from {first_block}")
         block = scroll_w3.eth.get_block(cur_block_num, full_transactions=True)
         for tx in reversed(block['transactions']):
             if tx.to == '0x5300000000000000000000000000000000000002' and tx.input.hex()[:10] == '0x39455d3a':
                 l1_base_fee = int.from_bytes(bytes.fromhex(tx.input.hex()[2:])[-64:-32], 'big')
                 l1_blob_base_fee = int.from_bytes(bytes.fromhex(tx.input.hex()[2:])[-32:], 'big')
         cur_block_num -= 1
+        blocks_searched += 1
🤖 Prompt for AI Agents
In scripts/derive_galileo_gas_parameter/derive_galileo_gas_parameter.py around
lines 800 to 806, the while loop that decrements cur_block_num can run forever
or underflow if the target L1 fee oracle transaction is never found; add a
bounded search by introducing a minimum block number or max_lookback counter and
check it each iteration, bail out when reached (raise a clear exception or log
and exit) and wrap the block fetch in try/except to handle out-of-range
requests; optionally add a configurable parameter (e.g., MAX_LOOKBACK or
min_block_num) and use it to stop the loop gracefully instead of looping until
negative block numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants