Skip to content

Conversation

@harumiWeb
Copy link
Owner

@harumiWeb harumiWeb commented Jan 22, 2026

#34

Summary by CodeRabbit

  • New Features

    • Verbose output now includes formulas_map; Sheet data exposes formulas_map and extraction can be enabled via an option.
  • Documentation

    • README, data model, feature spec, and examples updated; new sample workbook and human-readable demo added.
  • Bug Fixes

    • More robust PDF/image export for multiple print ranges with improved page naming and restore handling.
  • Tests

    • Expanded coverage for formulas_map extraction, backend fallbacks, rendering, and pipeline behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

- Introduced `include_formulas_map` option in `StructOptions` to control formula extraction.
- Updated `ExStructEngine` to handle formulas map during extraction.
- Enhanced `SheetData` model to include `formulas_map` for storing formula locations.
- Modified tests across various modules to validate the new formula extraction feature.
- Added sample Excel file with formulas for testing purposes.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 22, 2026

Warning

Rate limit exceeded

@harumiWeb has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 30 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a new formulas_map feature: extract per-sheet formula strings with cell coordinates via openpyxl or COM, thread include_formulas_map through engine, pipeline, backends, and models; update docs, samples, tests, and CLI/docs descriptions. No other behavioral changes beyond exposing and propagating formulas_map and related flags.

Changes

Cohort / File(s) Summary
Documentation
docs/README.en.md, docs/README.ja.md, docs/agents/DATA_MODEL.md, docs/agents/FEATURE_SPEC.md, docs/agents/TASKS.md
Document formulas_map, bump data model to 0.16, add feature spec and tasks, update verbose output descriptions and README text.
Samples / Examples
sample/formula/formula.json, sample/formula/formula_json_for_llm.md
Add sample workbook JSON and human-readable example demonstrating formulas_map, colors_map, merged regions, and formulas.
Core cells / extraction
src/exstruct/core/cells.py
New dataclasses SheetFormulasMap, WorkbookFormulasMap; implement extract_sheet_formulas_map (openpyxl) and extract_sheet_formulas_map_com (COM) plus normalization helpers.
Backends
src/exstruct/core/backends/base.py, src/exstruct/core/backends/openpyxl_backend.py, src/exstruct/core/backends/com_backend.py
Add Backend.extract_formulas_map protocol method; implement guarded extract_formulas_map in Openpyxl and COM backends with warning-on-failure behavior.
Pipeline / Integration
src/exstruct/core/pipeline.py, src/exstruct/core/integrate.py
Add include_formulas_map and use_com_for_formulas handling; new pipeline steps (step_extract_formulas_map_openpyxl, step_extract_formulas_map_com); thread formulas_map_data through artifacts and sheet collection; update resolve/build logic.
Modeling / Engine / API
src/exstruct/core/modeling.py, src/exstruct/engine.py, src/exstruct/__init__.py, src/exstruct/models/__init__.py
Add formulas_map to SheetRawData/SheetData; add include_formulas_map to StructOptions; propagate option through extract API and engine.
Rendering
src/exstruct/render/__init__.py
Major internal refactor of export orchestration (print-area parsing, multi-area plans, subprocess/in-process rendering); public APIs unchanged.
Tests - core & pipeline
tests/core/*, tests/integration/test_integrate_raw_data.py, tests/models/*
Add/update tests for formula normalization, openpyxl/COM extraction, pipeline flag behavior, raw-data propagation, and expanded pipeline unit tests.
Tests - backends & rendering
tests/backends/*, tests/com/test_render_smoke.py, tests/render/test_render_init.py
Add tests for backend formulas extraction success/failure, print-area parsing, adjust rendering assertions, and broaden rendering workflow tests.
Tooling / Misc
pyproject.toml, scripts/codacy_issues.py, src/exstruct/core/workbook.py, tests/utils.py, other tests scaffolding
Lint config changes, new codacy helper CLI script, __all__ exported in workbook module, typing/test utility refinements and signature updates in test helpers.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Engine as ExStructEngine
    participant Pipeline as Pipeline
    participant Backend as Backend (Openpyxl/COM)
    participant Cells as CellsModule
    participant Models as Models

    Client->>Engine: extract(file, mode="verbose")
    Engine->>Pipeline: extract_workbook(..., include_formulas_map=True)
    Pipeline->>Pipeline: resolve_extraction_inputs(include_formulas_map=True)
    alt openpyxl path
        Pipeline->>Backend: step_extract_formulas_map_openpyxl(file_path)
        Backend->>Cells: extract_sheet_formulas_map(file_path)
    else COM path (.xls)
        Pipeline->>Backend: step_extract_formulas_map_com(workbook)
        Backend->>Cells: extract_sheet_formulas_map_com(workbook)
    end
    Cells-->>Backend: WorkbookFormulasMap
    Backend-->>Pipeline: artifacts.formulas_map_data
    Pipeline->>Pipeline: collect_sheet_raw_data(..., formulas_map_data)
    Pipeline->>Models: build SheetRawData including formulas_map
    Models-->>Engine: WorkbookData (includes SheetData.formulas_map)
    Engine-->>Client: return WorkbookData
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I hopped through sheets both wide and deep,

Found formulas where quiet numbers sleep.
openpyxl and COM hummed the tune,
Pipelines carried each formula rune.
Maps now leap — hop-hip-hooray! 🎶

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding a formula acquisition option function, which aligns with the PR's core objective of implementing formulas_map support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link

codecov-commenter commented Jan 22, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 96.84211% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/exstruct/render/__init__.py 96.20% 6 Missing ⚠️
src/exstruct/core/cells.py 96.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds formula acquisition functionality to ExStruct, enabling extraction of Excel formulas alongside cell values, colors, and other data. The PR also includes a refactoring of the rendering module to better handle print areas and multi-page exports.

Changes:

  • Added formulas_map field to SheetData model that maps formula strings to cell coordinates
  • Implemented formula extraction via both openpyxl (for .xlsx/.xlsm) and COM (for .xls files) backends
  • Formulas are extracted in "verbose" mode by default, configurable via include_formulas_map option
  • Refactored render module to handle print areas and multi-page exports with sequential numbering

Reviewed changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/exstruct/models/init.py Added formulas_map field to SheetData model
src/exstruct/core/cells.py Implemented formula extraction for openpyxl and COM backends
src/exstruct/core/backends/openpyxl_backend.py Added extract_formulas_map method to OpenpyxlBackend
src/exstruct/core/backends/com_backend.py Added extract_formulas_map method to ComBackend
src/exstruct/core/backends/base.py Added extract_formulas_map to Protocol
src/exstruct/core/pipeline.py Integrated formulas_map extraction into pipeline with .xls COM fallback
src/exstruct/core/modeling.py Updated SheetRawData to include formulas_map
src/exstruct/core/integrate.py Added include_formulas_map parameter
src/exstruct/engine.py Added include_formulas_map option to StructOptions
src/exstruct/init.py Enabled formulas_map in verbose mode
src/exstruct/render/init.py Refactored sheet image export to handle print areas and use sequential numbering
tests/**/test_*.py Updated tests with new parameter and corrected expectations
docs/README.*.md Updated documentation to mention formulas_map in verbose mode
docs/agents/TASKS.md Added task checklist for formula feature
docs/agents/FEATURE_SPEC.md Added specification for formula acquisition
docs/agents/DATA_MODEL.md Updated data model version to 0.16 with formulas_map
sample/formula/* Added sample Excel file with formulas and documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2a9e18926

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docs/README.ja.md`:
- Line 134: Translate the English list for the "**verbose**" option into
Japanese while preserving formatting and inline code elements (`colors_map`,
`formulas_map`); update the line "- **verbose**: all shapes, charts,
table_candidates, hyperlinks, `colors_map`, and `formulas_map`." to a Japanese
sentence that conveys the same items (shapes, charts, table_candidates,
hyperlinks, `colors_map`, `formulas_map`) and keep the bold "**verbose**" marker
and punctuation unchanged.

In `@src/exstruct/core/pipeline.py`:
- Around line 513-524: The step_extract_formulas_map_openpyxl function claims it
“logs and skips on failure” but currently lets exceptions propagate; wrap the
call to OpenpyxlBackend(...).extract_formulas_map() in a try/except that catches
Exception, logs the error (use the module logger or existing logging helper) and
leaves artifacts.formulas_map_data as None (or an empty safe value) instead of
raising; apply the same guarded pattern to the corresponding
step_extract_formulas_map_pandas function so both behaviors match their
docstrings (or alternatively, remove the “logs and skips on failure” text from
both docstrings if you prefer to keep exceptions propagating).

In `@src/exstruct/render/__init__.py`:
- Around line 238-274: In _export_sheet_pdf, the finally block currently does
"return" inside the cleanup exception handler which can swallow exceptions from
sheet_api.ExportAsFixedFormat; remove that return so cleanup won't suppress the
original error, and instead catch exceptions when restoring page_setup.PrintArea
and log them (e.g., logger.exception or logger.error) without re-raising there;
ensure the restoration code references page_setup and original_print_area (the
variables set earlier) and that any failure to reset PrintArea does not prevent
the original ExportAsFixedFormat exception from propagating.
🧹 Nitpick comments (5)
tests/models/test_modeling.py (1)

28-28: Consider adding an assertion for formulas_map propagation.

The test data includes formulas_map, but there's no corresponding assertion to verify it's properly transferred to the resulting SheetData. Adding an assertion would ensure the new field is correctly propagated.

♻️ Suggested addition after line 43
    assert sheet.formulas_map == {"=A1": [(1, 0)]}
docs/agents/TASKS.md (1)

5-14: Consider marking completed checklist items.
If these tasks were implemented in this PR, updating the checkboxes will keep the task list accurate.

tests/core/test_pipeline.py (1)

164-171: Consider adding a test for include_formulas_map=True.

The tests verify that include_formulas_map=None resolves to False for standard mode. Consider adding a test case that verifies:

  1. include_formulas_map=True results in the formulas extraction step being included in the pipeline
  2. The resolution behavior for verbose mode (where it should default to True)

This would provide coverage symmetry with the existing include_colors_map tests.

src/exstruct/core/cells.py (1)

70-94: Align formulas map models with Pydantic BaseModel.

These new dataclasses are structured data returned across the API surface; the coding guidelines prefer Pydantic models for validation and consistency. Consider migrating SheetFormulasMap/WorkbookFormulasMap (and optionally the existing colors map counterparts) to BaseModel.

♻️ Suggested refactor sketch
+from pydantic import BaseModel
+
-@dataclass(frozen=True)
-class SheetFormulasMap:
-    """Formula map for a single worksheet."""
-    sheet_name: str
-    formulas_map: dict[str, list[tuple[int, int]]]
+class SheetFormulasMap(BaseModel):
+    """Formula map for a single worksheet."""
+    sheet_name: str
+    formulas_map: dict[str, list[tuple[int, int]]]
 
-@dataclass(frozen=True)
-class WorkbookFormulasMap:
-    """Formula maps for all worksheets in a workbook."""
-    sheets: dict[str, SheetFormulasMap]
+class WorkbookFormulasMap(BaseModel):
+    """Formula maps for all worksheets in a workbook."""
+    sheets: dict[str, SheetFormulasMap]
 
     def get_sheet(self, sheet_name: str) -> SheetFormulasMap | None:
         ...

As per coding guidelines, please confirm whether these models should be Pydantic-based.

src/exstruct/render/__init__.py (1)

10-10: Add docstrings to the new Protocol helpers.

The guidelines call for Google-style docstrings on all classes; these Protocols are currently undocumented.

📚 Docstring addition
 class _PageSetupProtocol(Protocol):
+    """Minimal COM PageSetup surface used for PrintArea handling."""
     PrintArea: object
 
 
 class _SheetApiProtocol(Protocol):
+    """Minimal COM sheet API surface used for export and PageSetup."""
     PageSetup: _PageSetupProtocol

As per coding guidelines, please add docstrings for these Protocols.

Also applies to: 112-121

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Around line 22-27: Move the module-level token validation out of top-level
execution: remove the TOKEN/TOKEN_STR assignment and the sys.exit(1) at import
time and instead implement a get_token() function that reads CODACY_API_TOKEN
from the environment and raises a clear exception if missing; update any callers
(e.g., fetch_json) to call get_token() when they need the token (or validate in
main() before doing work), and ensure tests can import the module without
triggering sys.exit by centralizing validation in get_token() or main().

In `@tests/backends/test_print_areas_openpyxl.py`:
- Around line 55-103: Change the local variable type annotation in the
test_append_print_areas_skips_invalid_ranges test from dict[str, list[object]]
to PrintAreaData so it matches the expected dict[str, list[PrintArea]] shape
used by _append_print_areas; also add brief Google-style docstrings to each of
the five test functions
(test_extract_print_areas_from_defined_names_filters_unknown_sheets,
test_extract_print_areas_from_defined_names_without_defined_names,
test_extract_print_areas_from_sheet_props_skips_empty,
test_parse_print_area_range_invalid,
test_append_print_areas_skips_invalid_ranges) describing the purpose of the test
and expected behavior to satisfy the repository docstring convention.

In `@tests/core/test_pipeline.py`:
- Around line 594-600: The test
test_filter_rows_excluding_merged_values_returns_rows_when_intervals_empty uses
an invalid MergedCellRange where r1 > r2 (MergedCellRange(r1=2, c1=0, r2=1,
c2=1, v="A")), which is confusing; update the fixture to a valid,
non-overlapping range (e.g., set r2 >= r1 such as MergedCellRange(r1=2, c1=0,
r2=2, c2=1, v="A") or any range that does not include row 1) so the test still
exercises empty intervals but uses a well-formed MergedCellRange for
_filter_rows_excluding_merged_values and CellRow.
♻️ Duplicate comments (1)
src/exstruct/core/pipeline.py (1)

513-530: Previous review feedback has been addressed.

The try/except block now correctly wraps the formulas_map extraction, matching the docstring's "logs and skips on failure" behavior.

🧹 Nitpick comments (14)
tests/models/test_models_export.py (1)

73-74: Comment is misleading given the cast() usage.

The comment states "pytest.skipif is typed; no ignore needed" but a cast() is still applied to satisfy the type checker. Consider updating or removing this comment to accurately reflect why the cast is necessary despite pytest having type annotations.

Suggested fix
-# pytest.skipif is typed; no ignore needed
+# cast to _SkipIf satisfies mypy strict mode for decorator typing
 `@cast`(_SkipIf, pytest.mark.skipif(not HAS_PYYAML, reason="pyyaml not installed"))
scripts/codacy_issues.py (5)

111-116: Broad exception handling obscures specific errors.

Catching bare Exception can mask unexpected issues. Consider catching subprocess.SubprocessError and OSError specifically. The static analysis warning about untrusted input is mitigated since all callers pass hardcoded git commands.

Suggested refinement
 def run_git(cmd: list[str]) -> str | None:
     try:
         out = subprocess.check_output(cmd, stderr=subprocess.DEVNULL)
         return out.decode("utf-8", errors="replace").strip()
-    except Exception:
+    except (subprocess.SubprocessError, OSError):
         return None

182-194: Dead code: status check inside with block is unreachable.

urllib.request.urlopen raises HTTPError for non-2xx responses before entering the with block, so lines 186-187 never execute. The HTTPError handler at line 195 correctly handles these cases.

Remove dead code
     try:
         with urllib.request.urlopen(req, timeout=60) as res:
             raw = res.read().decode("utf-8", errors="replace")
-            status = getattr(res, "status", 0) or 0
-            if status < 200 or status >= 300:
-                raise RuntimeError(f"HTTP {status}: {raw}")
             try:
                 parsed = json.loads(raw)
             except json.JSONDecodeError as exc:

Regarding the static analysis hint about URL schemes: the assert_codacy_url function validates that URLs match the expected https://api.codacy.com origin, which mitigates the scheme injection concern.


224-231: Calling sys.exit(1) inside a formatting function is unexpected.

format_for_ai is a data transformation function, but it calls sys.exit(1) on invalid input. This couples it tightly to CLI usage and prevents reuse. Consider raising ValueError instead.

Suggested change
 def format_for_ai(raw_issues: list[dict[str, Any]], min_level: str) -> list[str]:
     min_priority = get_level_priority(min_level)
     if min_priority is None:
-        print(
-            f"Invalid --min-level: {min_level}. Valid values: {', '.join(LEVELS)}",
-            file=sys.stderr,
-        )
-        sys.exit(1)
+        raise ValueError(
+            f"Invalid min_level: {min_level}. Valid values: {', '.join(LEVELS)}"
+        )

308-308: Pointless validation of hardcoded value.

assert_valid_choice("status", "all", ...) validates a hardcoded string literal. Either expose status as a CLI argument or remove this validation.

Remove unnecessary validation
-    status = assert_valid_choice("status", "all", ["all", "open", "closed"])
+    status = "all"

Or expose it as a CLI argument if filtering by status is intended.


36-46: Missing docstrings throughout the file.

Per coding guidelines, Google-style docstrings are required for all functions and classes. Consider adding brief docstrings to at least the public API functions (fetch_repo_issues, fetch_pr_issues, format_for_ai, main).

pyproject.toml (1)

60-63: Reconsider the fix = true default.

With fix = true in the [tool.ruff] config, ruff check will auto-apply fixes by default. Since you have a dedicated ruff-fix task, consider disabling auto-fix here to require explicit --fix to apply changes, making the workflow intent clearer.

♻️ Suggested adjustment
-fix = true
+# Prefer explicit fixing via `ruff check --fix` (task: ruff-fix)
+fix = false
tests/utils.py (1)

11-30: Add a Google‑style docstring for the typed parametrize wrapper.

Consider expanding the docstring to include Args/Returns so it aligns with the project’s docstring standard. As per coding guidelines, please use Google‑style docstrings.

✍️ Suggested docstring
-    """Type-safe wrapper around pytest.mark.parametrize."""
+    """Type-safe wrapper around pytest.mark.parametrize.
+
+    Args:
+        argnames: Parameter names (comma-separated string or sequence).
+        argvalues: Parameter values for each test case.
+        indirect: Whether to treat parameters as fixtures.
+        ids: Optional case IDs or an ID factory.
+        scope: Optional fixture scope for parametrization.
+
+    Returns:
+        Decorator preserving the wrapped callable signature.
+    """
tests/render/test_render_init.py (1)

546-785: Add Google‑style docstrings for the new helper classes/tests in this block.

Several newly introduced helper classes and tests lack docstrings; please add concise Google‑style docstrings for consistency. As per coding guidelines, please add Google‑style docstrings.

src/exstruct/render/__init__.py (1)

112-121: Add docstrings for the new Protocol types.

Please add short Google‑style docstrings to _PageSetupProtocol and _SheetApiProtocol to match the project’s docstring standard. As per coding guidelines, please add Google‑style docstrings.

✍️ Suggested docstrings
 class _PageSetupProtocol(Protocol):
+    """Protocol for Excel PageSetup objects exposing PrintArea."""
     PrintArea: object
@@
 class _SheetApiProtocol(Protocol):
+    """Protocol for Excel sheet COM APIs used by render helpers."""
     PageSetup: _PageSetupProtocol
src/exstruct/core/workbook.py (1)

15-16: Reconsider exporting private function and external library in __all__.

Exporting _find_open_workbook (underscore-prefixed private function) and xw (external xlwings module) in __all__ is unconventional:

  1. The underscore prefix conventionally signals internal use; if it needs to be public, consider renaming to find_open_workbook.
  2. Re-exporting external libraries (xw) couples consumers to xlwings directly and expands the public API surface unnecessarily.

If these exports are needed for tests or specific internal modules, consider whether they should remain private or be accessed differently.

tests/backends/test_backends.py (2)

291-293: Mutable class attribute creates shared state across tests.

_DummyWorkbook.sheets is defined as a class attribute containing a mutable list with _Sheet() instances. Since _Sheet() contains _SheetApi() which has mutable state (DisplayPageBreaks), tests may interfere with each other.

♻️ Proposed fix: use instance attribute via __init__
 class _DummyWorkbook:
-    sheets = [_Sheet()]
+    def __init__(self) -> None:
+        self.sheets = [_Sheet()]

And update the test:

 def test_com_backend_extract_auto_page_breaks_success() -> None:
-    backend = ComBackend(_DummyWorkbook())
+    backend = ComBackend(_DummyWorkbook())

331-333: Same mutable class attribute issue as _DummyWorkbook.

_RestoreErrorWorkbook.sheets has the same shared mutable state problem.

♻️ Proposed fix
 class _RestoreErrorWorkbook:
-    sheets = [_RestoreErrorSheet()]
+    def __init__(self) -> None:
+        self.sheets = [_RestoreErrorSheet()]
src/exstruct/core/pipeline.py (1)

311-316: Formulas_map step configuration duplicated across modes.

The StepConfig for formulas_map_openpyxl is identical in all three mode configurations (light/standard/verbose). While the table structure allows per-mode customization, consider extracting this as a shared constant if the configuration is unlikely to diverge.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Line 316: The line setting status uses assert_valid_choice("status", "all",
["all", "open", "closed"]) to validate a hardcoded constant which is
unnecessary; either remove the validation and simply set status = "all", or make
status configurable by adding a CLI option (e.g., a --status argument parsed in
the same place as other args) and then call assert_valid_choice("status",
status, ["all","open","closed"]) to validate the user-provided value; update any
references to status accordingly.
- Around line 232-239: The function format_for_ai currently mixes validation and
transformation by calling sys.exit when get_level_priority(min_level) returns
None; remove the sys.exit call and instead raise a ValueError (e.g., raise
ValueError(f"Invalid --min-level: {min_level}. Valid values: {',
'.join(LEVELS)}")) or let the caller validate before calling format_for_ai;
update callers to perform the check using get_level_priority/LEVELS or handle
the raised exception accordingly so format_for_ai only transforms data.
- Around line 193-195: The check for non-2xx status codes using status =
getattr(res, "status", 0) or 0 and if status < 200 or status >= 300 is
unreachable because urllib.request.urlopen raises urllib.error.HTTPError for
4xx/5xx before this code runs; fix by catching urllib.error.HTTPError around the
urlopen call, extract its .code and .read() into the existing res/raw/status
variables (or re-raise a RuntimeError with HTTPError details), so the error path
produces the same RuntimeError message; alternatively remove the dead status
check if you prefer relying solely on the HTTPError exception.
♻️ Duplicate comments (1)
scripts/codacy_issues.py (1)

22-27: Module-level sys.exit(1) breaks importability and testability.

This issue was previously flagged. Moving token validation into a get_token() function or into main() would allow the module to be imported for testing without triggering an exit.

🧹 Nitpick comments (5)
scripts/codacy_issues.py (5)

36-70: Missing Google-style docstrings on utility functions.

Per coding guidelines, all functions require Google-style docstrings. Functions get_level_priority, normalize_provider, assert_valid_segment, assert_valid_choice, and encode_segment lack documentation.

Example docstring for get_level_priority
def get_level_priority(level: str | None) -> int | None:
    """Convert a severity level string to a numeric priority.

    Args:
        level: The severity level name (Error, High, Warning, Info).

    Returns:
        The priority as an integer (4 for Error, 1 for Info), or None if invalid.
    """

52-66: Validation helpers call sys.exit(1) directly, reducing testability.

Consider raising ValueError instead and handling the exit in main(). This allows unit testing these functions without mocking sys.exit.

Proposed refactor
 def assert_valid_segment(name: str, value: str, pattern: re.Pattern[str]) -> str:
+    """Validate that a value matches a pattern.
+
+    Raises:
+        ValueError: If the value is empty or doesn't match the pattern.
+    """
     if (not value) or (pattern.match(value) is None):
-        print(f"Invalid {name}: {value}", file=sys.stderr)
-        sys.exit(1)
+        raise ValueError(f"Invalid {name}: {value}")
     return value

131-132: Bare except Exception silently swallows all errors.

While returning None for git failures is reasonable, catching all exceptions could mask unexpected issues (e.g., PermissionError, KeyboardInterrupt). Consider catching more specific exceptions.

Proposed change
-    except Exception:
+    except (OSError, subprocess.SubprocessError):
         return None

135-139: Consider using Pydantic BaseModel instead of dataclass.

Per coding guidelines, structured data should use Pydantic BaseModel instances. This would provide built-in validation and serialization.

Proposed change
-from dataclasses import dataclass
+from pydantic import BaseModel

-@dataclass
-class GitRemoteInfo:
-    provider: str
-    org: str
-    repo: str
+class GitRemoteInfo(BaseModel):
+    """Git remote origin information."""
+    provider: str
+    org: str
+    repo: str

330-338: Consider using a Pydantic model for the output payload.

Per coding guidelines, structured data should return Pydantic BaseModel instances. This would provide schema validation and easier serialization.

Example model
class CodacyIssuesOutput(BaseModel):
    """Output payload for Codacy issues."""
    scope: str
    organization: str
    repository: str
    pullRequest: str | None
    minLevel: str
    total: int
    issues: list[str]

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Around line 233-343: The build_payload() function currently returns a plain
dict; change it to return a Pydantic BaseModel (e.g., BuildPayloadModel)
instance instead: define a Pydantic model representing the payload fields used
in build_payload(), construct and return that model from build_payload(), and
update any callers that expect a dict to accept the model (or call .model_dump()
/ .model_dump_json() for serialization when a JSON string is needed—use
.model_dump_json() per Pydantic v2.12.5). Ensure fetch_json/format_for_ai or
other consumers that relied on dict semantics either access model attributes or
use .model_dump() to retain compatibility. Keep function name build_payload()
and update its type hints to return the new BaseModel type.

In `@src/exstruct/render/__init__.py`:
- Around line 112-210: The docstrings for the new protocol classes and helper
functions (_PageSetupProtocol, _SheetApiProtocol, _iter_sheet_apis,
_build_sheet_export_plan, _extract_print_areas, _split_csv_respecting_quotes)
need to be converted to Google-style docstrings: for each Protocol add a short
summary plus an Attributes section describing fields (e.g., PrintArea on
_PageSetupProtocol; PageSetup and ExportAsFixedFormat on _SheetApiProtocol), and
for each function include a short summary plus Args and Returns sections
describing parameter types and returned values (e.g., wb: xw.Book ->
list[tuple[int,str,_SheetApiProtocol]] for _iter_sheet_apis, and explain the
tuple structure for _build_sheet_export_plan and list[str] for
_split_csv_respecting_quotes/_extract_print_areas); keep wording concise and
consistent with existing project docstring style.
- Around line 245-282: In _export_sheet_pdf: when the COM ExportAsFixedFormat
fallback (the TypeError branch) is used, the previously set page_setup.PrintArea
can prevent ignoring print areas on older Excel; if ignore_print_areas is True
and page_setup is not None, clear page_setup.PrintArea (set to an empty string
or None) before calling the fallback ExportAsFixedFormat(0, str(pdf_path)), then
restore original_print_area in the existing finally block (page_setup.PrintArea
= original_print_area) as currently implemented; ensure this only happens for
the fallback path so the primary call can still pass IgnorePrintAreas when
supported.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 23, 2026

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #46

Docstrings generation was requested by @harumiWeb.

* #44 (comment)

The following files were modified:

* `scripts/codacy_issues.py`
* `src/exstruct/__init__.py`
* `src/exstruct/core/backends/base.py`
* `src/exstruct/core/backends/com_backend.py`
* `src/exstruct/core/backends/openpyxl_backend.py`
* `src/exstruct/core/cells.py`
* `src/exstruct/core/integrate.py`
* `src/exstruct/core/pipeline.py`
* `src/exstruct/core/workbook.py`
* `src/exstruct/engine.py`
* `src/exstruct/render/__init__.py`
* `tests/backends/test_auto_page_breaks.py`
* `tests/backends/test_backends.py`
* `tests/backends/test_print_areas_openpyxl.py`
* `tests/com/test_render_smoke.py`
* `tests/core/test_cells_utils.py`
* `tests/core/test_mode_output.py`
* `tests/core/test_pipeline.py`
* `tests/core/test_pipeline_fallbacks.py`
* `tests/engine/test_engine.py`
* `tests/models/test_models_export.py`
* `tests/render/test_render_init.py`
* `tests/utils.py`
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Around line 65-75: The docstrings in functions such as normalize_provider
contain blank lines that consist only of whitespace (causing Ruff W293); remove
trailing spaces from those blank lines or delete the blank lines so each
docstring has no whitespace-only lines. Update normalize_provider's docstring
and the other affected docstrings (around the segments referenced, e.g., the
blocks at ~117-136 and ~145-153) to strip trailing spaces on empty lines or
eliminate them entirely so no docstring lines are only whitespace.

In `@src/exstruct/core/backends/base.py`:
- Around line 44-50: The docstring for extract_formulas_map contains a trailing
whitespace-only blank line that triggers Ruff W293; edit the
extract_formulas_map function's docstring to remove the empty whitespace line
(ensure the closing triple quotes follow the last line of text with no extra
spaces or blank line) so the docstring ends cleanly without trailing whitespace.

In `@src/exstruct/core/cells.py`:
- Around line 151-184: The code currently computes col_index as start_col +
c_offset - 1 in extract_sheet_formulas_map_com which yields zero-based column
indices but the function docstring promises Excel 1-based indices; fix by making
the column index 1-based (remove the "- 1" adjustment) so formulas_map stores
(row, column) tuples using Excel 1-based indices (ensure this change is applied
where col_index is assigned and that SheetFormulasMap usage remains consistent);
alternatively, if you prefer zero-based indices, update the docstring to state
zero-based columns—pick one approach and make both the col_index assignment and
the docstring consistent.

In `@src/exstruct/render/__init__.py`:
- Around line 82-90: The docstring that starts "Export each worksheet in the
given Excel workbook to PNG files and return the image paths in workbook order."
contains tab characters in the Returns: and Raises: blocks causing Ruff E101;
open the module-level/function/class docstring in
src/exstruct/render/__init__.py that contains that text and replace the tab
characters with spaces in the indented "Returns:" and "Raises:" lines (and do
the same for the other docstring instance with the same tabbed blocks later in
the file) so all docstring indentation uses spaces only.
- Around line 142-153: The docstring for ExportAsFixedFormat is incorrectly
placed on the same line as the signature causing a parsing/IndentationError;
open the docstring on its own indented line directly below the def line and
indent the triple-quoted string to match the function body, moving the existing
docstring content into that properly formatted block so ExportAsFixedFormat has
a normal multi-line docstring.

In `@tests/backends/test_backends.py`:
- Around line 78-96: Remove the trailing whitespace on the blank line inside the
fake_formulas_map docstring in the
test_openpyxl_backend_extract_formulas_map_returns_none_on_failure test; open
the fake_formulas_map function (used to monkeypatch extract_sheet_formulas_map)
and delete any spaces at the end of the empty docstring line so the docstring
has no trailing whitespace, then run the linter to verify W293 is resolved.

In `@tests/core/test_cells_utils.py`:
- Around line 150-161: The docstring in the method range of the dummy class
contains mixed tabs and spaces and trailing whitespace on blank lines; update
the range(self, _start: object, _end: object) -> _DummyRange docstring to use
consistent spaces for indentation (no tabs) and remove any trailing whitespace
on blank lines so Ruff W293 and E101 are resolved, keeping the same docstring
content and returning _DummyRange() unchanged.

In `@tests/core/test_pipeline.py`:
- Around line 916-928: The docstring in the _step function contains mixed tabs
and spaces causing Ruff E101; fix by replacing the tabs used for indentation
inside the docstring (lines describing Parameters and their types) with
consistent spaces so the docstring uses only spaces and aligns with the
surrounding indentation, ensuring the triple-quoted string and its internal
lines are consistently indented in the _step function definition.
♻️ Duplicate comments (2)
src/exstruct/render/__init__.py (1)

323-356: Fallback ExportAsFixedFormat still honors PrintArea when IgnorePrintAreas isn’t supported.

This is the same concern previously raised: if the COM call doesn’t accept IgnorePrintAreas, the fallback should clear PrintArea when ignore_print_areas=True to avoid exporting only the print area.

scripts/codacy_issues.py (1)

499-530: Return a Pydantic model for the payload instead of a dict.

build_payload produces structured output; per project guidelines it should return a Pydantic BaseModel and use .model_dump_json() for serialization.

As per coding guidelines, return Pydantic BaseModel instances for structured data.

🧹 Nitpick comments (6)
src/exstruct/render/__init__.py (2)

116-128: Consider reusing the existing sheet-name sanitizer to avoid drift.

There’s already a _sanitize_sheet_filename in src/exstruct/io/__init__.py. Reusing it (or moving to a shared util) would prevent future divergence.


206-267: Avoid duplicate CSV-splitting logic; centralize helper.

_split_csv_respecting_quotes already exists in src/exstruct/core/backends/com_backend.py. Consider a shared utility to avoid subtle parsing drift over time.

src/exstruct/core/backends/openpyxl_backend.py (1)

119-132: Minor docstring style inconsistency.

The docstring uses Parameters: style while other methods in this file use Args: style (e.g., extract_cells, extract_colors_map). Consider aligning with the existing convention for consistency.

Suggested fix
     def detect_tables(self, sheet_name: str) -> list[str]:
         """
         Detects table candidate ranges within the specified worksheet.
-        
-        Parameters:
-            sheet_name (str): Name of the worksheet to analyze for table candidates.
-        
+
+        Args:
+            sheet_name: Name of the worksheet to analyze for table candidates.
+
         Returns:
-            list[str]: Detected table candidate ranges as A1-style range strings; empty list if none are found or detection fails.
+            Detected table candidate ranges as A1-style range strings;
+            empty list if none are found or detection fails.
         """
src/exstruct/core/workbook.py (2)

15-16: Consider removing private function from __all__.

Exporting _find_open_workbook in __all__ exposes an internal helper as part of the public API. By Python convention, the leading underscore indicates it's private. If external consumers need this functionality, consider renaming it without the underscore or keeping it private.

Suggested fix
-__all__ = ["openpyxl_workbook", "xlwings_workbook", "_find_open_workbook", "xw"]
+__all__ = ["openpyxl_workbook", "xlwings_workbook", "xw"]

22-32: Docstring style inconsistency with xlwings_workbook.

This docstring uses Parameters: style while xlwings_workbook (line 64-71) uses Args: style. Consider aligning for consistency within the module.

src/exstruct/core/pipeline.py (1)

624-642: Incomplete docstring: missing parameter documentation.

The docstring's Parameters section only documents workbook but omits inputs and artifacts. Per coding guidelines, Google-style docstrings should document all parameters.

📝 Suggested docstring update
 def step_extract_formulas_map_com(
     inputs: ExtractionInputs, artifacts: ExtractionArtifacts, workbook: xw.Book
 ) -> None:
     """
     Extract the workbook's formulas map via COM and store it into the artifacts.
     
     On success assigns the extracted WorkbookFormulasMap to artifacts.formulas_map_data.
     On failure leaves artifacts.formulas_map_data unchanged and logs a warning.
     
     Parameters:
+        inputs (ExtractionInputs): Pipeline inputs (unused but required by step signature).
+        artifacts (ExtractionArtifacts): Mutable container to receive the extracted formulas map.
         workbook (xlwings.Book): COM workbook to extract formulas from.
     """

…ultiple modules

- Updated docstrings in `src/exstruct/render/__init__.py` to improve formatting and clarity.
- Enhanced comments in test files to provide better context and consistency in style.
- Removed unnecessary whitespace in various docstrings and comments for cleaner code.
- Ensured all docstrings follow a consistent format, including parameter and return descriptions.
- Improved readability of test cases by clarifying the purpose and expected outcomes.
@harumiWeb harumiWeb added the enhancement New feature or request label Jan 23, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Around line 299-393: Replace the untyped dicts with Pydantic models: define
models (e.g., PatternInfo, Issue, and IssuesResponse or PaginatedIssues) that
include fields used by format_for_ai (filePath, lineNumber, message, and nested
patternInfo properties such as id/name/metadata), then update fetch_json to
return a parsed Pydantic model (or make it generic to return raw dict only
internally), change fetch_repo_issues to return IssuesResponse (call fetch_json
and pass the result into IssuesResponse.parse_obj/parse_raw) and change
fetch_pr_issues similarly; ensure function signatures and return types reflect
the concrete models and that JSON decoding errors still raise RuntimeError as
before.

In `@src/exstruct/render/__init__.py`:
- Around line 352-364: The fallback currently only clears PrintArea when
page_setup is None, which misses cases where print_area was set earlier; update
the except TypeError block so that if ignore_print_areas is True you obtain
page_setup = page_setup or getattr(sheet_api, "PageSetup", None) and then clear
page_setup.PrintArea = "" when page_setup is not None (inside the existing
try/except), before calling sheet_api.ExportAsFixedFormat(0, str(pdf_path));
ensure you still catch and log exceptions via logger.debug as currently done.
♻️ Duplicate comments (3)
scripts/codacy_issues.py (1)

499-592: Build payload as a Pydantic BaseModel (not a dict).

build_payload returns a dict, but structured outputs should be BaseModels. Update the payload to a Pydantic model and serialize with model_dump_json() in main(). Based on learnings and coding guidelines, please align payload construction with BaseModel usage.

♻️ Proposed fix
@@
-import urllib.parse
-import urllib.request
+import urllib.parse
+import urllib.request
+
+from pydantic import BaseModel
@@
+class CodacyPayload(BaseModel):
+    scope: str
+    organization: str
+    repository: str
+    pullRequest: str | None
+    minLevel: str
+    total: int
+    issues: list[str]
+
 def build_payload(
@@
-) -> dict[str, object]:
+) -> CodacyPayload:
@@
-    return {
-        "scope": "pull_request" if pr else "repository",
-        "organization": org,
-        "repository": repo,
-        "pullRequest": pr if pr else None,
-        "minLevel": min_level,
-        "total": len(issues),
-        "issues": issues,
-    }
+    return CodacyPayload(
+        scope="pull_request" if pr else "repository",
+        organization=org,
+        repository=repo,
+        pullRequest=pr if pr else None,
+        minLevel=min_level,
+        total=len(issues),
+        issues=issues,
+    )
@@
-    sys.stdout.write(json.dumps(payload, ensure_ascii=False, indent=2) + "\n")
+    sys.stdout.write(payload.model_dump_json(ensure_ascii=False, indent=2) + "\n")
src/exstruct/core/cells.py (1)

151-183: Docstring still mismatches column index base.
The COM path computes zero-based columns (start_col + c_offset - 1) but the docstring says 1-based. Please align the docstring (or the calculation) to avoid misleading API users.

src/exstruct/render/__init__.py (1)

82-90: Align new render docstrings with Google-style sections.

Several new docstrings (e.g., Line 82, Line 117, Line 131, Line 156, Line 206, Line 270, Line 330, Line 373, Line 400, Line 480, Line 509, Line 535) use Parameters: blocks and return variable names. Please switch to Google-style Args/Returns, and add Attributes for the Protocols (Line 131–153). As per coding guidelines, please keep docstrings consistent.

Also applies to: 116-127, 131-153, 156-242, 270-310, 330-385, 400-413, 480-488, 515-542

🧹 Nitpick comments (7)
scripts/codacy_issues.py (1)

42-114: Switch docstrings to Google style (Args/Returns/Raises).

Several functions in this block (and elsewhere in this file) use “Parameters/Returns” sections. Please update to Google-style consistently. As per coding guidelines, please align docstrings across the file.

src/exstruct/core/workbook.py (1)

15-16: Consider reviewing the __all__ export list contents.

The __all__ list includes _find_open_workbook (a private function by naming convention) and xw (the xlwings module itself). Typically, __all__ contains only public API symbols defined by the module. Including a private function suggests it may be needed for testing—if so, consider either renaming it to remove the underscore or documenting why it's exported. Re-exporting xw may cause confusion about whether this module is the canonical import location for xlwings.

src/exstruct/render/__init__.py (3)

116-128: Avoid duplicating sheet-name sanitization logic.

Line 116 introduces _sanitize_sheet_filename, which duplicates the helper already in src/exstruct/io/__init__.py. Reusing the shared utility (or moving it into a shared module) will prevent drift. As per coding guidelines, prefer DRY helpers.


230-267: Reuse the existing _split_csv_respecting_quotes helper.

Line 230 duplicates the implementation in src/exstruct/core/backends/com_backend.py. Centralizing this logic will prevent subtle parsing divergence. As per coding guidelines, avoid duplicate implementations.


156-203: Prefer a Pydantic model for sheet export plan entries.

Lines 156–203 return tuples for structured plan entries. The guidelines prefer Pydantic BaseModel instances for structured data to reduce tuple-index errors and improve typing. Consider introducing a small model (e.g., SheetExportPlanItem) for (index, name, api, print_area). As per coding guidelines, prefer BaseModel return types for structured data.

tests/core/test_pipeline.py (1)

225-233: Align helper docstrings to Google style where present.

Several nested helpers/classes (e.g., Line 225, Line 403, Line 447, Line 503, Line 544, Line 597, Line 633, Line 701, Line 739, Line 788, Line 821, Line 861, Line 916, Line 955, Line 991, Line 1004, Line 1015, Line 1026) use Parameters: blocks. Please switch to Google-style Args/Returns to match repo standards. As per coding guidelines, keep docstrings consistent.

Also applies to: 403-413, 447-472, 503-508, 544-552, 597-602, 633-638, 701-710, 739-748, 788-794, 821-826, 861-869, 916-926, 955-985, 991-999, 1004-1013, 1015-1023, 1026-1051

src/exstruct/core/pipeline.py (1)

275-283: Use Google-style Args/Returns in updated docstrings.

New/updated docstrings (e.g., Line 275, Line 501, Line 515, Line 610, Line 624, Line 671, Line 692, Line 713, Line 833, Line 881, Line 962) use Parameters: blocks. Please convert to Google-style Args/Returns and ensure all parameters are documented. As per coding guidelines, keep docstrings consistent.

Also applies to: 501-510, 515-526, 610-620, 624-635, 671-683, 692-704, 713-729, 833-853, 881-889, 962-972

…ックエンドがエンドツーエンドで `formulas_map` を伝播。印刷エリアのエクスポートの堅牢性を向上。
@harumiWeb harumiWeb merged commit 54e1ab0 into main Jan 23, 2026
9 checks passed
@harumiWeb harumiWeb deleted the dev/formula branch January 23, 2026 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants