-
Notifications
You must be signed in to change notification settings - Fork 15
Added formula acquisition option function #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced `include_formulas_map` option in `StructOptions` to control formula extraction. - Updated `ExStructEngine` to handle formulas map during extraction. - Enhanced `SheetData` model to include `formulas_map` for storing formula locations. - Modified tests across various modules to validate the new formula extraction feature. - Added sample Excel file with formulas for testing purposes.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a new formulas_map feature: extract per-sheet formula strings with cell coordinates via openpyxl or COM, thread include_formulas_map through engine, pipeline, backends, and models; update docs, samples, tests, and CLI/docs descriptions. No other behavioral changes beyond exposing and propagating formulas_map and related flags. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Engine as ExStructEngine
participant Pipeline as Pipeline
participant Backend as Backend (Openpyxl/COM)
participant Cells as CellsModule
participant Models as Models
Client->>Engine: extract(file, mode="verbose")
Engine->>Pipeline: extract_workbook(..., include_formulas_map=True)
Pipeline->>Pipeline: resolve_extraction_inputs(include_formulas_map=True)
alt openpyxl path
Pipeline->>Backend: step_extract_formulas_map_openpyxl(file_path)
Backend->>Cells: extract_sheet_formulas_map(file_path)
else COM path (.xls)
Pipeline->>Backend: step_extract_formulas_map_com(workbook)
Backend->>Cells: extract_sheet_formulas_map_com(workbook)
end
Cells-->>Backend: WorkbookFormulasMap
Backend-->>Pipeline: artifacts.formulas_map_data
Pipeline->>Pipeline: collect_sheet_raw_data(..., formulas_map_data)
Pipeline->>Models: build SheetRawData including formulas_map
Models-->>Engine: WorkbookData (includes SheetData.formulas_map)
Engine-->>Client: return WorkbookData
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds formula acquisition functionality to ExStruct, enabling extraction of Excel formulas alongside cell values, colors, and other data. The PR also includes a refactoring of the rendering module to better handle print areas and multi-page exports.
Changes:
- Added
formulas_mapfield toSheetDatamodel that maps formula strings to cell coordinates - Implemented formula extraction via both openpyxl (for .xlsx/.xlsm) and COM (for .xls files) backends
- Formulas are extracted in "verbose" mode by default, configurable via
include_formulas_mapoption - Refactored render module to handle print areas and multi-page exports with sequential numbering
Reviewed changes
Copilot reviewed 27 out of 28 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/exstruct/models/init.py | Added formulas_map field to SheetData model |
| src/exstruct/core/cells.py | Implemented formula extraction for openpyxl and COM backends |
| src/exstruct/core/backends/openpyxl_backend.py | Added extract_formulas_map method to OpenpyxlBackend |
| src/exstruct/core/backends/com_backend.py | Added extract_formulas_map method to ComBackend |
| src/exstruct/core/backends/base.py | Added extract_formulas_map to Protocol |
| src/exstruct/core/pipeline.py | Integrated formulas_map extraction into pipeline with .xls COM fallback |
| src/exstruct/core/modeling.py | Updated SheetRawData to include formulas_map |
| src/exstruct/core/integrate.py | Added include_formulas_map parameter |
| src/exstruct/engine.py | Added include_formulas_map option to StructOptions |
| src/exstruct/init.py | Enabled formulas_map in verbose mode |
| src/exstruct/render/init.py | Refactored sheet image export to handle print areas and use sequential numbering |
| tests/**/test_*.py | Updated tests with new parameter and corrected expectations |
| docs/README.*.md | Updated documentation to mention formulas_map in verbose mode |
| docs/agents/TASKS.md | Added task checklist for formula feature |
| docs/agents/FEATURE_SPEC.md | Added specification for formula acquisition |
| docs/agents/DATA_MODEL.md | Updated data model version to 0.16 with formulas_map |
| sample/formula/* | Added sample Excel file with formulas and documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f2a9e18926
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@docs/README.ja.md`:
- Line 134: Translate the English list for the "**verbose**" option into
Japanese while preserving formatting and inline code elements (`colors_map`,
`formulas_map`); update the line "- **verbose**: all shapes, charts,
table_candidates, hyperlinks, `colors_map`, and `formulas_map`." to a Japanese
sentence that conveys the same items (shapes, charts, table_candidates,
hyperlinks, `colors_map`, `formulas_map`) and keep the bold "**verbose**" marker
and punctuation unchanged.
In `@src/exstruct/core/pipeline.py`:
- Around line 513-524: The step_extract_formulas_map_openpyxl function claims it
“logs and skips on failure” but currently lets exceptions propagate; wrap the
call to OpenpyxlBackend(...).extract_formulas_map() in a try/except that catches
Exception, logs the error (use the module logger or existing logging helper) and
leaves artifacts.formulas_map_data as None (or an empty safe value) instead of
raising; apply the same guarded pattern to the corresponding
step_extract_formulas_map_pandas function so both behaviors match their
docstrings (or alternatively, remove the “logs and skips on failure” text from
both docstrings if you prefer to keep exceptions propagating).
In `@src/exstruct/render/__init__.py`:
- Around line 238-274: In _export_sheet_pdf, the finally block currently does
"return" inside the cleanup exception handler which can swallow exceptions from
sheet_api.ExportAsFixedFormat; remove that return so cleanup won't suppress the
original error, and instead catch exceptions when restoring page_setup.PrintArea
and log them (e.g., logger.exception or logger.error) without re-raising there;
ensure the restoration code references page_setup and original_print_area (the
variables set earlier) and that any failure to reset PrintArea does not prevent
the original ExportAsFixedFormat exception from propagating.
🧹 Nitpick comments (5)
tests/models/test_modeling.py (1)
28-28: Consider adding an assertion forformulas_mappropagation.The test data includes
formulas_map, but there's no corresponding assertion to verify it's properly transferred to the resultingSheetData. Adding an assertion would ensure the new field is correctly propagated.♻️ Suggested addition after line 43
assert sheet.formulas_map == {"=A1": [(1, 0)]}docs/agents/TASKS.md (1)
5-14: Consider marking completed checklist items.
If these tasks were implemented in this PR, updating the checkboxes will keep the task list accurate.tests/core/test_pipeline.py (1)
164-171: Consider adding a test forinclude_formulas_map=True.The tests verify that
include_formulas_map=Noneresolves toFalsefor standard mode. Consider adding a test case that verifies:
include_formulas_map=Trueresults in the formulas extraction step being included in the pipeline- The resolution behavior for verbose mode (where it should default to
True)This would provide coverage symmetry with the existing
include_colors_maptests.src/exstruct/core/cells.py (1)
70-94: Align formulas map models with Pydantic BaseModel.These new dataclasses are structured data returned across the API surface; the coding guidelines prefer Pydantic models for validation and consistency. Consider migrating SheetFormulasMap/WorkbookFormulasMap (and optionally the existing colors map counterparts) to BaseModel.
♻️ Suggested refactor sketch
+from pydantic import BaseModel + -@dataclass(frozen=True) -class SheetFormulasMap: - """Formula map for a single worksheet.""" - sheet_name: str - formulas_map: dict[str, list[tuple[int, int]]] +class SheetFormulasMap(BaseModel): + """Formula map for a single worksheet.""" + sheet_name: str + formulas_map: dict[str, list[tuple[int, int]]] -@dataclass(frozen=True) -class WorkbookFormulasMap: - """Formula maps for all worksheets in a workbook.""" - sheets: dict[str, SheetFormulasMap] +class WorkbookFormulasMap(BaseModel): + """Formula maps for all worksheets in a workbook.""" + sheets: dict[str, SheetFormulasMap] def get_sheet(self, sheet_name: str) -> SheetFormulasMap | None: ...As per coding guidelines, please confirm whether these models should be Pydantic-based.
src/exstruct/render/__init__.py (1)
10-10: Add docstrings to the new Protocol helpers.The guidelines call for Google-style docstrings on all classes; these Protocols are currently undocumented.
📚 Docstring addition
class _PageSetupProtocol(Protocol): + """Minimal COM PageSetup surface used for PrintArea handling.""" PrintArea: object class _SheetApiProtocol(Protocol): + """Minimal COM sheet API surface used for export and PageSetup.""" PageSetup: _PageSetupProtocolAs per coding guidelines, please add docstrings for these Protocols.
Also applies to: 112-121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Around line 22-27: Move the module-level token validation out of top-level
execution: remove the TOKEN/TOKEN_STR assignment and the sys.exit(1) at import
time and instead implement a get_token() function that reads CODACY_API_TOKEN
from the environment and raises a clear exception if missing; update any callers
(e.g., fetch_json) to call get_token() when they need the token (or validate in
main() before doing work), and ensure tests can import the module without
triggering sys.exit by centralizing validation in get_token() or main().
In `@tests/backends/test_print_areas_openpyxl.py`:
- Around line 55-103: Change the local variable type annotation in the
test_append_print_areas_skips_invalid_ranges test from dict[str, list[object]]
to PrintAreaData so it matches the expected dict[str, list[PrintArea]] shape
used by _append_print_areas; also add brief Google-style docstrings to each of
the five test functions
(test_extract_print_areas_from_defined_names_filters_unknown_sheets,
test_extract_print_areas_from_defined_names_without_defined_names,
test_extract_print_areas_from_sheet_props_skips_empty,
test_parse_print_area_range_invalid,
test_append_print_areas_skips_invalid_ranges) describing the purpose of the test
and expected behavior to satisfy the repository docstring convention.
In `@tests/core/test_pipeline.py`:
- Around line 594-600: The test
test_filter_rows_excluding_merged_values_returns_rows_when_intervals_empty uses
an invalid MergedCellRange where r1 > r2 (MergedCellRange(r1=2, c1=0, r2=1,
c2=1, v="A")), which is confusing; update the fixture to a valid,
non-overlapping range (e.g., set r2 >= r1 such as MergedCellRange(r1=2, c1=0,
r2=2, c2=1, v="A") or any range that does not include row 1) so the test still
exercises empty intervals but uses a well-formed MergedCellRange for
_filter_rows_excluding_merged_values and CellRow.
♻️ Duplicate comments (1)
src/exstruct/core/pipeline.py (1)
513-530: Previous review feedback has been addressed.The try/except block now correctly wraps the formulas_map extraction, matching the docstring's "logs and skips on failure" behavior.
🧹 Nitpick comments (14)
tests/models/test_models_export.py (1)
73-74: Comment is misleading given thecast()usage.The comment states "pytest.skipif is typed; no ignore needed" but a
cast()is still applied to satisfy the type checker. Consider updating or removing this comment to accurately reflect why the cast is necessary despite pytest having type annotations.Suggested fix
-# pytest.skipif is typed; no ignore needed +# cast to _SkipIf satisfies mypy strict mode for decorator typing `@cast`(_SkipIf, pytest.mark.skipif(not HAS_PYYAML, reason="pyyaml not installed"))scripts/codacy_issues.py (5)
111-116: Broad exception handling obscures specific errors.Catching bare
Exceptioncan mask unexpected issues. Consider catchingsubprocess.SubprocessErrorandOSErrorspecifically. The static analysis warning about untrusted input is mitigated since all callers pass hardcoded git commands.Suggested refinement
def run_git(cmd: list[str]) -> str | None: try: out = subprocess.check_output(cmd, stderr=subprocess.DEVNULL) return out.decode("utf-8", errors="replace").strip() - except Exception: + except (subprocess.SubprocessError, OSError): return None
182-194: Dead code: status check insidewithblock is unreachable.
urllib.request.urlopenraisesHTTPErrorfor non-2xx responses before entering thewithblock, so lines 186-187 never execute. TheHTTPErrorhandler at line 195 correctly handles these cases.Remove dead code
try: with urllib.request.urlopen(req, timeout=60) as res: raw = res.read().decode("utf-8", errors="replace") - status = getattr(res, "status", 0) or 0 - if status < 200 or status >= 300: - raise RuntimeError(f"HTTP {status}: {raw}") try: parsed = json.loads(raw) except json.JSONDecodeError as exc:Regarding the static analysis hint about URL schemes: the
assert_codacy_urlfunction validates that URLs match the expectedhttps://api.codacy.comorigin, which mitigates the scheme injection concern.
224-231: Callingsys.exit(1)inside a formatting function is unexpected.
format_for_aiis a data transformation function, but it callssys.exit(1)on invalid input. This couples it tightly to CLI usage and prevents reuse. Consider raisingValueErrorinstead.Suggested change
def format_for_ai(raw_issues: list[dict[str, Any]], min_level: str) -> list[str]: min_priority = get_level_priority(min_level) if min_priority is None: - print( - f"Invalid --min-level: {min_level}. Valid values: {', '.join(LEVELS)}", - file=sys.stderr, - ) - sys.exit(1) + raise ValueError( + f"Invalid min_level: {min_level}. Valid values: {', '.join(LEVELS)}" + )
308-308: Pointless validation of hardcoded value.
assert_valid_choice("status", "all", ...)validates a hardcoded string literal. Either exposestatusas a CLI argument or remove this validation.Remove unnecessary validation
- status = assert_valid_choice("status", "all", ["all", "open", "closed"]) + status = "all"Or expose it as a CLI argument if filtering by status is intended.
36-46: Missing docstrings throughout the file.Per coding guidelines, Google-style docstrings are required for all functions and classes. Consider adding brief docstrings to at least the public API functions (
fetch_repo_issues,fetch_pr_issues,format_for_ai,main).pyproject.toml (1)
60-63: Reconsider thefix = truedefault.With
fix = truein the [tool.ruff] config,ruff checkwill auto-apply fixes by default. Since you have a dedicatedruff-fixtask, consider disabling auto-fix here to require explicit--fixto apply changes, making the workflow intent clearer.♻️ Suggested adjustment
-fix = true +# Prefer explicit fixing via `ruff check --fix` (task: ruff-fix) +fix = falsetests/utils.py (1)
11-30: Add a Google‑style docstring for the typedparametrizewrapper.Consider expanding the docstring to include
Args/Returnsso it aligns with the project’s docstring standard. As per coding guidelines, please use Google‑style docstrings.✍️ Suggested docstring
- """Type-safe wrapper around pytest.mark.parametrize.""" + """Type-safe wrapper around pytest.mark.parametrize. + + Args: + argnames: Parameter names (comma-separated string or sequence). + argvalues: Parameter values for each test case. + indirect: Whether to treat parameters as fixtures. + ids: Optional case IDs or an ID factory. + scope: Optional fixture scope for parametrization. + + Returns: + Decorator preserving the wrapped callable signature. + """tests/render/test_render_init.py (1)
546-785: Add Google‑style docstrings for the new helper classes/tests in this block.Several newly introduced helper classes and tests lack docstrings; please add concise Google‑style docstrings for consistency. As per coding guidelines, please add Google‑style docstrings.
src/exstruct/render/__init__.py (1)
112-121: Add docstrings for the new Protocol types.Please add short Google‑style docstrings to
_PageSetupProtocoland_SheetApiProtocolto match the project’s docstring standard. As per coding guidelines, please add Google‑style docstrings.✍️ Suggested docstrings
class _PageSetupProtocol(Protocol): + """Protocol for Excel PageSetup objects exposing PrintArea.""" PrintArea: object @@ class _SheetApiProtocol(Protocol): + """Protocol for Excel sheet COM APIs used by render helpers.""" PageSetup: _PageSetupProtocolsrc/exstruct/core/workbook.py (1)
15-16: Reconsider exporting private function and external library in__all__.Exporting
_find_open_workbook(underscore-prefixed private function) andxw(external xlwings module) in__all__is unconventional:
- The underscore prefix conventionally signals internal use; if it needs to be public, consider renaming to
find_open_workbook.- Re-exporting external libraries (
xw) couples consumers to xlwings directly and expands the public API surface unnecessarily.If these exports are needed for tests or specific internal modules, consider whether they should remain private or be accessed differently.
tests/backends/test_backends.py (2)
291-293: Mutable class attribute creates shared state across tests.
_DummyWorkbook.sheetsis defined as a class attribute containing a mutable list with_Sheet()instances. Since_Sheet()contains_SheetApi()which has mutable state (DisplayPageBreaks), tests may interfere with each other.♻️ Proposed fix: use instance attribute via __init__
class _DummyWorkbook: - sheets = [_Sheet()] + def __init__(self) -> None: + self.sheets = [_Sheet()]And update the test:
def test_com_backend_extract_auto_page_breaks_success() -> None: - backend = ComBackend(_DummyWorkbook()) + backend = ComBackend(_DummyWorkbook())
331-333: Same mutable class attribute issue as_DummyWorkbook.
_RestoreErrorWorkbook.sheetshas the same shared mutable state problem.♻️ Proposed fix
class _RestoreErrorWorkbook: - sheets = [_RestoreErrorSheet()] + def __init__(self) -> None: + self.sheets = [_RestoreErrorSheet()]src/exstruct/core/pipeline.py (1)
311-316: Formulas_map step configuration duplicated across modes.The
StepConfigforformulas_map_openpyxlis identical in all three mode configurations (light/standard/verbose). While the table structure allows per-mode customization, consider extracting this as a shared constant if the configuration is unlikely to diverge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Line 316: The line setting status uses assert_valid_choice("status", "all",
["all", "open", "closed"]) to validate a hardcoded constant which is
unnecessary; either remove the validation and simply set status = "all", or make
status configurable by adding a CLI option (e.g., a --status argument parsed in
the same place as other args) and then call assert_valid_choice("status",
status, ["all","open","closed"]) to validate the user-provided value; update any
references to status accordingly.
- Around line 232-239: The function format_for_ai currently mixes validation and
transformation by calling sys.exit when get_level_priority(min_level) returns
None; remove the sys.exit call and instead raise a ValueError (e.g., raise
ValueError(f"Invalid --min-level: {min_level}. Valid values: {',
'.join(LEVELS)}")) or let the caller validate before calling format_for_ai;
update callers to perform the check using get_level_priority/LEVELS or handle
the raised exception accordingly so format_for_ai only transforms data.
- Around line 193-195: The check for non-2xx status codes using status =
getattr(res, "status", 0) or 0 and if status < 200 or status >= 300 is
unreachable because urllib.request.urlopen raises urllib.error.HTTPError for
4xx/5xx before this code runs; fix by catching urllib.error.HTTPError around the
urlopen call, extract its .code and .read() into the existing res/raw/status
variables (or re-raise a RuntimeError with HTTPError details), so the error path
produces the same RuntimeError message; alternatively remove the dead status
check if you prefer relying solely on the HTTPError exception.
♻️ Duplicate comments (1)
scripts/codacy_issues.py (1)
22-27: Module-levelsys.exit(1)breaks importability and testability.This issue was previously flagged. Moving token validation into a
get_token()function or intomain()would allow the module to be imported for testing without triggering an exit.
🧹 Nitpick comments (5)
scripts/codacy_issues.py (5)
36-70: Missing Google-style docstrings on utility functions.Per coding guidelines, all functions require Google-style docstrings. Functions
get_level_priority,normalize_provider,assert_valid_segment,assert_valid_choice, andencode_segmentlack documentation.Example docstring for get_level_priority
def get_level_priority(level: str | None) -> int | None: """Convert a severity level string to a numeric priority. Args: level: The severity level name (Error, High, Warning, Info). Returns: The priority as an integer (4 for Error, 1 for Info), or None if invalid. """
52-66: Validation helpers callsys.exit(1)directly, reducing testability.Consider raising
ValueErrorinstead and handling the exit inmain(). This allows unit testing these functions without mockingsys.exit.Proposed refactor
def assert_valid_segment(name: str, value: str, pattern: re.Pattern[str]) -> str: + """Validate that a value matches a pattern. + + Raises: + ValueError: If the value is empty or doesn't match the pattern. + """ if (not value) or (pattern.match(value) is None): - print(f"Invalid {name}: {value}", file=sys.stderr) - sys.exit(1) + raise ValueError(f"Invalid {name}: {value}") return value
131-132: Bareexcept Exceptionsilently swallows all errors.While returning
Nonefor git failures is reasonable, catching all exceptions could mask unexpected issues (e.g.,PermissionError,KeyboardInterrupt). Consider catching more specific exceptions.Proposed change
- except Exception: + except (OSError, subprocess.SubprocessError): return None
135-139: Consider using PydanticBaseModelinstead ofdataclass.Per coding guidelines, structured data should use Pydantic
BaseModelinstances. This would provide built-in validation and serialization.Proposed change
-from dataclasses import dataclass +from pydantic import BaseModel -@dataclass -class GitRemoteInfo: - provider: str - org: str - repo: str +class GitRemoteInfo(BaseModel): + """Git remote origin information.""" + provider: str + org: str + repo: str
330-338: Consider using a Pydantic model for the output payload.Per coding guidelines, structured data should return Pydantic
BaseModelinstances. This would provide schema validation and easier serialization.Example model
class CodacyIssuesOutput(BaseModel): """Output payload for Codacy issues.""" scope: str organization: str repository: str pullRequest: str | None minLevel: str total: int issues: list[str]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Around line 233-343: The build_payload() function currently returns a plain
dict; change it to return a Pydantic BaseModel (e.g., BuildPayloadModel)
instance instead: define a Pydantic model representing the payload fields used
in build_payload(), construct and return that model from build_payload(), and
update any callers that expect a dict to accept the model (or call .model_dump()
/ .model_dump_json() for serialization when a JSON string is needed—use
.model_dump_json() per Pydantic v2.12.5). Ensure fetch_json/format_for_ai or
other consumers that relied on dict semantics either access model attributes or
use .model_dump() to retain compatibility. Keep function name build_payload()
and update its type hints to return the new BaseModel type.
In `@src/exstruct/render/__init__.py`:
- Around line 112-210: The docstrings for the new protocol classes and helper
functions (_PageSetupProtocol, _SheetApiProtocol, _iter_sheet_apis,
_build_sheet_export_plan, _extract_print_areas, _split_csv_respecting_quotes)
need to be converted to Google-style docstrings: for each Protocol add a short
summary plus an Attributes section describing fields (e.g., PrintArea on
_PageSetupProtocol; PageSetup and ExportAsFixedFormat on _SheetApiProtocol), and
for each function include a short summary plus Args and Returns sections
describing parameter types and returned values (e.g., wb: xw.Book ->
list[tuple[int,str,_SheetApiProtocol]] for _iter_sheet_apis, and explain the
tuple structure for _build_sheet_export_plan and list[str] for
_split_csv_respecting_quotes/_extract_print_areas); keep wording concise and
consistent with existing project docstring style.
- Around line 245-282: In _export_sheet_pdf: when the COM ExportAsFixedFormat
fallback (the TypeError branch) is used, the previously set page_setup.PrintArea
can prevent ignoring print areas on older Excel; if ignore_print_areas is True
and page_setup is not None, clear page_setup.PrintArea (set to an empty string
or None) before calling the fallback ExportAsFixedFormat(0, str(pdf_path)), then
restore original_print_area in the existing finally block (page_setup.PrintArea
= original_print_area) as currently implemented; ensure this only happens for
the fallback path so the primary call can still pass IgnorePrintAreas when
supported.
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @harumiWeb. * #44 (comment) The following files were modified: * `scripts/codacy_issues.py` * `src/exstruct/__init__.py` * `src/exstruct/core/backends/base.py` * `src/exstruct/core/backends/com_backend.py` * `src/exstruct/core/backends/openpyxl_backend.py` * `src/exstruct/core/cells.py` * `src/exstruct/core/integrate.py` * `src/exstruct/core/pipeline.py` * `src/exstruct/core/workbook.py` * `src/exstruct/engine.py` * `src/exstruct/render/__init__.py` * `tests/backends/test_auto_page_breaks.py` * `tests/backends/test_backends.py` * `tests/backends/test_print_areas_openpyxl.py` * `tests/com/test_render_smoke.py` * `tests/core/test_cells_utils.py` * `tests/core/test_mode_output.py` * `tests/core/test_pipeline.py` * `tests/core/test_pipeline_fallbacks.py` * `tests/engine/test_engine.py` * `tests/models/test_models_export.py` * `tests/render/test_render_init.py` * `tests/utils.py`
📝 Add docstrings to `dev/formula`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Around line 65-75: The docstrings in functions such as normalize_provider
contain blank lines that consist only of whitespace (causing Ruff W293); remove
trailing spaces from those blank lines or delete the blank lines so each
docstring has no whitespace-only lines. Update normalize_provider's docstring
and the other affected docstrings (around the segments referenced, e.g., the
blocks at ~117-136 and ~145-153) to strip trailing spaces on empty lines or
eliminate them entirely so no docstring lines are only whitespace.
In `@src/exstruct/core/backends/base.py`:
- Around line 44-50: The docstring for extract_formulas_map contains a trailing
whitespace-only blank line that triggers Ruff W293; edit the
extract_formulas_map function's docstring to remove the empty whitespace line
(ensure the closing triple quotes follow the last line of text with no extra
spaces or blank line) so the docstring ends cleanly without trailing whitespace.
In `@src/exstruct/core/cells.py`:
- Around line 151-184: The code currently computes col_index as start_col +
c_offset - 1 in extract_sheet_formulas_map_com which yields zero-based column
indices but the function docstring promises Excel 1-based indices; fix by making
the column index 1-based (remove the "- 1" adjustment) so formulas_map stores
(row, column) tuples using Excel 1-based indices (ensure this change is applied
where col_index is assigned and that SheetFormulasMap usage remains consistent);
alternatively, if you prefer zero-based indices, update the docstring to state
zero-based columns—pick one approach and make both the col_index assignment and
the docstring consistent.
In `@src/exstruct/render/__init__.py`:
- Around line 82-90: The docstring that starts "Export each worksheet in the
given Excel workbook to PNG files and return the image paths in workbook order."
contains tab characters in the Returns: and Raises: blocks causing Ruff E101;
open the module-level/function/class docstring in
src/exstruct/render/__init__.py that contains that text and replace the tab
characters with spaces in the indented "Returns:" and "Raises:" lines (and do
the same for the other docstring instance with the same tabbed blocks later in
the file) so all docstring indentation uses spaces only.
- Around line 142-153: The docstring for ExportAsFixedFormat is incorrectly
placed on the same line as the signature causing a parsing/IndentationError;
open the docstring on its own indented line directly below the def line and
indent the triple-quoted string to match the function body, moving the existing
docstring content into that properly formatted block so ExportAsFixedFormat has
a normal multi-line docstring.
In `@tests/backends/test_backends.py`:
- Around line 78-96: Remove the trailing whitespace on the blank line inside the
fake_formulas_map docstring in the
test_openpyxl_backend_extract_formulas_map_returns_none_on_failure test; open
the fake_formulas_map function (used to monkeypatch extract_sheet_formulas_map)
and delete any spaces at the end of the empty docstring line so the docstring
has no trailing whitespace, then run the linter to verify W293 is resolved.
In `@tests/core/test_cells_utils.py`:
- Around line 150-161: The docstring in the method range of the dummy class
contains mixed tabs and spaces and trailing whitespace on blank lines; update
the range(self, _start: object, _end: object) -> _DummyRange docstring to use
consistent spaces for indentation (no tabs) and remove any trailing whitespace
on blank lines so Ruff W293 and E101 are resolved, keeping the same docstring
content and returning _DummyRange() unchanged.
In `@tests/core/test_pipeline.py`:
- Around line 916-928: The docstring in the _step function contains mixed tabs
and spaces causing Ruff E101; fix by replacing the tabs used for indentation
inside the docstring (lines describing Parameters and their types) with
consistent spaces so the docstring uses only spaces and aligns with the
surrounding indentation, ensuring the triple-quoted string and its internal
lines are consistently indented in the _step function definition.
♻️ Duplicate comments (2)
src/exstruct/render/__init__.py (1)
323-356: Fallback ExportAsFixedFormat still honors PrintArea when IgnorePrintAreas isn’t supported.This is the same concern previously raised: if the COM call doesn’t accept
IgnorePrintAreas, the fallback should clearPrintAreawhenignore_print_areas=Trueto avoid exporting only the print area.scripts/codacy_issues.py (1)
499-530: Return a Pydantic model for the payload instead of a dict.
build_payloadproduces structured output; per project guidelines it should return a Pydantic BaseModel and use.model_dump_json()for serialization.As per coding guidelines, return Pydantic BaseModel instances for structured data.
🧹 Nitpick comments (6)
src/exstruct/render/__init__.py (2)
116-128: Consider reusing the existing sheet-name sanitizer to avoid drift.There’s already a
_sanitize_sheet_filenameinsrc/exstruct/io/__init__.py. Reusing it (or moving to a shared util) would prevent future divergence.
206-267: Avoid duplicate CSV-splitting logic; centralize helper.
_split_csv_respecting_quotesalready exists insrc/exstruct/core/backends/com_backend.py. Consider a shared utility to avoid subtle parsing drift over time.src/exstruct/core/backends/openpyxl_backend.py (1)
119-132: Minor docstring style inconsistency.The docstring uses
Parameters:style while other methods in this file useArgs:style (e.g.,extract_cells,extract_colors_map). Consider aligning with the existing convention for consistency.Suggested fix
def detect_tables(self, sheet_name: str) -> list[str]: """ Detects table candidate ranges within the specified worksheet. - - Parameters: - sheet_name (str): Name of the worksheet to analyze for table candidates. - + + Args: + sheet_name: Name of the worksheet to analyze for table candidates. + Returns: - list[str]: Detected table candidate ranges as A1-style range strings; empty list if none are found or detection fails. + Detected table candidate ranges as A1-style range strings; + empty list if none are found or detection fails. """src/exstruct/core/workbook.py (2)
15-16: Consider removing private function from__all__.Exporting
_find_open_workbookin__all__exposes an internal helper as part of the public API. By Python convention, the leading underscore indicates it's private. If external consumers need this functionality, consider renaming it without the underscore or keeping it private.Suggested fix
-__all__ = ["openpyxl_workbook", "xlwings_workbook", "_find_open_workbook", "xw"] +__all__ = ["openpyxl_workbook", "xlwings_workbook", "xw"]
22-32: Docstring style inconsistency withxlwings_workbook.This docstring uses
Parameters:style whilexlwings_workbook(line 64-71) usesArgs:style. Consider aligning for consistency within the module.src/exstruct/core/pipeline.py (1)
624-642: Incomplete docstring: missing parameter documentation.The docstring's Parameters section only documents
workbookbut omitsinputsandartifacts. Per coding guidelines, Google-style docstrings should document all parameters.📝 Suggested docstring update
def step_extract_formulas_map_com( inputs: ExtractionInputs, artifacts: ExtractionArtifacts, workbook: xw.Book ) -> None: """ Extract the workbook's formulas map via COM and store it into the artifacts. On success assigns the extracted WorkbookFormulasMap to artifacts.formulas_map_data. On failure leaves artifacts.formulas_map_data unchanged and logs a warning. Parameters: + inputs (ExtractionInputs): Pipeline inputs (unused but required by step signature). + artifacts (ExtractionArtifacts): Mutable container to receive the extracted formulas map. workbook (xlwings.Book): COM workbook to extract formulas from. """
…ultiple modules - Updated docstrings in `src/exstruct/render/__init__.py` to improve formatting and clarity. - Enhanced comments in test files to provide better context and consistency in style. - Removed unnecessary whitespace in various docstrings and comments for cleaner code. - Ensured all docstrings follow a consistent format, including parameter and return descriptions. - Improved readability of test cases by clarifying the purpose and expected outcomes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@scripts/codacy_issues.py`:
- Around line 299-393: Replace the untyped dicts with Pydantic models: define
models (e.g., PatternInfo, Issue, and IssuesResponse or PaginatedIssues) that
include fields used by format_for_ai (filePath, lineNumber, message, and nested
patternInfo properties such as id/name/metadata), then update fetch_json to
return a parsed Pydantic model (or make it generic to return raw dict only
internally), change fetch_repo_issues to return IssuesResponse (call fetch_json
and pass the result into IssuesResponse.parse_obj/parse_raw) and change
fetch_pr_issues similarly; ensure function signatures and return types reflect
the concrete models and that JSON decoding errors still raise RuntimeError as
before.
In `@src/exstruct/render/__init__.py`:
- Around line 352-364: The fallback currently only clears PrintArea when
page_setup is None, which misses cases where print_area was set earlier; update
the except TypeError block so that if ignore_print_areas is True you obtain
page_setup = page_setup or getattr(sheet_api, "PageSetup", None) and then clear
page_setup.PrintArea = "" when page_setup is not None (inside the existing
try/except), before calling sheet_api.ExportAsFixedFormat(0, str(pdf_path));
ensure you still catch and log exceptions via logger.debug as currently done.
♻️ Duplicate comments (3)
scripts/codacy_issues.py (1)
499-592: Build payload as a Pydantic BaseModel (not a dict).
build_payloadreturns a dict, but structured outputs should be BaseModels. Update the payload to a Pydantic model and serialize withmodel_dump_json()inmain(). Based on learnings and coding guidelines, please align payload construction with BaseModel usage.♻️ Proposed fix
@@ -import urllib.parse -import urllib.request +import urllib.parse +import urllib.request + +from pydantic import BaseModel @@ +class CodacyPayload(BaseModel): + scope: str + organization: str + repository: str + pullRequest: str | None + minLevel: str + total: int + issues: list[str] + def build_payload( @@ -) -> dict[str, object]: +) -> CodacyPayload: @@ - return { - "scope": "pull_request" if pr else "repository", - "organization": org, - "repository": repo, - "pullRequest": pr if pr else None, - "minLevel": min_level, - "total": len(issues), - "issues": issues, - } + return CodacyPayload( + scope="pull_request" if pr else "repository", + organization=org, + repository=repo, + pullRequest=pr if pr else None, + minLevel=min_level, + total=len(issues), + issues=issues, + ) @@ - sys.stdout.write(json.dumps(payload, ensure_ascii=False, indent=2) + "\n") + sys.stdout.write(payload.model_dump_json(ensure_ascii=False, indent=2) + "\n")src/exstruct/core/cells.py (1)
151-183: Docstring still mismatches column index base.
The COM path computes zero-based columns (start_col + c_offset - 1) but the docstring says 1-based. Please align the docstring (or the calculation) to avoid misleading API users.src/exstruct/render/__init__.py (1)
82-90: Align new render docstrings with Google-style sections.Several new docstrings (e.g., Line 82, Line 117, Line 131, Line 156, Line 206, Line 270, Line 330, Line 373, Line 400, Line 480, Line 509, Line 535) use
Parameters:blocks and return variable names. Please switch to Google-styleArgs/Returns, and addAttributesfor the Protocols (Line 131–153). As per coding guidelines, please keep docstrings consistent.Also applies to: 116-127, 131-153, 156-242, 270-310, 330-385, 400-413, 480-488, 515-542
🧹 Nitpick comments (7)
scripts/codacy_issues.py (1)
42-114: Switch docstrings to Google style (Args/Returns/Raises).Several functions in this block (and elsewhere in this file) use “Parameters/Returns” sections. Please update to Google-style consistently. As per coding guidelines, please align docstrings across the file.
src/exstruct/core/workbook.py (1)
15-16: Consider reviewing the__all__export list contents.The
__all__list includes_find_open_workbook(a private function by naming convention) andxw(the xlwings module itself). Typically,__all__contains only public API symbols defined by the module. Including a private function suggests it may be needed for testing—if so, consider either renaming it to remove the underscore or documenting why it's exported. Re-exportingxwmay cause confusion about whether this module is the canonical import location for xlwings.src/exstruct/render/__init__.py (3)
116-128: Avoid duplicating sheet-name sanitization logic.Line 116 introduces
_sanitize_sheet_filename, which duplicates the helper already insrc/exstruct/io/__init__.py. Reusing the shared utility (or moving it into a shared module) will prevent drift. As per coding guidelines, prefer DRY helpers.
230-267: Reuse the existing_split_csv_respecting_quoteshelper.Line 230 duplicates the implementation in
src/exstruct/core/backends/com_backend.py. Centralizing this logic will prevent subtle parsing divergence. As per coding guidelines, avoid duplicate implementations.
156-203: Prefer a Pydantic model for sheet export plan entries.Lines 156–203 return tuples for structured plan entries. The guidelines prefer Pydantic BaseModel instances for structured data to reduce tuple-index errors and improve typing. Consider introducing a small model (e.g.,
SheetExportPlanItem) for(index, name, api, print_area). As per coding guidelines, prefer BaseModel return types for structured data.tests/core/test_pipeline.py (1)
225-233: Align helper docstrings to Google style where present.Several nested helpers/classes (e.g., Line 225, Line 403, Line 447, Line 503, Line 544, Line 597, Line 633, Line 701, Line 739, Line 788, Line 821, Line 861, Line 916, Line 955, Line 991, Line 1004, Line 1015, Line 1026) use
Parameters:blocks. Please switch to Google-styleArgs/Returnsto match repo standards. As per coding guidelines, keep docstrings consistent.Also applies to: 403-413, 447-472, 503-508, 544-552, 597-602, 633-638, 701-710, 739-748, 788-794, 821-826, 861-869, 916-926, 955-985, 991-999, 1004-1013, 1015-1023, 1026-1051
src/exstruct/core/pipeline.py (1)
275-283: Use Google-style Args/Returns in updated docstrings.New/updated docstrings (e.g., Line 275, Line 501, Line 515, Line 610, Line 624, Line 671, Line 692, Line 713, Line 833, Line 881, Line 962) use
Parameters:blocks. Please convert to Google-styleArgs/Returnsand ensure all parameters are documented. As per coding guidelines, keep docstrings consistent.Also applies to: 501-510, 515-526, 610-620, 624-635, 671-683, 692-704, 713-729, 833-853, 881-889, 962-972
…ックエンドがエンドツーエンドで `formulas_map` を伝播。印刷エリアのエクスポートの堅牢性を向上。
#34
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.