Skip to content
This repository was archived by the owner on Dec 15, 2025. It is now read-only.

fix async extract table result#84

Merged
boqiny merged 1 commit intomainfrom
0419
Apr 20, 2025
Merged

fix async extract table result#84
boqiny merged 1 commit intomainfrom
0419

Conversation

@boqiny
Copy link
Contributor

@boqiny boqiny commented Apr 20, 2025

User description

Description

Previously:
image

After fix:
image

Related Issue

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Performance improvement

How Has This Been Tested?

Screenshots (if applicable)

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes


PR Type

Bug fix


Description

  • Support result key in async parser

  • Bump package version to 0.0.23

  • Move Levenshtein to optional dependencies

  • Update example notebook usage environment


Changes walkthrough 📝

Relevant files
Configuration changes
__init__.py
Version bump to 0.0.23                                                                     

any_parser/init.py

  • Bumped __version__ from 0.0.22 to 0.0.23
+1/-1     
pyproject.toml
Bump version and update dependencies                                         

pyproject.toml

  • Bumped version to 0.0.23
  • Moved Levenshtein under optional dependencies
  • +7/-5     
    Bug fix
    async_parser.py
    Add `result` key support in parser                                             

    any_parser/async_parser.py

  • Added fallback for "result" in JSON response
  • Returns json_response["result"] if present
  • +2/-0     
    Documentation
    async_extract_tables.ipynb
    Update async extract tables example                                           

    examples/async_extract_tables.ipynb

  • Updated execution counts and outputs
  • Added import os and env var API key
  • Changed sample file path to sample.pdf
  • Adjusted Markdown display index
  • +92/-20 

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • Copilot AI review requested due to automatic review settings April 20, 2025 05:27
    Copy link

    Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Pull Request Overview

    This PR fixes issues related to asynchronous extraction of table results by updating version information, reorganizing dependency groups, and improving example usage in the notebook.

    • Bump project version from 0.0.22 to 0.0.23 and adjust dependency grouping in pyproject.toml.
    • Update the asynchronous extraction example to use an environment variable for the API key and adjust execution/order of notebook cells.
    • Enhance the async parser by returning the "result" key if present in the JSON response.

    Reviewed Changes

    Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

    File Description
    pyproject.toml Bumped version and moved Levenshtein dependency to optional.
    examples/async_extract_tables.ipynb Updated API key initialization, file path, and markdown display.
    any_parser/async_parser.py Added support for returning "result" from the JSON response.
    any_parser/init.py Updated version information.

    ],
    "source": [
    "display(Markdown(markdown_output))"
    "display(Markdown(markdown_output[4]))"
    Copy link

    Copilot AI Apr 20, 2025

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Ensure that 'markdown_output' is a list with a valid index 4 in all cases; otherwise, this may result in an IndexError at runtime.

    Suggested change
    "display(Markdown(markdown_output[4]))"
    "if len(markdown_output) > 4:\n",
    " display(Markdown(markdown_output[4]))\n",
    "else:\n",
    " print('Error: markdown_output does not have enough elements to access index 4.')"

    Copilot uses AI. Check for mistakes.
    @github-actions
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 1 🔵⚪⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Return Type

    The new branch for "result" assumes json_response["result"] is always a string. Validate the type and encoding of this field to avoid type errors or unexpected data formats.

    if "result" in json_response:
        return json_response["result"]
    Optional Dependencies

    Confirm the use of [tool.poetry.group.optional.dependencies] correctly marks Levenshtein as optional and aligns with installation workflows or consider using extras for optional features.

    [tool.poetry.group.optional.dependencies]
    Levenshtein = [
        { version = "0.25.1", python = "<3.9" },
        { version = "0.26.0", python = ">=3.9" }
    ]
    
    Environment Setup

    The notebook now references CAMBIO_API_KEY without guidance. Ensure that setup instructions or environment variable documentation are provided for reproducible demos.

      "import os"
     ]
    },
    {
     "cell_type": "code",
     "execution_count": 3,
     "metadata": {},
     "outputs": [],
     "source": [
      "ap = AnyParser(api_key=os.getenv(\"CAMBIO_API_KEY\"))"
    

    @github-actions
    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Convert list results to string

    Handle cases where result is a list by joining elements into a single string.

    any_parser/async_parser.py [30-31]

     if "result" in json_response:
    -    return json_response["result"]
    +    result = json_response["result"]
    +    if isinstance(result, list):
    +        return "\n".join(result)
    +    return result
    Suggestion importance[1-10]: 7

    __

    Why: Ensures the post-processor returns a string even when result is a list and aligns with the method signature.

    Medium
    General
    Display all parsed tables

    Iterate over all items in markdown_output instead of using a fixed index to avoid
    out‑of‑range errors.

    examples/async_extract_tables.ipynb [165]

    -display(Markdown(markdown_output[4]))
    +for table in markdown_output:
    +    display(Markdown(table))
    Suggestion importance[1-10]: 5

    __

    Why: Iterating over the entire markdown_output avoids fixed-index errors and ensures all tables are shown.

    Low
    Ensure API key is provided

    Validate that the CAMBIO_API_KEY environment variable is set before passing it to
    AnyParser.

    examples/async_extract_tables.ipynb [31]

    -ap = AnyParser(api_key=os.getenv("CAMBIO_API_KEY"))
    +key = os.getenv("CAMBIO_API_KEY")
    +if not key:
    +    raise EnvironmentError("CAMBIO_API_KEY not set")
    +ap = AnyParser(api_key=key)
    Suggestion importance[1-10]: 4

    __

    Why: Validates that the environment variable is set before use, preventing silent failures in the example notebook.

    Low

    Copy link
    Member

    @goldmermaid goldmermaid left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM

    @boqiny boqiny merged commit feeb5a3 into main Apr 20, 2025
    2 checks passed
    Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants