Skip to content

Enhance begin_analyze_binary to support in-memory byte input (file path + bytes support)#134

Open
deril2605 wants to merge 2 commits intoAzure-Samples:mainfrom
deril2605:patch-1
Open

Enhance begin_analyze_binary to support in-memory byte input (file path + bytes support)#134
deril2605 wants to merge 2 commits intoAzure-Samples:mainfrom
deril2605:patch-1

Conversation

@deril2605
Copy link

Enhance begin_analyze_binary to support in-memory byte input (file path + bytes support)

Purpose

  • Adds support for analyzing in-memory binary data (bytes) in begin_analyze_binary.
  • Maintains backward compatibility with existing file path–based usage.
  • Enables scenarios where files are already loaded in memory (e.g., blob storage downloads, web uploads, API pipelines).
  • Improves flexibility for sample integrations without requiring temporary file persistence.

The method now accepts either:

  • file_location (existing behavior), or
  • data (in-memory bytes)

Exactly one must be provided.


Does this introduce a breaking change?

[ ] Yes
[x] No

This change is backward compatible. Existing file path usage continues to work unchanged.


Pull Request Type

What kind of change does this Pull Request introduce?

[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

How to Test

Get the code

git clone [repo-address]
cd [repo-name]
git checkout [branch-name]

Test existing behavior (file path)

response = client.begin_analyze_binary(
    analyzer_id="your-analyzer-id",
    file_location="path/to/sample.pdf"
)

Expected:

  • Request succeeds
  • No behavioral change from previous implementation

Test new behavior (in-memory bytes)

with open("path/to/sample.pdf", "rb") as f:
    pdf_bytes = f.read()

response = client.begin_analyze_binary(
    analyzer_id="your-analyzer-id",
    data=pdf_bytes
)

Expected:

  • Request succeeds
  • Same analysis result as file-based approach

Negative test cases

1. Provide both arguments (should fail)

client.begin_analyze_binary(
    analyzer_id="id",
    file_location="file.pdf",
    data=b"123"
)

→ Raises ValueError

2. Provide neither argument (should fail)

client.begin_analyze_binary(analyzer_id="id")

→ Raises ValueError

3. Invalid file path (should fail)

client.begin_analyze_binary(
    analyzer_id="id",
    file_location="nonexistent.pdf"
)

→ Raises ValueError


What to Check

Verify that the following are valid:

  • Existing file-based usage remains unchanged.
  • In-memory bytes input works correctly.
  • Only one input source is accepted.
  • Error handling behaves as documented.
  • No breaking API changes introduced.
  • Code follows repository style and structure.

Other Information

This change supports common production workflows where documents are retrieved from:

  • Azure Blob Storage
  • Web uploads
  • In-memory streams
  • API gateways

It avoids the need for writing temporary files before analysis and aligns with modern server-side processing patterns.

This is a small, self-contained enhancement and does not modify API contracts beyond input flexibility.

Refactor begin_analyze_binary to accept in-memory data or file path. Improve error handling for input parameters.
@deril2605 deril2605 changed the title Patch 1 Enhance begin_analyze_binary to support in-memory byte input (file path + bytes support) Feb 11, 2026
@deril2605
Copy link
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant