Skip to content

New SkVisionStrictFormatCleanPdfDocument Connector#150

Merged
vmunoz96 merged 4 commits intoEncamina:mainfrom
MarioRamosEs:@mramos/SkVisionStrictFormatCleanPdfDocumentConnector
Jul 3, 2025
Merged

New SkVisionStrictFormatCleanPdfDocument Connector#150
vmunoz96 merged 4 commits intoEncamina:mainfrom
MarioRamosEs:@mramos/SkVisionStrictFormatCleanPdfDocumentConnector

Conversation

@MarioRamosEs
Copy link
Contributor

  • Added a new class: SkVisionStrictFormatCleanPdfDocumentConnector, which extends StrictFormatCleanPdfDocumentConnector:
    • Combines strict-format PDF text extraction with image analysis and interpretation using the vision model.
    • Detects embedded images in PDF files, processes them using vision, and appends their descriptions as additional text blocks.
  • Introduced a new virtual extension point ProcessPageExtensions(Page page) in StrictFormatCleanPdfDocumentConnector, allowing derived classes to append custom content blocks to the extracted page content.
  • Added a new sample project: Encamina.Enmarcha.Samples.SemanticKernel.DocumentContentExtractor, demonstrating how to extract content from documents using Semantic Kernel connectors and vision capabilities.
    • Interactive console example allowing content extraction from .docx, .pptx, .txt, .md, .jpg, .jpeg, .png, and .pdf files.
  • Introduced the SkVisionImageExtractor class, which centralizes image processing logic using vision models (via Semantic Kernel). This enhances code reuse and promotes separation of concerns.
  • Refactored SkVisionImageDocumentConnector to inherit from SkVisionImageExtractor, simplifying the implementation and removing redundant logic.

EXT97937
EXT97937 previously approved these changes Apr 8, 2025
@adiazcan adiazcan requested a review from Copilot April 10, 2025 10:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 9 out of 13 changed files in this pull request and generated 3 comments.

Files not reviewed (4)
  • Directory.Build.props: Language not supported
  • Enmarcha.sln: Language not supported
  • samples/SemanticKernel/Encamina.Enmarcha.Samples.SemanticKernel.DocumentContentExtractor/Encamina.Enmarcha.Samples.SemanticKernel.DocumentContentExtractor.csproj: Language not supported
  • samples/SemanticKernel/Encamina.Enmarcha.Samples.SemanticKernel.DocumentContentExtractor/appsettings.json: Language not supported

@albertodiazencamina albertodiazencamina requested review from EXT97937 and removed request for EXT97937 April 10, 2025 11:05
@albertodiazencamina albertodiazencamina dismissed EXT97937’s stale review April 10, 2025 11:06

He's not a valid reviewer

Copy link
Member

@rliberoff rliberoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vmunoz96 vmunoz96 merged commit 970ad6e into Encamina:main Jul 3, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants