Skip to content

Implement Multimodal Request Support for Gemini API #2

@simon-stratagile

Description

@simon-stratagile

Objective:
To enhance the ChatAIze.GenerativeCS library by enabling multimodal interactions with the Google Gemini API. This will allow users to send requests that combine text prompts with various file types (such as PDF, DOC, TXT, images, audio, and video), leveraging the full multimodal capabilities of Gemini.

Summary of Requirements:

  1. File Upload Functionality for Gemini:

    • Implement a dedicated service within the Gemini provider that allows users to upload supported file types.
    • This service should manage the communication with the Gemini Files API for uploading files and return a reference (e.g., a URI) that can be used in subsequent API calls.
  2. Integration with Chat Messages:

    • Update the library's chat message structure (ChatMessage and related components like IChatContentPart) to robustly support references to uploaded files.
    • Ensure the Gemini chat completion logic (ChatCompletion.cs) can correctly serialize these file references as part of the multimodal request to the Gemini API.
  3. User-Friendly Client Access:

    • Provide convenient methods, likely within GeminiClient.cs, for users to:
      • Easily upload files.
      • Easily create chat messages that include both text and references to one or more uploaded files.
    • Update relevant DI (Dependency Injection) extensions to ensure any new services are properly registered and accessible.
  4. Documentation:

    • Update README.md and any other relevant documentation to clearly explain how to use the new multimodal features, including code examples for file uploading and sending multimodal prompts.
  5. Adherence to Existing Standards:

    • The implementation should align with the established coding patterns, architectural style, and naming conventions already present in the ChatAIze.GenerativeCS library.
    • Ensure all existing functionalities remain intact and the overall code quality is maintained.

Expected Outcome:
Users of the ChatAIze.GenerativeCS library will be able to seamlessly utilize Gemini's multimodal features. This will enable richer interactions, such as analyzing documents, describing images, or processing audio/video content in conjunction with text prompts, directly through the library.

Reference:
This feature aligns with the capabilities described in the official Gemini API documentation for file handling: https://ai.google.dev/gemini-api/docs/files

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions