-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Objective:
To enhance the ChatAIze.GenerativeCS library by enabling multimodal interactions with the Google Gemini API. This will allow users to send requests that combine text prompts with various file types (such as PDF, DOC, TXT, images, audio, and video), leveraging the full multimodal capabilities of Gemini.
Summary of Requirements:
-
File Upload Functionality for Gemini:
- Implement a dedicated service within the Gemini provider that allows users to upload supported file types.
- This service should manage the communication with the Gemini Files API for uploading files and return a reference (e.g., a URI) that can be used in subsequent API calls.
-
Integration with Chat Messages:
- Update the library's chat message structure (
ChatMessageand related components likeIChatContentPart) to robustly support references to uploaded files. - Ensure the Gemini chat completion logic (
ChatCompletion.cs) can correctly serialize these file references as part of the multimodal request to the Gemini API.
- Update the library's chat message structure (
-
User-Friendly Client Access:
- Provide convenient methods, likely within
GeminiClient.cs, for users to:- Easily upload files.
- Easily create chat messages that include both text and references to one or more uploaded files.
- Update relevant DI (Dependency Injection) extensions to ensure any new services are properly registered and accessible.
- Provide convenient methods, likely within
-
Documentation:
- Update
README.mdand any other relevant documentation to clearly explain how to use the new multimodal features, including code examples for file uploading and sending multimodal prompts.
- Update
-
Adherence to Existing Standards:
- The implementation should align with the established coding patterns, architectural style, and naming conventions already present in the
ChatAIze.GenerativeCSlibrary. - Ensure all existing functionalities remain intact and the overall code quality is maintained.
- The implementation should align with the established coding patterns, architectural style, and naming conventions already present in the
Expected Outcome:
Users of the ChatAIze.GenerativeCS library will be able to seamlessly utilize Gemini's multimodal features. This will enable richer interactions, such as analyzing documents, describing images, or processing audio/video content in conjunction with text prompts, directly through the library.
Reference:
This feature aligns with the capabilities described in the official Gemini API documentation for file handling: https://ai.google.dev/gemini-api/docs/files