This Streamlit application allows users to upload PDF files and ask questions based on their content. The application utilizes the Google Generative AI API for text embedding and question answering.
- Python 3.x
- Required Python packages: streamlit, PyPDF2, langchain, google.generativeai, dotenv
- Google API Key (with access to the Generative AI API)
- Clone the repository or download the source code.
- Install the required Python packages by running the following command:
pip install -r requirements.txt - Create a
.envfile in the project directory and add your Google API Key:
GOOGLE_API_KEY=your_google_api_key
- Run the Streamlit application by executing the following command:
streamlit run app.py - The application will open in your default web browser.
- Upload one or more PDF files by clicking the "Upload your PDF Files and Click on the Submit & Process Button" button in the sidebar.
- Click the "Submit & Process" button to process the uploaded PDF files.
- Once the processing is complete, you can enter your question in the text input field and press Enter.
- The application will search for relevant information in the PDF files and provide an answer based on the context.
get_pdf_text(pdf_docs): Extracts text from the provided PDF files.get_text_chunks(text): Splits the text into smaller chunks for efficient embedding and retrieval.get_vector_store(text_chunks): Creates a vector store (FAISS index) from the text chunks using Google Generative AI Embeddings.get_conversational_chain(): Initializes the conversational chain for question answering using the Google Generative AI Chat model.user_input(user_question): Processes the user's question, retrieves relevant documents from the vector store, and generates an answer using the conversational chain.main(): The main function that sets up the Streamlit application and handles user interactions.
- Streamlit: A Python library for building interactive web applications.
- PyPDF2: A pure Python library for reading and writing PDF files.
- LangChain: A framework for building applications with large language models.
- Google Generative AI: Google's Generative AI API for text embedding and generation.
- FAISS: A library for efficient similarity search and clustering of dense vectors.
- python-dotenv: A Python library for reading key-value pairs from a
.envfile.
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.
For any questions or inquiries, please contact Aritro Saha.