This project implements a Retrieval-Augmented Generation (RAG) chatbot system enhanced with dialog summarization using a fine-tuned BART model. The summarization improves response efficiency and context handling in customer-service interactions. The system supports document uploads for Q&A, enhanced retrieval methods, and performance comparison with/without dialog summaries.
ETL_DWH_AWS_Project
βββ chatbot
β βββ chatbotservice/ # Contains backend service logic for the chatbot
β β βββ __init__.py # Package initialization
β β βββ chatbot_service.py # Core chatbot service
β β βββ chatbotlogsummary_service.py # Service to summarize chatbot logs
β β βββ chatlog_service.py # Service for chat log operations
β β βββ texretriever_service.py # Text retrieval services
β β βββ vectorHandlingService.py # Services for vector operations
β βββ controller/ # Handles API requests and routing
β β βββ __init__.py # Package initialization
β β βββ chatbotcontroller.py # Main controller for chatbot APIs
β βββ frontend/ # Frontend for the chatbot (Streamlit UI)
β β βββ __init__.py # Package initialization
β β βββ chatbotui.py # Streamlit-based chatbot UI
β βββ model/ # Data models for the chatbot system
β β βββ __init__.py # Package initialization
β β βββ chatlog.py # Model for chat log data
β β βββ chatmessage.py # Model for individual chat messages
β β βββ chatsummarizationrequest.py # Model for summarization requests
β β βββ chatsummarysession.py # Model for chat session summary
β β βββ queryresponse.py # Model for query responses
β βββ utils/ # Utility functions for various operations
β β βββ __init__.py # Package initialization
β β βββ database_utils.py # Database operation utilities
β β βββ embedding_utils.py # Embedding handling utilities
β β βββ llmmodel_utils.py # Language model utilities
β β βββ semanticembedding_utils.py # Semantic embedding-related utilities
β β βββ weightrepriorcalc_utils.py # Utility for weighted recalculations
β βββ .env # Environment variables configuration file
β βββ __init__.py # Package initialization
β βββ main.py # Entry point for backend (Uvicorn with FastAPI)
βββ dataset # Datasets for the project
βββ docs # Project documentation
βββ image # Images and assets for the project
βββ model_trainning # Model training scripts and related files for the BART-based
βββ reference # Reference materials and documents
To set up the environment for running this project, follow these steps:
Ensure you have the following installed on your system:
- Python 3.10 or higher
- pip (Python package manager)
- Git (optional, for cloning the repository)
- Docker (optional, for containerized setup)
-
Clone the Repository
Clone this repository to your local machine:
git clone git@github.com:husthunterpy01/Dialog-Summarization-System.git cd Dialog-Summarization-SystemBefore continuing, here are some notification about setting up .env file inside chatbot folder:
# MongoDb Atlas configuration MONGO_URI=xxxxxxxxxxxxxxxxxxxxxxxxx VECTOR_DB=xxxxxxxxxxxx VECTOR_DOCUMENT=xxxxxxxxxxxxxxx VECTOR_CONVERSATION_DOCUMENT=xxxxxxxxxxxx # LLM model Configuration LLM_MODEL = "./Dialog-Summarization-System/LLM_Model/granite-3.1-3b-a800m-instruct-Q6_K.gguf" FINE_TUNE_MODEL = "./Dialog-Summarization-System/finetunedmodel/fine-tuned-model/checkpoint-1100" # FastAPI endpoint CHAT_ENDPOINT = http://127.0.0.1:8000/- Create a LLM_Model and fine-tuned-model
- For the LLM_Model, visit hugging face and download the .gguf model Granite model
- For FINE_TUNE_MODEL, please download the checkpoint from BART_SamSUM_TweetSUM into the folder and points to the checkpoint you want
- For MONGO_URI configuration, please follow this tutorial mongodburi_video
- For other configurations, you can name the database, document as you want
-
Application setup with Docker (You can skip step 3 if conducting by this tep) If you prefer running with Docker, please execute this one:
docker-compose up --build
As you run the application will pop up in the website for usage. For further testing with application, after running the command, you can access the service as followed:
Backend (FastAPI): http://localhost:8000 Frontend (Streamlit): http://localhost:8501 -
Application setup without Docker In terms where there exists issues with Docker file, you can still setup this project as followed: Setup the virtual environment:
python -m venv venv source venv/bin/activate # For Linux/MacOS venv\Scripts\activate # For Windows
Activates the backend webserver
cd chatbot uvicorn main:app --host 0.0.0.0 --port 8000Activates the frontend:
cd chatbot/frontend streamlit run chatbotui.py --server.port 8501 --server.address 0.0.0.0As you run the application will pop up in the website for usage. For further testing with application, after running the command, you can access the service as followed:
Backend (FastAPI): http://localhost:8000 Frontend (Streamlit): http://localhost:8501
Dataset
In this project, I will conduct on a 2 public dataset called SamSUM(2019) and TweetSUM(2021), in which the 1st will be used for pre-trained and the last one is used for fine-tune purpose. I have already uploaded 2 datasets to this repos. If you are interested in the original dataset, please see the link below each type of dataset.- SamSUM dataset:
SamSUM is a dataset with the format of messenger-like conversations with summaries, with style and register are diversified.
Dataset link: Dataset/SamSUM . For the orignal one, please visit this site SamSUM - TweetSUM dataset:
TweetSUM is a dataset focused on summarization of dialogs, which represents the rich domain of Twitter customer care conversations
Dataset link: Dataset/TweetSUM . For the orignal one, please visit this site TweetSUM
Both the dataset will be pre-processed by this script before being fine-tuned by BART-based:

Pre-training with BART-base
BART-based will first be pre-trained with SamSUM dataset in order to have a better understaanding in general chat format, by the following configuration:After the trainning here are some results in terms of ROUGE score for the pre-trained BART-based:
Final ROUGE score:
Fine-tuning BART-SamSUM
After pre-trainning with BART-based, it will be fine-tuned with TweetSUM for customer-service summary understanding :After the trainning here are some results in terms of ROUGE score for the fine-tuned BART-based:
Final ROUGE score:
For the fine-tuned checkpoint, I have already uploaded on huggingface, please visit this site to get the model: BART_SamSUM_TweetSUM
For the full demo, please download the video at demo_video or visit this site Dialog Summarization System Demo
In this demo, I use the Iphonne User Guide as the document for this RAG chatbot, referring to Customer-Service-Handbook-English.pdf. I have already uploaded some other documents on the Docs Folder for testing, or you can also use other types of documents to test with this chatbot. The document uploaded will be saved in the vector database, here is a screenshot of a document I have uploaded:
As the base RAG architecture does not work well for document retrival in some cases, I have implemented some methods to improve the retrival performance
Hybrid search
Hybrid search will optimize the strength of both vector-search (contextual search) and key-word search, which is useful in some cases when you need to search for keyword or name of a person that can't be handled properly in terms of single vector searchSemantic chunking
Instead of fixed chunking at a fixed size, using semantic chunking helps user to seperate the chunk into meaningful chunks, which is conducive for later content retrivalThis chatbot is built for the customer serivce purpose Q&A for a larger sytstem. Before getting started, this chatbot is built based on the granite LLM by IBM, for more infomration please visit the site to download the model Granite_LLM. In here, I use the Granite version: Granite-3.1-3b-a800m-instruct-Q6_K.gguf as the LLM model due to the fact that my P.C only has CPU for execution, and you can find that the chat response in the demo video is a little bit slow due to this fact. Here is some screenshot on this RAG chatbot system:
- Main screen:
- Chat dialog sample:
This chatbot allows users to ask question based on document uploaded via the system and some possible knowledge acquired from the LLM model. In the chat, the dialog can be summarized with the use of the BART-SamTweetSUM and saved every time we summarize the data, as well as the whole chatlog session:
Reminded that the chat conversation will be saved as the context for the user prompt after each summary. For the 1st time without summary, the whole chat session will be loaded as context.
My system also use the latest summary as the context for the user's prompt so as to minimize the time response of the bot to user by reducing the input tokens. To help you understand the impact of having summary as context, I have developed a function to compare the response of the chatbot in terms of using latest summary as context and not using it as context
Here is a sample question to compare:
Please enable the comparison so as for the system to calculate the execution time and the input/output token for the response. After the execution, here is the result for the above response:
As you can see by adding summary as context, the response can be much faster compared to non-summary as context
- The demo shows how summary can work well with short dialog, but still needs improvement to cover more contents from the prev dialog
- Needs improvement in terms of speed
- Should also takes care of the case of hallucination
- Will later work on web/app development integrated to provide a more friendly chat interface and oriented to the purpose of the application.



















