feature(api): diarized transcription api #4

Stell0 · 2025-09-19T07:28:04Z

…aticc translations

…ions

…epgram

Copilot

Pull request overview

This PR adds a diarized transcription API feature that enables persistence of transcriptions with vector embeddings to Postgres/pgvector, along with optional AI-powered cleaning, summarization, and sentiment analysis.

Added Postgres/pgvector integration for storing transcriptions and embeddings
Implemented AI enrichment pipeline with multi-chunk processing for long transcriptions
Enhanced REST API to support persistence, diarization, and AI processing parameters

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
db.py	New database layer implementing Postgres schema management, transcript persistence, and vector embedding storage
call_processor.py	New subprocess worker for AI enrichment processing (cleaning, summarization, sentiment)
tests/test_db.py	Comprehensive test suite for database operations including schema initialization, validation, and embedding replacement
api.py	Enhanced transcription endpoint with persistence, AI processing, channel naming, and improved error handling
ai.py	Refactored AI processing with chunking support, multi-stage pipeline (clean → summarize → sentiment), and detailed logging
tests/test_api.py	Updated API tests to validate persistence, multichannel diarization, timeout handling, and error cases
tests/conftest.py	Updated mock response to include paragraphs transcript structure for diarization support
requirements.txt	Added dependencies: langchain-text-splitters, pgvector, psycopg[binary]
deepgram_connector.py	Removed deprecated AI summary processing code
README.md	Added documentation for Postgres configuration, REST API parameters, and usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

api.py

Copilot · 2025-12-23T09:40:29Z

db.py

+        raise RuntimeError(f"Postgres vectorstore is not configured; missing env vars: {', '.join(missing)}")
+
+    # psycopg conninfo format
+    return f"host={host} port={port} user={user} password={password} dbname={dbname}"


Potential security issue: database password is included in the connection string without escaping. If the password contains special characters (spaces, quotes, etc.), the conninfo string could be malformed or allow injection attacks. Consider using psycopg's dictionary-style connection parameters or the make_conninfo() function which properly escapes values, rather than string formatting.

Copilot · 2025-12-23T09:40:29Z

tests/test_api.py

+    def test_missing_uniqueid(self, client, valid_wav_content):
+        """Test that missing uniqueid is rejected when persistence is requested."""
+        response = client.post(
+            "/api/get_transcription",
+            files={"file": ("test.wav", valid_wav_content, "audio/wav")},
+            data={"persist": "true"},
+        )
+
+        assert response.status_code == 400
+        assert "uniqueid" in response.json()["detail"]


The test doesn't mock db.is_configured() to return False, so the test depends on the actual environment configuration. The test should explicitly control whether the database is considered configured to avoid flaky tests depending on environment state. Consider adding 'patch("api.db.is_configured", return_value=False)' to isolate the test from environment variables.

@copilot open a new pull request to apply changes based on this feedback

api.py

tests/test_api.py

call_processor.py

README.md

api.py

db.py

tests/test_db.py

Copilot · 2025-12-23T09:52:08Z

@Stell0 I've opened a new pull request, #23, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…nment (#23) * Initial plan * Mock db.is_configured in test_missing_uniqueid to isolate from environment Co-authored-by: Stell0 <4547897+Stell0@users.noreply.github.com> * Add .gitignore and remove build artifacts from repository Co-authored-by: Stell0 <4547897+Stell0@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Stell0 <4547897+Stell0@users.noreply.github.com>

feature(api): diarized transcription api

ce8b89e

Stell0 force-pushed the statictrans branch 3 times, most recently from 9130add to 91cbb87 Compare September 19, 2025 14:39

fix(api): collect multipart form parameters

98bd0bb

Stell0 force-pushed the statictrans branch from 91cbb87 to 98bd0bb Compare September 19, 2025 14:44

Stell0 added 7 commits November 24, 2025 15:38

Merge branch 'main' into statictrans

50e9097

Merge branch 'main' into statictrans

3b0dab8

Remove some debug

9f9cb47

Merge branch 'main' into statictrans

104fda3

Merge branch 'main' into statictrans

adbec8c

chore(realt time translations): Remove summary. It will be done in st…

aa3acdb

…aticc translations

feature(transcriptions): add postgres vectorstore for call transcript…

0ed45ee

…ions

Stell0 force-pushed the statictrans branch from e4f745d to 0ed45ee Compare December 18, 2025 09:56

Stell0 added 7 commits December 18, 2025 17:56

fix db upsert

1608519

Fix deepgram API timeout

ba011c8

Connect to db without vector extension at first then load extension

be98b96

Add database tests

3403fdc

Fix embedding vector creation

3b04181

Add deepgram response debug

b8c35e0

Database data refactor

8183f07

Stell0 force-pushed the statictrans branch from 2f1b718 to 8183f07 Compare December 19, 2025 10:31

Stell0 added 8 commits December 19, 2025 14:29

Add logging to AI functions

cab9fa3

Disable deepgram sentiment: only english is supported

2dbd790

feat(transcription): Accept name as parameter for speakers

0dd0687

fix(satellite): allow transcription without uniqueid for voicemails

22d29ec

fix(summarization): fix summarization function calling

9020536

fix(api) Accept both single channel and multichannel response from de…

62b6aff

…epgram

Fix tests

fec8d32

parse different deepgram responses

625fe74

Stell0 added 2 commits December 23, 2025 09:59

Fiz summary call process

37b4476

Don't fail if language isn't detected

5d140f6

Stell0 requested a review from Copilot December 23, 2025 09:33

Copilot started reviewing on behalf of Stell0 December 23, 2025 09:33 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

Copilot AI mentioned this pull request Dec 23, 2025

Mock db.is_configured in test_missing_uniqueid to isolate from environment #23

Merged

Stell0 and others added 12 commits December 23, 2025 10:52

Update api.py

ac6564f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update api.py

35e65e5

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update tests/test_db.py

f43b5f6

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update api.py

0291290

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update db.py

4c2945d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update api.py

3f83b18

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix README

bd1cf76

fix indentation

014a03b

fix README

fda7ddd

db: make schema bootstrap commits explicit

badd7b2

Better ai logging in subprocess

0daf3d6

Stell0 requested review from Amygos and edospadoni December 23, 2025 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature(api): diarized transcription api #4

feature(api): diarized transcription api #4

Uh oh!

Stell0 commented Sep 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Stell0 Dec 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feature(api): diarized transcription api #4

Are you sure you want to change the base?

feature(api): diarized transcription api #4

Uh oh!

Conversation

Stell0 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Stell0 Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stell0 commented Sep 19, 2025 •

edited

Loading