Skip to content

Conversation

@Stell0
Copy link
Collaborator

@Stell0 Stell0 commented Sep 19, 2025

@Stell0 Stell0 force-pushed the statictrans branch 3 times, most recently from 9130add to 91cbb87 Compare September 19, 2025 14:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a diarized transcription API feature that enables persistence of transcriptions with vector embeddings to Postgres/pgvector, along with optional AI-powered cleaning, summarization, and sentiment analysis.

  • Added Postgres/pgvector integration for storing transcriptions and embeddings
  • Implemented AI enrichment pipeline with multi-chunk processing for long transcriptions
  • Enhanced REST API to support persistence, diarization, and AI processing parameters

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
db.py New database layer implementing Postgres schema management, transcript persistence, and vector embedding storage
call_processor.py New subprocess worker for AI enrichment processing (cleaning, summarization, sentiment)
tests/test_db.py Comprehensive test suite for database operations including schema initialization, validation, and embedding replacement
api.py Enhanced transcription endpoint with persistence, AI processing, channel naming, and improved error handling
ai.py Refactored AI processing with chunking support, multi-stage pipeline (clean → summarize → sentiment), and detailed logging
tests/test_api.py Updated API tests to validate persistence, multichannel diarization, timeout handling, and error cases
tests/conftest.py Updated mock response to include paragraphs transcript structure for diarization support
requirements.txt Added dependencies: langchain-text-splitters, pgvector, psycopg[binary]
deepgram_connector.py Removed deprecated AI summary processing code
README.md Added documentation for Postgres configuration, REST API parameters, and usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

raise RuntimeError(f"Postgres vectorstore is not configured; missing env vars: {', '.join(missing)}")

# psycopg conninfo format
return f"host={host} port={port} user={user} password={password} dbname={dbname}"
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential security issue: database password is included in the connection string without escaping. If the password contains special characters (spaces, quotes, etc.), the conninfo string could be malformed or allow injection attacks. Consider using psycopg's dictionary-style connection parameters or the make_conninfo() function which properly escapes values, rather than string formatting.

Copilot uses AI. Check for mistakes.
Comment on lines 41 to 50
def test_missing_uniqueid(self, client, valid_wav_content):
"""Test that missing uniqueid is rejected when persistence is requested."""
response = client.post(
"/api/get_transcription",
files={"file": ("test.wav", valid_wav_content, "audio/wav")},
data={"persist": "true"},
)

assert response.status_code == 400
assert "uniqueid" in response.json()["detail"]
Copy link

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test doesn't mock db.is_configured() to return False, so the test depends on the actual environment configuration. The test should explicitly control whether the database is considered configured to avoid flaky tests depending on environment state. Consider adding 'patch("api.db.is_configured", return_value=False)' to isolate the test from environment variables.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copy link

Copilot AI commented Dec 23, 2025

@Stell0 I've opened a new pull request, #23, to work on those changes. Once the pull request is ready, I'll request review from you.

Stell0 and others added 12 commits December 23, 2025 10:52
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…nment (#23)

* Initial plan

* Mock db.is_configured in test_missing_uniqueid to isolate from environment

Co-authored-by: Stell0 <4547897+Stell0@users.noreply.github.com>

* Add .gitignore and remove build artifacts from repository

Co-authored-by: Stell0 <4547897+Stell0@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Stell0 <4547897+Stell0@users.noreply.github.com>
@Stell0 Stell0 requested review from Amygos and edospadoni December 23, 2025 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants