Skip to content

Conversation

@bashandbone
Copy link
Contributor

In attempting to fix failing tests, it became increasingly clear that CodeWeaver's existing architecture made proper test isolation all but impossible. This led to a need to accelerate the DI implementation plan.

Bottom line: We couldn't authoritatively validate CodeWeaver until we completed phase 2. We can do that now.

closes #117
closes #118
closes #83 (indirectly)

Apologies for the size of this PR; it was a natural consequence of the major reorganization we needed to do to get the repo's structure right.

… of NamedTuple types (being accessed as a BaseEnum member), updated string conversion patterns for consistency with BaseEnum classes across the codebase.
…latten imports

- Removed obsolete license file for nodes_to_langs.json.
- Updated imports in progress.py, state.py, tools.py, user_agent.py, and other files to use the correct modules for DataclassSerializationMixin and DATACLASS_CONFIG.
- Removed exclude_args from ToolRegistrationDict and adjusted related code in tools.py.
- Modified find_code_tool function to receive fastmcp context.
- Enhanced embedding types with field title generation for better schema documentation.
- Updated AgentTask classifications to include additional synonyms for tasks.
- Improved code readability and maintainability by applying consistent formatting and annotations across various modules.
…mendations

- Introduced `test-infrastructure-summary.md` detailing current test coverage, risks, and a 4-tier CI strategy to enhance testing efficiency.
- Added `test_skip_xfail_analysis.md` to analyze skipped and xfail tests, identifying gaps in coverage and proposing immediate actions for improvement.
- Updated `dataclasses.py` to improve serialization logic by using fields directly instead of pydantic fields.
- Enhanced `tools.py` documentation to reflect the increase in supported programming languages from 160 to 166.
- Modified `user_agent.py` to allow optional context parameter for the `find_code_tool` function.
- Fixed health monitoring tests to align with the BaseEnum interface by replacing `.value` with `.variable`.
- Added tests in `test_selector.py` to verify fallback configurations for chunkers.
- Implemented idempotency test in `test_failover_tracker.py` to ensure unchanged files do not affect pending changes during re-indexing.
…s and initialization; add mock_only marker to config tests
…──────────────────────�[0m

     �[38;5;238m│ �[0m�[1mSTDIN�[0m
�[38;5;238m─────┼──────────────────────────────────────────────────────────────────────────�[0m
�[38;5;238m   1�[0m �[38;5;238m│�[0m �[38;5;231mrefactor(logging): Rename logging modules to _logging to fix namespace conflicts�[0m
�[38;5;238m   2�[0m �[38;5;238m│�[0m
�[38;5;238m   3�[0m �[38;5;238m│�[0m �[38;5;231mRenamed logging.py → _logging.py across multiple modules to resolve�[0m
�[38;5;238m   4�[0m �[38;5;238m│�[0m �[38;5;231m`import logging` namespace conflicts that were causing issues.�[0m
�[38;5;238m   5�[0m �[38;5;238m│�[0m
�[38;5;238m   6�[0m �[38;5;238m│�[0m �[38;5;231mChanges:�[0m
�[38;5;238m   7�[0m �[38;5;238m│�[0m �[38;5;231m- Renamed logging modules in common, config, chunker, watcher, server�[0m
�[38;5;238m   8�[0m �[38;5;238m│�[0m �[38;5;231m- Updated all imports to use _logging naming convention�[0m
�[38;5;238m   9�[0m �[38;5;238m│�[0m �[38;5;231m- Added test infrastructure for nightly and weekly test runs�[0m
�[38;5;238m  10�[0m �[38;5;238m│�[0m �[38;5;231m- Enhanced test documentation and coverage analysis�[0m
�[38;5;238m  11�[0m �[38;5;238m│�[0m �[38;5;231m- Updated lazy import validation�[0m
�[38;5;238m  12�[0m �[38;5;238m│�[0m �[38;5;231m- Cleaned up unused variables in test_publish_validation�[0m
�[38;5;238m  13�[0m �[38;5;238m│�[0m
�[38;5;238m  14�[0m �[38;5;238m│�[0m �[38;5;231mThis refactor maintains all existing functionality while fixing the�[0m
�[38;5;238m  15�[0m �[38;5;238m│�[0m �[38;5;231mnamespace collision issue that prevented proper access to Python's�[0m
�[38;5;238m  16�[0m �[38;5;238m│�[0m �[38;5;231mstandard logging module.�[0m
�[38;5;238m  17�[0m �[38;5;238m│�[0m
�[38;5;238m  18�[0m �[38;5;238m│�[0m �[38;5;231m🤖 Generated with [Claude Code](https://claude.com/claude-code)�[0m
�[38;5;238m  19�[0m �[38;5;238m│�[0m
�[38;5;238m  20�[0m �[38;5;238m│�[0m �[38;5;231mCo-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>�[0m
�[38;5;238m─────┴──────────────────────────────────────────────────────────────────────────�[0m
… enhanced -- identified unused/implemented structured logging system and implemented it
- Updated package name references from \codeweaver\ to \code-weaver\ in integration and smoke tests.
- Relaxed performance test thresholds for memory persistence to accommodate WSL I/O overhead.
- Changed Qdrant Docker image version from \latest\ to \v1.16.1\ for consistency and reliability.
- Enhanced documentation in unit tests for embedding reconciliation, clarifying test organization and rationale.
- Removed outdated integration tests for reconciliation exception handling, consolidating testing strategy.
…aths for SearchStrategy and StrategizedQuery

- Updated import statements in multiple test files to reflect the new module structure for SearchStrategy and StrategizedQuery, moving from `codeweaver.agent_api.find_code.types` to `codeweaver.core.search_types`.
- Ensured consistency across integration and unit tests by modifying the relevant import paths.
- Added new unit tests for the DI container, including basic resolution, singleton behavior, nested resolution, overrides, lifespan management, and type hint resolution.
- Updated the `uv.lock` file to include new packages: `code-weaver-daemon` and `code-weaver-tokenizers`, along with their dependencies and test configurations.
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we are unable to review this pull request

The GitHub API does not allow us to fetch diffs exceeding 300 files, and this pull request has 331

@github-actions
Copy link
Contributor

🤖 Hi @bashandbone, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 25, 2025

👋 Hey @bashandbone,

Thanks for your contribution to codeweaver! 🧵

You need to agree to the CLA first... 🖊️

Before we can accept your contribution, you need to agree to our Contributor License Agreement (CLA).

To agree to the CLA, please comment:

I read the contributors license agreement and I agree to it.

Those exact words are important1, so please don't change them. 😉

You can read the full CLA here: Contributor License Agreement


@bashandbone has signed the CLA.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Footnotes

  1. Our bot needs those exact words to recognize that you agree to the CLA.

@socket-security
Copy link

socket-security bot commented Dec 25, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatedpypi/​black@​25.11.0 ⏵ 25.12.086 +1100100100100
Addedpypi/​blake3@​1.0.810010098100100
Updatedpypi/​ast-grep-py@​0.40.0 ⏵ 0.40.1100100100100100

View full report

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Phases 1 and 2 of the Dependency Injection (DI) and Monorepo architecture plan to fix failing tests by enabling proper test isolation. The changes establish transport-agnostic dependency injection, reorganize core types to break circular dependencies, and extract tokenizers/daemon into separate workspace packages.

Key Changes:

  • Established DI container with provider factories for embedding, vector store, and reranking services
  • Moved Provider/ProviderKind enums and search types to core/types to eliminate circular dependencies
  • Extracted codeweaver-tokenizers and codeweaver-daemon as separate workspace packages
  • Reorganized type system by splitting utilities, dataclasses, and domain types into focused modules
  • Added transport-agnostic design ensuring DI works in CLI, daemon, and MCP server modes

Reviewed changes

Copilot reviewed 143 out of 331 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/codeweaver/core/types/search.py New file defining search types (SearchStrategy, StrategizedQuery, SearchResult) moved from agent_api
src/codeweaver/core/types/provider.py New file with Provider/ProviderKind enums and environment variable metadata moved from providers
src/codeweaver/core/types/delimiter.py New file with delimiter types moved from engine.chunker to break circular deps
src/codeweaver/core/types/dataclasses.py New file extracting dataclass serialization mixins and BaseEnumData from models.py
src/codeweaver/core/types/env.py New file with environment variable type definitions
src/codeweaver/core/types/utils.py New file with utility functions (generate_title, clean_sentinel_from_schema)
src/codeweaver/common/utils/lazy_getter.py New file extracting create_lazy_getattr for reuse across packages
src/codeweaver/config/providers.py Updated default agent model and sparse embedding fallback to BM25
src/codeweaver/agent_api/find_code/pipeline.py Refactored to use DI with injected providers instead of global registry
pyproject.toml Added workspace configuration and new package dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Contributor

🤖 I'm sorry @bashandbone, but I was unable to process your request. Please see the logs for more details.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Copilot AI review requested due to automatic review settings December 25, 2025 17:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 143 out of 331 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings December 26, 2025 05:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 145 out of 365 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""
if self.is_semantic_search_language:
return SemanticSearchLanguage.from_string(self.value)
return SemanticSearchLanguage.from_string(self.variable)
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property name variable is ambiguous in this context. Based on the enum pattern, this should be value (the standard enum attribute) or a more descriptive name like name or language_name.

Suggested change
return SemanticSearchLanguage.from_string(self.variable)
return SemanticSearchLanguage.from_string(self.value)

Copilot uses AI. Check for mistakes.
chunks_in_failover: NonNegativeInt = 0

def _telemetry_keys(self) -> dict[FilteredKeyT, AnonymityConversion]:
def _telemetry_keys(self) -> None:
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type changed from dict[FilteredKeyT, AnonymityConversion] to None, but the method name _telemetry_keys suggests it should return keys. Either rename the method to indicate it's a no-op (e.g., _skip_telemetry_keys) or maintain the dict return type with an empty dict.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings December 26, 2025 05:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 145 out of 365 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

src/codeweaver/config/profiles.py:1

  • The comment on line 414-415 states 'qdrant_client has built-in BM25 support' and 'if FastEmbed isn't available, it will use that automatically', but the code unconditionally returns Provider.FASTEMBED. This is inconsistent - if FastEmbed isn't available, this will fail. Consider adding a check for FastEmbed availability.
# sourcery skip: lambdas-should-be-short

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

_unset_fields: Annotated[
set[str], Field(description="Set of fields that were unset", exclude=True)
] = set()
_unset_fields: Annotated[set[str], PrivateAttr(default_factory=set)] = set()
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value set() is redundant when using PrivateAttr(default_factory=set). Remove the = set() assignment as the default_factory already handles initialization.

Suggested change
_unset_fields: Annotated[set[str], PrivateAttr(default_factory=set)] = set()
_unset_fields: Annotated[set[str], PrivateAttr(default_factory=set)]

Copilot uses AI. Check for mistakes.

# Create unique client name based on provider and kind
client_name = f"{provider.value.lower()}_{provider_kind.value.lower()}"
client_name = f"{provider.variable}_{provider_kind.variable}_client"
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .variable instead of .value or .name for enum access is unconventional and unclear. If this is a custom enum attribute, add a comment explaining its purpose, or use the standard .value attribute.

Suggested change
client_name = f"{provider.variable}_{provider_kind.variable}_client"
client_name = f"{provider.value}_{provider_kind.value}_client"

Copilot uses AI. Check for mistakes.
management_port=management_port,
mcp_host=mcp_host,
mcp_port=mcp_port,
mcp_port=mcp_port or 4329,
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded default port 4329 should be defined as a named constant for better maintainability and consistency with other default port configurations in the codebase.

Copilot uses AI. Check for mistakes.
]
dependencies = [
# Core functionality
"aenum>=3.1.16",
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version constraint was changed from >=0.40.0 to >=0.40.1. Document why this specific patch version is required in a comment or changelog to help with future dependency management.

Suggested change
"aenum>=3.1.16",
"aenum>=3.1.16",
# ast-grep-py>=0.40.1 includes fixes required by code-weaver's AST search/matching pipeline; earlier 0.40.x releases are not sufficient.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Adam Poulemanos <89049923+bashandbone@users.noreply.github.com>
Copilot AI review requested due to automatic review settings December 26, 2025 05:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 145 out of 365 changed files in this pull request and generated 12 comments.

Comments suppressed due to low confidence (1)

src/codeweaver/config/profiles.py:1

  • The model name 'claude-haiku-4.5-latest' references a version (4.5) that may not exist yet. Verify this model name is correct and available from Anthropic.
# sourcery skip: lambdas-should-be-short

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""
return (
ConfigLanguage.from_string(self.value)
ConfigLanguage.from_string(self.variable)
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method as_config_language is using .variable instead of .value. This should be .value to properly access the enum's string value.

Suggested change
ConfigLanguage.from_string(self.variable)
ConfigLanguage.from_string(self.value)

Copilot uses AI. Check for mistakes.
"""Create a DiscoveredFile from a file path."""
branch = get_git_branch(path if path.is_dir() else path.parent) or "main"
if ext_kind := (ext_kind := ExtKind.from_file(path)):
if ext_kind := ExtKind.from_file(path):
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The walrus operator assignment ext_kind := ExtKind.from_file(path) is redundant here since ext_kind was already computed in the condition on line 127. This creates unnecessary duplicate calls to ExtKind.from_file(path).

Suggested change
if ext_kind := ExtKind.from_file(path):
ext_kind = ExtKind.from_file(path)
if ext_kind:

Copilot uses AI. Check for mistakes.
if not isinstance(env_path, Unset):
self.project_path = env_path
else:
self.project_path = lazy_import("codeweaver.core", "get_project_path")()
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import path for get_project_path has changed from codeweaver.common.utils to codeweaver.core. Ensure this function is properly exported from codeweaver.core.__init__.py and that the change is consistently applied throughout the codebase.

Copilot uses AI. Check for mistakes.
Comment on lines +414 to +416
# qdrant_client has built-in BM25 support
# if FastEmbed isn't available, it will use that automatically
return DeterminedDefaults(provider=Provider.FASTEMBED, model="qdrant/bm25", enabled=True)
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback for sparse embedding now always returns FASTEMBED with qdrant/bm25 instead of NOT_SET. This changes the behavior when no sparse embedding libraries are available. Consider whether this is the intended behavior or if there should still be a check for FastEmbed availability.

Suggested change
# qdrant_client has built-in BM25 support
# if FastEmbed isn't available, it will use that automatically
return DeterminedDefaults(provider=Provider.FASTEMBED, model="qdrant/bm25", enabled=True)
logger.warning(
"No default sparse embedding provider libraries found. Sparse embedding functionality "
"will be disabled unless explicitly set in your config or environment variables."
)
return DeterminedDefaults(provider=Provider.NOT_SET, model="NONE", enabled=False)

Copilot uses AI. Check for mistakes.
Comment on lines +270 to +271
"intent": self._intent_type.variable
if hasattr(self._intent_type, "variable")
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from .value to .variable for enum access. However, standard Python enums use .value, not .variable. This will cause AttributeError unless the enum has been customized to have a variable attribute.

Copilot uses AI. Check for mistakes.

# Create unique client name based on provider and kind
client_name = f"{provider.value.lower()}_{provider_kind.value.lower()}"
client_name = f"{provider.variable}_{provider_kind.variable}_client"
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .variable instead of standard enum .value attribute. Python enums use .value to access the enum's value, not .variable.

Suggested change
client_name = f"{provider.variable}_{provider_kind.variable}_client"
client_name = f"{provider.value}_{provider_kind.value}_client"

Copilot uses AI. Check for mistakes.
"Failed to resolve %s provider %s: %s",
provider_kind.name,
provider.value,
provider.as_title,
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from provider.value to provider.as_title. Verify that the Provider enum has an as_title property/method and that it returns the appropriate formatted string for logging.

Copilot uses AI. Check for mistakes.

except ImportError as e:
display.print_warning(f"Cannot list models for {provider.value}: {e}")
display.print_warning(f"Cannot list models for {provider.as_title}: {e}")
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using provider.as_title instead of provider.value. Confirm the Provider enum has this property and it provides appropriate formatting for user-facing messages.

Copilot uses AI. Check for mistakes.
summary = (
f"Found {len(code_matches)} relevant matches "
f"for {intent_type.value} query. "
f"for {intent_type.variable} query. "
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .variable instead of .value on the enum. Standard Python enums use .value to access their value, not .variable.

Suggested change
f"for {intent_type.variable} query. "
f"for {intent_type.value} query. "

Copilot uses AI. Check for mistakes.
"extra": {
"phase": "vector_search",
"search_strategy": query_vector.strategy.value,
"search_strategy": query_vector.strategy.variable,
Copy link

Copilot AI Dec 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing .variable on enum instead of .value. This will fail unless the SearchStrategy enum has been modified to have a custom variable attribute.

Suggested change
"search_strategy": query_vector.strategy.variable,
"search_strategy": query_vector.strategy.value,

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Phase 1 & 2 DI/Monorepo Implementation

Summary

This is an ambitious and architecturally significant PR implementing dependency injection and beginning the monorepo migration for CodeWeaver. The scope is necessarily large (375 files, +28K/-36K lines) due to the fundamental architectural changes required. However, the PR currently has critical test failures that must be resolved before merging.

Critical Issues Blocking Merge

1. Import Error: TypeError: 'module' object is not callable

Severity: Critical

The main test failure across all Python versions (3.12, 3.13, 3.14) is caused by an issue with lazy imports and the DI system:

# Error trace (from chunking_service.py:75)
governor: GovernorDep = depends(get_chunk_governor),
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'module' object is not callable

Root Cause:
The depends function is being imported via the lazy import mechanism in codeweaver.di.__init__.py, but it's being called at class definition time (as a default parameter value). At this point, the lazy import hasn't been resolved yet, so depends is still a module reference rather than the actual function.

Affected Files:

  • src/codeweaver/engine/chunking_service.py:75
  • src/codeweaver/server/health/health_service.py:58
  • src/codeweaver/engine/indexer/indexer.py (multiple locations)
  • And likely many more throughout the codebase

Recommended Fix:
You have two options:

  1. Import directly from the submodule (immediate fix):

    # Instead of:
    from codeweaver.di import depends
    
    # Use:
    from codeweaver.di.depends import depends
  2. Make lazy imports eager for class-time usage (better long-term):
    Modify create_lazy_getattr in lazy_import.py to immediately resolve imports that will be used at class definition time, or don't use lazy loading for the depends function since it's used in so many places.

2. Missing REUSE License Tool

Severity: High

All test runs fail the license check:

✗ reuse – ERROR
sh: 1: reuse: not found

Fix: Add reuse to the CI environment setup or the dev dependencies. The tool is available via pip:

pip install reuse

Update .github/workflows/_reusable-test.yml to install it.

3. Python 3.14 Pydantic Compatibility Issues ⚠️

Severity: Medium

Python 3.14 tests show additional failures:

ValueError: On field "content" the following field constraints are set but not enforced: min_items

This appears to be a Pydantic v2 compatibility issue with Python 3.14. Since 3.14 is still experimental, this could be addressed in a follow-up, but it's worth noting.

Architectural Review

Strengths ✅

  1. Well-Documented Strategy: The claudedocs/ directory contains excellent documentation of the DI/monorepo strategy, implementation phases, and impact analysis.

  2. Monorepo Structure: The creation of separate packages (codeweaver-daemon, codeweaver-tokenizers) is a good start for modularization.

  3. DI Implementation: The DI container implementation looks solid, with:

    • Clean separation of concerns
    • FastAPI-inspired Depends pattern
    • Proper lifecycle management
    • Type-safe dependency declarations
  4. Lazy Loading: The lazy import mechanism is sophisticated and should help with import performance.

Concerns & Recommendations ⚠️

  1. PR Size: 375 files is extremely large for review. While I understand this was necessary due to the architectural changes, future work should try to break this into smaller, more reviewable chunks. Consider:

    • Phase 1 (DI foundation) as one PR
    • Phase 2 (service migration) as incremental PRs
    • Monorepo structure as a separate PR
  2. Test Coverage: The PR description mentions "fix failing tests" but the tests are still failing. Before merging:

    • All test failures must be resolved
    • CI must be green
    • Consider adding integration tests specifically for the DI container
  3. Migration Completeness: According to DI_MIGRATION_STATUS_PHASE_2.md, this should eliminate 75% of circular dependencies (120-130 violations). It would be valuable to:

    • Include before/after dependency analysis results
    • Document which services have been fully migrated vs. partially migrated
    • Provide a checklist of remaining work
  4. Breaking Changes: This appears to be a breaking change for anyone using CodeWeaver as a library. Consider:

    • Documenting all breaking changes in CHANGELOG.md
    • Providing migration guide for library users
    • Semantic versioning bump (major version)
  5. Lazy Import Timing: The lazy import mechanism is clever but causes issues when functions are needed at class definition time. Consider:

    • Being more selective about what gets lazy-loaded
    • Documenting which imports should be direct vs. lazy
    • Adding runtime checks or better error messages when lazy imports fail

Security & Performance

Security: No immediate security concerns identified. The code appears to follow defensive security principles.

Performance:

  • The lazy import mechanism should improve startup time
  • The DI container overhead appears minimal with caching
  • No obvious performance regressions in the code reviewed

Code Quality

Positives:

  • Consistent licensing headers (SPDX)
  • Good type hints throughout
  • Comprehensive docstrings
  • Well-structured modules

Areas for Improvement:

  • Some complex functions could benefit from further decomposition
  • Error handling could be more explicit in DI resolution paths
  • Add more inline comments explaining the lazy import resolution flow

Recommendations

Before Merging (Required)

  1. Fix the depends import error - This is blocking all tests
  2. Add reuse tool to CI - Required for license checking
  3. Achieve green CI - All non-experimental tests must pass
  4. Update CHANGELOG.md - Document all breaking changes
  5. Version bump - This should be a major version increment

Follow-up Work (Recommended)

  1. 📋 Create issues for Python 3.14 compatibility
  2. 📋 Document DI migration guide for users
  3. 📋 Add dependency analysis report showing violation reduction
  4. 📋 Create integration tests for DI container edge cases
  5. 📋 Consider creating a migration script/tool for users

Conclusion

This PR represents significant and valuable architectural improvements to CodeWeaver. The DI implementation is well-designed and the monorepo structure is a good foundation. However, the critical import errors must be resolved before this can be merged. The failures are systematic and affect core functionality.

Recommendation: REQUEST CHANGES

Once the critical issues are resolved and tests are passing, this should be good to merge. The architectural direction is sound and the implementation quality is high.


Testing Checklist:

  • All Python 3.12 tests passing
  • All Python 3.13 tests passing
  • All Python 3.14 tests passing (or marked experimental)
  • License checks passing
  • Linting passing
  • Documentation updated
  • CHANGELOG updated

Great work on tackling this major refactoring! Let me know if you'd like help debugging the import issues.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Phase 1 & 2 DI/Monorepo Implementation

I've completed a comprehensive review of this substantial PR implementing Dependency Injection and Monorepo architecture for CodeWeaver. This is an ambitious refactoring that addresses fundamental architectural issues.

Critical Issue: Test Failures 🔴

Root Cause Identified: src/codeweaver/core/utils/lazy_import.py:174

The lazy import mechanism has a critical bug causing all test failures:

# Current (BROKEN):
module = __import__(f"{parent_module}.{submodule_name}", fromlist=[""])

When fromlist is an empty list or contains an empty string, __import__ returns the top-level package instead of the target submodule. This is why you're seeing TypeError: 'module' object is not callable - it's returning the package when it should return the actual function/class.

Fix Required:

# Should be one of these:
module = __import__(f"{parent_module}.{submodule_name}", fromlist=[name])
# OR
module = __import__(f"{parent_module}.{submodule_name}", fromlist=["__name__"])

This affects:

  • src/codeweaver/core/utils/lazy_import.py:174 in create_lazy_getattr()
  • All lazy-loaded modules using this function

Failing Tests:

  • tests/integration/test_health_monitoring.py
  • tests/integration/test_phase4_status_flow.py
  • tests/integration/test_reconciliation_integration.py
  • tests/integration/test_server_indexing.py
  • tests/unit/test_indexer_reconciliation.py
  • tests/unit/test_indexer_remove_path.py
  • tests/unit/test_indexer_stale_point_removal.py

All 7 errors trace back to this single bug in the lazy import mechanism.


Architecture & Design Review ✅

Strengths:

  1. Well-Documented Strategy - The claudedocs/ directory provides excellent context:

    • Clear progression from problem identification to solution
    • Quantified benefits (75% reduction in circular dependencies)
    • Week-by-week implementation plan
  2. Sound DI Architecture - The dependency injection design is solid:

    • FastAPI-inspired Depends() pattern is familiar and ergonomic
    • Provider factories properly abstract transport concerns
    • Singleton/transient lifecycle management
  3. Clean Monorepo Structure - Package separation makes sense:

    • code-weaver-tokenizers - Self-contained tokenization logic
    • code-weaver-daemon - Daemon-specific utilities
    • Main code-weaver package properly depends on workspace packages
  4. Lazy Loading Strategy - Using __getattr__ for lazy imports is appropriate for reducing startup time

Concerns:

  1. PR Size - 365 files changed is extremely difficult to review comprehensively:

    • Consider breaking into smaller PRs for phase 3
    • Current approach makes it hard to identify subtle bugs (like the lazy import issue)
  2. Logging Module Rename - The logging.py → _logging.py rename:

    • Good: Avoids namespace collision with stdlib logging
    • Concern: Commit ff67dcd has a corrupted message with ANSI escape codes
    • The rename itself is correct, but the presentation in git history is messy
  3. Test Coverage - While fixing failing tests:

    • Need to ensure DI container tests cover edge cases
    • Integration tests for workspace package boundaries
    • Verify lazy imports work correctly after fix

Code Quality Observations

Good Practices:

  • Consistent use of type hints throughout
  • Proper __all__ exports in __init__.py files
  • SPDX license headers maintained
  • Good separation of concerns

Minor Issues:

  1. Workspace Dependencies (pyproject.toml:109-110):

    dependencies = [
      "code-weaver-daemon",
      "code-weaver-tokenizers",

    These should use workspace references in [tool.uv.sources] (which they do), so this is correct.

  2. Model Name (src/codeweaver/config/profiles.py):

    • Copilot flagged claude-haiku-4.5-latest as potentially non-existent
    • Verify this model identifier is correct
  3. FastEmbed Fallback Logic:

    • Comment says "if FastEmbed isn't available, it will use that automatically"
    • But code unconditionally returns Provider.FASTEMBED
    • May want to add availability check

Security & Performance

Security: ✅ No security concerns identified

  • No credential handling issues
  • Proper separation of concerns reduces attack surface
  • DI pattern makes security auditing easier

Performance: ⚠️ Monitor carefully

  • Lazy loading should improve startup time
  • DI container overhead should be negligible for this use case
  • Watch for circular dependency resolution performance in large codebases

Recommendations

Immediate (Before Merge):

  1. Fix the lazy import bug in lazy_import.py:174
  2. Run full test suite to verify fix
  3. Clean up commit ff67dcd with corrupted message (optional, but recommended)

Short Term (Phase 3):

  1. Add comprehensive DI container tests
  2. Document migration guide for external users
  3. Add performance benchmarks comparing pre/post DI
  4. Consider splitting Phase 3 into smaller PRs

Long Term:

  1. Document the lazy loading strategy for contributors
  2. Add tooling to detect circular dependencies early
  3. Consider extracting DI framework as standalone package if it proves valuable

Verdict

Cannot approve until critical bug is fixed.

Once the lazy import bug in src/codeweaver/core/utils/lazy_import.py:174 is corrected and tests pass, this PR represents solid architectural improvement that addresses real technical debt.

The DI/Monorepo strategy is sound and well-executed. The primary issue is a single-line bug in the lazy import mechanism that's causing cascade failures.

Estimated fix time: < 30 minutes
Re-review needed: Yes, after fix is applied


Additional Notes

Great work on the documentation and planning. The claudedocs/INTEGRATED_DI_MONOREPO_STRATEGY.md is particularly well done and should serve as a model for future major refactorings.

The scope of this PR is ambitious but the execution shows careful thought and planning. Looking forward to seeing this merged once the import bug is resolved!

@github-actions
Copy link
Contributor

PR Review - Comment #2648872383 Response

I've reviewed the bot comment about the "non-callable" issue in chunking_service.py:77. The bot's analysis appears to be incorrect.

Analysis

  1. get_settings is a function, not a module

    • In src/codeweaver/di/providers.py:116-120, get_settings is defined as an async function
    • It's correctly imported in chunking_service.py:26 from codeweaver.di.providers
  2. The usage pattern is correct and consistent

    • The pattern settings: SettingsDep = depends(get_settings) matches the established pattern used throughout the codebase
    • See health_service.py:55-63 for the same pattern:
      def __init__(
          self,
          *,
          provider_registry: ProviderRegistryDep = depends(get_provider_registry),
          statistics: StatisticsDep = depends(get_statistics),
          ...
      )
  3. No shadowing issue

    • While there is a local import of get_settings from config.settings on line 163, it's inside the _chunk_sequential method scope
    • This doesn't affect the module-level import on line 26 or the default parameter on line 77

Actual Issue

The tests are failing, but the "non-callable" diagnosis appears to be a false positive from static analysis. The code follows the correct DI pattern.

Recommendation: Do not apply the suggested fix (depends(get_settings.get_settings)). This would be incorrect and break the DI pattern.

Test Failures

All Python test jobs (3.12, 3.13, 3.13t, 3.14, 3.14t) are failing. The real issue needs investigation through actual test logs rather than static analysis suggestions. Would recommend:

  1. Looking at actual test error messages
  2. Checking if the DI container is properly initialized in tests
  3. Verifying async function resolution in the container

Note: This is a massive PR (375 files, +28k/-36k lines) implementing Phase 1 & 2 of the DI/Monorepo plan. Given the scope, test failures are expected and should be addressed systematically.

@github-actions
Copy link
Contributor

Code Review - PR #205

Summary

This PR implements Phases 1 and 2 of the DI/Monorepo plan, introducing dependency injection architecture and reorganizing the codebase into packages. While the architectural vision is sound and well-documented, the PR has critical bugs that prevent tests from running.

🚨 Critical Issues (Must Fix Before Merge)

1. Module Export Bug in di/providers.py

Location: src/codeweaver/di/providers.py:268-285

The __all__ export list is missing the factory functions (get_chunk_governor, get_settings, get_tokenizer, etc.) that are being imported in other modules.

Evidence:

  • src/codeweaver/engine/chunking_service.py:20-26 imports these functions
  • src/codeweaver/server/health/health_service.py:58 uses depends(get_provider_registry)
  • Test failures show: TypeError: 'module' object is not callable

Fix Required:

__all__ = (
    # Type aliases
    "ChunkingServiceDep",
    "EmbeddingDep",
    # ... existing type aliases ...
    
    # Factory functions (MISSING - ADD THESE)
    "get_chunk_governor",
    "get_chunking_service",
    "get_embedding_provider",
    "get_failover_manager",
    "get_file_watcher",
    "get_health_service",
    "get_ignore_filter",
    "get_indexer",
    "get_model_registry",
    "get_provider_registry",
    "get_reranking_provider",
    "get_services_registry",
    "get_settings",
    "get_sparse_embedding_provider",
    "get_statistics",
    "get_tokenizer",
    "get_vector_store",
)

2. CI Test Failures

All Python versions (3.12, 3.13, 3.13t, 3.14, 3.14t) are failing due to the above import issue. Tests cannot even collect:

ERROR collecting tests/integration/test_health_monitoring.py
ERROR collecting tests/integration/test_reconciliation_integration.py
ERROR collecting tests/unit/test_indexer_reconciliation.py
# ... and more

Impact: Zero test coverage verification for this massive refactor.

⚠️ High-Priority Concerns

3. Lazy Import Implementation Risk

The PR moves from standard imports to a custom lazy import system across the codebase. While this can improve startup time:

Risks:

  • Import errors are deferred to runtime (as we're seeing in tests)
  • Harder to debug circular dependencies
  • Type checkers may struggle with dynamic __getattr__

Recommendation: Ensure comprehensive test coverage before relying on lazy imports for critical paths.

4. Massive Scope (300+ files)

  • 28,225 additions, 36,476 deletions
  • 79 new files, 14 deletions
  • Combines architectural refactor + monorepo reorganization + DI implementation

Risk: Difficult to review thoroughly and high chance of subtle bugs.

Recommendation: Consider splitting into smaller PRs:

  1. DI foundation (Phase 1)
  2. Service migration to DI (Phase 2)
  3. Monorepo package structure (Phase 3)

📋 Medium-Priority Issues

5. Type Safety Degradation

Multiple instances of Any types in DI annotations:

# src/codeweaver/di/providers.py
EmbeddingDep = Annotated[Any, Depends(get_embedding_provider)]
GovernorDep = Annotated[Any, Depends(get_chunk_governor)]

Issue: Loses type checking benefits. Should use concrete types from TYPE_CHECKING imports.

Fix:

if TYPE_CHECKING:
    from codeweaver.engine.chunker import ChunkGovernor

GovernorDep = Annotated[ChunkGovernor, Depends(get_chunk_governor)]

6. Coexistence Pattern Concerns

The PR aims to support both old and new DI patterns during migration:

# Old pattern still works (claimed)
indexer_old = Indexer()

# New pattern
container = get_container()
indexer_new = container.resolve(Indexer)

Issue: The new ChunkingService.__init__ uses DI by default:

def __init__(
    self,
    governor: GovernorDep = depends(get_chunk_governor),  # DI required\!
    ...
)

This breaks backward compatibility - you can't instantiate without the container.

Recommendation: Provide factory methods or make DI optional during transition:

def __init__(
    self,
    governor: ChunkGovernor | GovernorDep = None,
    ...
):
    if governor is None or isinstance(governor, Depends):
        # Resolve from container
        governor = container.resolve(ChunkGovernor)
    self.governor = governor

7. Test Migration Incomplete

Per claudedocs/RECONCILIATION_FIX_FINAL_STATUS.md:

  • Only 60% of reconciliation tests passing (6/10)
  • Integration tests through prime_index() failing
  • Root cause: Manifest state and model coordination issues

Recommendation: Complete test migration before merging to ensure no regressions.

✅ Positive Aspects

  1. Excellent Documentation

    • Clear migration strategy in claudedocs/
    • Visual diagrams showing architectural improvements
    • Well-defined phases and acceptance criteria
  2. Sound Architectural Vision

    • DI reduces circular dependencies by ~75% (164 → ~40 violations)
    • Enables clean package separation
    • FastAPI-familiar patterns
  3. Proper Licensing

    • REUSE-compliant headers added
    • Dual MIT/Apache-2.0 licensing
  4. Monorepo Structure

    • Clean package separation in packages/
    • codeweaver-daemon and codeweaver-tokenizers properly isolated

🔒 Security Assessment

No Critical Security Issues Identified

  • No authentication/credential handling changes
  • No external API exposure changes
  • Standard dependency management practices
  • Security.md properly references GitHub Security Advisories

Note: DI containers can introduce security risks if not carefully validated (arbitrary code execution via malicious providers), but the current implementation uses static factory functions, which is safe.

📊 Recommendations

Before Merge:

  1. Fix the __all__ export bug in di/providers.py (critical)
  2. Verify all tests pass on all Python versions
  3. Complete test migration to achieve >90% passing rate
  4. Replace Any with concrete types in DI annotations
  5. Verify backward compatibility or document breaking changes

Consider for Follow-up:

  1. Split into smaller, reviewable PRs (if possible at this stage)
  2. Add integration tests specifically for DI container lifecycle
  3. Document migration guide for external contributors
  4. Add performance benchmarks vs. baseline (per Phase 2 spec)

🎯 Verdict

CHANGES REQUESTED

The architectural direction is excellent and well-planned, but the implementation has a critical bug preventing any validation. The PR cannot be merged until:

  1. All tests pass on CI
  2. The export bug is fixed
  3. Backward compatibility is verified or breaking changes are documented

Estimated Fix Time: 2-4 hours for the critical issues.


Review completed by: Claude (Automated Code Review)
Branch: feat/di_monorepo
Commit: 9f0021f

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: DI/Monorepo Implementation (Phase 1 & 2)

Summary

This is an ambitious and well-documented PR implementing phases 1 and 2 of the DI/Monorepo plan. The architectural vision is sound, and the documentation is excellent. However, there are critical test failures that must be resolved before merging.


Critical Issues 🚨

1. Lazy Import System Bug with depends function

Status: Blocking - All Python 3.12, 3.13, 3.13t, 3.14, and 3.14t test runs failing

Error:

TypeError: 'module' object is not callable
  at src/codeweaver/engine/chunking_service.py:75
  governor: GovernorDep = depends(get_chunk_governor),
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^

Root Cause:
The lazy import system in src/codeweaver/di/__init__.py is incorrectly resolving the depends function. When code imports from codeweaver.di import depends, the lazy loader is returning the codeweaver.di.depends module instead of the depends function from within that module.

Affected Files:

  • src/codeweaver/engine/chunking_service.py:75 (depends(get_chunk_governor))
  • src/codeweaver/server/health/health_service.py:58 (depends(get_provider_registry))
  • All files using depends() pattern

Investigation Needed:
The lazy import pattern in src/codeweaver/di/__init__.py uses:

_dynamic_imports = MappingProxyType({
    "depends": (__spec__.parent, "depends"),  # Maps to ("codeweaver.di", "depends")
})

And in create_lazy_getattr (line 174 of lazy_import.py):

module = __import__(f"{parent_module}.{submodule_name}", fromlist=[""])
result = getattr(module, name)  # Getting 'depends' from module

The getattr(module, "depends") call should retrieve the depends function from the codeweaver.di.depends module, but it appears to be returning the module itself instead. This suggests either:

  1. A naming conflict where the module name shadows the function name
  2. An issue with how __getattr__ is resolving in the depends.py module
  3. A bug in the lazy import caching mechanism

Recommended Fix:
Debug the lazy import resolution for the depends function specifically. Consider:

  • Adding explicit exports in depends.py
  • Checking if there's a circular import occurring
  • Verifying __all__ is properly defined in depends.py (it is: line 69)
  • Testing if direct import works: from codeweaver.di.depends import depends

2. Missing reuse Tool in CI

Error:

sh: 1: reuse: not found
✗ reuse – ERROR
hk ERROR To fix, run: ./scripts/code-quality/update-licenses.py add --interactive

Impact: Test runs fail during quality checks

Fix: Ensure reuse is installed in the CI environment or skip the check if it's not critical for this PR.


Architecture & Design ✅

Strengths:

  1. Excellent Documentation: The claudedocs/ directory provides comprehensive analysis:

    • INTEGRATED_DI_MONOREPO_STRATEGY.md clearly explains the vision
    • DI_IMPACT_VISUALIZATION.md shows the migration path
    • DI_MIGRATION_STATUS_PHASE_2.md tracks progress
  2. Sound Architectural Approach: The DI system is well-designed:

    • FastAPI-inspired dependency injection pattern
    • Clear separation between container, providers, and types
    • Type-safe dependency declarations with Annotated
  3. Proper Type Annotations: Good use of TYPE_CHECKING blocks to avoid circular imports at runtime

  4. Test Infrastructure: Added comprehensive test categorization (unit, integration, nightly, weekly)


Code Quality 📊

Positive:

  • Consistent use of SPDX license headers
  • Good docstring coverage
  • Clear separation of concerns in the DI module

Concerns:

  • Massive PR Size: 375 files changed, 28k additions, 36k deletions makes thorough review difficult

    • Consider breaking future changes into smaller, incremental PRs
    • The monorepo structure should help with this going forward
  • Lazy Import Complexity: The lazy import system adds significant indirection

    • Current bug suggests it may be fragile
    • Consider if the performance benefits justify the complexity

Test Coverage 🧪

Issues:

  • All test runs are failing due to the import bug
  • Cannot validate that the DI system works correctly until tests pass
  • The PR description mentions fixing failing tests, but new failures have been introduced

Recommendation:

  • Fix the depends import issue
  • Ensure all existing tests pass
  • Add specific tests for the DI container (I see tests/unit/di/test_providers.py was added - good!)

Security & Performance 🔒

No major concerns identified, but:

  • The lazy import system adds runtime overhead for first access
  • DI resolution adds indirection - ensure this doesn't impact hot paths
  • No obvious security issues in the DI implementation

Recommendations

  1. Immediate Action Required:

    • Fix the depends lazy import bug (blocking issue)
    • Fix the reuse tool availability in CI
    • Get all tests passing
  2. Before Merge:

    • Verify test coverage hasn't decreased
    • Ensure no performance regressions in hot paths
    • Consider adding integration tests for the DI system
  3. Post-Merge:

    • Monitor for any runtime issues with lazy imports
    • Document migration guide for contributors
    • Plan smaller PRs for phase 3

Verdict

Request Changes ⚠️

The architectural direction is excellent and well-documented. However, the critical test failures (especially the depends import bug affecting all test runs) must be resolved before this can be merged.

Once the import issue is fixed and tests are passing, this PR will be in good shape to merge.


Questions for the Author

  1. Have you been able to reproduce the depends import issue locally?
  2. What Python version(s) did you test with during development?
  3. Is the lazy import pattern used elsewhere in the codebase working correctly, or is this specific to the DI module?
  4. Are there any known breaking changes for downstream consumers of this library?

Files Changed: 375 (100 shown in PR view)
Lines Added: 28,225
Lines Deleted: 36,476
Test Status: ❌ Failing (Python 3.12, 3.13, 3.13t, 3.14, 3.14t)
CI Checks: 5 failing, 8 passing, 9 skipping

- Removed old telemetry test files and added new structured tests for privacy serialization.
- Implemented a mock model for testing privacy filters and ensured critical fields are handled correctly.
- Updated the test container setup to use a context manager for dependency injection.
- Increased the maximum chunks per file in performance settings for chunking tests.
- Enhanced semantic deduplication tests to verify unique AST nodes and identifiers.
- Fixed issues in indexer reconciliation tests by ensuring proper dependency injection.
- Added a failures log for integration tests to track ongoing issues.
Copilot AI review requested due to automatic review settings December 28, 2025 05:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 145 out of 393 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self.batch_size = 10 # default batch size
if isinstance(self.batch_interval_seconds, Unset):
self.batch_interval_seconds = 60 # default batch interval
if self.tools_over_privacy and importlib.util.find_spec("code-weaver") is None:
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package name check uses wrong format. The package is named code-weaver (with hyphen) but Python imports use underscores. This should check for codeweaver (no hyphen or underscore) or the actual distribution name if different. The hyphenated form code-weaver will never be found by find_spec.

Suggested change
if self.tools_over_privacy and importlib.util.find_spec("code-weaver") is None:
if self.tools_over_privacy and importlib.util.find_spec("codeweaver") is None:

Copilot uses AI. Check for mistakes.
_unset_fields: Annotated[
set[str], Field(description="Set of fields that were unset", exclude=True)
] = set()
_unset_fields: Annotated[set[str], PrivateAttr(default_factory=set)] = set()
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field has both a PrivateAttr(default_factory=set) annotation and a = set() default value. Only the PrivateAttr default_factory is needed for Pydantic private attributes. Remove the redundant = set().

Suggested change
_unset_fields: Annotated[set[str], PrivateAttr(default_factory=set)] = set()
_unset_fields: Annotated[set[str], PrivateAttr(default_factory=set)]

Copilot uses AI. Check for mistakes.
from codeweaver.config.telemetry import TelemetrySettings

settings = get_settings().telemetry
settings = get_settings().telemetry # ty:ignore[invalid-assignment]
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type ignore comment uses wrong prefix. Should be type:ignore not ty:ignore.

Suggested change
settings = get_settings().telemetry # ty:ignore[invalid-assignment]
settings = get_settings().telemetry # type: ignore[invalid-assignment]

Copilot uses AI. Check for mistakes.
Comment on lines +969 to 972
def _telemetry_keys(self) -> None:
"""Define telemetry anonymization for failover statistics."""
# Most failover stats are safe to send as-is (counts, states)
# No identifying information in failover statistics
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed return type from dict[FilteredKeyT, AnonymityConversion] to None, but the method is supposed to return telemetry key mappings per the pattern used in other classes. If there are no keys to return, should return empty dict {} not None.

Suggested change
def _telemetry_keys(self) -> None:
"""Define telemetry anonymization for failover statistics."""
# Most failover stats are safe to send as-is (counts, states)
# No identifying information in failover statistics
def _telemetry_keys(self) -> dict[FilteredKeyT, AnonymityConversion]:
"""Define telemetry anonymization for failover statistics."""
# Most failover stats are safe to send as-is (counts, states)
# No identifying information in failover statistics
return {}

Copilot uses AI. Check for mistakes.
Comment on lines +1434 to +1439
def _log_provider_instance_creation(self, provider, provider_kind, arg2, instance):
logger.debug(
"get_provider_instance(%r, %r, %s, %r (id=%d))",
provider,
provider_kind,
arg2,
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter named arg2 is ambiguous and unclear. Should be renamed to something descriptive like operation or action_description.

Suggested change
def _log_provider_instance_creation(self, provider, provider_kind, arg2, instance):
logger.debug(
"get_provider_instance(%r, %r, %s, %r (id=%d))",
provider,
provider_kind,
arg2,
def _log_provider_instance_creation(
self,
provider,
provider_kind,
action_description,
instance,
):
logger.debug(
"get_provider_instance(%r, %r, %s, %r (id=%d))",
provider,
provider_kind,
action_description,

Copilot uses AI. Check for mistakes.
"extra": {
"phase": "vector_search",
"search_strategy": query_vector.strategy.value,
"search_strategy": query_vector.strategy.variable,
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using .variable instead of .value on enum. Should use standard enum attribute.

Suggested change
"search_strategy": query_vector.strategy.variable,
"search_strategy": query_vector.strategy.value,

Copilot uses AI. Check for mistakes.
- Removed redundant network markers from integration tests.
- Updated test functions to include `di_overrides` for better dependency management.
- Simplified mock indexer setup in unit tests by leveraging existing fixtures.
- Enhanced performance test threshold for CI stability.
- Streamlined mock handling in `TestAddMissingEmbeddings`, `TestRemovePathWithDeletedFiles`, and `TestStalePointRemovalInIndexFile` classes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

2 participants