Refactored to reduce coupling and redundency to maintain DRY more robustly.#10
Refactored to reduce coupling and redundency to maintain DRY more robustly.#10
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
| created_at: datetime | ||
| updated_at: datetime | ||
| is_active: bool = True | ||
|
|
||
|
|
||
| class BotConfigurationCreate(BaseModel): | ||
| """Parameters for creating a bot configuration. | ||
|
|
||
| Replaces the long parameter list in create_bot_configuration() with a | ||
| single type-safe parameter object. |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Pull request overview
Refactors core scraping and message-handling logic into smaller, injectable services (scraper components + message processing), while adding security middleware (rate limiting, input validation/sanitization, prompt-injection detection), caching, and expanded test coverage.
Changes:
- Introduced
WebsiteScraperv2 with separated fetch/parse/chunk responsibilities and updated existing scraper timeouts/constants. - Added message-processing architecture (
MessageProcessor, messaging protocol abstraction) and layered webhook security checks. - Added shared constants/settings, repository caching + parameter-object models, and extensive new unit tests.
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/test_website_scraper.py | New unit tests for refactored WebsiteScraper components. |
| tests/unit/test_webhook.py | Updated webhook tests for security layers + delegation to MessageProcessor. |
| tests/unit/test_scraper.py | Updated scraper tests for settings-based timeouts. |
| tests/unit/test_repository.py | Updated repository tests for cache + updated return types. |
| tests/unit/test_rate_limiter.py | New tests for RateLimiter behavior and singleton handling. |
| tests/unit/test_query_executor.py | New tests for query timing/logging utilities. |
| tests/unit/test_prompt_guard.py | New tests for prompt-injection detection logic. |
| tests/unit/test_models.py | Adjusted model property tests after import cleanup. |
| tests/unit/test_message_processor.py | New unit tests for MessageProcessor orchestration. |
| tests/unit/test_logging.py | Updated logging tests to account for cache reset + import cleanup. |
| tests/unit/test_input_sanitizer.py | New tests for message sanitization and validation behavior. |
| tests/unit/test_hypothesis.py | Minor cleanup of redundant imports. |
| tests/unit/test_db_client.py | Minor cleanup of redundant imports. |
| tests/unit/test_agent_service.py | Minor cleanup of unused imports. |
| tests/stateful/test_bot_configuration.py | Minor cleanup of unused imports. |
| tests/stateful/test_agent_conversation.py | Minor cleanup of unused imports. |
| tests/integration/test_repository_db.py | Minor cleanup of unused imports. |
| tests/e2e/test_webhook_verification.py | E2E tests updated to rely on shared test_client fixture. |
| tests/e2e/test_webhook_message_flow.py | E2E tests updated to rely on shared test_client fixture. |
| tests/e2e/test_main.py | E2E test import cleanup; continues to validate app wiring. |
| tests/e2e/test_health_endpoint.py | Minor cleanup of unused imports. |
| tests/conftest.py | Centralized fixtures and expanded mocks (processor/rate limiter/prompt guard/settings). |
| src/services/website_scraper.py | New refactored scraping module with injected components + v2 wrapper. |
| src/services/scraper.py | Refactor to use settings-based timeouts + shared constants/chunking. |
| src/services/prompt_guard.py | New prompt-injection detection service with risk levels and singleton accessors. |
| src/services/messaging_protocol.py | New messaging protocol abstraction + Facebook adapter + mock implementation. |
| src/services/message_processor.py | New orchestration service extracting webhook business logic. |
| src/services/input_sanitizer.py | New sanitization/validation utilities and error mapping. |
| src/services/facebook_service.py | Moved FB API version/timeout to constants/settings. |
| src/models/scraper_models.py | Added ScrapedPageCreate parameter object model. |
| src/models/message_models.py | Added parameter object models for message history + test messages. |
| src/models/config_models.py | Added BotConfigurationCreate parameter object model. |
| src/middleware/rate_limiter.py | New in-memory, thread-safe rate limiter with singleton helpers. |
| src/main.py | Added graceful shutdown infrastructure + signal handlers. |
| src/db/repository.py | Added bot-config cache + parameter-object APIs + updated upsert return type. |
| src/db/query_executor.py | New reusable query timing/logging utilities. |
| src/db/init.py | Exported query timing + cache utilities from db package. |
| src/constants.py | New centralized constants for scraping, timeouts, rate limiting, etc. |
| src/config.py | Settings now default from constants (timeouts, embedding/search, rate limits). |
| src/api/webhook.py | Added layered security checks + DI-friendly process_message delegating to MessageProcessor. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # ====================================================================== | ||
| # Security Layer 3: Prompt Injection Detection | ||
| # ====================================================================== | ||
| _prompt_guard = prompt_guard or get_prompt_guard() | ||
| injection_result = _prompt_guard.check(message_text) | ||
|
|
||
| agent_service = MessengerAgentService() | ||
| response = await agent_service.respond(context, message_text) | ||
| if injection_result.is_suspicious and injection_result.risk_level == "high": |
There was a problem hiding this comment.
Prompt injection detection runs on the raw message_text (before sanitization). Since sanitize_user_input removes control characters and normalizes whitespace, doing the guard check before sanitization can allow trivial bypasses (e.g., inserting control chars into "ignore\x00 previous\x00 instructions"). Consider sanitizing first and running prompt_guard.check() on the sanitized text (and passing the sanitized text into the processor).
| ), | ||
| (r"disregard\s+all\s+(?:instructions|prompts|rules)", "disregard_instructions"), | ||
| (r"forget\s+(?:everything|all|previous|your\s+instructions)", "memory_wipe"), | ||
| (r"system\s*:\s*", "system_prefix"), |
There was a problem hiding this comment.
The high-risk pattern system\s*:\s* is extremely broad (it will match benign user text like "my system: Windows" or even substrings like "ecosystem:"). This likely causes false-positive blocking. Consider anchoring it (e.g., start-of-message) and/or adding word boundaries so it only matches an actual system-role prefix.
| (r"system\s*:\s*", "system_prefix"), | |
| (r"(?m)^\s*system\s*:\s*", "system_prefix"), |
| def sanitize_user_input(text: str) -> str: | ||
| """Sanitize user input to prevent injection and ensure valid text. | ||
|
|
||
| Performs the following sanitization: | ||
| - Removes control characters (except newlines, carriage returns, tabs) | ||
| - Normalizes whitespace (collapses multiple spaces/newlines) | ||
| - Strips leading/trailing whitespace | ||
| - Truncates to maximum allowed length | ||
|
|
||
| Args: | ||
| text: The raw user input text. | ||
|
|
||
| Returns: | ||
| Sanitized text safe for processing. | ||
| """ | ||
| if not text: | ||
| return "" |
There was a problem hiding this comment.
sanitize_user_input() is annotated as taking text: str, but it explicitly handles None/falsey values (if not text:) and tests call it with None. Consider changing the signature to text: str | None (and updating the docstring) so callers don't need # type: ignore and type checkers reflect actual behavior.
| def validate_message(text: str) -> ValidationResult: | ||
| """Validate user message text. | ||
|
|
||
| Checks: | ||
| - Not empty or whitespace-only | ||
| - Not too long (before sanitization) | ||
| - Contains at least some alphanumeric content | ||
|
|
||
| Args: | ||
| text: The raw user input text to validate. |
There was a problem hiding this comment.
validate_message() is annotated as text: str, but it has an explicit if text is None: branch. Also, the docstring says it checks for alphanumeric content, but the implementation only logs and still returns valid. Consider updating the type to str | None and adjusting the docstring to match the non-rejecting behavior.
| def validate_message(text: str) -> ValidationResult: | |
| """Validate user message text. | |
| Checks: | |
| - Not empty or whitespace-only | |
| - Not too long (before sanitization) | |
| - Contains at least some alphanumeric content | |
| Args: | |
| text: The raw user input text to validate. | |
| def validate_message(text: str | None) -> ValidationResult: | |
| """Validate user message text. | |
| Checks: | |
| - Not empty or whitespace-only | |
| - Not too long (before sanitization) | |
| - Logs when the message contains no alphanumeric characters (but does not reject) | |
| Args: | |
| text: The raw user input text to validate. May be None. |
| if user_name and response.confidence > 0.8: | ||
| if not text.startswith(user_name) and random.random() < 0.2: | ||
| text = f"Hi {user_name}! {text}" | ||
|
|
There was a problem hiding this comment.
The personalization guard only checks not text.startswith(user_name). If the agent already starts with a greeting like "Hi Jane! ...", this condition is still true and can produce duplicate greetings ("Hi Jane! Hi Jane! ..."). Consider checking for common greeting prefixes (e.g., Hi {user_name}, Hello {user_name}, or {user_name}) before prepending.
| except ValueError: | ||
| pass |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except ValueError: | |
| pass | |
| except ValueError as exc: | |
| # Ignore invalid CHROME_VERSION_MAIN value but log for observability. | |
| logfire.warning( | |
| "Invalid CHROME_VERSION_MAIN value; falling back to default ChromeDriver version", | |
| raw_value=version_main, | |
| error=str(exc), | |
| ) |
| raise ValueError("test error") | ||
|
|
||
| # Should have logged start | ||
| assert mock_logfire.info.call_count == 1 |
There was a problem hiding this comment.
This statement is unreachable.
There was a problem hiding this comment.
Actionable comments posted: 22
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/unit/test_models.py (1)
257-257: Consider using timezone-aware datetime for consistency.Similar to the production code,
datetime.utcnow()is deprecated. For consistency across the codebase, consider usingdatetime.now(timezone.utc).♻️ Proposed fix
+from datetime import datetime, timezone -from datetime import datetimeThen replace usages:
- now = datetime.utcnow() + now = datetime.now(timezone.utc)This affects lines 257 and 283.
tests/unit/test_scraper.py (1)
12-28: Silence Ruff ARG001 formock_settings.
mock_settingsis only used for fixture dependency injection (to ensure settings are mocked before the patch applies); Ruff flags it as unused. Add explicit reference to avoid lint noise.🛠️ Proposed fix
def mock_browser_fetch(mock_settings): """Mock _fetch_with_browser_sync so 'first page little text' refetch does not launch Chrome. @@ since scraper functions now use configurable timeouts from settings. """ + _ = mock_settings # keep fixture dependency explicit (silences ARG001) def _fake_browser_fetch(url: str, timeout_seconds: float | None = None) -> str:
🤖 Fix all issues with AI agents
In `@src/config.py`:
- Around line 101-135: Add validation constraints to ensure timeouts and
rate-limit values are positive by adding gt=0 to the Field declarations for
scraper_timeout_seconds, facebook_api_timeout_seconds,
browser_page_load_timeout_seconds, browser_js_refetch_timeout_seconds (float
fields) and for rate_limit_max_messages and rate_limit_window_seconds (int
fields); update the Field(...) calls to include gt=0 so environment overrides
that are zero or negative will be rejected by the model validation.
In `@src/db/__init__.py`:
- Around line 10-16: The __all__ export list is not alphabetically ordered;
update the __all__ variable to be sorted alphabetically (case-insensitive) so
exports like "BotConfigCache", "QueryTimer", "timed_query",
"get_bot_config_cache", and "reset_bot_config_cache" appear in a consistent
alphabetical order; modify the __all__ assignment (the list containing
QueryTimer, timed_query, BotConfigCache, get_bot_config_cache,
reset_bot_config_cache) to the new sorted order.
In `@src/db/query_executor.py`:
- Line 9: Update the import statement to use Generator from collections.abc for
Python 3.9+ compatibility: replace the current "from typing import Any,
Generator" with an import that keeps Any from typing but imports Generator from
collections.abc (so code references like Generator in this module continue to
work under Python 3.9+ and satisfy Ruff UP035).
In `@src/db/repository.py`:
- Around line 19-119: Add opportunistic eviction to BotConfigCache to avoid
unbounded growth by implementing a private _purge_expired() helper that, under
self._lock, scans self._cache and removes entries whose timestamp is older than
self._ttl (use datetime.utcnow() - timestamp). Call this helper at the start of
set(self, page_id, config) and size(self) (and optionally in invalidate/clear if
desired) so entries are purged on writes and size queries; keep existing get()
behavior intact and reuse the same TTL and lock semantics.
In `@src/main.py`:
- Around line 22-24: Reset the global shutdown state at startup by clearing the
shutdown_event and any pending shutdown task collection so in-process restarts
don't inherit stale flags: call shutdown_event.clear() and
pending_shutdown_tasks.clear() (or assign an empty set/list as appropriate)
before any logic that checks is_shutting_down() or registers shutdown handlers;
update the initialization in the module where shutdown_event,
is_shutting_down(), and pending_shutdown_tasks are defined so those symbols are
explicitly reset on import/startup (and apply the same reset in any places that
reinitialize shutdown state during tests/dev reloads).
- Around line 81-107: The signal handler setup currently only catches
NotImplementedError and defines signal_handler without a return annotation;
update signal_handler to have a return type annotation (def
signal_handler(sig_name: str) -> None) to satisfy ANN202, and broaden the
exception handling around loop.add_signal_handler to also catch RuntimeError
(which occurs when called off the main thread); in that except block log the
RuntimeError with processLogger/logfire including the exception details and
context (e.g., which signal registration failed) and ensure graceful fallback
(leave shutdown_event handling as-is) so startup won’t crash in test runners or
background threads.
In `@src/middleware/rate_limiter.py`:
- Around line 50-54: The current sliding-window cleanup filters old timestamps
but leaves empty user keys in self._requests, causing unbounded memory growth;
fix by deleting empty entries after the filter in the rate-checking method
(inside the with self._lock block) — i.e., after updating
self._requests[user_id] remove the key if the list is empty — and (optional) add
a periodic background cleanup in __init__ (e.g., a _cleanup_stale_users method
run by a daemon thread) that acquires self._lock and purges keys whose last
timestamp is older than the window to reclaim memory for long-running
deployments.
- Around line 47-48: Replace naive UTC timestamps created with datetime.utcnow()
with timezone-aware datetimes by using datetime.now(timezone.utc) (and add the
timezone import if missing). Update the assignments where now is computed (e.g.,
in the RateLimiter method(s) that set now and compute cutoff = now -
self._window) so now is timezone-aware and all downstream comparisons/keys use
that same tz-aware value; apply the same change to the other occurrences of
datetime.utcnow() in this module and ensure imports include from datetime import
datetime, timezone or use datetime.timezone.
In `@src/services/input_sanitizer.py`:
- Around line 157-161: The user-facing error in the error_messages dict
currently hardcodes "1000"; change it to reference the MAX_MESSAGE_LENGTH_CHARS
constant so the message stays in sync. Update the "message_too_long" value in
error_messages to build the string using MAX_MESSAGE_LENGTH_CHARS (e.g.,
formatted or f-string) and ensure MAX_MESSAGE_LENGTH_CHARS is in scope where
error_messages is defined (in src/services/input_sanitizer.py) so no NameError
occurs; keep the other keys unchanged.
- Around line 117-119: Move the inline "import re" (currently inside the body
near the comment "Check for at least some meaningful content") to the
module-level imports at the top of src/services/input_sanitizer.py, remove the
local import so only the top-level import remains, and run lint/format to ensure
no unused-import warnings; update any nearby function (the block where the
inline import appears) to reference "re" from the module scope.
- Around line 50-52: Move the inline imports for unicodedata and re to the
module level: add "import re" and "import unicodedata" with the other
top-of-file imports, then remove the local imports inside the function where
"sanitized = unicodedata.normalize('NFC', sanitized)" (and the later re usage)
appears; ensure the function that uses the sanitized variable (the sanitizer
function) still references the now-module-level names and that there are no
local name collisions.
In `@src/services/message_processor.py`:
- Around line 276-301: Replace the hardcoded personalization magic numbers in
_personalize_response: instead of using 0.8 and 0.2 inline, import and use the
centralized constants (e.g., PERSONALIZATION_CONFIDENCE_THRESHOLD and
PERSONALIZATION_PROBABILITY) from the constants module, then update the
condition to use response.confidence > PERSONALIZATION_CONFIDENCE_THRESHOLD and
random.random() < PERSONALIZATION_PROBABILITY; ensure the function name
_personalize_response and the user_name/text checks remain unchanged and add the
new import at top of the file.
- Around line 177-194: The code currently always calls save_message_history
after calling messaging_service.send_message, which can persist messages even
when send_message fails (it returns a boolean). Update the flow in the method
containing personalized/send_message to capture the return value (e.g., sent =
await messaging_service.send_message(...)) and only construct
MessageHistoryCreate and call save_message_history when sent is truthy; on
failure, log the failure or raise an error instead of persisting. Ensure you
reference the existing symbols personalized, messaging_service.send_message,
MessageHistoryCreate, and save_message_history when making the change.
In `@src/services/messaging_protocol.py`:
- Around line 139-143: The __init__ constructor currently accepts
should_fail_send as a positional parameter which can lead to ambiguous calls;
make it keyword-only by adding a bare * before should_fail_send in the __init__
signature (e.g., change def __init__(self, user_info: FacebookUserInfo | None =
None, *, should_fail_send: bool = False):) and update any calls to the class
that pass should_fail_send positionally to use the keyword form; adjust tests or
usages referencing the constructor accordingly.
In `@src/services/prompt_guard.py`:
- Around line 39-96: HIGH_RISK_PATTERNS and MEDIUM_RISK_PATTERNS are mutable
class-level attributes and should be annotated with typing.ClassVar to satisfy
RUF012; import ClassVar from typing and change their annotations to
ClassVar[list[tuple[str, str]]] (e.g., HIGH_RISK_PATTERNS:
ClassVar[list[tuple[str, str]]]) so type checkers know these are class variables
rather than instance attributes while leaving the values unchanged.
In `@tests/unit/test_input_sanitizer.py`:
- Around line 187-194: The test hardcodes "1000" but the displayed limit is
defined by MAX_MESSAGE_LENGTH_CHARS; update
test_message_too_long_has_friendly_message to import/use
MAX_MESSAGE_LENGTH_CHARS (from the module that defines it) and assert that
str(MAX_MESSAGE_LENGTH_CHARS) is in the returned message from
get_user_friendly_error, keeping the existing checks for non-empty and "too
long" text.
In `@tests/unit/test_message_processor.py`:
- Around line 213-265: Make the test deterministic by patching the RNG used by
the MessageProcessor personalization path: in
test_personalize_response_high_confidence, patch random.random in the module
under test (src.services.message_processor.random.random) to return a value <
0.2 (e.g., 0.1) so the personalized branch is always taken, then call
MessageProcessor.process (or directly call processor.process) once and assert
the sent message starts with "Hi Jane!"; reference the test function
test_personalize_response_high_confidence and the MessageProcessor.process path
so the patch targets the correct module.
In `@tests/unit/test_prompt_guard.py`:
- Around line 26-27: Replace the first argument of the pytest parametrize
decorators from a comma-separated string to a tuple of strings; for each
`@pytest.mark.parametrize` instance (e.g., the decorator above the test function
that uses parameters "text,expected_pattern" and the other parametrize at the
later test), change "text,expected_pattern" to ("text", "expected_pattern") so
the decorator uses a tuple of names rather than a single string as required by
Ruff PT006 and pytest conventions.
- Around line 147-154: The test test_none_safe_handling calls
detector.check(None) but the check() signature is typed as text: str; update the
API to reflect actual behavior by changing the check() type hint to accept str |
None (i.e., text: str | None) and update its docstring to state that None and
empty strings are handled and return a safe/low-risk result, then remove the
now-unnecessary "# type: ignore" from test_none_safe_handling; alternatively, if
you prefer not to change the signature, document this behavior in check()'s
docstring and keep the signature but still remove the type ignore only after
ensuring static types are satisfied.
In `@tests/unit/test_repository.py`:
- Around line 625-627: Replace naive UTC timestamps created with
datetime.utcnow() in test fixtures with timezone-aware timestamps using
datetime.now(timezone.utc); update occurrences in tests/unit/test_repository.py
where created_at and updated_at are assigned and any other fixtures (including
BotConfigCache) to use timezone-aware datetimes, and add the necessary import
for timezone from the datetime module so all timestamps are consistent and
Python 3.12+ compatible.
In `@tests/unit/test_webhook.py`:
- Around line 337-339: The test function test_process_location_lng_alias
currently declares a mock parameter named mock_send that is unused; rename that
parameter to _mock_send in the test signature to suppress unused-argument
warnings and adjust any references (there should be none) accordingly so the
test still runs unchanged.
In `@tests/unit/test_website_scraper.py`:
- Line 36: The test unpacks parser.parse into text, links, title but never uses
links; update the unpacking in the test_website_scraper unit test so the unused
variable is prefixed with an underscore (e.g., use _links or _links_var) to
signal it is intentionally ignored when calling parser.parse in that test.
| # ========================================================================== | ||
| # Timeout Configuration | ||
| # ========================================================================== | ||
| # All timeouts can be overridden via environment variables. | ||
| # Defaults are sourced from src/constants.py. | ||
|
|
||
| scraper_timeout_seconds: float = Field( | ||
| default=DEFAULT_HTTP_TIMEOUT_SECONDS, | ||
| description="HTTP timeout for scraper requests (seconds)", | ||
| ) | ||
| facebook_api_timeout_seconds: float = Field( | ||
| default=FACEBOOK_API_TIMEOUT_SECONDS, | ||
| description="Timeout for Facebook Graph API calls (seconds)", | ||
| ) | ||
| browser_page_load_timeout_seconds: float = Field( | ||
| default=BROWSER_PAGE_LOAD_TIMEOUT_SECONDS, | ||
| description="Timeout for browser page loads (seconds)", | ||
| ) | ||
| browser_js_refetch_timeout_seconds: float = Field( | ||
| default=BROWSER_JS_REFETCH_TIMEOUT_SECONDS, | ||
| description="Extended timeout for JS-rendered page refetch (seconds)", | ||
| ) | ||
|
|
||
| # ========================================================================== | ||
| # Rate Limiting Configuration (from GUARDRAILS.md) | ||
| # ========================================================================== | ||
|
|
||
| rate_limit_max_messages: int = Field( | ||
| default=MAX_MESSAGES_PER_USER_PER_MINUTE, | ||
| description="Max messages per user per rate limit window", | ||
| ) | ||
| rate_limit_window_seconds: int = Field( | ||
| default=RATE_LIMIT_WINDOW_SECONDS, | ||
| description="Rate limit sliding window duration in seconds", | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Validate timeout and rate‑limit values are positive.
Environment overrides could set zero/negative values and break timeouts or the limiter. Consider adding gt=0 constraints.
♻️ Add positive-value constraints
scraper_timeout_seconds: float = Field(
default=DEFAULT_HTTP_TIMEOUT_SECONDS,
+ gt=0,
description="HTTP timeout for scraper requests (seconds)",
)
facebook_api_timeout_seconds: float = Field(
default=FACEBOOK_API_TIMEOUT_SECONDS,
+ gt=0,
description="Timeout for Facebook Graph API calls (seconds)",
)
browser_page_load_timeout_seconds: float = Field(
default=BROWSER_PAGE_LOAD_TIMEOUT_SECONDS,
+ gt=0,
description="Timeout for browser page loads (seconds)",
)
browser_js_refetch_timeout_seconds: float = Field(
default=BROWSER_JS_REFETCH_TIMEOUT_SECONDS,
+ gt=0,
description="Extended timeout for JS-rendered page refetch (seconds)",
)
@@
rate_limit_max_messages: int = Field(
default=MAX_MESSAGES_PER_USER_PER_MINUTE,
+ gt=0,
description="Max messages per user per rate limit window",
)
rate_limit_window_seconds: int = Field(
default=RATE_LIMIT_WINDOW_SECONDS,
+ gt=0,
description="Rate limit sliding window duration in seconds",
)🤖 Prompt for AI Agents
In `@src/config.py` around lines 101 - 135, Add validation constraints to ensure
timeouts and rate-limit values are positive by adding gt=0 to the Field
declarations for scraper_timeout_seconds, facebook_api_timeout_seconds,
browser_page_load_timeout_seconds, browser_js_refetch_timeout_seconds (float
fields) and for rate_limit_max_messages and rate_limit_window_seconds (int
fields); update the Field(...) calls to include gt=0 so environment overrides
that are zero or negative will be rejected by the model validation.
| __all__ = [ | ||
| "QueryTimer", | ||
| "timed_query", | ||
| "BotConfigCache", | ||
| "get_bot_config_cache", | ||
| "reset_bot_config_cache", | ||
| ] |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider sorting __all__ alphabetically.
The static analysis tool flags that __all__ is not sorted. While functional, sorting exports alphabetically improves readability and maintainability.
♻️ Proposed fix
__all__ = [
+ "BotConfigCache",
"QueryTimer",
- "timed_query",
- "BotConfigCache",
"get_bot_config_cache",
"reset_bot_config_cache",
+ "timed_query",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| __all__ = [ | |
| "QueryTimer", | |
| "timed_query", | |
| "BotConfigCache", | |
| "get_bot_config_cache", | |
| "reset_bot_config_cache", | |
| ] | |
| __all__ = [ | |
| "BotConfigCache", | |
| "QueryTimer", | |
| "get_bot_config_cache", | |
| "reset_bot_config_cache", | |
| "timed_query", | |
| ] |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 10-16: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
🤖 Prompt for AI Agents
In `@src/db/__init__.py` around lines 10 - 16, The __all__ export list is not
alphabetically ordered; update the __all__ variable to be sorted alphabetically
(case-insensitive) so exports like "BotConfigCache", "QueryTimer",
"timed_query", "get_bot_config_cache", and "reset_bot_config_cache" appear in a
consistent alphabetical order; modify the __all__ assignment (the list
containing QueryTimer, timed_query, BotConfigCache, get_bot_config_cache,
reset_bot_config_cache) to the new sorted order.
|
|
||
| import time | ||
| from contextlib import contextmanager | ||
| from typing import Any, Generator |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Modernize import for Python 3.9+ compatibility.
Per Ruff UP035, prefer importing Generator from collections.abc rather than typing for Python 3.9+.
♻️ Optional fix
-from typing import Any, Generator
+from collections.abc import Generator
+from typing import Any📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from typing import Any, Generator | |
| from collections.abc import Generator | |
| from typing import Any |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 9-9: Import from collections.abc instead: Generator
Import from collections.abc
(UP035)
🤖 Prompt for AI Agents
In `@src/db/query_executor.py` at line 9, Update the import statement to use
Generator from collections.abc for Python 3.9+ compatibility: replace the
current "from typing import Any, Generator" with an import that keeps Any from
typing but imports Generator from collections.abc (so code references like
Generator in this module continue to work under Python 3.9+ and satisfy Ruff
UP035).
| # ============================================================================= | ||
| # Bot Configuration Cache | ||
| # ============================================================================= | ||
|
|
||
|
|
||
| class BotConfigCache: | ||
| """ | ||
| Thread-safe in-memory cache for bot configurations. | ||
|
|
||
| Caches bot configurations by page_id with configurable TTL to reduce | ||
| database queries on every incoming message. The cache is thread-safe | ||
| and automatically expires entries after the TTL. | ||
| """ | ||
|
|
||
| def __init__(self, ttl_seconds: int = BOT_CONFIG_CACHE_TTL_SECONDS): | ||
| """ | ||
| Initialize the cache. | ||
|
|
||
| Args: | ||
| ttl_seconds: Time-to-live for cache entries in seconds (default: 300) | ||
| """ | ||
| self._cache: dict[str, tuple[BotConfiguration, datetime]] = {} | ||
| self._ttl = timedelta(seconds=ttl_seconds) | ||
| self._lock = Lock() | ||
|
|
||
| def get(self, page_id: str) -> BotConfiguration | None: | ||
| """ | ||
| Get cached configuration if not expired. | ||
|
|
||
| Args: | ||
| page_id: Facebook Page ID | ||
|
|
||
| Returns: | ||
| Cached BotConfiguration if valid, None if not found or expired | ||
| """ | ||
| with self._lock: | ||
| if page_id in self._cache: | ||
| config, timestamp = self._cache[page_id] | ||
| if datetime.utcnow() - timestamp < self._ttl: | ||
| logfire.debug( | ||
| "Bot config cache hit", | ||
| page_id=page_id, | ||
| cache_age_seconds=( | ||
| datetime.utcnow() - timestamp | ||
| ).total_seconds(), | ||
| ) | ||
| return config | ||
| # Expired - remove from cache | ||
| del self._cache[page_id] | ||
| logfire.debug( | ||
| "Bot config cache expired", | ||
| page_id=page_id, | ||
| ) | ||
| return None | ||
|
|
||
| def set(self, page_id: str, config: BotConfiguration) -> None: | ||
| """ | ||
| Cache a configuration. | ||
|
|
||
| Args: | ||
| page_id: Facebook Page ID | ||
| config: Bot configuration to cache | ||
| """ | ||
| with self._lock: | ||
| self._cache[page_id] = (config, datetime.utcnow()) | ||
| logfire.debug( | ||
| "Bot config cached", | ||
| page_id=page_id, | ||
| bot_id=config.id, | ||
| ) | ||
|
|
||
| def invalidate(self, page_id: str) -> None: | ||
| """ | ||
| Remove configuration from cache. | ||
|
|
||
| Args: | ||
| page_id: Facebook Page ID to invalidate | ||
| """ | ||
| with self._lock: | ||
| if page_id in self._cache: | ||
| del self._cache[page_id] | ||
| logfire.debug( | ||
| "Bot config cache invalidated", | ||
| page_id=page_id, | ||
| ) | ||
|
|
||
| def clear(self) -> None: | ||
| """Clear all cached configurations.""" | ||
| with self._lock: | ||
| count = len(self._cache) | ||
| self._cache.clear() | ||
| logfire.debug( | ||
| "Bot config cache cleared", | ||
| entries_cleared=count, | ||
| ) | ||
|
|
||
| @property | ||
| def size(self) -> int: | ||
| """Get number of cached entries.""" | ||
| with self._lock: | ||
| return len(self._cache) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider opportunistic eviction of expired cache entries to avoid unbounded growth.
Expired entries are only removed on lookup; a small purge on set/size keeps memory bounded if many page_ids are cached but never re-requested.
♻️ Suggested change
class BotConfigCache:
@@
def __init__(self, ttl_seconds: int = BOT_CONFIG_CACHE_TTL_SECONDS):
@@
self._cache: dict[str, tuple[BotConfiguration, datetime]] = {}
self._ttl = timedelta(seconds=ttl_seconds)
self._lock = Lock()
+
+ def _purge_expired(self) -> None:
+ """Remove expired entries to prevent unbounded growth."""
+ now = datetime.utcnow()
+ expired_keys = [
+ key for key, (_, ts) in self._cache.items() if now - ts >= self._ttl
+ ]
+ for key in expired_keys:
+ del self._cache[key]
@@
def set(self, page_id: str, config: BotConfiguration) -> None:
@@
with self._lock:
+ self._purge_expired()
self._cache[page_id] = (config, datetime.utcnow())
@@
def size(self) -> int:
"""Get number of cached entries."""
with self._lock:
+ self._purge_expired()
return len(self._cache)🧰 Tools
🪛 Ruff (0.14.14)
[warning] 33-33: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
[warning] 57-57: datetime.datetime.utcnow() used
(DTZ003)
[warning] 62-62: datetime.datetime.utcnow() used
(DTZ003)
[warning] 83-83: datetime.datetime.utcnow() used
(DTZ003)
🤖 Prompt for AI Agents
In `@src/db/repository.py` around lines 19 - 119, Add opportunistic eviction to
BotConfigCache to avoid unbounded growth by implementing a private
_purge_expired() helper that, under self._lock, scans self._cache and removes
entries whose timestamp is older than self._ttl (use datetime.utcnow() -
timestamp). Call this helper at the start of set(self, page_id, config) and
size(self) (and optionally in invalidate/clear if desired) so entries are purged
on writes and size queries; keep existing get() behavior intact and reuse the
same TTL and lock semantics.
| # ============================================================================= | ||
| # Graceful Shutdown Infrastructure | ||
| # ============================================================================= |
There was a problem hiding this comment.
Reset shutdown state on startup to avoid stale shutdown flags.
Because shutdown_event is global and is set during shutdown, in-process restarts (tests, dev reload) can leave it set, and is_shutting_down() will incorrectly return True. Clearing the event and pending set at startup avoids that edge-case.
🛠️ Suggested fix
async def lifespan(app: FastAPI):
"""Application lifespan manager with graceful shutdown support."""
# Startup
settings = get_settings()
+ # Reset shutdown state for in-process restarts/tests
+ shutdown_event.clear()
+ _pending_tasks.clear()Also applies to: 26-33, 51-53, 56-61
🤖 Prompt for AI Agents
In `@src/main.py` around lines 22 - 24, Reset the global shutdown state at startup
by clearing the shutdown_event and any pending shutdown task collection so
in-process restarts don't inherit stale flags: call shutdown_event.clear() and
pending_shutdown_tasks.clear() (or assign an empty set/list as appropriate)
before any logic that checks is_shutting_down() or registers shutdown handlers;
update the initialization in the module where shutdown_event,
is_shutting_down(), and pending_shutdown_tasks are defined so those symbols are
explicitly reset on import/startup (and apply the same reset in any places that
reinitialize shutdown state during tests/dev reloads).
| @pytest.mark.parametrize( | ||
| "text,expected_pattern", |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Use tuple format for parametrize argument names.
Per Ruff PT006, prefer using a tuple for the first argument of pytest.mark.parametrize for consistency with pytest best practices.
♻️ Proposed fix (applies to lines 27 and 78)
`@pytest.mark.parametrize`(
- "text,expected_pattern",
+ ("text", "expected_pattern"),
[🧰 Tools
🪛 Ruff (0.14.14)
[warning] 27-27: Wrong type passed to first argument of pytest.mark.parametrize; expected tuple
Use a tuple for the first argument
(PT006)
🤖 Prompt for AI Agents
In `@tests/unit/test_prompt_guard.py` around lines 26 - 27, Replace the first
argument of the pytest parametrize decorators from a comma-separated string to a
tuple of strings; for each `@pytest.mark.parametrize` instance (e.g., the
decorator above the test function that uses parameters "text,expected_pattern"
and the other parametrize at the later test), change "text,expected_pattern" to
("text", "expected_pattern") so the decorator uses a tuple of names rather than
a single string as required by Ruff PT006 and pytest conventions.
| def test_none_safe_handling(self): | ||
| """None input should be handled gracefully.""" | ||
| # Note: Type hint says str, but we should handle None gracefully | ||
| result = self.detector.check(None) # type: ignore | ||
|
|
||
| assert result.is_suspicious is False | ||
| assert result.risk_level == "low" | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "def check\(self, text" src/services/prompt_guard.py -A 10Repository: Abstract-Data/go-crea-fb-msg-fast-api
Length of output: 442
🏁 Script executed:
rg -n "def check\(self, text" src/services/prompt_guard.py -A 20Repository: Abstract-Data/go-crea-fb-msg-fast-api
Length of output: 872
Update the type hint to str | None or document None handling explicitly.
The check() method type hint declares text: str, but the implementation implicitly handles None via the if not text: guard on line 118. Update either the type hint to str | None to reflect actual behavior, or add a note to the docstring documenting that None and empty strings return a safe result. Currently, the test relies on # type: ignore to suppress a type violation.
🤖 Prompt for AI Agents
In `@tests/unit/test_prompt_guard.py` around lines 147 - 154, The test
test_none_safe_handling calls detector.check(None) but the check() signature is
typed as text: str; update the API to reflect actual behavior by changing the
check() type hint to accept str | None (i.e., text: str | None) and update its
docstring to state that None and empty strings are handled and return a
safe/low-risk result, then remove the now-unnecessary "# type: ignore" from
test_none_safe_handling; alternatively, if you prefer not to change the
signature, document this behavior in check()'s docstring and keep the signature
but still remove the type ignore only after ensuring static types are satisfied.
| created_at=datetime.utcnow(), | ||
| updated_at=datetime.utcnow(), | ||
| is_active=True, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider using timezone-aware datetime.
datetime.utcnow() is deprecated in Python 3.12+. Consider using datetime.now(timezone.utc) for consistency with modern Python practices. Note: This should be addressed consistently across the codebase, including in BotConfigCache which also uses utcnow().
♻️ Proposed fix for test fixtures
+from datetime import datetime, timezone
+
config = BotConfiguration(
id="bot-123",
page_id="page-123",
website_url="https://example.com",
reference_doc_id="doc-123",
tone="friendly",
facebook_page_access_token="token",
facebook_verify_token="verify",
- created_at=datetime.utcnow(),
- updated_at=datetime.utcnow(),
+ created_at=datetime.now(timezone.utc),
+ updated_at=datetime.now(timezone.utc),
is_active=True,
)🧰 Tools
🪛 Ruff (0.14.14)
[warning] 625-625: datetime.datetime.utcnow() used
(DTZ003)
[warning] 626-626: datetime.datetime.utcnow() used
(DTZ003)
🤖 Prompt for AI Agents
In `@tests/unit/test_repository.py` around lines 625 - 627, Replace naive UTC
timestamps created with datetime.utcnow() in test fixtures with timezone-aware
timestamps using datetime.now(timezone.utc); update occurrences in
tests/unit/test_repository.py where created_at and updated_at are assigned and
any other fixtures (including BotConfigCache) to use timezone-aware datetimes,
and add the necessary import for timezone from the datetime module so all
timestamps are consistent and Python 3.12+ compatible.
| async def test_process_location_lng_alias( | ||
| self, mock_update, mock_get_bot, mock_send | ||
| ): |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n tests/unit/test_webhook.py | sed -n '330,360p'Repository: Abstract-Data/go-crea-fb-msg-fast-api
Length of output: 1396
Unused mock parameter in the test signature.
Rename to _mock_send to suppress the unused-argument warning.
Suggested fix
- async def test_process_location_lng_alias(
- self, mock_update, mock_get_bot, mock_send
- ):
+ async def test_process_location_lng_alias(
+ self, mock_update, mock_get_bot, _mock_send
+ ):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_process_location_lng_alias( | |
| self, mock_update, mock_get_bot, mock_send | |
| ): | |
| async def test_process_location_lng_alias( | |
| self, mock_update, mock_get_bot, _mock_send | |
| ): |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 338-338: Unused method argument: mock_send
(ARG002)
🤖 Prompt for AI Agents
In `@tests/unit/test_webhook.py` around lines 337 - 339, The test function
test_process_location_lng_alias currently declares a mock parameter named
mock_send that is unused; rename that parameter to _mock_send in the test
signature to suppress unused-argument warnings and adjust any references (there
should be none) accordingly so the test still runs unchanged.
| </html> | ||
| """ | ||
| parser = PageParser() | ||
| text, links, title = parser.parse(html, "https://example.com") |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Prefix unused variable with underscore.
The links variable is unpacked but never used in this test. Prefix it with an underscore to indicate it's intentionally ignored.
♻️ Proposed fix
- text, links, title = parser.parse(html, "https://example.com")
+ text, _links, title = parser.parse(html, "https://example.com")🧰 Tools
🪛 Ruff (0.14.14)
[warning] 36-36: Unpacked variable links is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
In `@tests/unit/test_website_scraper.py` at line 36, The test unpacks parser.parse
into text, links, title but never uses links; update the unpacking in the
test_website_scraper unit test so the unused variable is prefixed with an
underscore (e.g., use _links or _links_var) to signal it is intentionally
ignored when calling parser.parse in that test.
2db2d4d to
9a9a50f
Compare
| error_messages = { | ||
| "empty_message": None, # Don't respond to empty messages | ||
| "null_message": None, | ||
| "message_too_long": "Your message is too long. Please send a shorter message (max 1000 characters).", |
There was a problem hiding this comment.
Bug: The error message in get_user_friendly_error hardcodes the 1000 character limit instead of using the MAX_MESSAGE_LENGTH_CHARS constant, creating a potential inconsistency.
Severity: MEDIUM
Suggested Fix
Use an f-string to dynamically insert the value from the MAX_MESSAGE_LENGTH_CHARS constant into the user-facing error message, ensuring it stays in sync with the validation logic. For example: f"Your message is too long. Please send a shorter message (max {MAX_MESSAGE_LENGTH_CHARS} characters).".
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/services/input_sanitizer.py#L160
Potential issue: The function `get_user_friendly_error` constructs an error message for
oversized messages with a hardcoded character limit of "1000". This value is also
defined as the constant `MAX_MESSAGE_LENGTH_CHARS` in `src/constants.py`, which is used
for the actual validation logic. If the `MAX_MESSAGE_LENGTH_CHARS` constant is ever
changed, the validation will correctly use the new value, but the error message will
still incorrectly state the limit is 1000. This will lead to user confusion, as they
will be told an incorrect reason for their message being rejected, violating the DRY
principle.

Summary by CodeRabbit
Release Notes
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.