Skip to content

perf: use GROUP_CONCAT for efficient tag aggregation in db_get_all_im…#1115

Open
moksha-hub wants to merge 2 commits intoAOSSIE-Org:mainfrom
moksha-hub:fix/1109-use-group-concat-for-tags
Open

perf: use GROUP_CONCAT for efficient tag aggregation in db_get_all_im…#1115
moksha-hub wants to merge 2 commits intoAOSSIE-Org:mainfrom
moksha-hub:fix/1109-use-group-concat-for-tags

Conversation

@moksha-hub
Copy link

@moksha-hub moksha-hub commented Jan 31, 2026

…ages

Fixes #1109

Refactored db_get_all_images() to use SQL GROUP_CONCAT to aggregate tags at the database layer instead of iterating in Python.

Changes:

Performance: SQL-level aggregation via C-based SQLite is significantly faster than Python iteration when processing 100k+ image rows.

Summary by CodeRabbit

  • Refactor
    • Improved image and tag retrieval for faster, more efficient loading.
    • Tags now appear in a consistent, deterministic order.
    • Metadata is parsed immediately for each image, ensuring up-to-date display.
    • Image results are returned as a flattened list with stable fields for simpler consumption.

✏️ Tip: You can customize this high-level summary in your review settings.

…ages

Fixes AOSSIE-Org#1109

Refactored db_get_all_images() to use SQL GROUP_CONCAT to aggregate
tags at the database layer instead of iterating in Python.

Changes:
- SQL query now uses GROUP_CONCAT(DISTINCT m.name) to aggregate tags
- Added GROUP BY i.id to return one row per image
- Simplified Python code to directly build image list (no more images_dict)
- Also moved import to global scope (addresses AOSSIE-Org#1108)

Performance: SQL-level aggregation via C-based SQLite is significantly
faster than Python iteration when processing 100k+ image rows.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 31, 2026

📝 Walkthrough

Walkthrough

Replaced Python-side per-row tag grouping with SQL aggregation using GROUP_CONCAT(DISTINCT m.name, '\x1f'), returning one row per image. Moved image_util_parse_metadata to module-level import and parse metadata/tags per row; db_get_untagged_images updated similarly.

Changes

Cohort / File(s) Summary
Image Query & Parsing
backend/app/database/images.py
Query now uses GROUP_CONCAT(DISTINCT m.name, '\x1f') and GROUP BY i.id, producing a single row per image. Removed in-Python tag grouping; tags are split on \x1f, sorted, and set to None when empty. image_util_parse_metadata imported at module level and used to parse metadata for each row. Result construction emits flat image dicts with fields id, path, folder_id, thumbnailPath, metadata, isTagged, isFavourite, tags.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I munched on rows and found the pace slow,

So I taught SQL to bundle tags in a row,
No more Python loops that hop and stagger,
Now queries sprint—my whiskers wag with swagger,
Hooray for tidy results and a lighter load!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: implementing GROUP_CONCAT for efficient tag aggregation in the db_get_all_images function.
Linked Issues check ✅ Passed The changes directly address issue #1109: GROUP_CONCAT aggregates tags in SQL, returns one row per image, uses DISTINCT for uniqueness, and improves performance for large datasets.
Out of Scope Changes check ✅ Passed All changes are focused on the tag aggregation refactoring and the import optimization mentioned in the PR objectives; no unrelated modifications are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/database/images.py (1)

141-183: ⚠️ Potential issue | 🟠 Major

Address tag splitting and ordering issues in GROUP_CONCAT aggregation.

Tag names are not constrained to exclude commas, making the current split(",") unsafe. Additionally, GROUP_CONCAT without ORDER BY produces non-deterministic ordering.

Consider using a safe separator and explicit ordering:

Proposed fix (safe separator + stable ordering)
+# Separator unlikely to appear in tag names
+TAG_SEPARATOR = "\x1f"

-        query = """
+        query = f"""
             SELECT 
                 i.id, 
                 i.path, 
                 i.folder_id, 
                 i.thumbnailPath, 
                 i.metadata, 
                 i.isTagged,
                 i.isFavourite,
-                GROUP_CONCAT(DISTINCT m.name) as tags
+                GROUP_CONCAT(DISTINCT m.name, '{TAG_SEPARATOR}') as tags
             FROM images i
             LEFT JOIN image_classes ic ON i.id = ic.image_id
             LEFT JOIN mappings m ON ic.class_id = m.class_id
         """
@@
-            tags = tags_str.split(",") if tags_str else None
+            tags = tags_str.split(TAG_SEPARATOR) if tags_str else []
+            tags.sort()

SQLite supports both GROUP_CONCAT(expr, separator) and GROUP_CONCAT(expr, sep ORDER BY col) syntax natively.

🤖 Fix all issues with AI agents
In `@backend/app/database/images.py`:
- Around line 187-190: The code is stringifying folder_id unconditionally which
produces "None" for NULL DB values; update the image dictionary construction
(the block where "folder_id": str(folder_id)) to guard against NULL by returning
None when folder_id is None (e.g., use a conditional that yields None if
folder_id is None otherwise str(folder_id)) so behavior matches
db_get_untagged_images; locate this change in the function building the image
dict (the same scope that references path, thumbnail_path, metadata_dict) and
replace the unconditional str(folder_id) with the guarded expression.

@aniket866
Copy link

@moksha-hub first join the discord channel of this repo there you will come to know PRs to those issues which is not yet assigned will not be merged , see recent messages here Pictopy discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Ineffecient way of getting images

2 participants