Skip to content

Conversation

@arreyder
Copy link
Contributor

@arreyder arreyder commented Jan 6, 2026

Summary

  • Add rowIterator[T] type for streaming access to query results without materializing the entire result slice
  • Add streamConnectorObjects() function that returns the iterator
  • Refactor listConnectorObjects() to use the streaming API internally, reducing code duplication
  • Pre-allocate result slices in ListSyncRuns() based on pageSize

The iterator provides:

  • Next()/Value()/Err() for standard iteration pattern
  • Collect() to materialize remaining results when a slice is needed
  • Close() to release database resources
  • NextPageToken() for pagination

This is useful for:

  • Memory-efficient processing of large result sets
  • Early termination when a specific item is found
  • Chained processing pipelines

Context

Profile analysis of temporal_sync showed go-sqlite.(*conn).columnText allocating ~914 MB/sec. While the root cause is in the driver, application-level optimizations provide defense in depth and reduce peak memory usage.

Test plan

  • Existing tests pass
  • Verified listConnectorObjects behavior unchanged (uses iterator internally)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor
    • Optimized query result streaming with enhanced memory management to support efficient handling of large result sets.
    • Refined internal memory allocation strategy for improved performance in result collection operations.

✏️ Tip: You can customize this high-level summary in your review settings.

Add RowIterator[T] type that provides streaming access to query results
without materializing the entire result slice upfront. This is useful for:
- Memory-efficient processing of large result sets
- Early termination when a specific item is found
- Chained processing pipelines

The iterator includes:
- Next()/Value()/Err() for standard iteration
- Collect() to materialize remaining results when needed
- Close() to release database resources
- NextPageToken() for pagination

Refactor listConnectorObjects to use the streaming API internally,
reducing code duplication and ensuring consistent behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Walkthrough

The changes introduce a generic rowIterator[T] type for streaming database query results instead of full materialization. A new streamConnectorObjects() function returns the iterator, while listConnectorObjects() is refactored to use it for result collection. Additionally, slice pre-allocation is optimized in ListSyncRuns.

Changes

Cohort / File(s) Summary
Streaming Iterator Pattern
pkg/dotc1z/sql_helpers.go
Introduced generic rowIterator[T proto.Message] with lifecycle methods (Next, Value, Err, NextPageToken, Close, Collect). Added streamConnectorObjects[T proto.Message]() function to return iterator instead of materialized slice. Refactored listConnectorObjects[T proto.Message]() to use streaming path: obtains iterator via streamConnectorObjects, defers Close(), and collects results with Collect(). Replaced inline row-iteration logic with iterator-based approach for page boundaries, unmarshalling, and error handling.
Memory Allocation Optimization
pkg/dotc1z/sync_runs.go
Modified slice initialization in ListSyncRuns from nil slice to preallocated slice with capacity pageSize for improved memory efficiency. No impact to function signatures or control flow.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant listConnectorObjects
    participant streamConnectorObjects
    participant rowIterator
    participant Database

    Caller->>listConnectorObjects: Call with request
    listConnectorObjects->>streamConnectorObjects: Call with context, table, factory
    streamConnectorObjects->>Database: Execute query
    streamConnectorObjects-->>listConnectorObjects: Return *rowIterator[T]
    
    Note over listConnectorObjects: Defer Close()
    listConnectorObjects->>rowIterator: Collect()
    
    loop For each row
        rowIterator->>Database: Fetch next row
        rowIterator->>rowIterator: Unmarshal & check errors
        Note over rowIterator: Next() returns true,<br/>Value() returns T
    end
    
    rowIterator-->>listConnectorObjects: Return ([]T, error)
    listConnectorObjects->>rowIterator: Close()
    listConnectorObjects-->>Caller: Return ([]T, nextPageToken, error)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

Streams now flow where lists once crowded the heap,
Iterators hop through data so deep,
With Collect() and Close() dancing with grace,
Our queries now stream at a sensible pace! 🐰💨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing a streaming iterator API for memory-efficient result processing, which matches the primary additions of rowIterator[T] and streamConnectorObjects.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd227b2 and 2f3c010.

📒 Files selected for processing (2)
  • pkg/dotc1z/sql_helpers.go
  • pkg/dotc1z/sync_runs.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/dotc1z/sql_helpers.go (2)
pkg/dotc1z/c1file.go (1)
  • C1File (37-60)
pkg/annotations/annotations.go (1)
  • GetSyncIdFromAnnotations (128-139)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
  • GitHub Check: go-test (1.25.2, windows-latest)
🔇 Additional comments (7)
pkg/dotc1z/sql_helpers.go (6)

29-42: Well-structured iterator type.

The generic iterator with proper proto.Message constraint and clear field separation for state management looks good. The design correctly separates pagination logic (count, pageSize, lastRow, nextPageToken) from iteration state (done, err, current).


44-80: Pagination boundary handling is correct.

The logic correctly detects the "extra row" (row pageSize+1) without scanning it, using the last successfully scanned row ID to compute the next page token. Error propagation through it.err and it.done is handled properly.


82-101: LGTM!

Simple accessor methods with clear contracts. The documentation correctly notes when Value() and NextPageToken() are valid.


103-122: Correct implementation with one consideration.

The pre-allocation and iteration logic are correct. Note that if a caller calls Next() to peek at the first value, then calls Collect(), that first value won't be included in the returned slice (it's accessible only via Value()). The current usage in listConnectorObjects doesn't have this issue since it calls Collect() immediately.


203-349: Well-designed streaming function with proper resource lifecycle.

The function correctly handles all error cases before rows is created, preventing resource leaks. The query building logic is appropriately extracted from the original listConnectorObjects. The iterator is returned with correct zero-value initialization for state fields.


351-366: Clean refactoring with proper resource cleanup.

The deferred Close() ensures database resources are released even if Collect() fails. The function now correctly delegates to the streaming API while maintaining the same external behavior.

pkg/dotc1z/sync_runs.go (1)

243-243: Good memory optimization.

Pre-allocating with capacity pageSize eliminates slice reallocations during the append loop, since we know exactly how many elements (at most pageSize) will be added.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants