Set progress counts when resuming a sync. #328

ggreer · 2025-03-14T00:36:30Z

Summary by CodeRabbit

New Features
- Introduced a dedicated progress tracking system that gathers and logs synchronization metrics for resources, entitlements, and grants.
Refactor
- Updated the synchronization process to integrate with the new progress tracker, enhancing logging and improving recovery for resumed operations.
Bug Fixes
- Improved error handling for cases where resource counts exceed expectations during synchronization.

coderabbitai · 2025-03-14T00:36:37Z

Walkthrough

This pull request introduces a new progress-tracking functionality. A new method ProgressCounts is added to the C1File struct to validate a sync ID and retrieve resource, entitlement, and grant counts via database queries. A new package, progress, is created that defines a ProgressCounts struct along with several logging methods to monitor various progress metrics during synchronization. The syncer is updated to remove legacy progress tracking and now integrates the new progress mechanism for resumed syncs.

Changes

File(s)	Change Summary
`pkg/dotc1z/c1file.go`	Added the `ProgressCounts(ctx, syncID)` method to `C1File` for validating sync IDs and retrieving progress counts from the database, using error handling and math checks.
`pkg/progress/progress.go`	Introduced the new `progress` package with the `ProgressCounts` struct, a constructor `NewProgressCounts`, and several logging methods (`LogResourceTypesProgress`, `LogResourcesProgress`, `LogEntitlementsProgress`, `LogGrantsProgress`, `LogExpandProgress`).
`pkg/sync/syncer.go`	Removed the old progress tracking implementation; updated the `syncer` struct to use `*progress.ProgressCounts` instead. Modified the sync flow to retrieve and update progress counts via the new `ProgressCounts` method when resuming a sync, integrating the new progress package.

Sequence Diagram(s)

sequenceDiagram
    participant Syncer
    participant C1File
    participant DB
    participant ProgressPkg

    Syncer->>C1File: Call ProgressCounts(ctx, syncID)
    C1File->>DB: Validate syncID using getSync()
    DB-->>C1File: Return validation result
    C1File->>DB: List resource types & query counts (loop)
    DB-->>C1File: Return resource and progress counts
    C1File->>ProgressPkg: Populate ProgressCounts struct
    ProgressPkg-->>C1File: Return populated ProgressCounts
    C1File-->>Syncer: Return ProgressCounts or error

Suggested reviewers

jirwin

Poem

I'm a rabbit hopping through the code,
Crunching carrots in a happy mode,
Progress counts and logs all bright,
Syncer whispers secrets in the night,
With burrows deep where data's stored,
I celebrate changes—hopping forward!

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai plan to trigger planning for file edits and PR creation.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

pkg/progress/progress.go (2)

3-5: Import statements for logging and expansions.
These imports set the stage for structured logging and expansion logic.

Consider grouping related packages for clarity if referencing local conventions.

Also applies to: 7-10

91-133: Conditional logging for grants.
Mirrors the entitlements logic closely, maintaining consistent patterns. Consider a helper function to DRY up repeated percentage calculations if it suits your design.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b832d95 and 0ba7742.

📒 Files selected for processing (3)

pkg/dotc1z/c1file.go (3 hunks)
pkg/progress/progress.go (1 hunks)
pkg/sync/syncer.go (5 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: go-test (1.23.x, windows-latest)

🔇 Additional comments (17)

pkg/dotc1z/c1file.go (1)

6-6: Imports added for error handling, math ops, and progress tracking.
These additions look appropriate and form the basis for your new functionality.

Also applies to: 8-8, 26-26

pkg/progress/progress.go (8)

1-2: New package declaration.
Creating a dedicated package for progress tracking is a clean way to separate concerns.

12-20: ProgressCounts struct design.
The fields cover key metrics: resource types, resource counts, entitlements, grants, plus last log timestamps. This is well-structured for your logging needs.

22-24: Constant for rate-limiting logs.
Using a constant maxLogFrequency for controlling log frequency is a good approach.

25-34: Constructor for ProgressCounts.
Initializing maps and time fields in one place is clear and maintainable.

36-40: Logging resource type progress.
Straightforward informational log. Approved for clarity and consistency in your logging system.

41-45: Logging resource progress.
This method parallels the resource types logging and is useful for incremental monitoring.

47-89: Conditional logging for entitlements.
This handles zero-resource scenarios gracefully by skipping percentage calculations. Ensuring you log at a bounded frequency is a best practice.

135-144: LogExpandProgress for grant expansions.
Logging the number of remaining actions helps with diagnosing performance or bottlenecks. Good addition.

pkg/sync/syncer.go (8)

18-18: New import references for the c1z file store and progress package.
These additions integrate your updated store mechanics and progress tracking seamlessly.

Also applies to: 33-33

64-65: Introducing a ProgressCounts field in the syncer struct.
This centralizes syncer-level tracking, ensuring all sync steps can update and reference the same counters.

218-229: Fetching and logging progress counts when resuming sync.
This ensures the syncer has up-to-date stats for resource types, entitlements, grants, etc., upon resuming. Approved for clarity and reusability.

451-455: Incrementing and logging resource types count.
Updating the progress structure and logging at the end of pagination is a clear approach to track overall progress.

555-560: Logging resources progress for each resource type.
Informative step for progress updates in a multi-type scenario. The placement before finishing the action is sensible.

766-768: Per-resource-type entitlement progress increments.
Counting entitlements at a resource-type granularity helps measure coverage. No issues spotted.

1328-1330: Maintaining grants progress.
Analogous to entitlements logic. Keeps you updated on how many resources have been fully processed.

1720-1720: Initializing the syncer’s ProgressCounts.
Instantiating a new ProgressCounts ensures tracking is ready from the start.

coderabbitai · 2025-03-14T00:42:52Z

pkg/dotc1z/c1file.go

+func (c *C1File) ProgressCounts(ctx context.Context, syncID string) (*progress.ProgressCounts, error) {
+	ctx, span := tracer.Start(ctx, "C1File.ProgressCounts")
+	defer span.End()
+
+	// Check if sync ID exists
+	_, err := c.getSync(ctx, syncID)
+	if err != nil {
+		return nil, err
+	}
+
+	counts := progress.NewProgressCounts()
+
+	var rtStats []*v2.ResourceType
+	pageToken := ""
+	for {
+		resp, err := c.ListResourceTypes(ctx, &v2.ResourceTypesServiceListResourceTypesRequest{PageToken: pageToken})
+		if err != nil {
+			return nil, err
+		}
+
+		rtStats = append(rtStats, resp.List...)
+
+		if resp.NextPageToken == "" {
+			break
+		}
+
+		pageToken = resp.NextPageToken
+	}
+	counts.ResourceTypes = len(rtStats)
+	for _, rt := range rtStats {
+		resourceCount, err := c.db.From(resources.Name()).
+			Where(goqu.C("resource_type_id").Eq(rt.Id)).
+			Where(goqu.C("sync_id").Eq(syncID)).
+			CountContext(ctx)
+		if err != nil {
+			return nil, err
+		}
+		if resourceCount > math.MaxInt {
+			return nil, fmt.Errorf("resource count for %s is too large", rt.Id)
+		}
+		counts.Resources[rt.Id] = int(resourceCount)
+
+		query, args, err := c.db.From(entitlements.Name()).
+			Select(goqu.COUNT(goqu.DISTINCT(goqu.C("resource_id")))).
+			Where(goqu.C("resource_type_id").Eq(rt.Id)).
+			Where(goqu.C("sync_id").Eq(syncID)).
+			ToSQL()
+		if err != nil {
+			return nil, err
+		}
+
+		row := c.db.QueryRowContext(ctx, query, args...)
+		if row == nil {
+			continue
+		}
+		var entitlementsCount int64
+		err = row.Scan(&entitlementsCount)
+		if err != nil {
+			if errors.Is(err, sql.ErrNoRows) {
+				continue
+			}
+			return nil, err
+		}
+		counts.EntitlementsProgress[rt.Id] = int(entitlementsCount)
+
+		query, args, err = c.db.From(grants.Name()).
+			Select(goqu.COUNT(goqu.DISTINCT(goqu.C("resource_id")))).
+			Where(goqu.C("resource_type_id").Eq(rt.Id)).
+			Where(goqu.C("sync_id").Eq(syncID)).
+			ToSQL()
+		if err != nil {
+			return nil, err
+		}
+
+		row = c.db.QueryRowContext(ctx, query, args...)
+		if row == nil {
+			continue
+		}
+		var grantsCount int64
+		err = row.Scan(&grantsCount)
+		if err != nil {
+			if errors.Is(err, sql.ErrNoRows) {
+				continue
+			}
+			return nil, err
+		}
+
+		if grantsCount > math.MaxInt {
+			return nil, fmt.Errorf("grants count for %s is too large", rt.Id)
+		}
+		counts.GrantsProgress[rt.Id] = int(grantsCount)
+	}
+
+	return counts, nil
+}


💡 Verification agent

🧩 Analysis chain

Potential overflow and repeated queries.

Storing large DB counts (int64) in an int field can lead to overflow on 32-bit systems if counts exceed 2,147,483,647. Consider refactoring the ProgressCounts struct to use int64 instead of int where feasible.

The logic for counting entitlements and grants repeats similar queries. You could optimize by consolidating them (for example, in a single grouped query) or by modularizing the repeated code to reduce duplication.

- counts.Resources[rt.Id] = int(resourceCount) - ... - counts.GrantsProgress[rt.Id] = int(grantsCount) + counts.Resources[rt.Id] = resourceCount + ... + counts.GrantsProgress[rt.Id] = grantsCount

I'll begin by gathering additional context regarding the definition of the ProgressCounts struct and its initialization via the NewProgressCounts function to verify the potential overflow issue mentioned. This will help us confirm whether the struct fields are defined as int or int64, and if the repeated query logic can be optimized.

Below is a shell script to search for these definitions in the codebase:

🏁 Script executed:

#!/bin/bash # Search for the definition of ProgressCounts struct echo "Searching for ProgressCounts declaration:" rg "type ProgressCounts" -n . # Search for the NewProgressCounts function implementation echo "Searching for NewProgressCounts function:" rg "func NewProgressCounts" -n .

Length of output: 384

I'll now gather more details about the field types within the ProgressCounts struct to verify if the overflow concern is valid. Specifically, we need to check whether fields like Resources, GrantsProgress, and EntitlementsProgress are declared as type int (which may cause overflows on 32-bit systems) or as int64.

Below is a shell script to inspect the definition of ProgressCounts in the pkg/progress/progress.go file:

🏁 Script executed:

#!/bin/bash echo "Inspecting ProgressCounts struct details:" rg -C10 "type ProgressCounts struct" -n pkg/progress/progress.go

Length of output: 887

Action Required: Refactor Integer Types and Consolidate Query Logic

The ProgressCounts struct (in pkg/progress/progress.go) defines count fields as int. Given that the database queries return int64, casting with int(...) risks overflow on 32-bit systems—even though there’s a check against math.MaxInt. Consider refactoring these fields (and the corresponding initialization in NewProgressCounts) to use int64 to safely handle larger numbers.

The SQL query logic for counting entitlements and grants is duplicated. Consolidating these similar queries into a common helper or grouped query would improve maintainability and reduce redundancy.

- counts.Resources[rt.Id] = int(resourceCount) - ... - counts.GrantsProgress[rt.Id] = int(grantsCount) + counts.Resources[rt.Id] = resourceCount + ... + counts.GrantsProgress[rt.Id] = grantsCount

Please address these concerns to ensure robustness and cleaner code.

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (2)

pkg/progress/progress.go (1)

47-145: ⚠️ Potential issue

Guard concurrent reads and writes to log-tracking maps.

LogEntitlementsProgress, LogGrantsProgress, and LogExpandProgress read and write time-stamped map entries (LastEntitlementLog, LastGrantLog, and LastActionLog) without synchronization. This can cause data races if called concurrently. Consider wrapping these accesses with a mutex or another concurrency mechanism.

pkg/sync/syncer.go (1)

2078-2078: 🛠️ Refactor suggestion

Leverage concurrency-safe logic when logging expand progress.

LogExpandProgress also uses p.LastActionLog without synchronization. If sync expansions happen in parallel, a race condition is possible here as well. Same concurrency guard recommended as for entitlements and grants logs.

🧹 Nitpick comments (2)

pkg/progress/progress.go (1)

24-24: Consider a concurrency-safe data structure.

The // TODO: use a mutex or a syncmap note indicates a plan to handle parallelism in the future. Accesses to shared mutable state (e.g., maps in this struct) should be guarded with synchronization to avoid race conditions once the code becomes concurrently invoked.

Do you want me to open a new issue or provide a variant with a sync.RWMutex for concurrency safety?

pkg/sync/syncer.go (1)

225-235: Validate the overhead of resuming sync for large data sets.

When resuming a sync, calling ProgressCounts queries all resource types and counts, potentially expensive for large databases. If performance issues arise, consider caching partial results or deferring this call until necessary.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ba7742 and d3e3504.

📒 Files selected for processing (3)

pkg/dotc1z/c1file.go (3 hunks)
pkg/progress/progress.go (1 hunks)
pkg/sync/syncer.go (4 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

pkg/dotc1z/c1file.go (1)

pkg/progress/progress.go (2)

ProgressCounts (12-20)

NewProgressCounts (25-34)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: go-test (1.23.x, windows-latest)

🔇 Additional comments (5)

pkg/dotc1z/c1file.go (1)

215-309: Address integer overflow and repeated query logic [duplicate of a previous comment].

The code uses int(resourceCount) and int(entitlementsCount)/int(grantsCount) to store database query results, risking overflow on 32-bit builds. It may be safer to use int64 or to store the raw int64 directly in ProgressCounts. Additionally, the repeated query pattern for counting entitlements and grants could be consolidated into helper functions to reduce duplication.
pkg/sync/syncer.go (4)

70-70: Progress counts integration looks good.

Initializing a *progress.ProgressCounts field helps centralize synchronization metrics. No issues noted here.

572-575: Confirm that incrementing by 1 for each resource is intentional.

The code increments entitlements or grants counters by 1 per resource finishing that step, rather than by the number of entitlements/grants processed. If the aim is only to track resource-level progress, this is fine. Otherwise, consider incrementing by the actual counts from each resource.

Also applies to: 782-783, 1347-1348

2277-2277: Progress initialization is clean and readable.

Using progress.NewProgressCounts() here neatly integrates the new progress-tracking logic for the syncer. This approach is consistent with the rest of the design.

467-467:
Details

❓ Verification inconclusive

Ensure no double counting of resource types.

ResourceTypes += len(resp.List) assumes each page of resource types represents distinct types. Confirm the connector never returns duplicates across multiple pages. Otherwise, consider deduplicating or verifying uniqueness before incrementing this count.

🏁 Script executed:
#!/bin/bash
# Searching for repeats of resource types returned by the connector
# This script uses the store directly to gather & compare the identifiers.
# Adjust or remove the 'limit' field if the store implementation has no limit concept.
echo "Listing resource types from the connector for potential duplicates:"
# Pseudocode approach: you would add real commands if the store is accessible here.
# For example:
# ast-grep or rg for verifying uniqueness in code or logs if duplicates are possible
Length of output: 144
Double Counting of Resource Types: Ensure Connector Uniqueness

In pkg/sync/syncer.go at line 467, the update
s.counts.ResourceTypes += len(resp.List)
implicitly assumes that every page returned by the connector contains a unique set of resource types. However, if there's any possibility that the connector might return duplicate resource types across pages, this approach could lead to an overcount.

Verify Connector Behavior: Confirm whether the connector guarantees that each page contains distinct resource types.

Consider Deduplication: If duplicates are possible, implement a deduplication step (e.g., by using a set or comparing against a stored list) to ensure that each resource type is counted only once.

Testing: Add tests or log inspections to verify that duplicate resource types are not being counted across pages.

coderabbitai · 2025-03-28T20:22:58Z

pkg/progress/progress.go

+type ProgressCounts struct {
+	ResourceTypes        int            // count of resource types
+	Resources            map[string]int // map of resource type id to resource count
+	EntitlementsProgress map[string]int // map of resource type id to entitlement count
+	LastEntitlementLog   map[string]time.Time
+	GrantsProgress       map[string]int // map of resource type id to grant count
+	LastGrantLog         map[string]time.Time
+	LastActionLog        time.Time
+}


🛠️ Refactor suggestion

Consider using int64 for counts.

Storing counts in the int type can risk overflow on 32-bit builds. Using int64 for fields like ResourceTypes, Resources, EntitlementsProgress, and GrantsProgress is safer if large values are possible.

- type ProgressCounts struct { - ResourceTypes int - Resources map[string]int - EntitlementsProgress map[string]int - ... - } + type ProgressCounts struct { + ResourceTypes int64 + Resources map[string]int64 + EntitlementsProgress map[string]int64 + ... + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

type ProgressCounts struct {

ResourceTypes int // count of resource types

Resources map[string]int // map of resource type id to resource count

EntitlementsProgress map[string]int // map of resource type id to entitlement count

LastEntitlementLog map[string]time.Time

GrantsProgress map[string]int // map of resource type id to grant count

LastGrantLog map[string]time.Time

LastActionLog time.Time

}

type ProgressCounts struct {

ResourceTypes int64 // count of resource types

Resources map[string]int64 // map of resource type id to resource count

EntitlementsProgress map[string]int64 // map of resource type id to entitlement count

LastEntitlementLog map[string]time.Time

GrantsProgress map[string]int64 // map of resource type id to grant count

LastGrantLog map[string]time.Time

LastActionLog time.Time

}

ggreer requested a review from jirwin as a code owner March 14, 2025 00:36

coderabbitai bot reviewed Mar 14, 2025

View reviewed changes

Set progress counts when resuming a sync.

d3e3504

ggreer force-pushed the ggreer/resume-stats branch from 0ba7742 to d3e3504 Compare March 28, 2025 20:16

coderabbitai bot reviewed Mar 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set progress counts when resuming a sync. #328

Set progress counts when resuming a sync. #328

Uh oh!

ggreer commented Mar 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 14, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 14, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Set progress counts when resuming a sync. #328

Are you sure you want to change the base?

Set progress counts when resuming a sync. #328

Uh oh!

Conversation

ggreer commented Mar 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggreer commented Mar 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 14, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)