-
Notifications
You must be signed in to change notification settings - Fork 4
Set progress counts when resuming a sync. #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis pull request introduces a new progress-tracking functionality. A new method Changes
Sequence Diagram(s)sequenceDiagram
participant Syncer
participant C1File
participant DB
participant ProgressPkg
Syncer->>C1File: Call ProgressCounts(ctx, syncID)
C1File->>DB: Validate syncID using getSync()
DB-->>C1File: Return validation result
C1File->>DB: List resource types & query counts (loop)
DB-->>C1File: Return resource and progress counts
C1File->>ProgressPkg: Populate ProgressCounts struct
ProgressPkg-->>C1File: Return populated ProgressCounts
C1File-->>Syncer: Return ProgressCounts or error
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/progress/progress.go (2)
3-5: Import statements for logging and expansions.
These imports set the stage for structured logging and expansion logic.Consider grouping related packages for clarity if referencing local conventions.
Also applies to: 7-10
91-133: Conditional logging for grants.
Mirrors the entitlements logic closely, maintaining consistent patterns. Consider a helper function to DRY up repeated percentage calculations if it suits your design.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
pkg/dotc1z/c1file.go(3 hunks)pkg/progress/progress.go(1 hunks)pkg/sync/syncer.go(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: go-test (1.23.x, windows-latest)
🔇 Additional comments (17)
pkg/dotc1z/c1file.go (1)
6-6: Imports added for error handling, math ops, and progress tracking.
These additions look appropriate and form the basis for your new functionality.Also applies to: 8-8, 26-26
pkg/progress/progress.go (8)
1-2: New package declaration.
Creating a dedicated package for progress tracking is a clean way to separate concerns.
12-20: ProgressCounts struct design.
The fields cover key metrics: resource types, resource counts, entitlements, grants, plus last log timestamps. This is well-structured for your logging needs.
22-24: Constant for rate-limiting logs.
Using a constantmaxLogFrequencyfor controlling log frequency is a good approach.
25-34: Constructor for ProgressCounts.
Initializing maps and time fields in one place is clear and maintainable.
36-40: Logging resource type progress.
Straightforward informational log. Approved for clarity and consistency in your logging system.
41-45: Logging resource progress.
This method parallels the resource types logging and is useful for incremental monitoring.
47-89: Conditional logging for entitlements.
This handles zero-resource scenarios gracefully by skipping percentage calculations. Ensuring you log at a bounded frequency is a best practice.
135-144: LogExpandProgress for grant expansions.
Logging the number of remaining actions helps with diagnosing performance or bottlenecks. Good addition.pkg/sync/syncer.go (8)
18-18: New import references for the c1z file store and progress package.
These additions integrate your updated store mechanics and progress tracking seamlessly.Also applies to: 33-33
64-65: Introducing a ProgressCounts field in the syncer struct.
This centralizes syncer-level tracking, ensuring all sync steps can update and reference the same counters.
218-229: Fetching and logging progress counts when resuming sync.
This ensures the syncer has up-to-date stats for resource types, entitlements, grants, etc., upon resuming. Approved for clarity and reusability.
451-455: Incrementing and logging resource types count.
Updating the progress structure and logging at the end of pagination is a clear approach to track overall progress.
555-560: Logging resources progress for each resource type.
Informative step for progress updates in a multi-type scenario. The placement before finishing the action is sensible.
766-768: Per-resource-type entitlement progress increments.
Counting entitlements at a resource-type granularity helps measure coverage. No issues spotted.
1328-1330: Maintaining grants progress.
Analogous to entitlements logic. Keeps you updated on how many resources have been fully processed.
1720-1720: Initializing the syncer’s ProgressCounts.
Instantiating a new ProgressCounts ensures tracking is ready from the start.
| func (c *C1File) ProgressCounts(ctx context.Context, syncID string) (*progress.ProgressCounts, error) { | ||
| ctx, span := tracer.Start(ctx, "C1File.ProgressCounts") | ||
| defer span.End() | ||
|
|
||
| // Check if sync ID exists | ||
| _, err := c.getSync(ctx, syncID) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| counts := progress.NewProgressCounts() | ||
|
|
||
| var rtStats []*v2.ResourceType | ||
| pageToken := "" | ||
| for { | ||
| resp, err := c.ListResourceTypes(ctx, &v2.ResourceTypesServiceListResourceTypesRequest{PageToken: pageToken}) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| rtStats = append(rtStats, resp.List...) | ||
|
|
||
| if resp.NextPageToken == "" { | ||
| break | ||
| } | ||
|
|
||
| pageToken = resp.NextPageToken | ||
| } | ||
| counts.ResourceTypes = len(rtStats) | ||
| for _, rt := range rtStats { | ||
| resourceCount, err := c.db.From(resources.Name()). | ||
| Where(goqu.C("resource_type_id").Eq(rt.Id)). | ||
| Where(goqu.C("sync_id").Eq(syncID)). | ||
| CountContext(ctx) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| if resourceCount > math.MaxInt { | ||
| return nil, fmt.Errorf("resource count for %s is too large", rt.Id) | ||
| } | ||
| counts.Resources[rt.Id] = int(resourceCount) | ||
|
|
||
| query, args, err := c.db.From(entitlements.Name()). | ||
| Select(goqu.COUNT(goqu.DISTINCT(goqu.C("resource_id")))). | ||
| Where(goqu.C("resource_type_id").Eq(rt.Id)). | ||
| Where(goqu.C("sync_id").Eq(syncID)). | ||
| ToSQL() | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| row := c.db.QueryRowContext(ctx, query, args...) | ||
| if row == nil { | ||
| continue | ||
| } | ||
| var entitlementsCount int64 | ||
| err = row.Scan(&entitlementsCount) | ||
| if err != nil { | ||
| if errors.Is(err, sql.ErrNoRows) { | ||
| continue | ||
| } | ||
| return nil, err | ||
| } | ||
| counts.EntitlementsProgress[rt.Id] = int(entitlementsCount) | ||
|
|
||
| query, args, err = c.db.From(grants.Name()). | ||
| Select(goqu.COUNT(goqu.DISTINCT(goqu.C("resource_id")))). | ||
| Where(goqu.C("resource_type_id").Eq(rt.Id)). | ||
| Where(goqu.C("sync_id").Eq(syncID)). | ||
| ToSQL() | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| row = c.db.QueryRowContext(ctx, query, args...) | ||
| if row == nil { | ||
| continue | ||
| } | ||
| var grantsCount int64 | ||
| err = row.Scan(&grantsCount) | ||
| if err != nil { | ||
| if errors.Is(err, sql.ErrNoRows) { | ||
| continue | ||
| } | ||
| return nil, err | ||
| } | ||
|
|
||
| if grantsCount > math.MaxInt { | ||
| return nil, fmt.Errorf("grants count for %s is too large", rt.Id) | ||
| } | ||
| counts.GrantsProgress[rt.Id] = int(grantsCount) | ||
| } | ||
|
|
||
| return counts, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Potential overflow and repeated queries.
- Storing large DB counts (
int64) in anintfield can lead to overflow on 32-bit systems if counts exceed 2,147,483,647. Consider refactoring theProgressCountsstruct to useint64instead ofintwhere feasible. - The logic for counting entitlements and grants repeats similar queries. You could optimize by consolidating them (for example, in a single grouped query) or by modularizing the repeated code to reduce duplication.
- counts.Resources[rt.Id] = int(resourceCount)
- ...
- counts.GrantsProgress[rt.Id] = int(grantsCount)
+ counts.Resources[rt.Id] = resourceCount
+ ...
+ counts.GrantsProgress[rt.Id] = grantsCountI'll begin by gathering additional context regarding the definition of the ProgressCounts struct and its initialization via the NewProgressCounts function to verify the potential overflow issue mentioned. This will help us confirm whether the struct fields are defined as int or int64, and if the repeated query logic can be optimized.
Below is a shell script to search for these definitions in the codebase:
🏁 Script executed:
#!/bin/bash
# Search for the definition of ProgressCounts struct
echo "Searching for ProgressCounts declaration:"
rg "type ProgressCounts" -n .
# Search for the NewProgressCounts function implementation
echo "Searching for NewProgressCounts function:"
rg "func NewProgressCounts" -n .Length of output: 384
I'll now gather more details about the field types within the ProgressCounts struct to verify if the overflow concern is valid. Specifically, we need to check whether fields like Resources, GrantsProgress, and EntitlementsProgress are declared as type int (which may cause overflows on 32-bit systems) or as int64.
Below is a shell script to inspect the definition of ProgressCounts in the pkg/progress/progress.go file:
🏁 Script executed:
#!/bin/bash
echo "Inspecting ProgressCounts struct details:"
rg -C10 "type ProgressCounts struct" -n pkg/progress/progress.goLength of output: 887
Action Required: Refactor Integer Types and Consolidate Query Logic
- The
ProgressCountsstruct (inpkg/progress/progress.go) defines count fields asint. Given that the database queries returnint64, casting withint(...)risks overflow on 32-bit systems—even though there’s a check againstmath.MaxInt. Consider refactoring these fields (and the corresponding initialization inNewProgressCounts) to useint64to safely handle larger numbers. - The SQL query logic for counting entitlements and grants is duplicated. Consolidating these similar queries into a common helper or grouped query would improve maintainability and reduce redundancy.
- counts.Resources[rt.Id] = int(resourceCount)
- ...
- counts.GrantsProgress[rt.Id] = int(grantsCount)
+ counts.Resources[rt.Id] = resourceCount
+ ...
+ counts.GrantsProgress[rt.Id] = grantsCountPlease address these concerns to ensure robustness and cleaner code.
Committable suggestion skipped: line range outside the PR's diff.
0ba7742 to
d3e3504
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
pkg/progress/progress.go (1)
47-145:⚠️ Potential issueGuard concurrent reads and writes to log-tracking maps.
LogEntitlementsProgress,LogGrantsProgress, andLogExpandProgressread and write time-stamped map entries (LastEntitlementLog,LastGrantLog, andLastActionLog) without synchronization. This can cause data races if called concurrently. Consider wrapping these accesses with a mutex or another concurrency mechanism.pkg/sync/syncer.go (1)
2078-2078: 🛠️ Refactor suggestionLeverage concurrency-safe logic when logging expand progress.
LogExpandProgressalso usesp.LastActionLogwithout synchronization. If sync expansions happen in parallel, a race condition is possible here as well. Same concurrency guard recommended as for entitlements and grants logs.
🧹 Nitpick comments (2)
pkg/progress/progress.go (1)
24-24: Consider a concurrency-safe data structure.The
// TODO: use a mutex or a syncmapnote indicates a plan to handle parallelism in the future. Accesses to shared mutable state (e.g., maps in this struct) should be guarded with synchronization to avoid race conditions once the code becomes concurrently invoked.Do you want me to open a new issue or provide a variant with a sync.RWMutex for concurrency safety?
pkg/sync/syncer.go (1)
225-235: Validate the overhead of resuming sync for large data sets.When resuming a sync, calling
ProgressCountsqueries all resource types and counts, potentially expensive for large databases. If performance issues arise, consider caching partial results or deferring this call until necessary.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
pkg/dotc1z/c1file.go(3 hunks)pkg/progress/progress.go(1 hunks)pkg/sync/syncer.go(4 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
pkg/dotc1z/c1file.go (1)
pkg/progress/progress.go (2)
ProgressCounts(12-20)NewProgressCounts(25-34)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: go-test (1.23.x, windows-latest)
🔇 Additional comments (5)
pkg/dotc1z/c1file.go (1)
215-309: Address integer overflow and repeated query logic [duplicate of a previous comment].The code uses
int(resourceCount)andint(entitlementsCount)/int(grantsCount)to store database query results, risking overflow on 32-bit builds. It may be safer to useint64or to store the rawint64directly inProgressCounts. Additionally, the repeated query pattern for counting entitlements and grants could be consolidated into helper functions to reduce duplication.pkg/sync/syncer.go (4)
70-70: Progress counts integration looks good.Initializing a
*progress.ProgressCountsfield helps centralize synchronization metrics. No issues noted here.
572-575: Confirm that incrementing by 1 for each resource is intentional.The code increments entitlements or grants counters by 1 per resource finishing that step, rather than by the number of entitlements/grants processed. If the aim is only to track resource-level progress, this is fine. Otherwise, consider incrementing by the actual counts from each resource.
Also applies to: 782-783, 1347-1348
2277-2277: Progress initialization is clean and readable.Using
progress.NewProgressCounts()here neatly integrates the new progress-tracking logic for the syncer. This approach is consistent with the rest of the design.
467-467:Details
❓ Verification inconclusive
Ensure no double counting of resource types.
ResourceTypes += len(resp.List)assumes each page of resource types represents distinct types. Confirm the connector never returns duplicates across multiple pages. Otherwise, consider deduplicating or verifying uniqueness before incrementing this count.
🏁 Script executed:
#!/bin/bash # Searching for repeats of resource types returned by the connector # This script uses the store directly to gather & compare the identifiers. # Adjust or remove the 'limit' field if the store implementation has no limit concept. echo "Listing resource types from the connector for potential duplicates:" # Pseudocode approach: you would add real commands if the store is accessible here. # For example: # ast-grep or rg for verifying uniqueness in code or logs if duplicates are possibleLength of output: 144
Double Counting of Resource Types: Ensure Connector Uniqueness
In pkg/sync/syncer.go at line 467, the update
s.counts.ResourceTypes += len(resp.List)implicitly assumes that every page returned by the connector contains a unique set of resource types. However, if there's any possibility that the connector might return duplicate resource types across pages, this approach could lead to an overcount.
- Verify Connector Behavior: Confirm whether the connector guarantees that each page contains distinct resource types.
- Consider Deduplication: If duplicates are possible, implement a deduplication step (e.g., by using a set or comparing against a stored list) to ensure that each resource type is counted only once.
- Testing: Add tests or log inspections to verify that duplicate resource types are not being counted across pages.
| type ProgressCounts struct { | ||
| ResourceTypes int // count of resource types | ||
| Resources map[string]int // map of resource type id to resource count | ||
| EntitlementsProgress map[string]int // map of resource type id to entitlement count | ||
| LastEntitlementLog map[string]time.Time | ||
| GrantsProgress map[string]int // map of resource type id to grant count | ||
| LastGrantLog map[string]time.Time | ||
| LastActionLog time.Time | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider using int64 for counts.
Storing counts in the int type can risk overflow on 32-bit builds. Using int64 for fields like ResourceTypes, Resources, EntitlementsProgress, and GrantsProgress is safer if large values are possible.
- type ProgressCounts struct {
- ResourceTypes int
- Resources map[string]int
- EntitlementsProgress map[string]int
- ...
- }
+ type ProgressCounts struct {
+ ResourceTypes int64
+ Resources map[string]int64
+ EntitlementsProgress map[string]int64
+ ...
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| type ProgressCounts struct { | |
| ResourceTypes int // count of resource types | |
| Resources map[string]int // map of resource type id to resource count | |
| EntitlementsProgress map[string]int // map of resource type id to entitlement count | |
| LastEntitlementLog map[string]time.Time | |
| GrantsProgress map[string]int // map of resource type id to grant count | |
| LastGrantLog map[string]time.Time | |
| LastActionLog time.Time | |
| } | |
| type ProgressCounts struct { | |
| ResourceTypes int64 // count of resource types | |
| Resources map[string]int64 // map of resource type id to resource count | |
| EntitlementsProgress map[string]int64 // map of resource type id to entitlement count | |
| LastEntitlementLog map[string]time.Time | |
| GrantsProgress map[string]int64 // map of resource type id to grant count | |
| LastGrantLog map[string]time.Time | |
| LastActionLog time.Time | |
| } |
Summary by CodeRabbit