Skip to content

Conversation

@kans
Copy link
Contributor

@kans kans commented Dec 16, 2025

With tests and future proofing for the addition of new fields.

Summary by CodeRabbit

  • New Features

    • Deterministic, salt-configurable anonymization for resources, resource types, entitlements, grants, external IDs, emails, names, logins and URLs.
    • End-to-end C1Z anonymization: produces an anonymized copy, reports per-type stats, clears assets/sessions, and resets sync timestamps.
    • Public anonymizer config and helpers for consistent timestamps and hashing.
  • Bug Fixes / Improvements

    • Annotation handling: preserve known marker annotations, anonymize known fields, drop unknown annotation types.
  • Tests

    • Extensive unit, field/table-coverage, and end-to-end tests validating determinism and coverage.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 16, 2025

Walkthrough

Adds a deterministic, salt‑based anonymizer (Hasher + Anonymizer), applies it across ResourceTypes, Resources, Entitlements, Grants and annotations, provides a C1Z file processor to run anonymization (clearing assets/sessions and sync timestamps), and adds extensive unit and integration tests and DB helpers.

Changes

Cohort / File(s) Summary
Core anonymization
pkg/anonymize/anonymize.go, pkg/anonymize/anonymize_test.go
New Config, Anonymizer and Hasher (HMAC‑SHA256 salt-based) with many Anonymize* helpers and tests for determinism, truncation, and variants.
Resource anonymization
pkg/anonymize/resource.go, pkg/anonymize/resource_test.go
In‑place AnonymizeResource and helpers for IDs, ExternalId, DisplayName, Description, ParentId and trait‑specific anonymization (User/Group/Role/App/Secret); tests added.
ResourceType & misc
pkg/anonymize/misc.go, pkg/anonymize/misc_test.go
Adds AnonymizeResourceType, ShouldDeleteAssets, ShouldClearSessionStore and related tests.
Entitlement anonymization
pkg/anonymize/entitlement.go, pkg/anonymize/entitlement_test.go
Adds AnonymizeEntitlement (DisplayName, Description, Slug, Id), recurses into embedded Resource and GrantableTo, with tests.
Grant anonymization
pkg/anonymize/grant.go, pkg/anonymize/grant_test.go
Adds AnonymizeGrant, anonymizes nested entitlement/principal, remaps GrantSources keys (anonymized resource IDs), with tests.
Annotation processors
pkg/anonymize/annotations.go
New processors for ResourceType, Grant, and Entitlement annotations; type‑specific unmarshal, field anonymization/hashing, ExternalLink URL anonymization, unknown types dropped, nil‑safe, errors propagated.
C1Z processing / orchestration
pkg/anonymize/processor.go, pkg/anonymize/processor_test.go
ProcessorStats and AnonymizeC1ZFile: open input/output C1Z, paginated processing of resource_types/resources/entitlements/grants, write output, clear assets/sessions, clear sync timestamps; end‑to‑end tests.
Field & table coverage tests
pkg/anonymize/field_coverage_test.go, pkg/anonymize/table_coverage_test.go
Large test suites enforcing per‑field anonymization policies and asserting DB table handler coverage; validators for nested types and policy completeness.
C1Z DB helpers & SQL utilities
pkg/dotc1z/anonymize_helpers.go, pkg/dotc1z/sql_helpers.go
Adds C1File methods ClearAssets, ClearAllSessions, ClearSyncRunTimestamps and AllTableNames(); tableDescriptor interface surface reduced.
Misc tests
pkg/anonymize/*_test.go
Additional unit/integration tests covering determinism, salt variance, and processor behavior.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Anonymizer (AnonymizeC1ZFile)
    participant FS as FileSystem
    participant InC1 as Input C1File
    participant OutC1 as Output C1File
    participant DB as DB (C1File tables)
    participant Anon as Anonymizer/Hasher

    CLI->>FS: open inputPath
    CLI->>FS: create outputPath
    CLI->>InC1: Open(inputPath)
    CLI->>OutC1: Create(outputPath)
    CLI->>Anon: New(config)
    CLI->>InC1: processC1File(start)
    loop per section (paged)
        InC1->>DB: List rows (ResourceTypes/Resources/Entitlements/Grants)
        DB-->>InC1: rows
        InC1->>Anon: AnonymizeX(item)
        Anon->>Anon: hash/anonymize fields & annotations
        Anon-->>InC1: anonymized item
        InC1->>OutC1: Put anonymized item
    end
    alt ShouldDeleteAssets == true
        OutC1->>DB: ClearAssets()
        DB-->>OutC1: delete result
    end
    alt ShouldClearSessionStore == true
        OutC1->>DB: ClearAllSessions()
        DB-->>OutC1: delete result
    end
    OutC1->>DB: ClearSyncRunTimestamps()
    DB-->>OutC1: update result
    CLI->>OutC1: Close & Save
    CLI-->>User: return ProcessorStats
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Extra attention:
    • pkg/anonymize/resource.go — extensive trait‑specific logic and edge cases
    • pkg/anonymize/processor.go — pagination, C1Z file handling, and error propagation
    • pkg/anonymize/annotations.go — Any unmarshalling, unknown type handling, and error paths
    • pkg/anonymize/field_coverage_test.go — policy completeness and nested type lists
    • pkg/dotc1z/anonymize_helpers.go — SQL update/delete correctness and dbUpdated semantics

Suggested reviewers

  • jirwin
  • mj-palanker

Poem

🐰 I salted names and hid the links,
IDs hopped off to secret nooks.
Grants and traits now wear a mask,
Sessions cleared — a tidy task.
A little rabbit danced and fixed the books.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.98% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'anonymize a c1z' is vague and generic, using minimal words without conveying specific details about the substantial anonymization implementation added. Use a more specific title such as 'Add anonymization module for C1Z files' or 'Implement C1Z file anonymization with deterministic hashing' to better describe the significant changes introduced.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kans/anonymize-c1zs

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 480356d and 40b4e9a.

📒 Files selected for processing (1)
  • pkg/anonymize/processor.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: go-lint
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
  • GitHub Check: go-test (1.25.2, windows-latest)
🔇 Additional comments (1)
pkg/anonymize/processor.go (1)

138-284: Code duplication across process methods is acceptable.

The four process methods (processResourceTypes, processResources, processEntitlements, processGrants) follow an identical pagination pattern with only type-specific differences. While this creates duplication, attempting to abstract it would likely harm readability without significant benefit, especially given Go's type system constraints.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (11)
pkg/anonymize/field_coverage_test.go (1)

572-595: Consider using slices.Contains for cleaner assertion.

The nested loop to check if a required field exists in the anonymized fields list could be simplified.

+import "slices"
+
 // TestFieldPolicy_UserTraitHasPIIFields verifies that UserTrait has expected PII fields marked.
 func TestFieldPolicy_UserTraitHasPIIFields(t *testing.T) {
 	anonymizedFields := getAnonymizedFields(userTraitFieldPolicies)
 
 	// These fields MUST be anonymized as they contain PII
 	requiredAnonymizedFields := []string{
 		"emails",
 		"login",
 		"login_aliases",
 		"employee_ids",
 		"structured_name",
 	}
 
 	for _, required := range requiredAnonymizedFields {
-		found := false
-		for _, actual := range anonymizedFields {
-			if actual == required {
-				found = true
-				break
-			}
-		}
-		require.True(t, found, "UserTrait field %q must be marked for anonymization", required)
+		require.True(t, slices.Contains(anonymizedFields, required),
+			"UserTrait field %q must be marked for anonymization", required)
 	}
 }
pkg/anonymize/anonymize.go (1)

39-46: Consider documenting that the default salt should be replaced for production use.

The hardcoded default salt "baton-anonymize-default-salt" is fine for testing/development, but using the same salt across different anonymization runs could allow correlation of data between anonymized files. Consider adding a comment or log warning when the default salt is used.

 // DefaultConfig returns a Config with sensible default values.
+// Note: The default salt is provided for convenience but should be replaced
+// with a unique, secret salt for production use to prevent data correlation
+// across anonymized files.
 func DefaultConfig() Config {
 	return Config{
 		Salt:                  "baton-anonymize-default-salt",
 		AssetHandling:         AssetHandlingRemove,
 		AnonymizeDescriptions: true,
 	}
 }
pkg/anonymize/entitlement_test.go (1)

10-111: Good test coverage, consider adding a test for grantable_to anonymization.

The tests cover the main scenarios well. For completeness, consider adding a test case that includes GrantableTo resource types to verify they are anonymized correctly along with the entitlement.

pkg/anonymize/misc.go (1)

81-86: Consider making session store clearing configurable.

ShouldClearSessionStore() unconditionally returns true. If there are scenarios where preserving session data is desirable (e.g., debugging), consider adding a config flag similar to AnonymizeDescriptions. However, if session data always contains sensitive information, the current approach is appropriate.

pkg/anonymize/processor.go (1)

244-256: AssetsRemoved and AssetsReplaced stats are misleading.

These counters are incremented by 1 regardless of how many assets are affected. They behave more like boolean flags than counts. Consider either renaming them (e.g., AssetsWereRemoved bool) or tracking the actual count of affected rows.

 	if assetAnon.ShouldRemove {
 		// Clear all assets from the database
 		if err := c1f.ClearAssets(ctx); err != nil {
 			return err
 		}
-		stats.AssetsRemoved++
+		stats.AssetsRemoved = true  // If changing to bool
 	} else if assetAnon.ReplacementData != nil {

Alternatively, modify ClearAssets and ReplaceAllAssets to return the number of affected rows for accurate stats.

pkg/dotc1z/anonymize_helpers.go (1)

46-51: Consider using UTC for discovered_at timestamp as a best practice.

The codebase currently uses time.Now() uniformly across all discovered_at assignments (sql_helpers.go, assets.go, anonymize_helpers.go). While this is consistent and functional, using time.Now().UTC() is recommended for database portability and to avoid potential issues if the server's local timezone changes. Since discovered_at is stored as text in SQLite and used for comparisons, UTC ensures consistency regardless of deployment environment.

 	q = q.Set(goqu.Record{
 		"content_type":  contentType,
 		"data":          data,
-		"discovered_at": time.Now().Format("2006-01-02 15:04:05.999999999"),
+		"discovered_at": time.Now().UTC().Format("2006-01-02 15:04:05.999999999"),
 	})
pkg/anonymize/anonymize_test.go (2)

32-43: Consider adding edge case test for HashN when n exceeds hash length.

The test verifies truncation behavior and the prefix relationship, which is good. However, it would be valuable to test the behavior when n exceeds the full hash length to ensure the implementation handles this gracefully (e.g., returns the full hash without panicking).

+func TestHasher_HashN_ExceedsLength(t *testing.T) {
+	h := NewHasher("test-salt")
+
+	// Request more characters than the hash can provide
+	hashLarge := h.HashN("test-input", 1000)
+	require.NotEmpty(t, hashLarge, "HashN should handle large n values gracefully")
+}

127-141: Add assertion to verify the default salt is actually applied.

The test verifies the anonymizer is created but doesn't confirm the default salt was used. Consider adding an assertion that the hasher produces consistent output with another NewWithDefaults() instance.

 func TestNew_WithEmptySalt(t *testing.T) {
 	config := Config{
 		Salt: "",
 	}
 	a := New(config)
 	require.NotNil(t, a, "New should return an Anonymizer even with empty salt")
-	// Should use default salt
 	require.NotNil(t, a.hasher, "Anonymizer should have a hasher")
+
+	// Verify it uses the default salt by comparing output with NewWithDefaults
+	aDefault := NewWithDefaults()
+	require.Equal(t, a.hasher.Hash("test"), aDefault.hasher.Hash("test"), "Empty salt should use default salt")
 }
pkg/anonymize/resource_test.go (2)

115-150: Add test coverage for RoleTrait and SecretTrait.

The implementation includes anonymizeRoleTrait and anonymizeSecretTrait, but there are no corresponding tests. Consider adding tests to ensure these traits are properly anonymized.

func TestAnonymizeResource_WithRoleTrait(t *testing.T) {
	a := NewWithDefaults()

	roleTrait := &v2.RoleTrait{
		Profile: func() *structpb.Struct {
			s, _ := structpb.NewStruct(map[string]interface{}{
				"level": "admin",
			})
			return s
		}(),
	}

	annos := annotations.New(roleTrait)
	r := &v2.Resource{
		Id: &v2.ResourceId{
			ResourceType: "role",
			Resource:     "admin-role",
		},
		DisplayName: "Admin Role",
		Annotations: annos,
	}

	err := a.AnonymizeResource(r)
	require.NoError(t, err)

	rt := &v2.RoleTrait{}
	updatedAnnos := annotations.Annotations(r.GetAnnotations())
	found, err := updatedAnnos.Pick(rt)
	require.NoError(t, err)
	require.True(t, found)
	require.False(t, rt.HasProfile())

	require.Contains(t, r.GetDisplayName(), "Role ")
}

func TestAnonymizeResource_WithSecretTrait(t *testing.T) {
	a := NewWithDefaults()

	secretTrait := &v2.SecretTrait{
		CreatedById: &v2.ResourceId{
			ResourceType: "user",
			Resource:     "creator-123",
		},
	}

	annos := annotations.New(secretTrait)
	r := &v2.Resource{
		Id: &v2.ResourceId{
			ResourceType: "secret",
			Resource:     "api-key-123",
		},
		DisplayName: "API Key",
		Annotations: annos,
	}

	err := a.AnonymizeResource(r)
	require.NoError(t, err)

	st := &v2.SecretTrait{}
	updatedAnnos := annotations.Annotations(r.GetAnnotations())
	found, err := updatedAnnos.Pick(st)
	require.NoError(t, err)
	require.True(t, found)
	require.Contains(t, st.GetCreatedById().GetResourceType(), "type_")
}

248-276: Consider also asserting ResourceType determinism.

The test verifies DisplayName and Resource are deterministic, but the implementation also anonymizes ResourceType. Consider adding an assertion for completeness.

 	// Same input should produce same output
 	require.Equal(t, r1.GetDisplayName(), r2.GetDisplayName())
 	require.Equal(t, r1.GetId().GetResource(), r2.GetId().GetResource())
+	require.Equal(t, r1.GetId().GetResourceType(), r2.GetId().GetResourceType())
 }
pkg/anonymize/resource.go (1)

54-94: Silently ignoring errors from annos.Pick() may hide issues.

In getDisplayNamePrefix, errors from annos.Pick() are silently ignored. While the fallback behavior is reasonable, logging or tracking these errors could help with debugging malformed annotations.

Consider whether error visibility is important here:

 // Check for UserTrait
 ut := &v2.UserTrait{}
-if ok, _ := annos.Pick(ut); ok {
+if ok, err := annos.Pick(ut); err != nil {
+	// Could log error for debugging
+} else if ok {
 	return "User"
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48199c2 and 202a488.

📒 Files selected for processing (14)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
pkg/anonymize/entitlement.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (49-52)
pkg/anonymize/grant_test.go (1)
pkg/anonymize/anonymize.go (1)
  • NewWithDefaults (66-68)
pkg/anonymize/processor.go (2)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (49-52)
pkg/dotc1z/c1file.go (2)
  • NewC1ZFile (174-210)
  • C1File (36-54)
pkg/anonymize/anonymize_test.go (1)
pkg/anonymize/anonymize.go (9)
  • NewHasher (77-81)
  • DefaultConfig (40-46)
  • AssetHandlingRemove (17-17)
  • AssetHandling (13-13)
  • NewWithDefaults (66-68)
  • Config (25-37)
  • New (55-63)
  • AssetHandlingReplace (21-21)
  • AssetHandlingKeep (19-19)
pkg/anonymize/misc_test.go (2)
pkg/anonymize/anonymize.go (1)
  • NewWithDefaults (66-68)
pkg/anonymize/misc.go (2)
  • PlaceholderAssetData (41-48)
  • PlaceholderContentType (51-51)
pkg/anonymize/annotations.go (3)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (49-52)
pkg/annotations/annotations.go (1)
  • Annotations (12-12)
vendor/github.com/doug-martin/goqu/v9/update_dataset.go (1)
  • Update (29-31)
pkg/dotc1z/anonymize_helpers.go (3)
pkg/dotc1z/c1file.go (1)
  • C1File (36-54)
vendor/github.com/doug-martin/goqu/v9/delete_dataset.go (1)
  • Delete (31-33)
vendor/github.com/doug-martin/goqu/v9/update_dataset.go (1)
  • Update (29-31)
pkg/anonymize/grant.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (49-52)
pkg/anonymize/misc.go (1)
pkg/anonymize/anonymize.go (5)
  • Anonymizer (49-52)
  • AssetHandling (13-13)
  • AssetHandlingRemove (17-17)
  • AssetHandlingReplace (21-21)
  • AssetHandlingKeep (19-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, windows-latest)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
🔇 Additional comments (27)
pkg/anonymize/annotations.go (3)

1-43: LGTM! Consistent annotation anonymization pattern for ResourceType.

The implementation correctly handles nil guards, uses the annotations wrapper pattern, and properly anonymizes ChildResourceType IDs and ExternalLink URLs. The comment on line 39 documenting that marker annotations don't contain PII is helpful.


45-114: LGTM! Comprehensive grant annotation handling.

The method properly handles all grant-related annotation types:

  • GrantExpandable - anonymizes entitlement and resource type IDs
  • GrantMetadata - clears arbitrary metadata
  • GrantImmutable - hashes source_id and clears metadata
  • ExternalLink - anonymizes URLs

Error propagation is correct throughout.


116-150: LGTM! Entitlement annotation handling follows the established pattern.

Properly clears EntitlementImmutable metadata and anonymizes the source_id and external links.

pkg/anonymize/field_coverage_test.go (3)

13-25: Well-designed field policy enum.

The four-value enum provides clear semantics for how each field should be handled during anonymization. This is an excellent approach for making anonymization policies explicit and maintainable.


215-272: Solid validation helpers for field coverage.

validateFieldCoverage ensures no fields are missed, while validateNoPolicyForMissingFields catches stale policies for removed/renamed fields. This bi-directional validation is a robust approach to maintain policy-field synchronization.


619-685: Excellent defensive test for schema evolution.

The TestNestedTypeCoverage test combined with knownNestedMessageTypes is a clever mechanism to catch new nested message types added to protobufs. This ensures that new types can't slip through without explicit anonymization handling.

pkg/anonymize/anonymize.go (2)

70-98: LGTM! Sound cryptographic approach for deterministic anonymization.

Using HMAC-SHA256 with a salt is appropriate for generating deterministic but unpredictable anonymized values. The HashN truncation is safe since it checks bounds before slicing.


100-154: LGTM! Consistent and human-readable anonymization outputs.

The various Anonymize* methods provide:

  • Deterministic outputs (same input → same output)
  • Human-readable prefixes (e.g., user_, login_, emp_)
  • Appropriate truncation lengths for readability while maintaining uniqueness
pkg/anonymize/grant_test.go (3)

10-51: Good test coverage for basic grant anonymization.

The test validates that Grant ID, Entitlement, and Principal fields are properly anonymized. The assertions correctly verify that original values are not present in the output.


53-75: Good test for source key anonymization.

This test verifies that the map keys in GrantSources are anonymized. The length check ensures keys aren't lost during remapping.


127-162: Deterministic test is well-structured.

This test correctly validates the deterministic property by comparing two identical grants after anonymization. This is crucial for maintaining referential integrity across anonymized data.

pkg/anonymize/grant.go (1)

7-43: LGTM! Clean implementation following consistent patterns.

The AnonymizeGrant method properly:

  • Guards against nil input
  • Anonymizes all relevant fields
  • Delegates to appropriate sub-anonymizers
  • Propagates errors correctly
pkg/anonymize/entitlement.go (1)

8-55: LGTM! Well-structured anonymization with proper error handling.

The implementation correctly handles nil inputs, conditionally anonymizes descriptions based on configuration, and properly propagates errors from nested anonymization calls. The comment about Purpose being an enum (not PII) is helpful documentation.

pkg/anonymize/misc_test.go (1)

10-83: LGTM! Good test coverage for ResourceType anonymization and placeholder assets.

The tests appropriately validate basic anonymization behavior, nil handling, deterministic output, and PNG magic number verification. The determinism test is particularly important for ensuring consistent anonymization across runs.

pkg/anonymize/misc.go (1)

8-37: LGTM! Consistent anonymization pattern.

The implementation follows the same pattern as other anonymizers with proper nil handling, conditional description anonymization, and error propagation from annotation processing.

pkg/dotc1z/anonymize_helpers.go (1)

11-33: LGTM! Clean implementation with proper tracing and validation.

The method follows consistent patterns: tracing, validation, query building, execution, and state update. The unconditional DELETE is appropriate for the "clear all assets" use case.

pkg/anonymize/processor.go (1)

26-59: LGTM! Solid entry point with proper cleanup on error.

The approach of copying first then modifying in-place is sound. Error handling correctly cleans up the output file on failure, and the file is properly closed to persist changes on success.

pkg/anonymize/anonymize_test.go (2)

9-20: LGTM!

Good test coverage for deterministic hashing behavior - verifies same input produces same output, and different inputs produce different outputs.


143-184: LGTM!

Comprehensive table-driven test covering all three AssetHandling modes with appropriate assertions for ShouldRemove, ReplacementData, and ReplacementContentType.

pkg/anonymize/resource_test.go (2)

12-38: LGTM!

Good basic test covering display name, description, and resource ID anonymization with appropriate assertions.


40-113: LGTM!

Thorough test for UserTrait anonymization covering emails, login, aliases, employee IDs, profile clearing, and structured name fields.

pkg/anonymize/resource.go (6)

9-52: LGTM!

Well-structured main anonymization function with proper nil handling, conditional description anonymization based on config, and appropriate error propagation from annotation processing.


96-108: LGTM!

Proper nil check and consistent anonymization of both resource identifier and resource type fields.


126-177: LGTM!

Comprehensive trait annotation handling with proper error checking and consistent pattern across all trait types.


179-253: LGTM!

Thorough UserTrait anonymization covering all sensitive fields (emails, login, aliases, employee IDs, structured name, profile, icon, status details, timestamps). Good documentation on what's kept and why.


255-297: LGTM!

Consistent implementation for GroupTrait, RoleTrait, and AppTrait anonymization with appropriate field handling.


299-322: LGTM!

Good handling of SecretTrait including clearing profile, anonymizing associated resource IDs, and removing timestamps.

@kans kans force-pushed the kans/anonymize-c1zs branch from 202a488 to 1deff93 Compare December 16, 2025 16:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
pkg/anonymize/grant.go (1)

45-66: Potential map key collision (acknowledged).

The collision concern from the past review is valid but extremely unlikely with 12 hex characters (48 bits of entropy). For practical source counts, this is negligible. The existing implementation is acceptable.

pkg/anonymize/processor.go (1)

273-293: copyFile doesn't clean up destination on copy failure.

This was flagged in a previous review. If io.Copy or Sync fails, a partial destination file remains. Apply the suggested fix from the past review to ensure cleanup.

🧹 Nitpick comments (1)
pkg/anonymize/anonymize.go (1)

100-154: Length-preserving anonymization methods are well-designed.

All anonymization methods correctly preserve the original length using HashN, which maintains data structure consistency. The approach is appropriate for the use case.

Minor note: AnonymizeDisplayName has an unused prefix parameter (Line 106). If this is for future extensibility or API compatibility, consider adding a comment explaining the design choice.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 202a488 and 1deff93.

📒 Files selected for processing (14)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/anonymize/entitlement.go
  • pkg/anonymize/misc.go
  • pkg/anonymize/grant_test.go
🧰 Additional context used
🧬 Code graph analysis (7)
pkg/anonymize/entitlement_test.go (1)
pkg/anonymize/anonymize.go (3)
  • NewWithDefaults (66-68)
  • DefaultConfig (40-46)
  • New (55-63)
pkg/anonymize/grant.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (49-52)
pkg/anonymize/anonymize_test.go (2)
vendor/github.com/doug-martin/goqu/v9/expressions.go (1)
  • T (232-234)
pkg/anonymize/anonymize.go (9)
  • NewHasher (77-81)
  • DefaultConfig (40-46)
  • AssetHandlingRemove (17-17)
  • AssetHandling (13-13)
  • NewWithDefaults (66-68)
  • Config (25-37)
  • New (55-63)
  • AssetHandlingReplace (21-21)
  • AssetHandlingKeep (19-19)
pkg/anonymize/processor.go (2)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (49-52)
pkg/dotc1z/c1file.go (2)
  • NewC1ZFile (174-210)
  • C1File (36-54)
pkg/anonymize/misc_test.go (2)
pkg/anonymize/anonymize.go (1)
  • NewWithDefaults (66-68)
pkg/anonymize/misc.go (2)
  • PlaceholderAssetData (41-48)
  • PlaceholderContentType (51-51)
pkg/dotc1z/anonymize_helpers.go (3)
pkg/dotc1z/c1file.go (1)
  • C1File (36-54)
vendor/github.com/doug-martin/goqu/v9/delete_dataset.go (1)
  • Delete (31-33)
vendor/github.com/doug-martin/goqu/v9/update_dataset.go (1)
  • Update (29-31)
pkg/anonymize/resource.go (3)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (49-52)
pkg/types/tasks/tasks.go (1)
  • GetResourceType (92-92)
pkg/annotations/annotations.go (1)
  • Annotations (12-12)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
  • GitHub Check: go-test (1.25.2, windows-latest)
🔇 Additional comments (38)
pkg/dotc1z/anonymize_helpers.go (3)

10-33: LGTM!

The ClearAssets method correctly validates the DB, builds a DELETE query using goqu, executes it, and marks the DB as updated. The tracing span and error handling are appropriate.


35-65: LGTM!

The ReplaceAllAssets method correctly updates all assets with placeholder data. The timestamp formatting follows the SQLite datetime format pattern.


67-90: LGTM!

The ClearAllSessions method follows the same consistent pattern as ClearAssets for deleting session store data.

pkg/anonymize/grant.go (1)

8-43: LGTM!

The AnonymizeGrant method correctly handles nil input and systematically anonymizes all grant components: ID, embedded entitlement, principal resource, sources, and annotations. Error propagation is properly implemented.

pkg/anonymize/misc_test.go (4)

10-42: LGTM!

Good coverage testing that anonymization preserves length for ID and DisplayName, replaces Description with "[ANONYMIZED]", and preserves Traits. The assertions are appropriate.


44-49: LGTM!

Proper nil-input handling test.


51-73: LGTM!

Good test for deterministic behavior - same input produces same output.


75-85: LGTM!

Good validation of placeholder PNG asset data including magic number verification.

pkg/anonymize/entitlement_test.go (4)

10-52: LGTM!

Comprehensive test covering ID, DisplayName, Description, Slug, and embedded Resource anonymization with proper length preservation assertions.


54-78: LGTM!

Good coverage for nil entitlement and nil resource edge cases.


80-102: LGTM!

Determinism test ensures consistent output for identical inputs.


104-120: LGTM!

Good test for the AnonymizeDescriptions configuration option.

pkg/anonymize/resource_test.go (6)

12-43: LGTM!

Good basic test covering DisplayName, Description, and ResourceId anonymization with length preservation.


45-137: LGTM!

Excellent comprehensive test for UserTrait anonymization covering emails, login, aliases, employee IDs, profile clearing, and structured name fields. All assertions properly verify length preservation and value changes.


139-177: LGTM!

Good test for GroupTrait ensuring profile is cleared and DisplayName is anonymized with length preservation.


179-217: LGTM!

Good test for AppTrait covering HelpUrl anonymization and profile clearing.


219-273: LGTM!

Good coverage for ExternalId and ParentResourceId anonymization with proper assertions.


275-331: LGTM!

Good coverage for nil handling, determinism, and the PreserveDescriptions config option.

pkg/anonymize/annotations.go (3)

8-43: LGTM!

The anonymizeResourceTypeAnnotations method correctly processes ChildResourceType and ExternalLink annotations with proper nil guards and error handling.


45-114: LGTM!

The anonymizeGrantAnnotations method thoroughly handles GrantExpandable IDs, clears GrantMetadata and GrantImmutable metadata, hashes source IDs, and anonymizes ExternalLink URLs. Error propagation is correct.


116-150: LGTM!

The anonymizeEntitlementAnnotations method properly clears EntitlementImmutable metadata, hashes source IDs, and anonymizes ExternalLink URLs.

pkg/anonymize/processor.go (8)

13-22: LGTM!

Good stats struct for tracking anonymization metrics.


24-59: LGTM!

The AnonymizeC1ZFile function correctly handles the workflow: copies input, processes, and cleans up output on error. Proper error wrapping with %w.


61-94: LGTM!

Clear orchestration of the processing stages with proper error wrapping for each stage.


96-130: LGTM!

Correct pagination pattern for processing resource types.


132-166: LGTM!

Consistent pagination pattern for resources.


168-202: LGTM!

Consistent pagination pattern for entitlements.


204-238: LGTM!

Consistent pagination pattern for grants.


240-271: LGTM!

Asset and session processing correctly use the configuration-based handlers.

pkg/anonymize/anonymize_test.go (1)

1-209: Excellent test coverage for the anonymization framework!

The test suite is comprehensive and well-structured:

  • Deterministic hashing behavior is properly validated across different inputs and salts
  • Length preservation is tested for all anonymization methods
  • Asset handling modes and configuration defaults are thoroughly covered
  • The tests will catch regressions in core anonymization behavior
pkg/anonymize/anonymize.go (1)

83-98: Solid hashing implementation using HMAC-SHA256.

The use of HMAC-SHA256 provides deterministic, cryptographically-sound hashing for anonymization. The HashN truncation logic correctly handles cases where the requested length might exceed the hash length.

pkg/anonymize/resource.go (4)

8-52: Well-structured resource anonymization with proper nil handling.

The AnonymizeResource method follows a clear, systematic approach:

  • Nil guard prevents panics
  • Context-aware display name anonymization using type-based prefixes
  • Comprehensive coverage of all ID fields
  • Proper error propagation from trait annotation processing

179-253: Excellent comprehensive PII scrubbing for UserTrait.

The UserTrait anonymization demonstrates strong privacy engineering:

  • All PII fields are properly anonymized (emails, logins, employee IDs, structured names)
  • Potentially identifying metadata is cleared (prefix/suffix like "Dr.", "Jr.")
  • Profile pictures and arbitrary profile data are removed
  • Timestamps are cleared to prevent fingerprinting
  • Analytical value is preserved by keeping non-identifying enum values

This thorough approach significantly reduces re-identification risk.


255-297: Consistent trait anonymization across Group, Role, and App types.

The anonymization approach is appropriately tailored to each trait type:

  • Profile fields are consistently cleared to remove arbitrary PII
  • Visual identifiers (icons, logos) are removed as they can be identifying
  • URLs are anonymized while preserving structure
  • Non-identifying metadata (flags) is preserved for analytical value

299-322: Appropriate SecretTrait anonymization with timestamp clearing.

The SecretTrait anonymization correctly:

  • Clears timestamps (created, expires, last used) to prevent temporal fingerprinting
  • Anonymizes resource IDs that could identify users (CreatedById, IdentityId)
  • Removes arbitrary profile data

The comment explaining that timestamps are "not used by SDK" provides helpful context for the design decision.

pkg/anonymize/field_coverage_test.go (3)

13-272: Excellent self-documenting field policy maps for transparency.

The field policy maps provide clear documentation of the anonymization strategy:

  • Each field's handling is explicitly categorized (Anonymize, Safe, Recurse, Clear)
  • Comments explain the privacy rationale for each decision
  • The validation helpers ensure completeness and catch typos
  • This makes the anonymization strategy transparent and auditable

This is exemplary documentation for privacy-sensitive code.


274-617: Robust multi-layered test structure for field coverage validation.

The testing approach provides excellent coverage:

  • Individual tests enable granular failure diagnosis
  • Consolidated TestFieldCoverage_AllTypes provides CI visibility
  • Invariant tests (PII fields, profile clearing) validate critical requirements
  • Tests will immediately catch when new fields are added without anonymization policies

This layered approach ensures the anonymization framework remains complete as the codebase evolves.


619-685: Outstanding future-proofing mechanism for nested message types!

The TestNestedTypeCoverage test is an excellent engineering practice:

  • Automatically detects when new nested message types are added to protobufs
  • Forces explicit handling of new types before code can merge
  • Prevents silent anonymization gaps that could leak PII
  • The whitelist approach (knownNestedMessageTypes) makes the coverage explicit

This proactive approach ensures the anonymization framework remains comprehensive as the protobuf schema evolves. This pattern should be considered for other privacy-critical features.

@kans kans force-pushed the kans/anonymize-c1zs branch from 1deff93 to cd2da51 Compare December 16, 2025 16:57
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
pkg/dotc1z/anonymize_helpers.go (1)

45-51: Consider using a deterministic timestamp for anonymization.

The use of time.Now() on line 50 means that anonymizing the same file multiple times will produce different timestamps, breaking determinism. For anonymization workflows, consider using a fixed timestamp or a deterministically derived value.

Apply this diff to use a fixed timestamp:

 	q = q.Set(goqu.Record{
 		"content_type":  contentType,
 		"data":          data,
-		"discovered_at": time.Now().Format("2006-01-02 15:04:05.999999999"),
+		"discovered_at": "2000-01-01 00:00:00.000000000",
 	})
pkg/anonymize/processor.go (1)

239-247: Assets are unconditionally deleted without configuration check.

Unlike processSessions (which checks ShouldClearSessionStore()), this method unconditionally deletes all assets. Consider whether assets should also be controlled by a configuration flag like ShouldDeleteAssets() for consistency.

pkg/anonymize/anonymize.go (2)

79-82: Length preservation may not be necessary and reduces anonymization strength.

Preserving the original length leaks information about the input. For short email addresses, combined with the HashN collision risk, this weakens anonymization. Unless there's a specific requirement to preserve length (e.g., database column constraints), consider using a fixed-length hash output instead.

If length preservation isn't required:

-func (h *Hasher) AnonymizeEmail(email string) string {
-	return h.HashN(email, len(email))
-}
+func (h *Hasher) AnonymizeEmail(email string) string {
+	return h.HashN(email, 16) // Fixed length, reduces information leakage
+}

84-87: Remove unused parameter or document its purpose.

The second parameter is explicitly ignored via _. If this parameter is reserved for future use (e.g., to use the prefix in the anonymization), document this intent. Otherwise, remove it to simplify the API.

If not needed:

-func (h *Hasher) AnonymizeDisplayName(name string, _ string) string {
+func (h *Hasher) AnonymizeDisplayName(name string) string {
 	return h.HashN(name, len(name))
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1deff93 and cd2da51.

📒 Files selected for processing (14)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pkg/anonymize/anonymize_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/anonymize/grant.go
  • pkg/anonymize/misc.go
🧰 Additional context used
🧬 Code graph analysis (6)
pkg/anonymize/processor.go (2)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/dotc1z/c1file.go (2)
  • NewC1ZFile (174-210)
  • C1File (36-54)
pkg/dotc1z/anonymize_helpers.go (3)
pkg/dotc1z/c1file.go (1)
  • C1File (36-54)
vendor/github.com/doug-martin/goqu/v9/delete_dataset.go (1)
  • Delete (31-33)
vendor/github.com/doug-martin/goqu/v9/update_dataset.go (1)
  • Update (29-31)
pkg/anonymize/resource_test.go (3)
pkg/types/tasks/tasks.go (1)
  • GetResourceType (92-92)
pkg/anonymize/anonymize.go (1)
  • New (34-42)
pkg/annotations/annotations.go (1)
  • Annotations (12-12)
pkg/anonymize/resource.go (2)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/annotations/annotations.go (1)
  • Annotations (12-12)
pkg/anonymize/entitlement.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/field_coverage_test.go (1)
vendor/google.golang.org/protobuf/reflect/protoreflect/type.go (1)
  • Descriptor (21-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
  • GitHub Check: go-test (1.25.2, windows-latest)
🔇 Additional comments (22)
pkg/dotc1z/anonymize_helpers.go (2)

10-33: LGTM!

The ClearAssets method follows the established patterns for database operations with proper tracing, validation, and error handling.


67-90: LGTM!

The ClearAllSessions method follows the same correct pattern as ClearAssets with proper error handling and state management.

pkg/anonymize/grant_test.go (1)

1-179: LGTM!

The test suite provides comprehensive coverage of grant anonymization including:

  • Basic field anonymization with length preservation
  • Nested resource and entitlement handling
  • Sources map key anonymization
  • Nil input handling at multiple levels
  • Deterministic behavior verification
pkg/anonymize/misc_test.go (1)

1-73: LGTM!

The ResourceType anonymization tests provide good coverage of:

  • Field anonymization with length preservation
  • Description replacement with "[ANONYMIZED]"
  • Trait preservation (non-PII enum values)
  • Nil input handling
  • Deterministic behavior
pkg/anonymize/entitlement.go (1)

7-55: LGTM!

The AnonymizeEntitlement method correctly handles:

  • Nil input guard
  • In-place field anonymization with appropriate methods
  • Recursive anonymization of embedded Resource
  • Loop-based anonymization of GrantableTo ResourceTypes
  • Annotation handling with error propagation
  • Preservation of Purpose enum (non-PII)
pkg/anonymize/entitlement_test.go (1)

1-102: LGTM!

Comprehensive test suite covering:

  • Full field anonymization (ID, DisplayName, Description, Slug)
  • Embedded resource anonymization
  • Nil handling at multiple levels
  • Length preservation verification
  • Deterministic behavior
pkg/anonymize/resource_test.go (1)

1-310: LGTM!

Excellent comprehensive test coverage including:

  • Basic resource field anonymization
  • Detailed trait-specific anonymization (UserTrait, GroupTrait, AppTrait)
  • Complex nested structures (emails, login aliases, employee IDs, structured names)
  • Profile clearing behavior
  • ExternalID and ParentResourceID handling
  • URL anonymization patterns
  • Nil handling and deterministic behavior
pkg/anonymize/processor.go (5)

13-58: LGTM!

The AnonymizeC1ZFile entry point follows good patterns:

  • Proper error handling with context
  • Cleanup on error (removes partial output file)
  • Default output path generation
  • Clear workflow: copy → open → process → close

60-93: LGTM!

The processC1File coordinator properly sequences all processing steps with clear error messages for each stage.


95-237: LGTM!

The entity processing methods follow a consistent, correct pattern:

  • Pagination with 1000-item pages
  • In-place anonymization of each entity
  • Batch write-back for efficiency
  • Proper statistics tracking
  • Error propagation

260-287: LGTM!

The copyFile implementation properly handles error cases with cleanup, addressing the concern from the previous review. The function now:

  • Closes and removes the destination file on copy failure
  • Closes and removes the destination file on sync failure
  • Properly closes the file on success

249-258: No action needed. The ShouldClearSessionStore() method is properly implemented on the Anonymizer type in pkg/anonymize/misc.go and has corresponding test coverage in pkg/anonymize/anonymize_test.go.

Likely an incorrect or invalid review comment.

pkg/anonymize/annotations.go (3)

8-43: LGTM!

The anonymizeResourceTypeAnnotations method correctly:

  • Guards against nil input
  • Uses the pick-modify-update pattern for annotations
  • Anonymizes ChildResourceType IDs
  • Anonymizes ExternalLink URLs
  • Properly documents non-PII marker annotations

45-114: LGTM!

The anonymizeGrantAnnotations method comprehensively handles:

  • GrantExpandable with entitlement and resource type ID anonymization
  • GrantMetadata clearing
  • GrantImmutable with source ID hashing and metadata clearing
  • ExternalLink URL anonymization
  • Proper error handling throughout

116-150: LGTM!

The anonymizeEntitlementAnnotations method correctly:

  • Guards against nil input
  • Hashes EntitlementImmutable source IDs
  • Clears metadata that might contain PII
  • Anonymizes ExternalLink URLs
  • Follows the established annotation handling pattern
pkg/anonymize/anonymize.go (1)

109-112: LGTM!

The fixed anonymized domain with an 8-character hash suffix provides a good balance between brevity and collision resistance for typical datasets.

pkg/anonymize/resource.go (3)

8-52: LGTM!

The resource anonymization logic is comprehensive and handles all relevant fields. The in-place modification approach is efficient, and the nil-safety checks are appropriate.


179-253: Excellent privacy-focused implementation.

The user trait anonymization is thorough and well-considered. Particularly good decisions:

  • Clearing prefix/suffix from names (line 231-233) to prevent identification
  • Clearing timestamps to avoid fingerprinting (line 247-249)
  • Clear comments explaining what's kept and why (line 251-252)

255-322: LGTM!

The trait-specific anonymization methods are well-implemented with appropriate handling of profile data (cleared) vs. identifiers (anonymized). The consistency across traits makes the code maintainable.

pkg/anonymize/field_coverage_test.go (3)

13-213: Excellent comprehensive field policy documentation.

The field policy maps are thorough and well-documented. The comments explaining why each field receives its policy (e.g., lines 86-87 noting status details may contain PII) are particularly valuable for maintainability.


215-272: Excellent bidirectional validation approach.

The combination of validateFieldCoverage and validateNoPolicyForMissingFields ensures both that no fields are missed AND that policy maps don't reference removed fields. This will catch issues immediately when protobufs are updated, preventing silent failures.


656-685: Outstanding future-proofing mechanism.

This test ensures that when new nested message types are added to the protobufs, they can't silently slip through without explicit anonymization policies. The test will fail with a clear message directing developers to add the new type to knownNestedMessageTypes and create a policy map. This is exactly the kind of defensive programming that prevents security/privacy issues.

@kans kans force-pushed the kans/anonymize-c1zs branch from cd2da51 to f900cc5 Compare December 16, 2025 17:02
@kans kans changed the title DRAFT anonymize a c1z anonymize a c1z Dec 16, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (4)
pkg/anonymize/grant.go (1)

45-66: Potential hash collision acknowledged in past review.

The map key remapping could theoretically cause collisions if two different keys hash to the same value. This was already flagged in a previous review. Given the 12-character hex output (48 bits of entropy), collision probability is negligible for practical source counts.

pkg/anonymize/processor.go (1)

276-302: Double-close of destFile on error paths.

The defer destFile.Close() on line 287 combined with explicit destFile.Close() calls on lines 291, 297, and 301 will cause a double-close. While this typically doesn't cause issues in Go (second close returns an error that's ignored), it's not clean.

 func copyFile(src, dst string) error {
 	sourceFile, err := os.Open(src)
 	if err != nil {
 		return err
 	}
 	defer sourceFile.Close()

 	destFile, err := os.Create(dst)
 	if err != nil {
 		return err
 	}
-	defer destFile.Close()

 	_, err = io.Copy(destFile, sourceFile)
 	if err != nil {
 		destFile.Close()
 		os.Remove(dst)
 		return err
 	}

 	if err := destFile.Sync(); err != nil {
 		destFile.Close()
 		os.Remove(dst)
 		return err
 	}
 	return destFile.Close()
 }
pkg/anonymize/anonymize.go (2)

21-24: Security concern: hardcoded default salt is publicly visible.

The default salt is a known value in public code, which means anonymization using the default can be reversed by anyone with the codebase and knowledge of the input domain. This was previously flagged in earlier reviews.


71-77: Collision risk with short truncated hashes.

When n is very small (e.g., 2-3 characters), truncating the hash dramatically increases collision probability. This was previously flagged in earlier reviews.

🧹 Nitpick comments (6)
pkg/anonymize/field_coverage_test.go (1)

666-678: Consider using protoreflect.MessageKind constant.

The string comparison works but using the constant is more idiomatic and refactor-safe.

+	"google.golang.org/protobuf/reflect/protoreflect"
-			if field.Kind().String() == "message" {
+			if field.Kind() == protoreflect.MessageKind {
pkg/anonymize/table_coverage_test.go (1)

22-33: Consider adding automated verification against sql_helpers.go.

The comment states this list must match allTableDescriptors in pkg/dotc1z/sql_helpers.go, but this relies on manual synchronization. If someone updates sql_helpers.go without updating this file, the tests will fail with a helpful message, which is good. However, you could strengthen this by programmatically importing or referencing the actual table list if it's exported.

#!/bin/bash
# Check if allTableDescriptors is exported in sql_helpers.go
rg -n "allTableDescriptors" pkg/dotc1z/sql_helpers.go
pkg/anonymize/anonymize_test.go (1)

60-71: Clarify test comment: the prefix parameter is unused for determinism.

The comment "prefix ignored" on line 70 is slightly confusing. It appears the AnonymizeDisplayName method accepts a prefix parameter (e.g., "User", "Group") but doesn't use it for the hash computation, only the original value matters. Consider renaming the test or updating the comment to clarify why different prefixes produce the same result.

-	// Second call should be deterministic
-	name2 := h.AnonymizeDisplayName(original, "Group")
-	require.Equal(t, name, name2, "Display name anonymization should be deterministic (prefix ignored)")
+	// Second call with different prefix should produce same result (prefix is not part of hash input)
+	name2 := h.AnonymizeDisplayName(original, "Group")
+	require.Equal(t, name, name2, "Display name anonymization depends only on input value, not prefix")
pkg/anonymize/processor.go (1)

101-135: Consider checking context cancellation in pagination loops.

For large c1z files, the pagination loops could run for a while. If the context is cancelled (e.g., timeout or user cancellation), the loop would continue until the next List call fails. Consider adding an explicit context check at the start of each iteration.

 func (a *Anonymizer) processResourceTypes(ctx context.Context, c1f *dotc1z.C1File, stats *ProcessorStats) error {
 	pageToken := ""
 	for {
+		if err := ctx.Err(); err != nil {
+			return err
+		}
+
 		req := v2.ResourceTypesServiceListResourceTypesRequest_builder{
 			PageSize:  1000,
 			PageToken: pageToken,
 		}.Build()

The same pattern could be applied to processResources, processEntitlements, and processGrants.

pkg/anonymize/anonymize.go (2)

34-42: Consider returning an error instead of panicking.

The constructor panics when config.Salt is empty (line 36). Library code typically returns errors rather than panicking, which gives callers better control over error handling and recovery.

Apply this diff to return an error:

-// New creates a new Anonymizer with the given configuration.
-func New(config Config) *Anonymizer {
+// New creates a new Anonymizer with the given configuration.
+// Returns an error if the configuration is invalid.
+func New(config Config) (*Anonymizer, error) {
 	if config.Salt == "" {
-		panic("salt is required")
+		return nil, fmt.Errorf("salt is required")
 	}
 	return &Anonymizer{
 		config: config,
 		hasher: NewHasher(config.Salt),
-	}
+	}, nil
 }

Note: This would require updating all callers and importing fmt.


85-86: Unused parameter in AnonymizeDisplayName.

The second parameter is unused (indicated by _). This may be for future extensibility (e.g., using a prefix), but it creates an inconsistent API signature compared to other methods.

If the parameter isn't needed yet, consider removing it for consistency:

-func (h *Hasher) AnonymizeDisplayName(name string, _ string) string {
+func (h *Hasher) AnonymizeDisplayName(name string) string {
 	return h.HashN(name, len(name))
 }

Or add a comment explaining its purpose:

// AnonymizeDisplayName generates an anonymized display name, preserving the original length.
// The prefix parameter is reserved for future use to provide context-aware anonymization.
func (h *Hasher) AnonymizeDisplayName(name string, prefix string) string {
	// TODO: Use prefix for context-aware anonymization
	return h.HashN(name, len(name))
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd2da51 and f900cc5.

📒 Files selected for processing (15)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/anonymize/table_coverage_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/anonymize/entitlement.go
  • pkg/anonymize/resource_test.go
  • pkg/anonymize/misc.go
  • pkg/anonymize/entitlement_test.go
  • pkg/anonymize/grant_test.go
🧰 Additional context used
🧬 Code graph analysis (5)
pkg/anonymize/processor.go (2)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/dotc1z/c1file.go (2)
  • NewC1ZFile (174-210)
  • C1File (36-54)
pkg/anonymize/anonymize_test.go (1)
pkg/anonymize/anonymize.go (3)
  • NewHasher (56-60)
  • New (34-42)
  • Config (13-18)
pkg/dotc1z/anonymize_helpers.go (3)
pkg/dotc1z/c1file.go (1)
  • C1File (36-54)
vendor/github.com/doug-martin/goqu/v9/delete_dataset.go (1)
  • Delete (31-33)
vendor/github.com/doug-martin/goqu/v9/update_dataset.go (1)
  • Update (29-31)
pkg/anonymize/grant.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/field_coverage_test.go (1)
vendor/google.golang.org/protobuf/reflect/protoreflect/type.go (1)
  • Descriptor (21-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, windows-latest)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
🔇 Additional comments (31)
pkg/anonymize/field_coverage_test.go (9)

1-11: LGTM!

Clean package declaration and imports appropriate for protobuf reflection-based testing.


13-25: LGTM!

Well-documented enum with clear semantics for each anonymization policy type.


27-213: LGTM!

Comprehensive policy maps with clear documentation explaining the rationale for each field's anonymization treatment. The empty grantSourcesGrantSourceFieldPolicies map for future-proofing is a good defensive practice.


215-272: LGTM!

Excellent bidirectional validation: validateFieldCoverage catches missing policies while validateNoPolicyForMissingFields catches stale/typo entries. Error messages are actionable with clear guidance.


274-497: LGTM!

Consistent test pattern across all types. Having both individual tests and the aggregate test (below) provides good CI visibility and failure isolation.


499-543: LGTM!

Well-structured table-driven test. The aggregate test complements individual tests by providing a single test for CI visibility.

Consider: The individual tests and this aggregate test must be kept in sync manually. If a new type is added, it's easy to add the individual test but forget the aggregate. This is low-risk since the individual tests would still catch issues.


545-570: LGTM!

Helper functions are appropriate for test usage. The nondeterministic order from getAnonymizedFields is acceptable since consumers only check membership.


572-617: LGTM!

Good semantic validation beyond mere coverage. These tests ensure critical PII fields remain marked for anonymization and profile fields are consistently cleared, preventing accidental policy regressions.


619-685: LGTM!

Excellent defensive test to catch new nested message types that lack anonymization policies. The error message format clearly shows the parent-child relationship for easy debugging.

pkg/anonymize/table_coverage_test.go (1)

40-63: LGTM!

The bidirectional coverage tests are well-designed. They ensure both that all tables have handlers and that no orphaned handlers exist. The error messages are clear and actionable.

pkg/anonymize/misc_test.go (2)

10-42: LGTM!

The test covers key anonymization behaviors: length preservation for ID and DisplayName, description replacement, and trait preservation. The assertions are appropriate and the test structure is clear.


44-73: LGTM!

Good coverage of edge cases and determinism. The deterministic test is particularly important for ensuring consistent anonymization across multiple runs with the same salt.

pkg/anonymize/grant.go (1)

8-43: LGTM!

The method follows a clean pattern: nil guard, ID anonymization, recursive anonymization of nested structures, and annotation handling. Error propagation is consistent.

pkg/anonymize/anonymize_test.go (4)

11-32: LGTM!

Good tests for deterministic behavior and salt differentiation. These are fundamental properties of the hashing system.


34-45: LGTM!

The prefix relationship test (line 44) is a clever verification that HashN truncates consistently from the same underlying hash.


73-137: LGTM!

Comprehensive coverage of all anonymization methods with consistent test patterns: length preservation, value change, and determinism where applicable.


139-167: LGTM!

Good coverage of configuration validation and construction paths, including the panic test for empty salt.

pkg/anonymize/processor.go (2)

14-22: LGTM!

ProcessorStats provides useful metrics for tracking the anonymization process. The boolean flags for assets, sessions, and sync runs are appropriate since those are all-or-nothing operations.


24-59: LGTM!

Good error handling with cleanup of the output file on failure. The copy-then-modify approach ensures the original file is never touched.

pkg/anonymize/annotations.go (3)

10-43: LGTM!

Clean implementation following the pick-modify-update pattern. The nil guard and error handling are appropriate.


45-114: LGTM!

Comprehensive grant annotation processing. The handling of GrantExpandable with nested ID arrays, clearing of metadata, and source ID hashing all look correct.


116-150: LGTM!

Consistent pattern with entitlement annotations. The source ID hashing and metadata clearing align with the grant annotation handling.

pkg/dotc1z/anonymize_helpers.go (3)

11-33: LGTM: Full-table deletion is appropriate for anonymization.

The method correctly deletes all assets from the database. The lack of a WHERE clause is intentional for anonymization purposes.


68-90: LGTM: Session clearing is correctly implemented.

The method properly deletes all session data from the database as part of the anonymization workflow.


94-123: LGTM: Correctly uses fixed epoch for deterministic anonymization.

The method appropriately uses time.Unix(0, 0) to ensure deterministic output, which aligns with the anonymization framework's goals.

pkg/anonymize/resource.go (6)

97-108: LGTM: ResourceId anonymization is correctly implemented.

The method appropriately anonymizes both the resource and resourceType fields in place, maintaining consistency with the ResourceType.Id hashing approach.


111-124: LGTM: ExternalId anonymization is correctly implemented.

The method properly anonymizes all sensitive fields of an ExternalId with appropriate transformations.


127-177: LGTM: Trait annotation orchestration is well-structured.

The method systematically anonymizes all trait types with proper error handling and updates. The consistent pattern makes the code maintainable and easy to extend for new trait types.


180-253: LGTM: Comprehensive and privacy-conscious user trait anonymization.

The method thoroughly anonymizes all user PII while preserving useful non-identifying metadata (status enums, account types). The comment on lines 251-252 helpfully documents what is intentionally kept, making the privacy trade-offs explicit.


256-297: LGTM: Group, Role, and App trait anonymization is appropriate.

Each trait handler clears/anonymizes the right fields for its type. The AppTrait comment (line 296) helpfully notes that flags are intentionally preserved.


300-322: LGTM: Secret trait anonymization properly handles resource references.

The method correctly anonymizes resource ID references (CreatedById, IdentityId) and clears timestamps, with a helpful comment explaining the privacy rationale.

@kans kans force-pushed the kans/anonymize-c1zs branch from f900cc5 to 714fa19 Compare December 16, 2025 17:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
pkg/dotc1z/anonymize_helpers.go (1)

45-51: Use a fixed timestamp for deterministic anonymization.

Line 50 uses time.Now() which makes anonymization non-deterministic. For consistent re-anonymization of the same data, use the fixed epoch pattern from ClearSyncRunTimestamps (line 104).

Apply this diff:

 	q := c.db.Update(assets.Name())
 	q = q.Set(goqu.Record{
 		"content_type":  contentType,
 		"data":          data,
-		"discovered_at": time.Now().Format("2006-01-02 15:04:05.999999999"),
+		"discovered_at": time.Unix(0, 0).Format("2006-01-02 15:04:05.999999999"),
 	})
pkg/anonymize/grant.go (1)

45-66: Potential key collision when anonymizing map keys.

If two different source keys hash to the same anonymized key, one entry would be silently dropped. While the probability is low with 12 hex characters (48 bits), consider adding collision detection for strict data preservation if this is a concern.

pkg/anonymize/annotations.go (3)

10-43: Pick only processes the first annotation of each type.

The Pick method at lines 19 and 30 returns only the first matching annotation. If multiple ChildResourceType or ExternalLink annotations exist, only the first of each type will be anonymized. Consider iterating over all matching annotations if this is a data integrity concern.


47-114: Pick only processes the first annotation of each type.

Lines 56, 82, 91, and 103 use Pick, which processes only the first matching annotation of each type. Multiple annotations of the same type (e.g., multiple ExternalLink instances) would not be fully anonymized.


118-150: Pick only processes the first annotation of each type.

Lines 127 and 139 use Pick, limiting anonymization to the first EntitlementImmutable and first ExternalLink. If multiple instances exist, subsequent annotations remain unanonymized.

pkg/anonymize/resource.go (1)

56-94: Errors from annos.Pick are still ignored.

This was flagged in a previous review. The error return from annos.Pick() is discarded on lines 61, 67, 73, 79, and 85. While annotation parsing errors may be rare, silently ignoring them could mask issues. Consider adding a brief comment explaining why errors are safe to ignore here, or log them at debug level.

🧹 Nitpick comments (3)
pkg/anonymize/processor.go (2)

101-135: Consider checking context cancellation within the pagination loop.

The context is passed but not checked for cancellation. For large datasets, long-running anonymization could benefit from respecting context cancellation between pages.

 func (a *Anonymizer) processResourceTypes(ctx context.Context, c1f *dotc1z.C1File, stats *ProcessorStats) error {
 	pageToken := ""
 	for {
+		select {
+		case <-ctx.Done():
+			return ctx.Err()
+		default:
+		}
+
 		req := v2.ResourceTypesServiceListResourceTypesRequest_builder{
 			PageSize:  1000,
 			PageToken: pageToken,
 		}.Build()

275-302: Potential double-close on the destination file.

The defer destFile.Close() on line 287 will execute even when the error paths (lines 291, 297) explicitly call destFile.Close(). This causes a double-close, though harmless in practice for os.File. Consider removing the defer entirely since you're now managing the close explicitly.

 func copyFile(src, dst string) error {
 	sourceFile, err := os.Open(src)
 	if err != nil {
 		return err
 	}
 	defer sourceFile.Close()

 	destFile, err := os.Create(dst)
 	if err != nil {
 		return err
 	}
-	defer destFile.Close()

 	_, err = io.Copy(destFile, sourceFile)
 	if err != nil {
 		destFile.Close()
 		os.Remove(dst)
 		return err
 	}

 	if err := destFile.Sync(); err != nil {
 		destFile.Close()
 		os.Remove(dst)
 		return err
 	}
 	return destFile.Close()
 }
pkg/anonymize/field_coverage_test.go (1)

642-654: Consider adding GrantSources to parentTypesWithNestedMessages.

GrantSources contains GrantSources_GrantSource as a nested message type. While currently empty, if fields are added to GrantSources_GrantSource in the future, they might not be detected by TestNestedTypeCoverage since GrantSources isn't in the parent list.

 var parentTypesWithNestedMessages = []proto.Message{
 	&v2.Resource{},
 	&v2.ResourceType{},
 	&v2.Entitlement{},
 	&v2.Grant{},
 	&v2.UserTrait{},
 	&v2.GroupTrait{},
 	&v2.RoleTrait{},
 	&v2.AppTrait{},
 	&v2.SecretTrait{},
+	&v2.GrantSources{},
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f900cc5 and 714fa19.

📒 Files selected for processing (15)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/anonymize/table_coverage_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • pkg/anonymize/entitlement_test.go
  • pkg/anonymize/anonymize_test.go
  • pkg/anonymize/table_coverage_test.go
  • pkg/anonymize/anonymize.go
🧰 Additional context used
🧬 Code graph analysis (8)
pkg/anonymize/entitlement.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/annotations.go (2)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/annotations/annotations.go (1)
  • Annotations (12-12)
pkg/anonymize/resource.go (2)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/annotations/annotations.go (1)
  • Annotations (12-12)
pkg/dotc1z/anonymize_helpers.go (3)
pkg/dotc1z/c1file.go (1)
  • C1File (36-54)
vendor/github.com/doug-martin/goqu/v9/delete_dataset.go (1)
  • Delete (31-33)
vendor/github.com/doug-martin/goqu/v9/update_dataset.go (1)
  • Update (29-31)
pkg/anonymize/misc.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/grant.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/processor.go (2)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/dotc1z/c1file.go (2)
  • NewC1ZFile (174-210)
  • C1File (36-54)
pkg/anonymize/field_coverage_test.go (1)
vendor/google.golang.org/protobuf/reflect/protoreflect/type.go (1)
  • Descriptor (21-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, windows-latest)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
🔇 Additional comments (23)
pkg/dotc1z/anonymize_helpers.go (3)

10-33: LGTM! Clean deletion pattern.

The asset deletion logic is straightforward and follows the established pattern: validate, build query, execute, mark updated.


67-90: LGTM! Clean session deletion.

Session clearing follows the same validated pattern with proper error handling and state management.


92-123: LGTM! Deterministic timestamp clearing.

Using a fixed epoch timestamp (line 104) ensures deterministic anonymization. This is the correct pattern that should be applied elsewhere.

pkg/anonymize/misc_test.go (1)

1-73: LGTM! Comprehensive ResourceType test coverage.

The tests properly validate:

  • Length preservation and anonymization patterns (64-char hashes for ID/DisplayName)
  • Description replacement with "[ANONYMIZED]"
  • Trait preservation (enums remain unchanged)
  • Nil handling without errors
  • Deterministic behavior for identical inputs
pkg/anonymize/grant_test.go (1)

1-179: LGTM! Thorough Grant anonymization test suite.

Excellent test coverage including:

  • Basic grant anonymization with nested structures (entitlement, principal)
  • Source key anonymization verification
  • Graceful nil handling for grant, entitlement, and principal
  • Length preservation for all IDs
  • Deterministic output validation

The tests properly exercise the recursive anonymization behavior and edge cases.

pkg/anonymize/grant.go (1)

7-43: LGTM! Well-structured grant anonymization.

The method properly:

  • Guards against nil input
  • Anonymizes the grant ID preserving length
  • Recursively anonymizes nested entitlement and principal with error propagation
  • Delegates source and annotation anonymization to helpers
  • Maintains consistent error handling throughout
pkg/anonymize/entitlement.go (1)

7-55: LGTM! Clean entitlement anonymization implementation.

The method exhibits solid design:

  • Nil-safe with early return
  • Consistent anonymization strategy across all PII fields (DisplayName, Description, Slug, ID)
  • Proper recursive anonymization of embedded Resource and GrantableTo ResourceTypes
  • Correct error propagation from nested calls
  • Appropriate comment noting Purpose enum is preserved
pkg/anonymize/resource_test.go (1)

1-310: LGTM! Exceptionally comprehensive Resource test suite.

Outstanding test coverage across:

  • Basic resource fields (DisplayName, Description, ResourceId) with length validation
  • Detailed UserTrait testing (emails, login, aliases, employee IDs, profile clearing, structured names)
  • GroupTrait profile handling
  • AppTrait with HelpUrl anonymization
  • ExternalId and ParentResourceId anonymization
  • Nil handling and deterministic behavior

The tests properly validate both the anonymization transformations and the preservation of critical properties (length, structure).

pkg/anonymize/misc.go (2)

7-37: LGTM! Consistent ResourceType anonymization.

The method follows the established anonymization pattern:

  • Nil-safe with early return
  • ID anonymization using AnonymizeResourceType for consistency with ResourceId.resource_type
  • DisplayName with appropriate "Type" prefix
  • Description replaced with "[ANONYMIZED]"
  • Proper annotation handling with error propagation
  • Correct comments noting enum values (Traits) and booleans are preserved

39-50: LGTM! Clear control flags for anonymization workflow.

Both flag methods are straightforward with clear documentation explaining why assets and session stores must be cleared during anonymization (identifying information and sensitive cached data).

pkg/anonymize/processor.go (4)

14-22: LGTM!

The ProcessorStats struct clearly captures all processing outcomes with appropriate types - counts for iterable items and booleans for bulk operations.


26-59: LGTM! Clean error handling with proper cleanup.

The method properly cleans up the output file on both processing errors and close errors. The pattern of copying first, then modifying in-place is a safe approach that preserves the original file.


62-99: LGTM! Clear sequential processing pipeline.

The orchestration is well-structured with clear error wrapping at each stage. The sequential nature ensures dependent data (e.g., resource types before resources) is processed in order.


245-253: Consider adding flag check to match processSessions pattern, or document why unconditional deletion is intentional.

ShouldDeleteAssets() exists (always returns true) but processAssets() doesn't check it before clearing, while processSessions() respects ShouldClearSessionStore() with a conditional. Both methods currently always clear data, but the implementation patterns are inconsistent. Clarify whether unconditional deletion is intentional or should follow the same conditional pattern as sessions.

pkg/anonymize/resource.go (4)

9-52: LGTM! Comprehensive resource anonymization with clear structure.

The method handles all resource components systematically: display name (with type-aware prefix), description, IDs, external IDs, and annotations. The nil check at the start is appropriate.


126-177: Good improvement: anonymizeResourceAnnotations properly handles errors.

Unlike getDisplayNamePrefix, this method correctly checks and propagates errors from annos.Pick(). The pattern is consistent and safe.


179-253: Thorough user trait anonymization with good documentation.

The method covers all PII fields comprehensively: emails, login, aliases, employee IDs, structured names, profile, icon, status details, and timestamps. The comments explaining which fields are kept (line 251-252) are helpful for future maintainers.


299-322: LGTM! SecretTrait anonymization properly handles resource IDs.

The method correctly anonymizes CreatedById and IdentityId using anonymizeResourceID, maintaining consistency with how resource IDs are handled elsewhere.

pkg/anonymize/field_coverage_test.go (5)

13-25: LGTM! Clean policy enum design.

The four-policy classification (ANONYMIZE, SAFE, RECURSE, CLEAR) provides clear semantics and covers all anonymization scenarios appropriately.


27-52: LGTM! Policy maps serve as both configuration and documentation.

These policy maps clearly document the anonymization intent for each field. The inline comments explaining why each policy was chosen (e.g., "User/group/app names", "Enum value") are valuable for future maintainers.


215-272: Excellent validation helpers for ensuring field coverage.

These two functions together ensure:

  1. All proto fields have explicit policies (no silent omissions)
  2. No stale policies exist for removed/renamed fields

This is a robust approach to "future-proofing" against proto schema changes.


499-543: LGTM! Aggregate test provides good CI visibility.

Running all coverage checks as subtests in a single test function makes CI output cleaner and easier to scan.


656-685: Smart approach to detect new nested message types.

This test automatically fails when new nested message types are added to the protos without corresponding policy maps. The error message clearly guides developers on what to add.

One consideration: if a field uses a repeated message type, this detection still works since field.Message() returns the element type.

@kans kans force-pushed the kans/anonymize-c1zs branch from 714fa19 to 6a4f99d Compare December 16, 2025 17:45
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/anonymize/table_coverage_test.go (1)

22-33: Add a test to verify allDatabaseTables alignment with allTableDescriptors.

The comment states these lists "MUST match," but there's no programmatic validation. While currently aligned (both contain 7 tables with matching logical names), if someone adds a table to allTableDescriptors in sql_helpers.go, the test won't detect a missing entry in allDatabaseTables unless you verify the counts match. Consider adding a check like TestTableCountAlignment that asserts len(allDatabaseTables) == len(allTableDescriptors) to catch future drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 714fa19 and 6a4f99d.

📒 Files selected for processing (15)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/anonymize/table_coverage_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/anonymize/resource_test.go
  • pkg/anonymize/entitlement.go
🧰 Additional context used
🧬 Code graph analysis (6)
pkg/anonymize/misc.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/dotc1z/anonymize_helpers.go (3)
pkg/dotc1z/c1file.go (1)
  • C1File (36-54)
vendor/github.com/doug-martin/goqu/v9/delete_dataset.go (1)
  • Delete (31-33)
vendor/github.com/doug-martin/goqu/v9/update_dataset.go (1)
  • Update (29-31)
pkg/anonymize/grant.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/processor.go (2)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/dotc1z/c1file.go (2)
  • NewC1ZFile (174-210)
  • C1File (36-54)
pkg/anonymize/annotations.go (1)
pkg/anonymize/anonymize.go (2)
  • Anonymizer (28-31)
  • New (34-42)
pkg/anonymize/resource.go (1)
pkg/anonymize/anonymize.go (2)
  • Anonymizer (28-31)
  • New (34-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, windows-latest)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
🔇 Additional comments (36)
pkg/anonymize/table_coverage_test.go (1)

40-63: LGTM!

Both tests provide excellent safeguards: TestTableCoverage ensures every table has a handler, and TestNoOrphanedTableHandlers prevents stale references. The error messages are clear and actionable for developers.

pkg/dotc1z/anonymize_helpers.go (3)

11-33: LGTM!

The implementation correctly deletes all assets using the query builder, with proper error handling and state management.


36-58: LGTM!

The implementation correctly deletes all session data with proper error handling and follows the same pattern as ClearAssets.


62-91: LGTM!

The implementation correctly uses a fixed epoch timestamp (time.Unix(0, 0)) for deterministic anonymization, ensuring re-anonymizing produces identical results.

pkg/anonymize/anonymize.go (4)

12-47: LGTM!

The initialization logic is clean, with appropriate validation (panics on empty salt) and clear separation between default and custom configurations.


49-77: LGTM!

The HMAC-SHA256 implementation provides deterministic, salted hashing. The truncation logic in HashN is straightforward and correct.


85-108: LGTM!

The identity anonymization methods correctly use full hashes or length-preserving hashes for deterministic anonymization.


110-134: LGTM!

The URL anonymization uses a 16-character hash (addressing collision concerns), and the name anonymization methods consistently use full hashes for maximum entropy.

pkg/anonymize/entitlement_test.go (1)

10-102: LGTM!

The test suite provides excellent coverage: basic functionality, nil safety, nested resource handling, and deterministic behavior. The assertions are appropriate and verify all key anonymization requirements.

pkg/anonymize/grant_test.go (1)

12-179: LGTM!

Comprehensive test coverage of grant anonymization including nested resources (entitlement and principal), source key remapping, nil safety, and deterministic behavior. All assertions correctly verify the anonymization requirements.

pkg/anonymize/misc_test.go (1)

10-73: LGTM!

The test suite correctly verifies resource type anonymization including the important behavior of preserving traits (line 40-41) while anonymizing identifiers and descriptions. Good coverage of nil safety and determinism.

pkg/anonymize/grant.go (2)

8-43: LGTM!

The method provides thorough anonymization of all grant fields with proper nil checking and error propagation. The structure is clean and follows the same pattern as other anonymization methods in the package.


45-66: LGTM!

The implementation correctly anonymizes source keys using AnonymizeResourceID, which preserves the original key length. The collision risk is negligible for typical key lengths (10+ characters provide sufficient entropy).

pkg/anonymize/anonymize_test.go (3)

11-45: LGTM!

The hasher tests thoroughly verify deterministic behavior, salt isolation, and truncation correctness. The prefix verification (line 44) is a nice touch that ensures HashN correctly truncates rather than generating a different hash.


47-138: LGTM!

Comprehensive test coverage of all anonymization methods, verifying both deterministic behavior and output format requirements (e.g., email contains @, URL uses correct domain).


140-168: LGTM!

The configuration and construction tests provide good coverage, including proper validation of panic behavior on empty salt and verification of the public helper methods.

pkg/anonymize/annotations.go (4)

11-28: LGTM - Annotation processing loop is correct.

The function properly iterates through all annotations, processes each one, handles errors, and drops unknown types as intended. The nil check and error propagation are appropriate.


32-70: LGTM - ResourceType annotation processing is comprehensive.

The function correctly handles all known ResourceType annotation types (ChildResourceType, ExternalLink, and marker types), anonymizes appropriately, and drops unknown types. Error handling from UnmarshalTo is properly propagated.


74-156: LGTM - Grant annotation processing handles all known types.

The Grant annotation processor correctly handles:

  • GrantExpandable: anonymizes all entitlement and resource type IDs in slices
  • GrantMetadata: clears sensitive metadata
  • GrantImmutable: hashes source_id and clears metadata
  • ExternalLink: anonymizes URLs

Error handling and unknown type dropping are consistent with the pattern.


160-208: LGTM - Entitlement annotation processing is correct.

The Entitlement annotation processor correctly handles EntitlementImmutable and ExternalLink annotations, applying appropriate anonymization and dropping unknown types.

pkg/anonymize/misc.go (2)

8-37: LGTM - ResourceType anonymization is comprehensive.

The function properly anonymizes all PII fields (ID, DisplayName, Description) and delegates annotation processing. The comments correctly note that Traits (enums) and SourcedExternally (boolean) don't contain PII.


39-50: LGTM - Helper functions are clear and well-documented.

Both functions have clear documentation explaining why assets should be deleted and session stores cleared. The implementation is straightforward.

pkg/anonymize/resource.go (5)

9-45: LGTM - Resource anonymization orchestration is correct.

The function properly coordinates anonymization of all Resource fields including display name, description, IDs, and trait annotations. Error handling is appropriate.


47-75: LGTM - ID and ExternalID anonymization is correct.

Both helper functions properly handle nil checks and anonymize all relevant fields using appropriate hasher methods.


77-160: LGTM - Resource annotation processing covers all trait types.

The function properly handles all resource trait annotations (UserTrait, GroupTrait, RoleTrait, AppTrait, SecretTrait) plus ExternalLink, delegating to specialized anonymization functions. Unknown annotation types are correctly dropped.


162-236: LGTM - UserTrait anonymization is comprehensive and well-documented.

The function handles all UserTrait PII fields appropriately:

  • Email addresses, logins, and employee IDs are anonymized
  • Structured names (given/family/middle) are anonymized while prefix/suffix are cleared
  • Profile and icon data are cleared
  • Timestamps are cleared to prevent fingerprinting
  • Enum values are preserved for analysis utility

The comments clearly explain the rationale for each decision.


238-305: LGTM - Trait anonymization functions are correct and consistent.

All trait-specific anonymization functions properly:

  • Clear profile data that may contain arbitrary PII
  • Clear icons/logos that are identifying
  • Anonymize URLs and resource IDs
  • Clear timestamps to prevent fingerprinting
  • Preserve non-PII flags and enums

The implementation is consistent across all trait types.

pkg/anonymize/processor.go (5)

13-59: LGTM - Main entry point has proper error handling and cleanup.

The function correctly:

  • Generates a default output path if not provided
  • Copies the input file to avoid modifying the original
  • Cleans up the output file on any error
  • Tracks processing statistics

Error handling ensures partial/failed anonymization files are removed.


61-99: LGTM - Orchestration function processes all data types in correct order.

The function correctly sequences processing (resource types before resources, etc.) and provides clear error context for each stage. All major C1Z data types are covered.


101-243: LGTM - Entity processing functions use correct pagination pattern.

All four processing functions (ResourceTypes, Resources, Entitlements, Grants) correctly:

  • Use pagination with page size of 1000
  • Iterate through all pages using NextPageToken
  • Apply appropriate anonymization to each item
  • Write back results only when non-empty
  • Increment statistics
  • Propagate errors

The implementation is consistent and correct.


245-273: LGTM - Asset, session, and sync run processing is correct.

The three cleanup functions properly:

  • Clear assets unconditionally (contains identifying info)
  • Clear sessions conditionally based on config
  • Clear sync run timestamps (fingerprinting data)
  • Set appropriate statistics flags
  • Handle errors correctly

275-302: LGTM - File copy with proper error handling and cleanup.

The function correctly handles all error cases:

  • Closes and removes destination file if copy fails
  • Closes and removes destination file if sync fails
  • Uses deferred close as a safety net
  • Multiple Close calls are safe (idempotent operation)

The cleanup logic ensures no partial files are left on error, as addressed in previous review.

pkg/anonymize/field_coverage_test.go (4)

13-214: LGTM - Field policy maps provide comprehensive coverage documentation.

The field policy definitions and maps serve as both:

  1. Validation that all fields are handled
  2. Documentation of anonymization strategy for each field

The policies are clear (ANONYMIZE, SAFE, RECURSE, CLEAR) and the comments explain the rationale for each field's treatment.


217-272: LGTM - Validation functions provide robust field coverage checking.

The two validation functions work together to ensure:

  1. All actual fields have anonymization policies (prevents omissions)
  2. No policies exist for non-existent fields (catches typos/renames)

The use of protobuf reflection makes this robust against future changes.


274-497: LGTM - Individual type tests provide clear failure isolation.

Each test function validates a single type, making it easy to identify which type has missing or incorrect policies when a test fails. The consistent pattern and clear assertions are excellent.


499-685: LGTM - Aggregate tests and nested type coverage provide excellent future-proofing.

These tests ensure:

  • All types are validated in a single run for CI visibility
  • Critical PII fields (emails, logins, etc.) are always anonymized
  • Profile fields are consistently cleared across all traits
  • New nested message types can't be added without explicit handling

The TestNestedTypeCoverage is particularly valuable - it will fail if a new nested type is added to the protobufs without being added to the policy maps, preventing silent data leakage.

@kans kans force-pushed the kans/anonymize-c1zs branch from 6a4f99d to 1391650 Compare December 16, 2025 18:36
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
pkg/anonymize/grant.go (1)

45-66: Consider collision detection for map key anonymization.

The past review comment raises a valid point about potential hash collisions when anonymizing GrantSources map keys. While the probability is very low with 12-character hex hashes (48 bits), adding collision detection would make the code more robust:

newSources := make(map[string]*v2.GrantSources_GrantSource)
for key, source := range sources {
    newKey := a.hasher.AnonymizeResourceID(key)
    if _, exists := newSources[newKey]; exists {
        // Log warning or append deterministic suffix
    }
    newSources[newKey] = source
}

Given the "Chill" review mode and extremely low collision probability for typical grant source counts, this is an optional enhancement that could be deferred.

🧹 Nitpick comments (1)
pkg/anonymize/anonymize.go (1)

33-42: Consider returning an error instead of panicking.

The panic on line 36 for an empty salt is fail-fast but inconsistent with Go conventions for library code. Most Go libraries return errors for invalid configuration, allowing callers to handle the issue gracefully.

Consider this alternative:

-// New creates a new Anonymizer with the given configuration.
-func New(config Config) *Anonymizer {
+// New creates a new Anonymizer with the given configuration.
+// Returns an error if the configuration is invalid.
+func New(config Config) (*Anonymizer, error) {
 	if config.Salt == "" {
-		panic("salt is required")
+		return nil, fmt.Errorf("salt is required")
 	}
 	return &Anonymizer{
 		config: config,
 		hasher: NewHasher(config.Salt),
-	}
+	}, nil
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a4f99d and 1391650.

📒 Files selected for processing (16)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/anonymize/table_coverage_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
  • pkg/dotc1z/sql_helpers.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • pkg/dotc1z/anonymize_helpers.go
  • pkg/anonymize/anonymize_test.go
  • pkg/anonymize/processor.go
  • pkg/anonymize/annotations.go
  • pkg/anonymize/table_coverage_test.go
  • pkg/anonymize/grant_test.go
  • pkg/anonymize/entitlement_test.go
🧰 Additional context used
🧬 Code graph analysis (6)
pkg/anonymize/misc.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/entitlement.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/resource_test.go (2)
pkg/anonymize/anonymize.go (1)
  • New (34-42)
pkg/annotations/annotations.go (1)
  • Annotations (12-12)
pkg/anonymize/grant.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/resource.go (2)
pkg/anonymize/anonymize.go (2)
  • Anonymizer (28-31)
  • New (34-42)
pkg/types/tasks/tasks.go (1)
  • GetResourceType (92-92)
pkg/anonymize/field_coverage_test.go (1)
vendor/google.golang.org/protobuf/reflect/protoreflect/type.go (1)
  • Descriptor (21-112)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, windows-latest)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
🔇 Additional comments (20)
pkg/dotc1z/sql_helpers.go (1)

42-51: LGTM - Clean implementation of table name aggregation.

The function correctly iterates over allTableDescriptors and provides a single source of truth for table names. The documentation clearly explains its purpose for test coverage.

pkg/anonymize/field_coverage_test.go (5)

13-25: LGTM - Well-defined field policy enum.

The four policy types (ANONYMIZE, SAFE, RECURSE, CLEAR) clearly cover all anonymization scenarios. Good documentation for each constant.


27-213: Excellent field policy coverage with clear rationale.

The comprehensive field policy maps cover all message types and nested structures with clear inline comments explaining the rationale for each policy decision. This provides strong documentation and maintainability.


215-272: LGTM - Robust validation approach using protoreflect.

The two validation functions provide bidirectional coverage: ensuring all fields are handled and no stale policies exist. This is excellent future-proofing that will catch schema changes at test time.


274-543: LGTM - Comprehensive test coverage for all message types.

The individual tests and the aggregated TestFieldCoverage_AllTypes provide excellent coverage. The table-driven approach in the AllTypes test is particularly clean and maintainable.


619-685: Excellent future-proofing with nested type coverage test.

The TestNestedTypeCoverage test is a clever guard against schema evolution. It will fail when new nested message types are added, forcing developers to explicitly add field policies. This aligns perfectly with the PR's goal of future-proofing.

pkg/anonymize/misc_test.go (1)

1-73: LGTM - Solid test coverage for ResourceType anonymization.

The three tests cover the essential scenarios: basic anonymization with hash length verification, nil input handling, and deterministic behavior. The assertions are clear and comprehensive.

pkg/anonymize/resource_test.go (2)

45-137: Excellent comprehensive coverage of UserTrait anonymization.

The TestAnonymizeResource_WithUserTrait test is particularly thorough, covering emails, login, aliases, employee IDs, profile clearing, and structured names. The hash length assertions (65 chars for email with @, 64 chars for others) demonstrate careful attention to the anonymization format.


282-310: LGTM - Deterministic behavior properly validated.

The deterministic test ensures identical inputs produce identical outputs, which is critical for maintaining referential integrity across anonymized resources.

pkg/anonymize/entitlement.go (1)

7-55: LGTM - Clean entitlement anonymization with proper recursion.

The implementation correctly anonymizes all PII fields while preserving structural information (Purpose enum). The recursive calls to AnonymizeResource and AnonymizeResourceType maintain consistency across the anonymization surface. Error propagation is handled properly.

pkg/anonymize/grant.go (1)

7-43: LGTM - Grant anonymization properly handles nested structures.

The implementation correctly anonymizes the grant ID and recursively processes the embedded entitlement and principal. Error propagation and nil handling are appropriate.

pkg/anonymize/misc.go (2)

7-37: LGTM - ResourceType anonymization with consistent hashing.

The implementation uses hasher.AnonymizeResourceType for the ID field, maintaining consistency with ResourceId.resource_type hashing as documented in the comment (line 13). The error propagation for annotation anonymization is correct.


39-50: LGTM - Clear helper methods with good documentation.

The boolean helper methods have clear documentation explaining why assets are deleted and session stores are cleared during anonymization.

pkg/anonymize/resource.go (4)

8-45: LGTM - Clean resource anonymization with proper field handling.

The AnonymizeResource method systematically anonymizes all PII fields while preserving structural integrity. The nil checks and error propagation from nested calls are appropriate.


77-160: Good annotation processing with clean separation.

The two-function approach (anonymizeResourceAnnotations and processResourceAnnotation) cleanly separates concerns: iteration/rebuilding vs. type-specific processing. Unknown annotation types are appropriately dropped (line 159), which aligns with the defensive anonymization strategy.


162-236: Comprehensive UserTrait anonymization covering all PII fields.

The implementation thoroughly handles all UserTrait PII fields:

  • Email addresses with appropriate formatting
  • Login and aliases
  • Employee IDs
  • Structured names (given, family, middle)
  • Profile and icon clearing
  • Status details anonymization
  • Timestamp clearing for fingerprinting prevention

The comments explaining rationale (e.g., lines 214-215, 230-231) are particularly helpful.


238-305: LGTM - Trait anonymization methods follow consistent patterns.

The four trait anonymization methods (anonymizeGroupTrait, anonymizeRoleTrait, anonymizeAppTrait, anonymizeSecretTrait) consistently handle profile clearing, asset clearing, URL anonymization, and timestamp clearing where applicable. The inline comments (e.g., lines 279, 301-303) clearly explain which fields are kept and why.

pkg/anonymize/anonymize.go (3)

1-10: LGTM!

The package documentation clearly describes the purpose, and the imports are appropriate for the HMAC-SHA256-based anonymization functionality.


49-77: LGTM!

The Hasher implementation correctly uses HMAC-SHA256 for deterministic hashing, and HashN properly truncates the output. The approach is sound for the anonymization use case.


100-134: LGTM!

These anonymization methods are implemented correctly:

  • AnonymizeResourceID and AnonymizeExternalID properly preserve original lengths using HashN
  • AnonymizeURL uses 16 characters (addressing the previous collision concern)
  • Name and resource type methods appropriately use the full hash digest

Comment on lines 79 to 83
// AnonymizeEmail generates an anonymized email address, preserving the original length.
func (h *Hasher) AnonymizeEmail(email string) string {
e := h.Hash(email)
return e[:len(e)/2] + "@" + e[len(e)/2:]
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix incorrect documentation - length is not preserved.

The comment claims this method preserves the original email length, but the implementation always returns 65 characters (32 + "@" + 32) regardless of input length, since Hash() always produces a 64-character hex string.

Apply this diff to correct the documentation:

-// AnonymizeEmail generates an anonymized email address, preserving the original length.
+// AnonymizeEmail generates an anonymized email address with a valid email structure.
+// The output is always 65 characters (32@32 hex format).
 func (h *Hasher) AnonymizeEmail(email string) string {
 	e := h.Hash(email)
 	return e[:len(e)/2] + "@" + e[len(e)/2:]
 }
🤖 Prompt for AI Agents
In pkg/anonymize/anonymize.go around lines 79 to 83, the function comment
incorrectly claims the anonymized email preserves the original length; update
the comment to accurately state that AnonymizeEmail returns a fixed-length
anonymized email built from the hex hash (64 chars) split with an "@" in the
middle (resulting in 65 characters total), and remove any mention of preserving
input length; keep the implementation as-is.

@kans kans force-pushed the kans/anonymize-c1zs branch from 1391650 to 95f131b Compare December 16, 2025 19:12
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
pkg/anonymize/grant.go (1)

45-66: Hash collision handling remains unaddressed.

The potential for hash collisions when anonymizing GrantSources map keys was previously flagged. While the probability is negligible with the current hash length, adding collision detection would improve robustness:

 func (a *Anonymizer) anonymizeGrantSources(gs *v2.GrantSources) {
 	if gs == nil {
 		return
 	}
 
 	sources := gs.GetSources()
 	if sources == nil {
 		return
 	}
 
 	newSources := make(map[string]*v2.GrantSources_GrantSource)
 	for key, source := range sources {
 		newKey := a.hasher.AnonymizeResourceID(key)
+		if _, exists := newSources[newKey]; exists {
+			// Collision detected - append deterministic suffix
+			suffix := 1
+			for {
+				uniqueKey := a.hasher.HashN(fmt.Sprintf("%s_%d", key, suffix), len(key))
+				if _, exists := newSources[uniqueKey]; !exists {
+					newKey = uniqueKey
+					break
+				}
+				suffix++
+			}
+		}
 		newSources[newKey] = source
 	}
 	gs.SetSources(newSources)
 }

This issue was flagged in a previous review but not yet addressed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1391650 and 95f131b.

📒 Files selected for processing (16)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/anonymize/table_coverage_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
  • pkg/dotc1z/sql_helpers.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • pkg/dotc1z/anonymize_helpers.go
  • pkg/anonymize/resource_test.go
  • pkg/dotc1z/sql_helpers.go
  • pkg/anonymize/processor.go
  • pkg/anonymize/misc_test.go
  • pkg/anonymize/table_coverage_test.go
  • pkg/anonymize/anonymize.go
🧰 Additional context used
🧬 Code graph analysis (7)
pkg/anonymize/entitlement.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/anonymize_test.go (1)
pkg/anonymize/anonymize.go (3)
  • NewHasher (56-60)
  • New (34-42)
  • Config (13-18)
pkg/anonymize/misc.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
pkg/anonymize/annotations.go (1)
pkg/anonymize/anonymize.go (2)
  • Anonymizer (28-31)
  • New (34-42)
pkg/anonymize/field_coverage_test.go (1)
vendor/google.golang.org/protobuf/reflect/protoreflect/type.go (1)
  • Descriptor (21-112)
pkg/anonymize/resource.go (1)
pkg/anonymize/anonymize.go (2)
  • Anonymizer (28-31)
  • New (34-42)
pkg/anonymize/grant.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (28-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
  • GitHub Check: go-test (1.25.2, windows-latest)
🔇 Additional comments (12)
pkg/anonymize/field_coverage_test.go (2)

215-272: LGTM! Excellent validation helpers.

The validation functions provide robust coverage checking and will catch issues when protobuf schemas evolve. The error messages are clear and actionable.


499-543: Good aggregation pattern for CI visibility.

The TestFieldCoverage_AllTypes test provides a single entry point to run all coverage checks, which improves test discoverability and CI reporting.

pkg/anonymize/entitlement_test.go (1)

10-102: LGTM! Comprehensive entitlement anonymization tests.

The test suite covers:

  • Basic anonymization with length preservation for IDs and slugs
  • Nil safety for both entitlement and embedded resource
  • Deterministic behavior for consistent anonymization
  • Proper anonymization of nested resource fields
pkg/anonymize/anonymize_test.go (1)

11-168: LGTM! Thorough test coverage of core anonymization functionality.

The test suite validates:

  • Deterministic hashing with consistent salt
  • Different salts producing different outputs
  • HashN truncation behavior and prefix relationships
  • All anonymization methods (email, display name, login, employee ID, resource ID, external ID, URL, structured names)
  • Configuration and construction, including panic on empty salt
  • Asset deletion and session store clearing behavior
pkg/anonymize/entitlement.go (1)

7-55: LGTM! Clean entitlement anonymization implementation.

The method properly:

  • Handles nil inputs
  • Anonymizes all PII fields (display name, description, slug, ID)
  • Recursively anonymizes embedded resource and grantable_to ResourceTypes
  • Delegates annotation processing with error propagation
  • Preserves non-PII fields (purpose enum)
pkg/anonymize/grant_test.go (1)

12-179: LGTM! Comprehensive grant anonymization tests.

The test suite provides excellent coverage:

  • Basic anonymization with nested structures (entitlement, principal)
  • Grant sources with map key anonymization
  • Nil safety for all optional fields
  • Deterministic behavior for consistent results
  • Proper length preservation for IDs
pkg/anonymize/misc.go (1)

7-50: LGTM! Clear implementation of resource type and lifecycle methods.

The methods are well-implemented:

  • AnonymizeResourceType properly handles all fields with consistent hashing for IDs
  • ShouldDeleteAssets correctly returns true to remove identifying assets
  • ShouldClearSessionStore correctly returns true to clear cached sensitive data

The comments clearly explain why these behaviors are necessary.

pkg/anonymize/grant.go (1)

7-43: LGTM! Clean grant anonymization implementation.

The method properly:

  • Handles nil inputs
  • Anonymizes the grant ID with length preservation
  • Recursively anonymizes embedded entitlement and principal
  • Processes grant sources and annotations with error propagation
pkg/anonymize/resource.go (3)

8-91: LGTM! Well-structured resource anonymization.

The implementation provides comprehensive anonymization:

  • All string fields are properly anonymized or cleared
  • ResourceIDs use consistent hashing for type and resource fields
  • ExternalIDs handle URLs, IDs, and descriptions appropriately
  • Annotation processing correctly rebuilds the annotation list
  • Error propagation is handled properly throughout

93-160: LGTM! Comprehensive trait processing.

The processResourceAnnotation method:

  • Handles all trait types (UserTrait, GroupTrait, RoleTrait, AppTrait, SecretTrait)
  • Processes ExternalLink annotations
  • Correctly drops unknown annotation types (line 159) with clear documentation
  • Properly unmarshals, anonymizes, and re-marshals each type
  • Error handling is appropriate

162-305: LGTM! Thorough trait anonymization implementations.

The trait-specific anonymization methods are well-designed:

  • UserTrait: Comprehensive PII handling (emails, logins, employee IDs, names, status), profile/icon clearing, timestamp removal to prevent fingerprinting
  • GroupTrait/RoleTrait: Appropriate profile and icon clearing
  • AppTrait: URL anonymization, asset clearing, flag preservation
  • SecretTrait: ResourceID anonymization, timestamp clearing, profile removal

All implementations align with the field policies defined in the test suite.

pkg/anonymize/annotations.go (1)

11-28: LGTM!

The function properly iterates through all annotations and handles errors correctly. The nil check and result building logic are sound.

Comment on lines +32 to +70
func (a *Anonymizer) processResourceTypeAnnotation(ann *anypb.Any) (*anypb.Any, error) {
// ChildResourceType - anonymize resource type ID
childRT := &v2.ChildResourceType{}
if ann.MessageIs(childRT) {
if err := ann.UnmarshalTo(childRT); err != nil {
return nil, err
}
if childRT.GetResourceTypeId() != "" {
childRT.SetResourceTypeId(a.hasher.AnonymizeResourceType(childRT.GetResourceTypeId()))
}
return anypb.New(childRT)
}

// ExternalLink - anonymize URL
externalLink := &v2.ExternalLink{}
if ann.MessageIs(externalLink) {
if err := ann.UnmarshalTo(externalLink); err != nil {
return nil, err
}
if externalLink.GetUrl() != "" {
externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
}
return anypb.New(externalLink)
}

// Empty marker types - preserve as-is (no PII)
for _, marker := range []proto.Message{
&v2.SkipEntitlementsAndGrants{},
&v2.SkipGrants{},
&v2.SkipEntitlements{},
} {
if ann.MessageIs(marker) {
return ann, nil
}
}

// Unknown type - drop it
return nil, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle errors from anypb.New().

The anypb.New() function returns (*anypb.Any, error), but the error is not checked at lines 42 and 54. Marshaling failures could be silently ignored, leading to data loss or incorrect anonymization.

Apply this diff to handle errors properly:

 		if childRT.GetResourceTypeId() != "" {
 			childRT.SetResourceTypeId(a.hasher.AnonymizeResourceType(childRT.GetResourceTypeId()))
 		}
-		return anypb.New(childRT)
+		anyMsg, err := anypb.New(childRT)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// ExternalLink - anonymize URL
 	externalLink := &v2.ExternalLink{}
 	if ann.MessageIs(externalLink) {
 		if err := ann.UnmarshalTo(externalLink); err != nil {
 			return nil, err
 		}
 		if externalLink.GetUrl() != "" {
 			externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
 		}
-		return anypb.New(externalLink)
+		anyMsg, err := anypb.New(externalLink)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (a *Anonymizer) processResourceTypeAnnotation(ann *anypb.Any) (*anypb.Any, error) {
// ChildResourceType - anonymize resource type ID
childRT := &v2.ChildResourceType{}
if ann.MessageIs(childRT) {
if err := ann.UnmarshalTo(childRT); err != nil {
return nil, err
}
if childRT.GetResourceTypeId() != "" {
childRT.SetResourceTypeId(a.hasher.AnonymizeResourceType(childRT.GetResourceTypeId()))
}
return anypb.New(childRT)
}
// ExternalLink - anonymize URL
externalLink := &v2.ExternalLink{}
if ann.MessageIs(externalLink) {
if err := ann.UnmarshalTo(externalLink); err != nil {
return nil, err
}
if externalLink.GetUrl() != "" {
externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
}
return anypb.New(externalLink)
}
// Empty marker types - preserve as-is (no PII)
for _, marker := range []proto.Message{
&v2.SkipEntitlementsAndGrants{},
&v2.SkipGrants{},
&v2.SkipEntitlements{},
} {
if ann.MessageIs(marker) {
return ann, nil
}
}
// Unknown type - drop it
return nil, nil
}
func (a *Anonymizer) processResourceTypeAnnotation(ann *anypb.Any) (*anypb.Any, error) {
// ChildResourceType - anonymize resource type ID
childRT := &v2.ChildResourceType{}
if ann.MessageIs(childRT) {
if err := ann.UnmarshalTo(childRT); err != nil {
return nil, err
}
if childRT.GetResourceTypeId() != "" {
childRT.SetResourceTypeId(a.hasher.AnonymizeResourceType(childRT.GetResourceTypeId()))
}
anyMsg, err := anypb.New(childRT)
if err != nil {
return nil, err
}
return anyMsg, nil
}
// ExternalLink - anonymize URL
externalLink := &v2.ExternalLink{}
if ann.MessageIs(externalLink) {
if err := ann.UnmarshalTo(externalLink); err != nil {
return nil, err
}
if externalLink.GetUrl() != "" {
externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
}
anyMsg, err := anypb.New(externalLink)
if err != nil {
return nil, err
}
return anyMsg, nil
}
// Empty marker types - preserve as-is (no PII)
for _, marker := range []proto.Message{
&v2.SkipEntitlementsAndGrants{},
&v2.SkipGrants{},
&v2.SkipEntitlements{},
} {
if ann.MessageIs(marker) {
return ann, nil
}
}
// Unknown type - drop it
return nil, nil
}
🤖 Prompt for AI Agents
In pkg/anonymize/annotations.go around lines 32 to 70, the calls to
anypb.New(childRT) and anypb.New(externalLink) ignore the returned error; update
both calls to capture (out, err := anypb.New(...)) and if err != nil return nil,
err, otherwise return out, nil so marshaling failures are propagated instead of
dropped.

Comment on lines +74 to +156
func (a *Anonymizer) anonymizeGrantAnnotations(g *v2.Grant) error {
if g == nil {
return nil
}

var result []*anypb.Any
for _, ann := range g.GetAnnotations() {
if processed, err := a.processGrantAnnotation(ann); err != nil {
return err
} else if processed != nil {
result = append(result, processed)
}
}

g.SetAnnotations(result)
return nil
}

// processGrantAnnotation processes a single annotation from a Grant.
// Returns nil if the annotation should be dropped.
func (a *Anonymizer) processGrantAnnotation(ann *anypb.Any) (*anypb.Any, error) {
// GrantExpandable - anonymize entitlement and resource type IDs
expandable := &v2.GrantExpandable{}
if ann.MessageIs(expandable) {
if err := ann.UnmarshalTo(expandable); err != nil {
return nil, err
}
entitlementIDs := expandable.GetEntitlementIds()
for i, eid := range entitlementIDs {
if eid != "" {
entitlementIDs[i] = a.hasher.AnonymizeExternalID(eid)
}
}
expandable.SetEntitlementIds(entitlementIDs)

resourceTypeIDs := expandable.GetResourceTypeIds()
for i, rtid := range resourceTypeIDs {
if rtid != "" {
resourceTypeIDs[i] = a.hasher.AnonymizeResourceType(rtid)
}
}
expandable.SetResourceTypeIds(resourceTypeIDs)
return anypb.New(expandable)
}

// GrantMetadata - clear metadata
grantMetadata := &v2.GrantMetadata{}
if ann.MessageIs(grantMetadata) {
if err := ann.UnmarshalTo(grantMetadata); err != nil {
return nil, err
}
grantMetadata.ClearMetadata()
return anypb.New(grantMetadata)
}

// GrantImmutable - anonymize source_id, clear metadata
grantImmutable := &v2.GrantImmutable{}
if ann.MessageIs(grantImmutable) {
if err := ann.UnmarshalTo(grantImmutable); err != nil {
return nil, err
}
if grantImmutable.GetSourceId() != "" {
grantImmutable.SetSourceId(a.hasher.Hash(grantImmutable.GetSourceId()))
}
grantImmutable.ClearMetadata()
return anypb.New(grantImmutable)
}

// ExternalLink - anonymize URL
externalLink := &v2.ExternalLink{}
if ann.MessageIs(externalLink) {
if err := ann.UnmarshalTo(externalLink); err != nil {
return nil, err
}
if externalLink.GetUrl() != "" {
externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
}
return anypb.New(externalLink)
}

// Unknown type - drop it
return nil, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle errors from anypb.New().

The anypb.New() function returns (*anypb.Any, error), but errors are not checked at lines 116, 126, 139, and 151. This could lead to silent failures during marshaling.

Apply this diff to handle errors properly:

 		expandable.SetResourceTypeIds(resourceTypeIDs)
-		return anypb.New(expandable)
+		anyMsg, err := anypb.New(expandable)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// GrantMetadata - clear metadata
 	grantMetadata := &v2.GrantMetadata{}
 	if ann.MessageIs(grantMetadata) {
 		if err := ann.UnmarshalTo(grantMetadata); err != nil {
 			return nil, err
 		}
 		grantMetadata.ClearMetadata()
-		return anypb.New(grantMetadata)
+		anyMsg, err := anypb.New(grantMetadata)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// GrantImmutable - anonymize source_id, clear metadata
 	grantImmutable := &v2.GrantImmutable{}
 	if ann.MessageIs(grantImmutable) {
 		if err := ann.UnmarshalTo(grantImmutable); err != nil {
 			return nil, err
 		}
 		if grantImmutable.GetSourceId() != "" {
 			grantImmutable.SetSourceId(a.hasher.Hash(grantImmutable.GetSourceId()))
 		}
 		grantImmutable.ClearMetadata()
-		return anypb.New(grantImmutable)
+		anyMsg, err := anypb.New(grantImmutable)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// ExternalLink - anonymize URL
 	externalLink := &v2.ExternalLink{}
 	if ann.MessageIs(externalLink) {
 		if err := ann.UnmarshalTo(externalLink); err != nil {
 			return nil, err
 		}
 		if externalLink.GetUrl() != "" {
 			externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
 		}
-		return anypb.New(externalLink)
+		anyMsg, err := anypb.New(externalLink)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (a *Anonymizer) anonymizeGrantAnnotations(g *v2.Grant) error {
if g == nil {
return nil
}
var result []*anypb.Any
for _, ann := range g.GetAnnotations() {
if processed, err := a.processGrantAnnotation(ann); err != nil {
return err
} else if processed != nil {
result = append(result, processed)
}
}
g.SetAnnotations(result)
return nil
}
// processGrantAnnotation processes a single annotation from a Grant.
// Returns nil if the annotation should be dropped.
func (a *Anonymizer) processGrantAnnotation(ann *anypb.Any) (*anypb.Any, error) {
// GrantExpandable - anonymize entitlement and resource type IDs
expandable := &v2.GrantExpandable{}
if ann.MessageIs(expandable) {
if err := ann.UnmarshalTo(expandable); err != nil {
return nil, err
}
entitlementIDs := expandable.GetEntitlementIds()
for i, eid := range entitlementIDs {
if eid != "" {
entitlementIDs[i] = a.hasher.AnonymizeExternalID(eid)
}
}
expandable.SetEntitlementIds(entitlementIDs)
resourceTypeIDs := expandable.GetResourceTypeIds()
for i, rtid := range resourceTypeIDs {
if rtid != "" {
resourceTypeIDs[i] = a.hasher.AnonymizeResourceType(rtid)
}
}
expandable.SetResourceTypeIds(resourceTypeIDs)
return anypb.New(expandable)
}
// GrantMetadata - clear metadata
grantMetadata := &v2.GrantMetadata{}
if ann.MessageIs(grantMetadata) {
if err := ann.UnmarshalTo(grantMetadata); err != nil {
return nil, err
}
grantMetadata.ClearMetadata()
return anypb.New(grantMetadata)
}
// GrantImmutable - anonymize source_id, clear metadata
grantImmutable := &v2.GrantImmutable{}
if ann.MessageIs(grantImmutable) {
if err := ann.UnmarshalTo(grantImmutable); err != nil {
return nil, err
}
if grantImmutable.GetSourceId() != "" {
grantImmutable.SetSourceId(a.hasher.Hash(grantImmutable.GetSourceId()))
}
grantImmutable.ClearMetadata()
return anypb.New(grantImmutable)
}
// ExternalLink - anonymize URL
externalLink := &v2.ExternalLink{}
if ann.MessageIs(externalLink) {
if err := ann.UnmarshalTo(externalLink); err != nil {
return nil, err
}
if externalLink.GetUrl() != "" {
externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
}
return anypb.New(externalLink)
}
// Unknown type - drop it
return nil, nil
}
func (a *Anonymizer) anonymizeGrantAnnotations(g *v2.Grant) error {
if g == nil {
return nil
}
var result []*anypb.Any
for _, ann := range g.GetAnnotations() {
if processed, err := a.processGrantAnnotation(ann); err != nil {
return err
} else if processed != nil {
result = append(result, processed)
}
}
g.SetAnnotations(result)
return nil
}
// processGrantAnnotation processes a single annotation from a Grant.
// Returns nil if the annotation should be dropped.
func (a *Anonymizer) processGrantAnnotation(ann *anypb.Any) (*anypb.Any, error) {
// GrantExpandable - anonymize entitlement and resource type IDs
expandable := &v2.GrantExpandable{}
if ann.MessageIs(expandable) {
if err := ann.UnmarshalTo(expandable); err != nil {
return nil, err
}
entitlementIDs := expandable.GetEntitlementIds()
for i, eid := range entitlementIDs {
if eid != "" {
entitlementIDs[i] = a.hasher.AnonymizeExternalID(eid)
}
}
expandable.SetEntitlementIds(entitlementIDs)
resourceTypeIDs := expandable.GetResourceTypeIds()
for i, rtid := range resourceTypeIDs {
if rtid != "" {
resourceTypeIDs[i] = a.hasher.AnonymizeResourceType(rtid)
}
}
expandable.SetResourceTypeIds(resourceTypeIDs)
anyMsg, err := anypb.New(expandable)
if err != nil {
return nil, err
}
return anyMsg, nil
}
// GrantMetadata - clear metadata
grantMetadata := &v2.GrantMetadata{}
if ann.MessageIs(grantMetadata) {
if err := ann.UnmarshalTo(grantMetadata); err != nil {
return nil, err
}
grantMetadata.ClearMetadata()
anyMsg, err := anypb.New(grantMetadata)
if err != nil {
return nil, err
}
return anyMsg, nil
}
// GrantImmutable - anonymize source_id, clear metadata
grantImmutable := &v2.GrantImmutable{}
if ann.MessageIs(grantImmutable) {
if err := ann.UnmarshalTo(grantImmutable); err != nil {
return nil, err
}
if grantImmutable.GetSourceId() != "" {
grantImmutable.SetSourceId(a.hasher.Hash(grantImmutable.GetSourceId()))
}
grantImmutable.ClearMetadata()
anyMsg, err := anypb.New(grantImmutable)
if err != nil {
return nil, err
}
return anyMsg, nil
}
// ExternalLink - anonymize URL
externalLink := &v2.ExternalLink{}
if ann.MessageIs(externalLink) {
if err := ann.UnmarshalTo(externalLink); err != nil {
return nil, err
}
if externalLink.GetUrl() != "" {
externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
}
anyMsg, err := anypb.New(externalLink)
if err != nil {
return nil, err
}
return anyMsg, nil
}
// Unknown type - drop it
return nil, nil
}
🤖 Prompt for AI Agents
In pkg/anonymize/annotations.go around lines 74 to 156, the calls to
anypb.New(...) at the end of each branch return (any, error) but the error is
ignored; update each of the four places (the GrantExpandable, GrantMetadata,
GrantImmutable and ExternalLink branches around lines ~116, ~126, ~139, ~151) to
capture the returned values, check the error and return it on failure, e.g. out,
err := anypb.New(...); if err != nil { return nil, err }; return out, nil.

Comment on lines +160 to +208
func (a *Anonymizer) anonymizeEntitlementAnnotations(e *v2.Entitlement) error {
if e == nil {
return nil
}

var result []*anypb.Any
for _, ann := range e.GetAnnotations() {
if processed, err := a.processEntitlementAnnotation(ann); err != nil {
return err
} else if processed != nil {
result = append(result, processed)
}
}

e.SetAnnotations(result)
return nil
}

// processEntitlementAnnotation processes a single annotation from an Entitlement.
// Returns nil if the annotation should be dropped.
func (a *Anonymizer) processEntitlementAnnotation(ann *anypb.Any) (*anypb.Any, error) {
// EntitlementImmutable - anonymize source_id, clear metadata
entitlementImmutable := &v2.EntitlementImmutable{}
if ann.MessageIs(entitlementImmutable) {
if err := ann.UnmarshalTo(entitlementImmutable); err != nil {
return nil, err
}
if entitlementImmutable.GetSourceId() != "" {
entitlementImmutable.SetSourceId(a.hasher.Hash(entitlementImmutable.GetSourceId()))
}
entitlementImmutable.ClearMetadata()
return anypb.New(entitlementImmutable)
}

// ExternalLink - anonymize URL
externalLink := &v2.ExternalLink{}
if ann.MessageIs(externalLink) {
if err := ann.UnmarshalTo(externalLink); err != nil {
return nil, err
}
if externalLink.GetUrl() != "" {
externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
}
return anypb.New(externalLink)
}

// Unknown type - drop it
return nil, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle errors from anypb.New().

The anypb.New() function returns (*anypb.Any, error), but errors are not checked at lines 191 and 203. This could lead to silent failures during marshaling.

Apply this diff to handle errors properly:

 		entitlementImmutable.ClearMetadata()
-		return anypb.New(entitlementImmutable)
+		anyMsg, err := anypb.New(entitlementImmutable)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// ExternalLink - anonymize URL
 	externalLink := &v2.ExternalLink{}
 	if ann.MessageIs(externalLink) {
 		if err := ann.UnmarshalTo(externalLink); err != nil {
 			return nil, err
 		}
 		if externalLink.GetUrl() != "" {
 			externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
 		}
-		return anypb.New(externalLink)
+		anyMsg, err := anypb.New(externalLink)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
🤖 Prompt for AI Agents
In pkg/anonymize/annotations.go around lines 160-208, the calls to
anypb.New(entitlementImmutable) and anypb.New(externalLink) ignore the returned
error; update both sites to capture the returned (*anypb.Any, error), check the
error and return nil, err on failure, otherwise return the created *anypb.Any
and nil. Ensure you propagate any Unmarshal errors as before and do not change
the function signature.

Comment on lines +656 to +685
// TestNestedTypeCoverage ensures all nested message types have field coverage.
// This test will fail if a new nested message type is added to the protobufs
// without being added to knownNestedMessageTypes and having a policy map created.
func TestNestedTypeCoverage(t *testing.T) {
var unknownTypes []string

for _, parent := range parentTypesWithNestedMessages {
md := parent.ProtoReflect().Descriptor()
fields := md.Fields()

for i := 0; i < fields.Len(); i++ {
field := fields.Get(i)

// Check if this field is a message type
if field.Kind().String() == "message" {
msgName := string(field.Message().FullName())

if !knownNestedMessageTypes[msgName] {
unknownTypes = append(unknownTypes,
fmt.Sprintf("%s.%s -> %s", md.FullName(), field.Name(), msgName))
}
}
}
}

if len(unknownTypes) > 0 {
t.Errorf("Unknown nested message types found - add them to knownNestedMessageTypes AND create field policy maps:\n %s",
strings.Join(unknownTypes, "\n "))
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Use typed comparison instead of string comparison for field kind.

Line 670 compares field.Kind().String() == "message" which is fragile. Use the typed constant instead:

-			if field.Kind().String() == "message" {
+			if field.Kind() == protoreflect.MessageKind {

This is more robust and doesn't rely on string representation.

You'll need to import "google.golang.org/protobuf/reflect/protoreflect" for the constant.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In pkg/anonymize/field_coverage_test.go around lines 656 to 685, replace the
fragile string comparison field.Kind().String() == "message" with a typed
comparison using the protoreflect.KindMessage constant (i.e. field.Kind() ==
protoreflect.KindMessage) and add the import
"google.golang.org/protobuf/reflect/protoreflect" to the test file; this makes
the check robust to string changes and avoids relying on the string
representation.

@kans kans force-pushed the kans/anonymize-c1zs branch from 95f131b to decf8f6 Compare December 16, 2025 23:58
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (5)
pkg/anonymize/field_coverage_test.go (1)

669-678: Use typed comparison instead of string comparison for field kind.

This uses field.Kind().String() == "message" which is fragile. Use the typed constant protoreflect.MessageKind instead for robustness.

+import "google.golang.org/protobuf/reflect/protoreflect"
+
 // In TestNestedTypeCoverage:
-			if field.Kind().String() == "message" {
+			if field.Kind() == protoreflect.MessageKind {
pkg/anonymize/annotations.go (3)

32-70: Handle errors from anypb.New().

The anypb.New() calls at lines 42 and 54 return (*anypb.Any, error), but the error is not checked. Marshaling failures could be silently ignored, leading to data corruption or loss.

 		if childRT.GetResourceTypeId() != "" {
 			childRT.SetResourceTypeId(a.hasher.AnonymizeResourceType(childRT.GetResourceTypeId()))
 		}
-		return anypb.New(childRT)
+		anyMsg, err := anypb.New(childRT)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// ExternalLink - anonymize URL
 	externalLink := &v2.ExternalLink{}
 	if ann.MessageIs(externalLink) {
 		if err := ann.UnmarshalTo(externalLink); err != nil {
 			return nil, err
 		}
 		if externalLink.GetUrl() != "" {
 			externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
 		}
-		return anypb.New(externalLink)
+		anyMsg, err := anypb.New(externalLink)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}

94-156: Handle errors from anypb.New() in grant annotation processing.

Lines 116, 126, 139, and 151 ignore the error return from anypb.New(). Apply error handling similar to the pattern in processResourceTypeAnnotation.


180-208: Handle errors from anypb.New() in entitlement annotation processing.

Lines 191 and 203 ignore the error return from anypb.New().

pkg/anonymize/grant.go (1)

45-66: Potential map key collision already noted in prior review.

The concern about hash collisions when remapping map keys was previously flagged. While the probability is negligible for practical source counts (48 bits from 12 hex chars), consider adding a brief comment acknowledging this or implementing collision detection if strict data preservation is required.

🧹 Nitpick comments (7)
pkg/anonymize/field_coverage_test.go (2)

53-68: Consider whether created_at and last_login should preserve relative ordering.

These timestamps are marked fieldPolicyAnonymize and set to a single anonymized timestamp. This approach is valid for privacy, but be aware it removes the ability to analyze temporal patterns (e.g., "users created before X" or "inactive users"). If such analysis is needed, consider preserving relative order by adding a random offset from a base date instead.


177-179: Document why the empty map is needed.

The comment explains it's currently empty, but consider adding a note that this policy map must exist to satisfy validateNoPolicyForMissingFields even when the message has no fields.

 // grantSourcesGrantSourceFieldPolicies defines the anonymization policy for GrantSources.GrantSource fields.
-// Note: This message currently has no fields, but coverage ensures future fields are handled.
+// Note: This message currently has no fields, but this empty map must exist to:
+// 1. Ensure coverage validation passes (all types need explicit policy maps)
+// 2. Force developers to update this when fields are added to GrantSources.GrantSource
 var grantSourcesGrantSourceFieldPolicies = map[string]fieldPolicy{
 	// Currently empty - no fields in GrantSources_GrantSource
 }
pkg/anonymize/anonymize.go (1)

15-28: Consider adding documentation about default salt security.

The hardcoded default salt is visible in the codebase. While past review acknowledged this, adding a brief doc comment warning that the default salt is only suitable for development/testing would help prevent misuse.

+// defaultConfig returns a Config with sensible default values.
+// WARNING: The default salt is publicly known and should only be used for
+// development/testing. For production use cases requiring strong anonymization,
+// provide a unique, random salt via Config.Salt.
 func defaultConfig() Config {
 	return Config{
 		Salt: "baton-anonymize-default-salt",
 	}
 }
pkg/anonymize/processor_test.go (1)

101-146: Deterministic test reuses the same Anonymizer instance for both runs.

Both anonymization calls on lines 118 and 121 use the same anonymizer instance created on line 116. Since the Anonymizer struct captures timestamp: time.Now() at construction time, both runs inherently share the same timestamp, which is correct for this test. However, to more rigorously test determinism across separate instantiations, consider creating two separate Anonymizer instances with the same salt.

 	// Anonymize twice with the same salt
 	salt := "deterministic-salt"
-	anonymizer := New(Config{Salt: salt})
+	anonymizer1 := New(Config{Salt: salt})
+	anonymizer2 := New(Config{Salt: salt})
 
-	_, err := anonymizer.AnonymizeC1ZFile(ctx, inputPath, outputPath1)
+	_, err := anonymizer1.AnonymizeC1ZFile(ctx, inputPath, outputPath1)
 	require.NoError(t, err)
 
-	_, err = anonymizer.AnonymizeC1ZFile(ctx, inputPath, outputPath2)
+	_, err = anonymizer2.AnonymizeC1ZFile(ctx, inputPath, outputPath2)
 	require.NoError(t, err)

Note: This would cause timestamp fields to differ between runs since time.Now() is called at construction. If timestamp determinism is required for testing, the Anonymizer may need a way to inject a fixed timestamp.

pkg/anonymize/resource.go (1)

93-160: Unknown annotation types are silently dropped.

Line 159 returns nil, nil for unrecognized annotation types, which silently removes them from the resource. This is likely intentional for safety (unknown types might contain PII), but could result in data loss if new annotation types are added to the SDK. Consider logging a warning when dropping unknown annotations for observability.

+	// Unknown type - drop it but consider logging for observability
+	// log.Debugf("Dropping unknown annotation type during anonymization: %s", ann.GetTypeUrl())
 	return nil, nil
pkg/anonymize/processor.go (2)

71-76: Remove commented-out code and fix documentation mismatch.

The comment on line 72 states "It iterates through each sync run" but the implementation only processes the latest finished sync. Additionally, lines 75-76 contain dead code that should be removed.

Apply this diff:

 // processC1File processes all data from the input file and writes anonymized data to the output file.
-// It iterates through each sync run in the input and creates a corresponding sync in the output.
+// It processes the latest finished full sync from the input and creates a corresponding sync in the output.
 func (a *Anonymizer) processC1File(ctx context.Context, input *dotc1z.C1File, output *dotc1z.C1File, stats *ProcessorStats) error {
-	// List all sync runs from the input file
-	// pageToken := ""
-	// for {
 	latestSync, err := input.GetLatestFinishedSync(ctx, reader_v2.SyncsReaderServiceGetLatestFinishedSyncRequest_builder{

77-93: Consider validating sync existence before use.

If GetLatestFinishedSync returns successfully but with no sync data (e.g., empty database), the nil-safe protobuf getters would return empty strings, potentially causing unexpected behavior downstream.

 	latestSync, err := input.GetLatestFinishedSync(ctx, reader_v2.SyncsReaderServiceGetLatestFinishedSyncRequest_builder{
 		SyncType: string(connectorstore.SyncTypeFull),
 	}.Build())
 	if err != nil {
 		return fmt.Errorf("failed to get latest finished sync: %w", err)
 	}
+	if latestSync.GetSync() == nil {
+		return fmt.Errorf("no finished full sync found in input file")
+	}
 
 	// Set the input file to view this specific sync
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 95f131b and decf8f6.

📒 Files selected for processing (17)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/processor_test.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/anonymize/table_coverage_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
  • pkg/dotc1z/sql_helpers.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/anonymize/misc_test.go
  • pkg/anonymize/resource_test.go
  • pkg/anonymize/grant_test.go
  • pkg/dotc1z/anonymize_helpers.go
  • pkg/dotc1z/sql_helpers.go
  • pkg/anonymize/anonymize_test.go
🧰 Additional context used
🧬 Code graph analysis (7)
pkg/anonymize/table_coverage_test.go (1)
pkg/dotc1z/sql_helpers.go (1)
  • AllTableNames (45-51)
pkg/anonymize/processor_test.go (2)
pkg/anonymize/anonymize.go (2)
  • New (38-47)
  • Config (16-21)
pkg/dotc1z/c1file.go (2)
  • NewC1ZFile (174-210)
  • WithReadOnly (158-162)
pkg/anonymize/resource.go (1)
pkg/anonymize/anonymize.go (2)
  • Anonymizer (31-35)
  • New (38-47)
pkg/anonymize/misc.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (31-35)
pkg/anonymize/entitlement.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (31-35)
pkg/anonymize/field_coverage_test.go (1)
vendor/google.golang.org/protobuf/reflect/protoreflect/type.go (1)
  • Descriptor (21-112)
pkg/anonymize/processor.go (3)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (31-35)
pkg/dotc1z/c1file.go (2)
  • NewC1ZFile (174-210)
  • C1File (36-54)
pkg/connectorstore/connectorstore.go (2)
  • SyncType (11-11)
  • SyncTypeFull (14-14)
🪛 GitHub Check: go-lint
pkg/anonymize/processor_test.go

[failure] 69-69:
use of fmt.Printf forbidden by pattern ^(fmt\.Print(|f|ln)|print|println)$ (forbidigo)


[failure] 56-56:
use of fmt.Printf forbidden by pattern ^(fmt\.Print(|f|ln)|print|println)$ (forbidigo)


[failure] 49-49:
use of fmt.Printf forbidden by pattern ^(fmt\.Print(|f|ln)|print|println)$ (forbidigo)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: go-test (1.25.2, windows-latest)
  • GitHub Check: go-test (1.25.2, ubuntu-latest)
🔇 Additional comments (25)
pkg/anonymize/table_coverage_test.go (3)

10-22: Well-designed table coverage tracking.

The knownAnonymizedTables map with descriptive comments provides clear documentation for developers adding new tables. The bidirectional validation (table → handler and handler → table) ensures the anonymization logic stays in sync with schema changes.


28-41: LGTM - solid coverage test.

The test correctly validates that every database table has an explicit anonymization handler, failing fast with actionable error messages that guide developers on next steps.


43-57: Good orphan detection logic.

Building a set from AllTableNames() and checking each handler against it catches stale entries that reference removed tables.

pkg/anonymize/field_coverage_test.go (3)

13-25: Clear policy enum design.

The four-value fieldPolicy enum cleanly categorizes anonymization behavior. The comments explaining each policy's purpose will help maintainers understand the intent when adding new fields.


215-241: Solid field coverage validation helper.

The function correctly uses proto reflection to enumerate all fields and checks each against the policy map. The error message is actionable, listing the specific unhandled fields.


499-543: TestFieldCoverage_AllTypes provides good CI visibility.

The table-driven test aggregates all type coverage checks, making it easy to see the full scope in CI output. The structure is clean and extensible.

pkg/anonymize/annotations.go (1)

11-28: Good nil-safety and iteration pattern.

The function correctly handles nil input and iterates all annotations, allowing multiple annotations of the same type to be processed. The comment about unknown types being dropped is helpful.

pkg/anonymize/anonymize.go (3)

37-47: Panic on invalid config is appropriate here.

Using panic for missing salt is reasonable since this is a programming error that should be caught during development, not a runtime condition. The fail-fast behavior prevents silent corruption.


72-87: LGTM - core hashing implementation.

HMAC-SHA256 with hex encoding provides deterministic, cryptographically sound anonymization. The HashN truncation handles edge cases correctly.


120-123: Good improvement on URL hash length.

Using 16 hex characters provides ~18 quintillion unique values, which is adequate for URL collision resistance.

pkg/anonymize/entitlement_test.go (3)

10-52: Comprehensive basic anonymization test.

The test verifies key behaviors: ID length preservation, display name full hash (64 chars), description replacement with sentinel, slug length preservation, and recursive anonymization of embedded resources.


54-78: Good nil handling tests.

Testing both nil entitlement and nil embedded resource ensures the anonymization code handles edge cases gracefully without panicking.


80-102: Critical determinism verification.

This test ensures that the same input always produces the same output, which is essential for maintaining referential integrity across anonymized data (e.g., grants referencing entitlements).

pkg/anonymize/entitlement.go (1)

7-55: LGTM!

The entitlement anonymization follows the established pattern with proper nil-safety, appropriate field handling (display name hashed, description replaced with placeholder, slug length-preserved), and correct error propagation for nested structures.

pkg/anonymize/misc.go (1)

7-50: LGTM!

The ResourceType anonymization follows the established pattern. The ShouldDeleteAssets and ShouldClearSessionStore methods are appropriately conservative by always returning true for anonymization scenarios, with clear documentation explaining the rationale.

pkg/anonymize/grant.go (1)

7-43: LGTM!

The grant anonymization correctly handles the ID, embedded entitlement, principal resource, sources, and annotations with proper nil-safety and error propagation.

pkg/anonymize/processor_test.go (1)

204-282: Good comprehensive test data setup.

The helper creates a realistic C1Z file with resource types, resources, entitlements, and grants with proper relationships, enabling thorough end-to-end verification.

pkg/anonymize/resource.go (3)

9-45: LGTM!

The resource anonymization is well-structured with proper nil-safety, handling of display name, description, IDs (resource, parent, external), and annotation processing with error propagation.


162-236: Comprehensive UserTrait anonymization.

Thorough handling of PII fields including emails, login credentials, structured names, and timestamps. Clearing profile and icon prevents image-based identification. Setting timestamps to a single anonymized value prevents temporal fingerprinting.


282-305: LGTM!

SecretTrait anonymization properly clears profile, anonymizes resource IDs for CreatedById and IdentityId, and normalizes timestamps to prevent fingerprinting.

pkg/anonymize/processor.go (5)

14-23: LGTM!

The ProcessorStats struct is well-designed with clear field names for tracking both counts and boolean flags for the various anonymization operations.


27-69: LGTM!

The error handling and cleanup logic is thorough—properly removing the output file on any failure after creation, while using defer for the read-only input file.


113-136: LGTM!

Clean orchestration with proper error wrapping and sequential processing of all data types.


138-210: LGTM!

The pagination pattern is well-implemented with proper page token handling, in-place anonymization of protobuf messages, and conditional batch writes.


212-284: LGTM!

The processEntitlements and processGrants methods follow the same consistent pagination and batch-write pattern as the other process methods. The repetition is acceptable here given the different types involved.

Comment on lines 110 to 118
// AnonymizeResourceID generates an anonymized resource ID, preserving the original length.
func (h *Hasher) AnonymizeResourceID(resourceID string) string {
return h.HashN(resourceID, len(resourceID))
}

// AnonymizeExternalID generates an anonymized external ID, preserving the original length.
func (h *Hasher) AnonymizeExternalID(externalID string) string {
return h.HashN(externalID, len(externalID))
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation inaccuracy: length preservation has a 64-char ceiling.

The comments for AnonymizeResourceID and AnonymizeExternalID claim to preserve original length, but HashN is capped at 64 hex characters (the HMAC-SHA256 output). Inputs longer than 64 characters will have their anonymized output truncated to 64 chars.

-// AnonymizeResourceID generates an anonymized resource ID, preserving the original length.
+// AnonymizeResourceID generates an anonymized resource ID.
+// Preserves the original length for inputs up to 64 characters; longer inputs are truncated to 64.
 func (h *Hasher) AnonymizeResourceID(resourceID string) string {
 	return h.HashN(resourceID, len(resourceID))
 }

-// AnonymizeExternalID generates an anonymized external ID, preserving the original length.
+// AnonymizeExternalID generates an anonymized external ID.
+// Preserves the original length for inputs up to 64 characters; longer inputs are truncated to 64.
 func (h *Hasher) AnonymizeExternalID(externalID string) string {
 	return h.HashN(externalID, len(externalID))
 }
🤖 Prompt for AI Agents
In pkg/anonymize/anonymize.go around lines 110 to 118, the comments for
AnonymizeResourceID and AnonymizeExternalID incorrectly state they preserve the
original length; update both comments to accurately document that the anonymized
output is produced by HashN which is capped at 64 hex characters (HMAC-SHA256
output), so inputs longer than 64 characters will be truncated to 64 characters;
alternatively, if true length preservation is required, change the
implementation to derive/extend the hash output (e.g., repeated HMAC or HKDF
expansion) to match the input length and document that behavior instead — choose
and implement one clear approach and make the comments match the actual
behavior.

// Verify the input file exists
statsFile, err := os.Stat(inputPath)
require.NoError(t, err, "Input c1z file should exist")
fmt.Printf("Input file size: %d\n", statsFile.Size())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace fmt.Printf with t.Logf to satisfy linter.

Static analysis reports fmt.Printf is forbidden. Use t.Logf for test output instead.

-	fmt.Printf("Input file size: %d\n", statsFile.Size())
+	t.Logf("Input file size: %d", statsFile.Size())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fmt.Printf("Input file size: %d\n", statsFile.Size())
t.Logf("Input file size: %d", statsFile.Size())
🧰 Tools
🪛 GitHub Check: go-lint

[failure] 49-49:
use of fmt.Printf forbidden by pattern ^(fmt\.Print(|f|ln)|print|println)$ (forbidigo)

🤖 Prompt for AI Agents
In pkg/anonymize/processor_test.go around line 49, the test uses
fmt.Printf("Input file size: %d\n", statsFile.Size()) which the linter forbids;
replace the fmt.Printf call with t.Logf("Input file size: %d", statsFile.Size())
so test output uses the testing logger rather than fmt, and remove the trailing
newline since t.Logf adds one.

anonymizer := New(Config{Salt: "test-salt-for-e2e"})
stats, err := anonymizer.AnonymizeC1ZFile(ctx, inputPath, outputPath)
require.NoError(t, err, "AnonymizeC1ZFile should succeed")
fmt.Printf("Stats: %+v\n", stats)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace fmt.Printf with t.Logf to satisfy linter.

-	fmt.Printf("Stats: %+v\n", stats)
+	t.Logf("Stats: %+v", stats)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fmt.Printf("Stats: %+v\n", stats)
t.Logf("Stats: %+v", stats)
🧰 Tools
🪛 GitHub Check: go-lint

[failure] 56-56:
use of fmt.Printf forbidden by pattern ^(fmt\.Print(|f|ln)|print|println)$ (forbidigo)

🤖 Prompt for AI Agents
In pkg/anonymize/processor_test.go around line 56, the test is using
fmt.Printf("Stats: %+v\n", stats) which fails the linter; replace that call with
t.Logf("Stats: %+v", stats) (drop the trailing newline), update imports to
remove "fmt" if it becomes unused, and run the tests/linter to verify the
warning is resolved.

// Step 3: Verify the output file exists
statsFile, err = os.Stat(outputPath)
require.NoError(t, err, "Output c1z file should exist")
fmt.Printf("Output file size: %d\n", statsFile.Size())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace fmt.Printf with t.Logf to satisfy linter.

-	fmt.Printf("Output file size: %d\n", statsFile.Size())
+	t.Logf("Output file size: %d", statsFile.Size())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fmt.Printf("Output file size: %d\n", statsFile.Size())
t.Logf("Output file size: %d", statsFile.Size())
🧰 Tools
🪛 GitHub Check: go-lint

[failure] 69-69:
use of fmt.Printf forbidden by pattern ^(fmt\.Print(|f|ln)|print|println)$ (forbidigo)

🤖 Prompt for AI Agents
In pkg/anonymize/processor_test.go around line 69, the test uses
fmt.Printf("Output file size: %d\n", statsFile.Size()); replace that call with
t.Logf("Output file size: %d", statsFile.Size()) so the test logs via testing.T;
also remove the now-unused fmt import (or adjust imports) to satisfy the linter.

@kans kans force-pushed the kans/anonymize-c1zs branch from decf8f6 to 480356d Compare December 17, 2025 00:13
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
pkg/anonymize/grant.go (1)

45-66: Note: Potential key collision in map anonymization.

As previously flagged, if two different source keys hash to the same anonymized value, one entry will be silently dropped. With 12-character hex (48 bits), collision probability is negligible for practical source counts, but this is a theoretical concern.

This is an acceptable tradeoff for the anonymization use case, as preserving exact entry counts is less critical than ensuring consistent anonymization.

Based on past review comments, this was previously noted as a minor issue.

pkg/anonymize/field_coverage_test.go (1)

670-670: Use typed comparison instead of string comparison for field kind.

Line 670 compares field.Kind().String() == "message" which is fragile and relies on string representation. Use the typed constant protoreflect.MessageKind instead for robustness.

Apply this diff:

-			if field.Kind().String() == "message" {
+			if field.Kind() == protoreflect.MessageKind {

You'll need to add the import:

 import (
 	"fmt"
 	"strings"
 	"testing"
 
 	v2 "github.com/conductorone/baton-sdk/pb/c1/connector/v2"
 	"github.com/stretchr/testify/require"
 	"google.golang.org/protobuf/proto"
+	"google.golang.org/protobuf/reflect/protoreflect"
 )

Based on past review comments, this was previously flagged.

pkg/anonymize/annotations.go (3)

180-208: Handle errors from anypb.New() to prevent silent data loss.

Lines 191 and 203 call anypb.New() but ignore the error return value. Marshaling failures will be silently dropped, potentially corrupting the anonymization output.

Apply this diff to handle errors:

 		entitlementImmutable.ClearMetadata()
-		return anypb.New(entitlementImmutable)
+		anyMsg, err := anypb.New(entitlementImmutable)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// ExternalLink - anonymize URL
 	externalLink := &v2.ExternalLink{}
 	if ann.MessageIs(externalLink) {
 		if err := ann.UnmarshalTo(externalLink); err != nil {
 			return nil, err
 		}
 		if externalLink.GetUrl() != "" {
 			externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
 		}
-		return anypb.New(externalLink)
+		anyMsg, err := anypb.New(externalLink)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}

94-156: Handle errors from anypb.New() to prevent silent data loss.

Lines 116, 126, 139, and 151 call anypb.New() but ignore the error return value. Marshaling failures will be silently dropped, potentially corrupting the anonymization output.

Apply this diff to handle errors:

 		expandable.SetResourceTypeIds(resourceTypeIDs)
-		return anypb.New(expandable)
+		anyMsg, err := anypb.New(expandable)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// GrantMetadata - clear metadata
 	grantMetadata := &v2.GrantMetadata{}
 	if ann.MessageIs(grantMetadata) {
 		if err := ann.UnmarshalTo(grantMetadata); err != nil {
 			return nil, err
 		}
 		grantMetadata.ClearMetadata()
-		return anypb.New(grantMetadata)
+		anyMsg, err := anypb.New(grantMetadata)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// GrantImmutable - anonymize source_id, clear metadata
 	grantImmutable := &v2.GrantImmutable{}
 	if ann.MessageIs(grantImmutable) {
 		if err := ann.UnmarshalTo(grantImmutable); err != nil {
 			return nil, err
 		}
 		if grantImmutable.GetSourceId() != "" {
 			grantImmutable.SetSourceId(a.hasher.Hash(grantImmutable.GetSourceId()))
 		}
 		grantImmutable.ClearMetadata()
-		return anypb.New(grantImmutable)
+		anyMsg, err := anypb.New(grantImmutable)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// ExternalLink - anonymize URL
 	externalLink := &v2.ExternalLink{}
 	if ann.MessageIs(externalLink) {
 		if err := ann.UnmarshalTo(externalLink); err != nil {
 			return nil, err
 		}
 		if externalLink.GetUrl() != "" {
 			externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
 		}
-		return anypb.New(externalLink)
+		anyMsg, err := anypb.New(externalLink)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}

32-70: Handle errors from anypb.New() to prevent silent data loss.

Lines 42 and 54 call anypb.New() but ignore the error return value. Marshaling failures will be silently dropped, potentially corrupting the anonymization output or losing annotations.

Apply this diff to handle errors:

 		if childRT.GetResourceTypeId() != "" {
 			childRT.SetResourceTypeId(a.hasher.AnonymizeResourceType(childRT.GetResourceTypeId()))
 		}
-		return anypb.New(childRT)
+		anyMsg, err := anypb.New(childRT)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
 
 	// ExternalLink - anonymize URL
 	externalLink := &v2.ExternalLink{}
 	if ann.MessageIs(externalLink) {
 		if err := ann.UnmarshalTo(externalLink); err != nil {
 			return nil, err
 		}
 		if externalLink.GetUrl() != "" {
 			externalLink.SetUrl(a.hasher.AnonymizeURL(externalLink.GetUrl()))
 		}
-		return anypb.New(externalLink)
+		anyMsg, err := anypb.New(externalLink)
+		if err != nil {
+			return nil, err
+		}
+		return anyMsg, nil
 	}
pkg/anonymize/anonymize.go (1)

24-28: Document security implications of the default salt.

The hardcoded default salt "baton-anonymize-default-salt" is publicly visible in the repository. Anyone with knowledge of the codebase can reverse the anonymization by recomputing hashes. This severely undermines anonymization for production use cases.

Add documentation warning users about this limitation:

-// defaultConfig returns a Config with sensible default values.
+// defaultConfig returns a Config with default values suitable for development/testing only.
+// WARNING: The default salt is publicly visible in the codebase and should NOT be used
+// for production or sensitive data. Generate a unique, cryptographically random salt
+// (e.g., using crypto/rand) for each anonymization context and store it securely.
 func defaultConfig() Config {
 	return Config{
-		Salt: "baton-anonymize-default-salt",
+		Salt: "baton-anonymize-default-salt", // DO NOT USE IN PRODUCTION
 	}
 }

Consider adding a package-level comment or README with guidance on generating secure salts.

🧹 Nitpick comments (3)
pkg/anonymize/anonymize.go (3)

89-93: Fix documentation to match implementation.

The comment says "using the full hash digest" but the implementation calls HashN(email, 32), truncating to 32 characters, not the full 64-character digest.

-// AnonymizeEmail generates an anonymized email address using the full hash digest, split around an @ symbol.
+// AnonymizeEmail generates an anonymized email address using a 32-character hash, split around an @ symbol.
+// Output format: 16 hex chars + "@" + 16 hex chars (total 33 characters).
 func (h *Hasher) AnonymizeEmail(email string) string {
 	e := h.HashN(email, 32)
 	return e[:len(e)/2] + "@" + e[len(e)/2:]
 }

100-108: Fix documentation to match implementation.

The comments say "using the full hash digest" but both methods use HashN with max(32, len(input)), which is not the full 64-character digest.

-// AnonymizeLogin generates an anonymized login/username using the full hash digest.
+// AnonymizeLogin generates an anonymized login/username with a minimum length of 32 characters.
+// Uses the input length if greater than 32, otherwise uses 32.
 func (h *Hasher) AnonymizeLogin(login string) string {
 	return h.HashN(login, max(32, len(login)))
 }
 
-// AnonymizeEmployeeID generates an anonymized employee ID using the full hash digest.
+// AnonymizeEmployeeID generates an anonymized employee ID with a minimum length of 32 characters.
+// Uses the input length if greater than 32, otherwise uses 32.
 func (h *Hasher) AnonymizeEmployeeID(empID string) string {
 	return h.HashN(empID, max(32, len(empID)))
 }

110-118: Fix documentation to accurately describe length preservation.

The comments claim "preserving the original length" but the implementation uses max(32, len(input)), which means inputs shorter than 32 characters will be padded to 32, not preserved at their original length.

-// AnonymizeResourceID generates an anonymized resource ID, preserving the original length.
+// AnonymizeResourceID generates an anonymized resource ID with a minimum length of 32 characters.
+// Uses the input length if greater than 32, otherwise uses 32.
 func (h *Hasher) AnonymizeResourceID(resourceID string) string {
 	return h.HashN(resourceID, max(32, len(resourceID)))
 }
 
-// AnonymizeExternalID generates an anonymized external ID, preserving the original length.
+// AnonymizeExternalID generates an anonymized external ID with a minimum length of 32 characters.
+// Uses the input length if greater than 32, otherwise uses 32.
 func (h *Hasher) AnonymizeExternalID(externalID string) string {
 	return h.HashN(externalID, max(32, len(externalID)))
 }
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between decf8f6 and 480356d.

📒 Files selected for processing (17)
  • pkg/anonymize/annotations.go (1 hunks)
  • pkg/anonymize/anonymize.go (1 hunks)
  • pkg/anonymize/anonymize_test.go (1 hunks)
  • pkg/anonymize/entitlement.go (1 hunks)
  • pkg/anonymize/entitlement_test.go (1 hunks)
  • pkg/anonymize/field_coverage_test.go (1 hunks)
  • pkg/anonymize/grant.go (1 hunks)
  • pkg/anonymize/grant_test.go (1 hunks)
  • pkg/anonymize/misc.go (1 hunks)
  • pkg/anonymize/misc_test.go (1 hunks)
  • pkg/anonymize/processor.go (1 hunks)
  • pkg/anonymize/processor_test.go (1 hunks)
  • pkg/anonymize/resource.go (1 hunks)
  • pkg/anonymize/resource_test.go (1 hunks)
  • pkg/anonymize/table_coverage_test.go (1 hunks)
  • pkg/dotc1z/anonymize_helpers.go (1 hunks)
  • pkg/dotc1z/sql_helpers.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • pkg/anonymize/resource_test.go
  • pkg/anonymize/resource.go
  • pkg/anonymize/misc_test.go
  • pkg/anonymize/anonymize_test.go
  • pkg/anonymize/entitlement_test.go
🧰 Additional context used
🧬 Code graph analysis (7)
pkg/anonymize/processor_test.go (2)
pkg/anonymize/anonymize.go (2)
  • New (38-47)
  • Config (16-21)
pkg/dotc1z/c1file.go (2)
  • NewC1ZFile (174-210)
  • WithReadOnly (158-162)
pkg/anonymize/grant.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (31-35)
pkg/anonymize/misc.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (31-35)
pkg/anonymize/processor.go (3)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (31-35)
pkg/dotc1z/c1file.go (3)
  • NewC1ZFile (174-210)
  • WithReadOnly (158-162)
  • C1File (36-54)
pkg/connectorstore/connectorstore.go (2)
  • SyncType (11-11)
  • SyncTypeFull (14-14)
pkg/anonymize/table_coverage_test.go (1)
pkg/dotc1z/sql_helpers.go (1)
  • AllTableNames (45-51)
pkg/anonymize/entitlement.go (1)
pkg/anonymize/anonymize.go (1)
  • Anonymizer (31-35)
pkg/anonymize/annotations.go (1)
pkg/anonymize/anonymize.go (2)
  • Anonymizer (31-35)
  • New (38-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: go-test (1.25.2, windows-latest)
🔇 Additional comments (13)
pkg/anonymize/grant_test.go (1)

1-179: LGTM! Comprehensive test coverage for grant anonymization.

The test suite properly covers:

  • Basic field anonymization with length validations
  • GrantSources key anonymization
  • Nil input handling (grant, entitlement, principal)
  • Deterministic behavior across identical inputs

The tests validate both top-level and nested field anonymization, ensuring no PII leakage.

pkg/anonymize/entitlement.go (1)

7-55: LGTM! Well-structured entitlement anonymization.

The implementation correctly:

  • Performs nil checks before processing
  • Anonymizes display name, description, slug, and ID appropriately
  • Recursively anonymizes embedded resource and grantable_to resource types
  • Propagates errors from recursive calls
  • Leaves enum values (purpose) unchanged as expected

The in-place modification pattern is consistent with other anonymization methods in the package.

pkg/dotc1z/sql_helpers.go (1)

42-51: LGTM! Clean implementation of table name accessor.

This provides a single source of truth for table names, used by tests to ensure all tables have anonymization coverage. The implementation is straightforward and serves its purpose well.

pkg/anonymize/processor_test.go (1)

1-360: LGTM! Comprehensive end-to-end test suite.

The test suite provides excellent coverage of the anonymization workflow:

  • End-to-end verification with complete data validation
  • Default output path behavior
  • Deterministic anonymization with consistent salts
  • Salt variance validation
  • Error handling for invalid inputs

The helper functions properly use t.Helper() and create appropriate test fixtures. The verification logic checks anonymization at multiple levels (resource types, resources, entitlements, grants) and validates both field transformations and length constraints.

Note: Past review comments mentioned fmt.Printf violations, but these appear to be resolved as no such calls are present in the current code.

pkg/anonymize/misc.go (2)

7-37: LGTM! Resource type anonymization is correctly implemented.

The method properly:

  • Performs nil checks
  • Anonymizes ID using the same hashing as ResourceId.resource_type for consistency
  • Anonymizes display name and description appropriately
  • Delegates annotation processing with error propagation
  • Leaves traits (enum) and sourced_externally (boolean) unchanged as documented

39-50: LGTM! Clear helper methods with appropriate defaults.

Both ShouldDeleteAssets and ShouldClearSessionStore return true with clear documentation explaining why assets and session data should be cleared during anonymization.

pkg/anonymize/grant.go (1)

7-43: LGTM! Grant anonymization properly handles all components.

The implementation correctly:

  • Performs nil checks before processing
  • Anonymizes grant ID
  • Recursively anonymizes embedded entitlement and principal resources
  • Delegates to helper for sources anonymization
  • Processes grant annotations with error propagation

The in-place modification pattern is consistent with other anonymization methods.

pkg/dotc1z/anonymize_helpers.go (1)

1-91: LGTM! Well-implemented data clearing helpers.

All three methods follow a consistent pattern:

  • Proper tracing with span creation
  • Database validation before operations
  • Clean SQL construction and execution
  • Correct error propagation
  • Setting dbUpdated flag after successful operations

The ClearSyncRunTimestamps method correctly uses a fixed epoch (time.Unix(0, 0)) for deterministic anonymization, ensuring repeated anonymization produces identical results.

pkg/anonymize/table_coverage_test.go (1)

1-57: LGTM! Excellent future-proofing with table coverage tests.

The bidirectional validation (all tables covered + no orphaned handlers) is a solid pattern that will catch schema additions early. The descriptive map entries provide clear documentation of how each table is handled.

pkg/anonymize/processor.go (2)

27-69: LGTM! Clean error handling and resource cleanup.

The method properly handles file lifecycle, cleanup on errors (lines 49-51, 56-58, 64-65), and returns comprehensive statistics. The output path defaulting logic is a nice convenience.


138-284: LGTM! Consistent pagination pattern across all entity types.

The four process* methods follow a clean, consistent pattern with proper pagination handling, error propagation, and batched writes. The 1000-item page size is reasonable for memory efficiency.

pkg/anonymize/anonymize.go (2)

37-47: LGTM! Clean constructor with salt validation.

The panic on empty salt is appropriate for a required configuration parameter, preventing silent failures. Capturing a single timestamp at initialization ensures consistency across all anonymized records.


72-87: LGTM! Deterministic HMAC-based hashing implementation.

The use of HMAC-SHA256 provides cryptographic strength while maintaining determinism. The HashN truncation logic correctly handles edge cases where the requested length exceeds the hash length.

Comment on lines 73 to 111
func (a *Anonymizer) processC1File(ctx context.Context, input *dotc1z.C1File, output *dotc1z.C1File, stats *ProcessorStats) error {
// List all sync runs from the input file
// pageToken := ""
// for {
latestSync, err := input.GetLatestFinishedSync(ctx, reader_v2.SyncsReaderServiceGetLatestFinishedSyncRequest_builder{
SyncType: string(connectorstore.SyncTypeFull),
}.Build())
if err != nil {
return fmt.Errorf("failed to get latest finished sync: %w", err)
}

// Set the input file to view this specific sync
if err := input.ViewSync(ctx, latestSync.GetSync().GetId()); err != nil {
return fmt.Errorf("failed to set view sync: %w", err)
}

// Start a new sync in the output file with the same type
syncType := connectorstore.SyncType(latestSync.GetSync().GetSyncType())
if _, err := output.StartNewSync(ctx, syncType, ""); err != nil {
return fmt.Errorf("failed to start sync in output file: %w", err)
}

err = a.processSyncData(ctx, input, output, stats)
if err != nil {
return fmt.Errorf("failed to process sync data: %w", err)
}

// End the sync in the output file
if err := output.EndSync(ctx); err != nil {
return fmt.Errorf("failed to end sync in output file: %w", err)
}

// Note: Assets and sessions are NOT copied to the output file.
// The output file starts fresh without these potentially identifying data.
stats.AssetsDeleted = true
stats.SessionsCleared = true

return nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove commented-out code or clarify the single-sync design.

Lines 75-76 contain commented-out code suggesting iteration over multiple syncs, but the implementation only processes the latest finished sync. This creates ambiguity about the intended design.

If processing only the latest sync is intentional, remove the commented code and update the function documentation to clarify this behavior:

 // processC1File processes all data from the input file and writes anonymized data to the output file.
-// It iterates through each sync run in the input and creates a corresponding sync in the output.
+// It processes the latest finished full sync from the input and creates a corresponding sync in the output.
 func (a *Anonymizer) processC1File(ctx context.Context, input *dotc1z.C1File, output *dotc1z.C1File, stats *ProcessorStats) error {
-	// List all sync runs from the input file
-	// pageToken := ""
-	// for {
 	latestSync, err := input.GetLatestFinishedSync(ctx, reader_v2.SyncsReaderServiceGetLatestFinishedSyncRequest_builder{

If multiple syncs should be processed, implement the loop properly.

Committable suggestion skipped: line range outside the PR's diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants