You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm putting this here as a discussion because I can't get this bug to reliably reproduce.
Summary
When many records are created quickly on one device, handleFetchedRecordZoneChanges on a receiving device can process child records before their parent records have been upserted. The child records are correctly added to unsyncedRecordIDs for retry on FK constraint failure, but the parent records that failed to upsert are silently swallowed by withErrorReporting and never queued for retry. This creates a permanent deadlock where child records retry forever but their parents never exist locally.
Steps to Reproduce
Set up two devices syncing via CKSyncEngine
On Device A, rapidly create many records where each creates both a parent record and child records with foreign key references to that parent
Observe Device B
Expected Behavior
All records appear on Device B.
Actual Behavior
Some child records are permanently stuck in unsyncedRecordIDs along with their dependents. Their referenced parent records exist in sqlitedata_icloud_metadata but not in the user database. App restarts do not recover the stuck records.
Root Cause Analysis
The issue spans two code paths in handleFetchedRecordZoneChanges:
1. Parent records fail silently
In upsertFromServerRecord(_:db:) (line 1850), the entire body is wrapped in withErrorReporting, which catches and swallows errors. When a parent record fails to upsert, it is never added to unsyncedRecordIDs. It exists in sync metadata but not in the user database — effectively lost.
2. Child records are added to retry queue but can never succeed
When a child record hits a FK constraint violation (lines 1918-1930), it's correctly added to unsyncedRecordIDs. But on subsequent sync cycles, the retry logic (lines 1473-1522) re-fetches the child from CloudKit and tries to upsert again. It fails again because the parent still doesn't exist locally — and the parent was never queued for retry.
3. The retry path also has a silent failure mode
Even if a retry fetch from CloudKit fails, it's silently skipped:
case .failure:
continue // line 1515-1516
The record remains in unsyncedRecordIDs but is never actually retried successfully.
Contributing Factor
The modifications array is sorted topologically at line 1524, but only after merging with previously unsynced records. The initial delivery from CKSyncEngine is not guaranteed to arrive in topological order, so parent records may appear after their children in the batch.
Possible Fixes
Add failed parent records to unsyncedRecordIDs: When upsertFromServerRecord fails for any reason (not just FK constraints), add the record to the retry queue so it isn't silently lost.
Retry from local metadata instead of CloudKit: The _lastKnownServerRecordAllFields blob in sqlitedata_icloud_metadata already contains the full record data. The retry could upsert from this local copy instead of re-fetching from CloudKit, avoiding the silent case .failure: continue path.
Ensure topological ordering of the initial CKSyncEngine delivery: Sort the incoming modifications before processing, not just after merging with unsynced records.
Workaround
Calling deleteLocalData() or deleting and reinstalling the app forces a full re-sync from CloudKit, which recovers the stuck records.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I'm putting this here as a discussion because I can't get this bug to reliably reproduce.
Summary
When many records are created quickly on one device,
handleFetchedRecordZoneChangeson a receiving device can process child records before their parent records have been upserted. The child records are correctly added tounsyncedRecordIDsfor retry on FK constraint failure, but the parent records that failed to upsert are silently swallowed bywithErrorReportingand never queued for retry. This creates a permanent deadlock where child records retry forever but their parents never exist locally.Steps to Reproduce
Expected Behavior
All records appear on Device B.
Actual Behavior
Some child records are permanently stuck in
unsyncedRecordIDsalong with their dependents. Their referenced parent records exist insqlitedata_icloud_metadatabut not in the user database. App restarts do not recover the stuck records.Root Cause Analysis
The issue spans two code paths in
handleFetchedRecordZoneChanges:1. Parent records fail silently
In
upsertFromServerRecord(_:db:)(line 1850), the entire body is wrapped inwithErrorReporting, which catches and swallows errors. When a parent record fails to upsert, it is never added tounsyncedRecordIDs. It exists in sync metadata but not in the user database — effectively lost.2. Child records are added to retry queue but can never succeed
When a child record hits a FK constraint violation (lines 1918-1930), it's correctly added to
unsyncedRecordIDs. But on subsequent sync cycles, the retry logic (lines 1473-1522) re-fetches the child from CloudKit and tries to upsert again. It fails again because the parent still doesn't exist locally — and the parent was never queued for retry.3. The retry path also has a silent failure mode
Even if a retry fetch from CloudKit fails, it's silently skipped:
The record remains in
unsyncedRecordIDsbut is never actually retried successfully.Contributing Factor
The modifications array is sorted topologically at line 1524, but only after merging with previously unsynced records. The initial delivery from CKSyncEngine is not guaranteed to arrive in topological order, so parent records may appear after their children in the batch.
Possible Fixes
Add failed parent records to
unsyncedRecordIDs: WhenupsertFromServerRecordfails for any reason (not just FK constraints), add the record to the retry queue so it isn't silently lost.Retry from local metadata instead of CloudKit: The
_lastKnownServerRecordAllFieldsblob insqlitedata_icloud_metadataalready contains the full record data. The retry could upsert from this local copy instead of re-fetching from CloudKit, avoiding the silentcase .failure: continuepath.Ensure topological ordering of the initial CKSyncEngine delivery: Sort the incoming modifications before processing, not just after merging with unsynced records.
Workaround
Calling
deleteLocalData()or deleting and reinstalling the app forces a full re-sync from CloudKit, which recovers the stuck records.Beta Was this translation helpful? Give feedback.
All reactions