Data Migration with NSPersistentCloudKitContainer

Data migration with production user data is like repairing a running train. With NSPersistentCloudKitContainer, it is like repairing multiple running trains at the same time.

In Percento’s latest version, because I refactored the schema and the way I computed transaction entries, I have to perform computation on all users’ entries:

private func getAllEntries(context: NSManagedObjectContext) -> [Entry] {
    let request = NSFetchRequest<Entry>(entityName: Entry.entityName())
    let stock = NSPredicate(format: "account.typeValue == 202")
    let fund = NSPredicate(format: "account.typeValue == 205")
    let crypto = NSPredicate(format: "account.typeValue == 206")

    let accountType = NSCompoundPredicate(orPredicateWithSubpredicates: [stock, fund, crypto])
    let version = NSPredicate(format: "clientVersionNumber < 2070500") // 2.7.5

    request.predicate = NSCompoundPredicate(andPredicateWithSubpredicates: [accountType, version])
    request.sortDescriptors = [
        NSSortDescriptor(key: "date", ascending: true)
    ]
    var results: [Entry]?
    context.performAndWait {
        results = try? context.fetch(request)
    }
    return results ?? []
}

Long story short, I’ve soon received users reported that their history data is wrong. From the way their data looks, it looks like data migration is performed twice.

I’ve added the clientVersionNumber column to distinguish whether an entry has performed migration. So this shouldn’t happen at all, plus the fact that I’ve tested it on my device for multiple times. Why only a small set of users experience this problem also confuses me.

After god-knows-how-many-hours-I-spent, I reproduced this problem on my own device when I run the App on my iPad. It turns out that the newly added column clientVersionNumber is 0 for my history data, even though the data has already synced from remote.

This is the part you should pay attention: After you deploy new schema in the CloudKit dashboard, users with old App versions will fetch the data without the new column.

It is also a very important lesson for me to understand that there are many users like to use the same App on multiple devices.

To avoid performing the same migration for twice, I perform a CKRecord query manually before I call the migration function.

public func getCloudKitLargestEntryClientVersion(completion: @escaping (_ error: Error?, _ latestEntryclientVersionNumber: Int?) -> Void) {
    let query = CKQuery(recordType: "CD_Entry", predicate: NSPredicate(value: true))
    query.sortDescriptors = [NSSortDescriptor(key: "CD_clientVersionNumber", ascending: false)]
    let queryOperation = CKQueryOperation(query: query)
    queryOperation.resultsLimit = 1

    var fetchedRecords: [CKRecord] = []

    let perRecordBlock = { (fetchedRecord: CKRecord) -> Void in
        fetchedRecords.append(fetchedRecord)
    }
    queryOperation.recordFetchedBlock = perRecordBlock

    queryOperation.queryCompletionBlock = { (queryCursor: CKQueryOperation.Cursor?, error: Error?) -> Void in
        completion(error, fetchedRecords.first?.object(forKey: "CD_clientVersionNumber") as? Int)
    }

    CKContainer.default().privateCloudDatabase.add(queryOperation)
}
Posted 2021-06-09

More writing at jakehao.com