Crème’s data storage adventures

Jul 16, 2011

The first version of Crème used Core Data. Part of the reason was that I simply wanted to learn the framework, and I’m glad I did it. It’s a really powerful approach and I’m sure I’ll use it in my next desktop applications, when I ever make any.

For Crème, though, it was a bad choice, made worse by my eagerness to model everything as full-blown objects, even when not really necessary. I made a pretty elaborate schema, when in reality I could have done what I do now — keep most data as JSON blobs and keep only the indexes and metadata in some relational form. On devices, large complex tables and large record sizes mean complexity and slower performance. Core Data does many smart things to get around that, but I still hit the wall at one point.

A big problem I had was with multithreading. I would put objects into Core Data in a background thread, and signal the main thread which would try to read them… and they simply weren’t there. Now, there are warnings over all the docs that threads are hard to begin with, and with Core Data even more so. I’m not blaming the system, I’m sure it worked correctly, I was simply unable to bend it to my will completely.

Brent Simmons of NetNewswire has an excellent post that sums up many of my troubles. I’ll add that you can’t create multicolumn indexes in Core Data, you can only index individual properties. So with more complex predicates, performance will go down.

So, I figured I need alternatives, and asked myself, what’s the OPPOSITE of Core Data? What’s the simplest way to store data? Forget any SQL and such.

Cocoa has an easy answer—you can marshal and unmarshal Foundation objects (like NSDictionary, NSArray…) to disk. And with being able to filter arrays containing dictionaries quite easily with keypath predicates, you have a rudimentary database. I was curious to see how far I would get this approach, and started by structuring the data correctly. All the full objects (tweets, DMs) are stored as separate JSON blobs, and I construct simple separate indexes for the metadata (e.g tweet IDs ordered by time for a given view).

This simple object marshaling actually worked quite well, but performance started to suck quite quickly. Disk access is really expensive on devices, and when you need to open and close a lot of files, it has massive overhead.

The third and final approach was a tried and true SQLite database over the FMDB bridge. I hadn’t written raw SQL in a while, it brought me back nice memories of my earlier web developer days. But a properly structured and indexed database on a device can work wonders. As Simmons says, you can examine the query plans and tweak the indexes until you don’t see any full scans for common operations any more. Not so with Core Data, that is basically a black box whom you can’t really examine and tweak.

One caveat is that FMDB’s docs are kind of lacking. But the source and headers are not that long and complicated, you can just look at those, or the rudimentary bundled example.

When you pull down in the Crème view to update your data, you’ll notice that it’s doing “housekeeping.” This means removing older read tweets from the indexes to keep the app fairly snappy. Crème stores a few hundred tweets per page. So if you’ve added twenty pages to Crème, theoretically we’d be talking about 4,000 objects. This works fine.

Out of curiosity, I initially did not have the housekeeping code and let the database grow as much as it liked and wanted to see what impact it has on performance. It grew a bit above 20,000 objects. Everything still functioned fine, but some of the loading delays became too big for my liking, upwards of several seconds. So hence housekeeping.

One thing not to be under-appreciated is the developer experience and how easy it is to “get to the metal” and poke around in each of the data objects. With Core Data, you’re effectively working with a black box. You have neither visibility nor control into how it really works. Yes, you can debug the SQL, and you can poke around the constructed SQLite database, but you feel dirty and that you’re not supposed to be doing that, kind of like peeking at your older brother’s girlie magazines when he’s not looking.

Using Foundation objects marshaled to/from disk, while terrible from a runtime performance perspective, is delightful, when set up correctly. When you keep Finder open next to Simulator, you can see the data objects appearing/disappearing on disk in realtime. When you’re interested, you can launch any of them directly in a text editor and see if they were saved or loaded correctly. When there’s a bug, it’s easy to spot and trace.

SQLite is similar to that, but you’ll need a tool. The commandline tool works, but all that typing is a waste of time. I don’t know of any good free SQLite GUIs for the Mac, so I got Base from the Mac App Store and it worked great.

None of this helps when working on a device though. The best I can tell, there’s no way to easily poke around the device’s filesystem as a developer if you haven’t jailbroken anything, even just to explore your app’s guts. You just have to get it working on Simulator and hope for the best. I hope Apple is considering some kind of developer/debugging bridge to the device’s file system for this kind of storage debugging in the future versions of Xcode and iOS.