Saving Data Safely

With the advent of Gingerbread, we’re going to be running a series of posts in this space about the aspects of Android 2.3 that developers should care about. One thing that developers should care about more than anything else is not losing data. The rules are changing slightly as Gingerbread arrives, so I thought that would be a good starting point. I didn’t write this; I pulled it together from the contents of an email thread involving Android engineers Brad Fitzpatrick, Dianne Hackborn, Brian Swetland, and Chris Tate.

The question is: how do you make really sure your data’s been written to persistent storage? The answer involves a low-level system call named fsync(). Old C programmers like me mostly learned this the hard way back in the Bad Old Days; in 2008 at OSCON I immensely enjoyed Eat My Data: How Everybody Gets File IO Wrong by Stewart Smith; I’ve included a picture I took of one of his slides.

The reason this should be of concern to Android developers is that with 2.3, an increasing proportion of devices, notably including the Nexus S, are going to be moving from YAFFS to the ext4 filesystem, which buffers much more aggressively; thus you need to be more assertive about making sure your data gets to permanent storage when you want it to.

If you just use SharedPreferences or SQLite, you can relax, because we’ve made sure they Do The Right Thing about buffering. But if you have your own on-disk format, keep in mind that your data doesn’t actually consistently reach the flash chip when you write() it or even when you close() it. There are several layers of buffering between you and the hardware! And because of ext4 buffering policy, any POSIX guarantees that you thought you had before (but actually didn’t), you especially don’t have now.

Some Android devices are already running non-YAFFS filesystems, but as we brought up the Nexus S, buffering issues have actually bitten us a couple of times in framework code. When the Gingerbread source code becomes available, you’ll find lots of examples of how file I/O should be done.

To start with, for raw data consider using one of the synchronous modes of java.io.RandomAccessFile, which take care of calling fsync() for you in the appropriate way. If you can’t, you’ll want Java code that looks something like this.

     public static boolean sync(FileOutputStream stream) {
         try {
             if (stream != null) {
                 stream.getFD().sync();
             }
             return true;
         } catch (IOException e) {
         }
         return false;

In some applications, you might even want to check the return status of the close() call.

Now of course, there are two sides to this story. When you call fsync() and force the data onto storage, that can be slow; worse, unpredictably slow. So be sure to call it when you need it, but be careful not to call it carelessly.

Background reading if you want to know more:

A Googly MySQL Cluster Talk

Google TechTalks April 28, 2006 Stewart Smith Stewart Smith works for MySQL AB as a software engineer working on MySQL Cluster. He is an active member of the free and open source software community, especially in Australia. ABSTRACT Part 1 – Introduction to MySQL Cluster The NDB storage engine (MySQL Cluster) is a high-availability storage engine for MySQL.

http://www.youtube.com/v/HJ930zMk96U?f=videos&app=youtube_gdata