Documents List API

 

There are a number of ways to add resources to your Google Documents List using the API. Most commonly, clients need to upload an existing resource, rather than create a new, empty one. Legacy clients may be doing this in an inefficient way. In this post, we’ll walk through why using resumable uploads makes your client more efficient.

The resumable upload process allows your client to send small segments of an upload over time, and confirm that each segment arrived intact. This has a number of advantages.

Resumable uploads have a customizable memory footprint on client systems

Since only one small segment of data is sent to the API at a time, clients can store less data in memory as they send data to the API. For example, consider a client uploading a PDF via a regular, non-resumable upload in a single request. The client might follow these steps:

  1. Open file pointer to PDF
  2. Pass file pointer and PDF to client library
  3. Client library starts request
  4. Client library reads 100,000 bytes and immediately sends 100,000 bytes
  5. Client library repeats until all bytes sent
  6. Client library returns response

But that 100,000 bytes isn’t a customizable value in most client libraries. In some environments, with limited memory, applications need to choose a custom chunk size that is either smaller or larger.

The resumable upload mechanism allows for a custom chunk size. That means that if your application only has 500KB of memory available, you can safely choose a chunk size of 256KB.

Resumable uploads are reliable even though a connection may not be

In the previous example, if any of the bytes fail to transmit, this non-resumable upload fails entirely. This often happens in mobile environments with unreliable connections. Uploading 99% of a file, failing, and restarting the entire upload creates a bad user experience. A better user experience is to resume and upload only the remaining 1%.

Resumable uploads support larger files

Traditional non-resumable uploads via HTTP have size limits depending on both the client and server systems. These limits are not applicable to resumable uploads with reasonable chunk sizes, as individual HTTP requests are sent for each chunk of a file. Since the Documents List API now supports file sizes up to 10GB, this is very important.

Resumable upload support is already in the client libraries for Google Data APIs

The Java, Python, Objective-C, and .NET Google Data API client libraries all include a mechanism by which you can initiate a resumable upload session. Examples of uploading a document with resumable upload using the client libraries is detailed in the documentation. Additionally, the new Documents List API Python client library now uses only the resumable upload mechanism. To use that version, make sure to follow these directions.

Deleting Blank Tiles

 

Creating raster tilesets almost invariably leads to the creation of some blank tiles – covering those areas of space where no features were present in the underlying dataset. Depending on the image format you use for your tiles, and the method you used to create them, those “blank” tiles may be pure white, or some other solid colour, or they may have an alpha channel set to be fully transparent.

Here’s an example of a directory of tiles I just created. In this particular z/x directory, more than half the tiles are blank. Windows explorer shows them as black but that’s because it doesn’t process the alpha channel correctly. They are actually all 256px x 256px PNG images, filled with ARGB (0, 0, 0, 0):

image

What to do with these tiles? Well, there’s two schools of thought:

  • The first is that they should be retained. They are, after all, valid image files that can be retrieved and overlaid on the map. Although they aren’t visually perceptible, the very presence of the file demonstrates that the dataset was tested at this location, and confirms that no features exist there. This provides important metadata about the dataset in itself, and confirms the tile generation process was complete. The blank images themselves are generally small, and so storage is not generally an issue.
  • The opposing school of thought is that they should be deleted. It makes no sense to keep multiple copies of exactly the same, blank tile. If a request is received for a tile that is not present in the dataset, the assumption can be made that it contains no data, and a single, generic blank tile can be returned in all such instances – there is no benefit of returning the specific blank tile associated with that tile request. This not only reduces disk space on the tile server itself, but the client needs only cache a single blank tile that can be re-used in all cases where no data is present.

I can see arguments in favour of both sides. But, for my current project, disk and cache space is at a premium, so I decided I wanted to delete any blank tiles from my dataset. To determine which files were blank, I initially thought of testing the filesize of the image. However, even though I knew that every tile was of a fixed dimension (256px x 256px), an empty tile can still vary in filesize according to the compression algorithm used. Then I thought I could loop through each pixel in the image and use GetPixel() to retrieve the data to see whether the entire image was the same colour, but it turns out that GetPixel() is slooooowwwww….

The best solution I’ve found is to use an unsafe method, BitMap.LockBits to provide direct access to the pixel byte data of the image, and then read and compare the byte values directly. In my case, my image tiles are 32bit PNG files, which use 4 bytes per pixel (BGRA), and my “blank” tiles are completely transparent (Alpha channel = 0). Therefore, in my case I used the following function, which returns true if all the pixels in the image are completely transparent, or false otherwise:

public static Boolean IsEmpty(string imageFileName)
{
  using (Bitmap b = ReadFileAsImage(imageFileName))
  {
    System.Drawing.Imaging.BitmapData bmData = b.LockBits(new Rectangle(0, 0, b.Width, b.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, b.PixelFormat);
    unsafe
    {
      int PixelSize = 4; // Assume 4Bytes per pixel ARGB
      for (int y = 0; y < b.Height; y++)
      {
        byte* p = (byte*)bmData.Scan0 + (y * bmData.Stride);
        for (int x = 0; x < b.Width; x++)         {           byte blue = p[x * PixelSize]; // Blue value. Just in case needed later           byte green = p[x * PixelSize + 1]; // Green. Ditto           byte red = p[x * PixelSize + 2]; // Red. Ditto           byte alpha = p[x * PixelSize + 3];           if (alpha > 0) return false;
        }
      }
    }
    b.UnlockBits(bmData);
  }

  return true;
}

 

It needs to be compiled with the /unsafe option (well, it did say in the title that this post was dangerous!). Then, I just walked through the directory structure of my tile images, passing each file into this function and deleting those where IsEmpty() returned true.