SQL Azure – Moving your data

 

Moving your data to the (public) cloud necessarily involves relinquishing some control over the setup and maintenance of the environment in which your data is hosted. Cloud-based hosting services such as Microsoft Azure are effectively just scalable shared hosting providers. Since parts of the server configuration are shared with other customers and (to make the service scalable) there is to be a standard template on which all instances are based, there are many system settings that your cloud provider won’t allow you to change on an individual basis.

For me, this is generally great. I’m not a DBA or SysAdmin and I have no interest in maintaining an OS, tweaking server configuration settings, installing updates, or patching hotfixes. The thought of delegating the tasks to ensure my server remains finely-oiled and up-to-date to Microsoft is very appealing.

However, this also has its own down-sides. One advantage of maintaining my own server is that, even though it might not be up-to-date or have the latest service packs applied, I know nobody else has tweaked it either. That means that, unless I’ve accidentally cocked something else up or sneezed on the delete key or something, a database-driven application that connects to my own hosted database should stay working day after day. When an upgrade is available I can choose when to apply it, and test to ensure that my applications work correctly following the upgrade according to my own plan.

Not so with SQL Azure.

Two examples of breaking changes I’ve recently experienced with SQL Azure, both seemingly as a result of changes rolled out since the July Service Release:

Firstly, if you use SQL Server Management Studio to connect and manage your SQL Azure databases, you need to upgrade SSMS to at least version 10.50.1777.0 in order to connect to an upgraded SQL Azure datacentre. This same change also broke any applications that rely on SQL Server Management Objects (including, for example, the SQL Azure Migration Wizard, resulting in the error described here). The solution to both these issues is thankfully relatively simple once diagnosed – run Windows Update and install the optional SQL Server 2008 SP1 service pack.

A more subtle change is that the behaviour of the actual SQL Azure database engine has changed, making it more comparable to Denali on-site SQL Server rather than SQL Server 2008 R2. Whereas, normally, upgrading SQL Server wouldn’t be a breaking change for most code (unless, of course, you were relying on a deprecated feature that was removed), the increase in spatial precision from 27bits to 48bits in SQL Denali means that you actually get different results from the same spatial query. Consider the following simple query:

DECLARE @line1 geometry = 'LINESTRING(0 11, 430 310)';
DECLARE @line2 geometry = 'LINESTRING(0 500, 650 0)';

SELECT @line1.STIntersection(@line2).ToString();

Previously, if you’d have run this query in SQL Azure you’d have got the same result as in SQL Server 2008/R2, which is POINT (333.88420666910952 243.16599486991572).

But then, overnight, SQL Azure is upgraded and running the same query now gives you this instead: POINT (333.88420666911088 243.16599486991646), which is consistent with the result from SQL Denali CTP3.

Not much of a difference, you might think… but think about what this means for any spatial queries that rely on exact comparison between points. How about this example using the same two geometry instances:

SELECT @line1.STIntersection(@line2).STIntersects(@line1);

SQL Azure query run in July 2011: 0. Same SQL Azure query run in August 2011: 1. Considering STIntersects() returns a Boolean, you can’t really get much more different than 1 and 0….

So, a precautionary tale: although SQL Azure hosting might have handed over the responsibility for actually performing any DB upgrades to Microsoft, the task of testing and ensuring that your code is up-to-date and doesn’t break from version to version is perhaps greater than ever, since there is no way to roll back or delay the upgrade to your little slice of the cloud.

OGR2OGR Importing Spatial Data to SQL Server

A few months back, I posted an article explaining how to import spatial data into SQL Server 2008 from any format supported by the OGR library (including ESRI shapefiles, GML, and TIGER data), using OGR2OGR. That article was written using OGR2OGR from v1.7 of the GDAL 1.7 library, which doesn’t support SQL Server 2008 directly, so I instead used OGR2OGR to create a CSV file containing spatial data in Well-Known Text format and then parsed that data in SQL Server using the STGeomFromText() method.

The good news is that things have become a bit easier since then, and version 1.8 of the GDAL library now has a MSSQLSpatial driver that can interface directly with geometry and geography data in SQL Server 2008.

The bad news is that most of the places that offer pre-compiled GDAL binaries for Windows have yet to update to the new version. FWTools, for example, still comes packaged only with v1.7. Likewise, the osgeo download site at http://download.osgeo.org/gdal/win32/ also lists GDAL versions up to v1.7.

So, if you want to get hold of the latest GDAL to import directly into SQL Server you’ll have to build it yourself from source, which can be downloaded from http://download.osgeo.org/gdal/gdal180.zip

Fortunately, the source has been very considerately packaged, and includes solution files that will build GDAL out-of-the-box in VS2005, 2008, and 2010. Simply load the .sln file, click build, and wait a few minutes:

Building GDAL 1.8 in VS2010

Then, if you look in the output directory (warmerdabldbin, by default) you should see a lovely collection of utilities for working with spatial data – GDAL (for working with raster data), and OGR (for its vector sibling).

Here’s the output of calling ogr2ogr –formats, which retrieves the list of supported vector spatial formats – note the MSSQLSpatial format supported for both read/write:

image

Example usage to load a shapefile to SQL Server as follows:

[php]
ogr2ogr -overwrite -f MSSQLSpatial "MSSQL:server=.\MSSQLSERVER2008;database=spatial;trusted_connection=yes" "TG20.shp"
[/php]

 

Render with Mapnik

I explained a little about the process by which you can overlay raster data on Bing Maps, by geo-referencing the source data (if necessary), projecting into the appropriate spherical Mercator projection, then cutting the resulting image into 256px x 256px tiles named according to the quadkey tile numbering system.

The application I used in the last example to do this was Microsoft Mapcruncher. Mapcruncher has got a lot of benefits:

  • It’s free
  • It’s a windows application with no special dependencies and easy to install
  • It’s got a nice GUI that’s very easy to use
  • It will perform geo-referencing/warping/tile-cutting for you as part of a single process

Mapcruncher is great if you have a relatively small raster image that you want to integrate into Bing Maps / Google Maps as part of a one-off process. However, there are many occasions when you might need a little more than that. As part of a current project, for example, I need to create a tileset that covers an area of approximately 60km x 50km (not that small), based on data held in SQL Server 2008 (not raster), that will be updated approximately every month (not one-off).

So I set out to evaluate alternatives, and the first I considered was mapnik.

Installing Mapnik

Mapnik is an open-source mapping toolkit. It’s what openstreetmap uses to render the tiles used in their base map imagery. They’ve got over 220Gb of XML data in the planet.osm file, covering the whole world, with thousands of updates every day, so if mapnik can render that I’m pretty certain it should be able to cope with my modest requirements.

Mapnik tiles in Open Street Map

Installing and configuring mapnik, unfortunately, turned out to be quite a challenge, and has taken me a significant amount of time to get a working installation. In fact, had I realised quite how many steps would be involved, I’d probably have taken more care to document them carefully. As it is, I’m going to try to note down in this post what I did while they’re still relatively fresh in my mind.

What about using pre-compiled Windows binaries?

Mapnik, like many open source applications, is primarily targeted at a UNIX stack. That means that a large part of the documentation will refer to commands like sudo apt-get, which will look fairly alien to many Windows users. There’s also a lot of dependencies on other packages – python, libboost, libpng, etc. which you may not be familiar with.

Now, fortunately, some kind people have taken the time to prepare pre-compiled Windows binaries of these various tools, and even packaged them together in convenient download format.

For example, the OSGeo4W package contains windows binary executables for Mapnik, along with the GDAL library (used for importing, converting of various spatial data formats), QGIS (open source desktop GIS application), and many other useful open source spatial goodies. The problem is that, with trying to keep track of all those separate dependencies, the package itself can become out-of-date quite quickly. The latest build of OSGeo4W, for example, is still based on python 2.5.2-1. The current build of the 2.x branch of python is 2.7.1, and even that represents the end-of-life release for the 2.x branch. The latest version of python is actually 3.2. I didn’t really want to start a project on software that had already been deprecated.

Likewise, the excellent FWTools project, which also bundles together many of the same packages as OSGeo4W still comes bundled with Python 2.3.4 and version 1.7 of the GDAL library. Crucially, for me, GDAL introduced support for SQL Server as a spatial data source only in version 1.8, so using FWTools wasn’t an option either.

What’s more, Mapnik itself seems to have forked into two versions – the current stable release being 0.7.1, but with many comments being made about breaking changes in the new 2.x development version. The only precompiled windows binaries I could find were of the 0.7 version and, again, I didn’t want to invest a lot of time setting up a project based on software that was about to go out of date.

So, precompiled binaries was, at this stage, a no-go.

Build-It-Yourself

I decided I was going to have to build my own installation, and here was my wish list:

  • Python 3.2
  • GDAL 1.8
  • Mapnik 2.x

Python was (thankfully) easy to install – there’s an installer package available from http://python.org/ftp/python/3.2/python-3.2.msi

GDAL was also not too bad – there are x86 and x64 packages (bundled with mapserver) that you can download from http://www.gisinternals.com/sdk/

Now onto Mapnik. And it is here, with retrospect, that I wish I’d stated taking notes. You can download the source for mapnik from here. However, before compiling mapnik itself, you need to download and/or compile its required dependencies. These are: proj4, boost, zlib, freetype, icu, libxml2, libpng, libjpeg, and libtiff.

Now, you could download the source for each of these and build them separately but fortunately, once again, some kind soul has done much of this work for us. If you go to the gnuwin32 project on sourceforge, you’ll find links to download most of these libraries. For those packages not included in gbuwin32, ICU is available from here, and you can get the latest zlib from here.

Once you’ve got all the dependencies sorted out, it’s onto the configuration changes. If you follow the article at http://trac.mapnik.org/wiki/Python3k you’ll see a number of steps required to rebuild the python bindings with Mapnik to target Python 3.x. After some fiddling about with these and a bit of guesswork, I managed eventually to get everything built.

Testing it Out

Mapnik comes with a test script so, gingerly, I tried running it. Lots of errors – couldn’t find xxx etc. I realised this was because I hadn’t set the environment variables and paths correctly so, after sorting this out, I had another go. This time looked a lot more promising:

image

And, lo and behold, here was the (beautiful) example image generated:

demo_high

Over-confident of my new found ability, I then tried altering the example rundemo.py script to point at a SQL Server datasource. Mapnik supports OGR datasources, so I first created a virtual layer that connected to my SQL Server. For the purposes of testing, I decided to select a set of data from the OS VectorMap District settlement area data (note that I’m not sure if OGR can deal with SQL Server’s native binary geography/geometry format, so I use STAsText() to get the WKT representation and then specify WKT encoding in the GeometryField):

[php]

MSSQL:server=.SQLEXPRESS;database=OSVectorMap;trusted_connection=yes
SELECT geom27700.STAsText() AS geomWKT FROM TG11_Settlement_Area</pre>
[/php]

Testing the virtual layer with ogrinfo seemed to suggest that everything was working ok:

image

So then I modifed the python script to add the new layer. Note that OS Vectormap data is defined using the OSGB British National Grid coordinate system (EPSG:27700), and mapnik expects the parameters for the srs property to use PROJ4 syntax, which you can get from http://spatialreference.org/ref/epsg/27700/proj4/:

[php]vectormap_lyr = mapnik.Layer(‘OS Vectormap’)
vectormap_lyr.srs = "+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +datum=OSGB36 +units=m +no_defs"

vectormap_lyr.datasource = mapnik.Ogr(file=’MSSQL.ovf’,layer="AASQLlayer")</pre>
[/php]

Unfortunately, here I hit another problem:

image

And it seems here my luck has run out, because I simply can’t work out how to get round this one. The error message is simply “Failed to open datasource”, yet ogrinfo confirms that the datasource is fine, and other GDAL/OGR components can read from it, so I don’t know if it’s because I built mapnik wrong or forgot to change/include a particular setting.

I did just notice that there’s a Google Summer of Code Project to improve Mapnik installation on Windows, and I’m really hoping it’s successful because the results generated by Mapnik are beautiful.

As a workaround, I actually used OGR2OGR to export my data from SQL Server into a shapefile called MSSQL_export.shp, and then used that as a datasource in Mapnik by changing the python datasource to:

[php]
<pre>vectormap_lyr.datasource = mapnik.Shapefile(file=’MSSQL_export’)</pre>
[/php]

Finally, after a bit of XML styling, I was able to get the following image (click for full size):

image

I’m actually really pleased with the image quality, but until I can get Mapnik to retrieve data directly from SQL Server there’s not much point proceeding with the tilecutting process – I can’t really justify an additional step of exporting from SQL Server to shapefile just to get Mapnik to load it.

If anyone has had any success of getting Mapnik and SQL Server to play nicely together, please let me know!