Parsing Free-Text Addresses and a UK Postcode Regular Expression Pattern

We’ll be attempting to replicate the functionality of Google Maps using nothing but freely-available tools and data – SQL Server Express, OS Open Data, and a dash of Silverlight.

One of the features I’ll be demonstrating is a basic geocoding function – i.e. given an address, placename, or landmark, how do you look up and return the coordinates representing the location so that the map can centre on that place? This is not really a spatial question at all – it’s a question of parsing a free-text user input and using that as the basis of a text search of the database.

The simplest way of doing this is to force your users to enter Street Number, Street Name, Town, and Postcode in separate input elements (and these match the fields in your database). In this case, your query becomes straightforward:

SELECT X, Y FROM AddressDatabase WHERE StreetNumber = ‘10’ AND StreetName = ‘Downing Street’ AND Town=’London’

Most databases don’t contain the location of every individual address. If there is no exact matching StreetNumber record, then you typically find the closest matching properties on the same road and interpolate between them (it seems reasonable to assume that Number 10 Downing Street will be somewhere between Number 9 and Number 11).

Forcing users to enter each element of the address separately doesn’t necessarily create the most attractive UI, however. What’s more common is to use a single free-text search box into which users can type whatever they’re searching for – a placename, address, landmark, postcode etc. Nice UI, but horrible to make sense of the input. In these cases, the user might supply:

“10 Downing Street, London”

“Downing Street, St James’, LONDON”

“10, Downing St. SW1A 2AA”

…not to mention “10 Downig Street. London”, and any other many of misspellings or alternative formats.

One approach you might want to take in these cases is to use a RegEx pattern matcher to determine if any part of the string supplied is a postcode. The UK postcode format is defined by British Standard BS7666, and can be described using the following regular expression pattern:

(GIR 0AA|[A-PR-UWYZ]([0-9][0-9A-HJKPS-UW]?|[A-HK-Y][0-9][0-9ABEHMNPRV-Y]?) [0-9][ABD-HJLNP-UW-Z]{2})

Matching the supplied address string against this RegEx doesn’t prove that a valid postcode was supplied, but just that some part of the user input matched the format for a postcode. The matching substring can then be looked up (say, against the CodePoint Open dataset) to confirm that it is real.

Once you’ve identified the postcode, you can then run a query to retrieve a list of roadnames that lie in that postcode, from something like the OSLocator dataset, and scan the remainder of the input to see if it contains any of those names. You can also scan for any numeric characters in the first part of the text input, which might represent a house number. If you find a matching property, with the same road name and valid postcode, you can be pretty sure you’ve found a match.

If you find more than valid match, or possibly several partial matches only, then you can of course present a disambiguation dialogue box – “Is this the 10 Downing Street you meant?”. For example, there are many “10 Downing Street”s in the UK – from Liverpool to Llanelli and Farnham to Fishwick…. without knowing either the town or the postcode, it could have referred to any of the following:

image

SQL Server: Easy Bulk Loading OS Locator Road Data

I’ve just loaded the Ordnance Survey “OS Locator” dataset (part of Ordnance Survey Open Data) into SQL Server 2008. OS Locator contains details of all the roads in Britain, in a gazetteer-style format. It is a 120Mb delimited text file, containing details of around 790,000 road entities (some roads are split into multiple entities if they cross districts etc.) – including the road name, the coordinates of its bounding box and centrepoint, classification, and the county and area in which it is located. (Note that this dataset doesn’t include the geometry of the shape of the road itself). You can find the technical specifications for the OS Locator dataset here: http://www.ordnancesurvey.co.uk/oswebsite/products/oslocator/docs/user_guide.pdf

Since the file is text-based, there’s a range of options available to import it – you could create an SSIS package, or use the import/export wizard, or one of a variety of third party ETL tools. However, for an absolutely dead-easy way to query the data and create a geometry point representing the centre of each road using purely T-SQL, I used the OPENROWSET bulk function in conjunction with a format file.

Not only is this method easy, but it’s utterly repeatable so (assuming the structure of the source data remains constant) you can run it again each time the underlying data gets refreshed. OPENROWSET allows you to query the text file source directly, as if it were a table in the database, from which you can SELECT columns of data, INSERT them into other tables etc.

The following shows the xml format file I used to specify the columns of data in the OS Locator file:

[php]<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="0" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="7" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="9" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="10" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="11" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="12" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="13" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="14" xsi:type="CharTerm" TERMINATOR=":"/>
<FIELD ID="15" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="0" NAME="Name" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="1" NAME="Classification" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="Centx" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="Centy" xsi:type="SQLINT"/>
<COLUMN SOURCE="4" NAME="Minx" xsi:type="SQLINT"/>
<COLUMN SOURCE="5" NAME="Maxx" xsi:type="SQLINT"/>
<COLUMN SOURCE="6" NAME="Miny" xsi:type="SQLINT"/>
<COLUMN SOURCE="7" NAME="Maxy" xsi:type="SQLINT"/>
<COLUMN SOURCE="8" NAME="PostSector" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="9" NAME="Settlement" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="10" NAME="Locality" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="11" NAME="Cou_Unit" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="12" NAME="LocalAuth" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="13" NAME="Tile_10k" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="14" NAME="Tile_25k" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="15" NAME="Source" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>[/php]

Assuming that the format file above is saved as c:OSLocator_formatfile.xml, and the OS Locator download itself is saved as c:OS_Locator2010_2_Open.txt (the filename of the most recent locator download at the time of writing), then you can load the Locator dataset directly in SQL Server, creating a geometry point at the centre of each road in a single query as follows:

[php]SELECT *,
geometry::Point(Centx,Centy,27700) AS Centre
FROM OPENROWSET(
BULK ‘C:OS_Locator2010_2_Open.txt’,
FORMATFILE=’C:OSLocator_formatfile.xml’
) AS CSV;[/php]

And here’s what the data looks like:

image

Here’s the spatial results tab displaying the centre points of the first 5,000 rows of data:

image

And here’s what a small section of Norwich looks like having created a bounding box from the min/max coordinates of each road and overlaying them on Bing Maps (having first transformed the coordinates from EPSG:27700 to EPSG:4326):

image

Electric vehicle charging stations on Google Maps

At Google, we’ve been supporting environmentally-friendly cars through initiatives such as RechargeIT, an effort to accelerate the adoption of electric vehicles. As consumers embrace electric vehicles, we’ve also been adding information to Google Maps over the past few months to help users find charging stations. Just search on Google Maps for “ev charging station” plus the appropriate area, for example, “ev charging station near mountain view ca.”

We’d like to continue adding more charging stations to Google Maps, so we’re excited that our friends over at the the U.S. Department of Energy’s National Renewable Energy Laboratory (NREL) are working to make more data available.

NREL has launched the GeoEVSE Forum to help develop a detailed, accurate database of electric vehicle charging stations. We’ll be adding more EV charging locations to Google Maps as their database, and others, are updated. As always, you can use the Report a problem tool to inform us of any corrections.

If you represent an automaker, an electric vehicle charging station manufacturer or installer, an EV consumer group or any other interested party, you can join the GeoEVSE Forum by visiting this site.

We look forward to adding even more information to Google Maps to help you recharge your electric vehicle. Happy green driving!