Parsing Free-Text Addresses and a UK Postcode Regular Expression Pattern

We’ll be attempting to replicate the functionality of Google Maps using nothing but freely-available tools and data – SQL Server Express, OS Open Data, and a dash of Silverlight.

One of the features I’ll be demonstrating is a basic geocoding function – i.e. given an address, placename, or landmark, how do you look up and return the coordinates representing the location so that the map can centre on that place? This is not really a spatial question at all – it’s a question of parsing a free-text user input and using that as the basis of a text search of the database.

The simplest way of doing this is to force your users to enter Street Number, Street Name, Town, and Postcode in separate input elements (and these match the fields in your database). In this case, your query becomes straightforward:

SELECT X, Y FROM AddressDatabase WHERE StreetNumber = ‘10’ AND StreetName = ‘Downing Street’ AND Town=’London’

Most databases don’t contain the location of every individual address. If there is no exact matching StreetNumber record, then you typically find the closest matching properties on the same road and interpolate between them (it seems reasonable to assume that Number 10 Downing Street will be somewhere between Number 9 and Number 11).

Forcing users to enter each element of the address separately doesn’t necessarily create the most attractive UI, however. What’s more common is to use a single free-text search box into which users can type whatever they’re searching for – a placename, address, landmark, postcode etc. Nice UI, but horrible to make sense of the input. In these cases, the user might supply:

“10 Downing Street, London”

“Downing Street, St James’, LONDON”

“10, Downing St. SW1A 2AA”

…not to mention “10 Downig Street. London”, and any other many of misspellings or alternative formats.

One approach you might want to take in these cases is to use a RegEx pattern matcher to determine if any part of the string supplied is a postcode. The UK postcode format is defined by British Standard BS7666, and can be described using the following regular expression pattern:

(GIR 0AA|[A-PR-UWYZ]([0-9][0-9A-HJKPS-UW]?|[A-HK-Y][0-9][0-9ABEHMNPRV-Y]?) [0-9][ABD-HJLNP-UW-Z]{2})

Matching the supplied address string against this RegEx doesn’t prove that a valid postcode was supplied, but just that some part of the user input matched the format for a postcode. The matching substring can then be looked up (say, against the CodePoint Open dataset) to confirm that it is real.

Once you’ve identified the postcode, you can then run a query to retrieve a list of roadnames that lie in that postcode, from something like the OSLocator dataset, and scan the remainder of the input to see if it contains any of those names. You can also scan for any numeric characters in the first part of the text input, which might represent a house number. If you find a matching property, with the same road name and valid postcode, you can be pretty sure you’ve found a match.

If you find more than valid match, or possibly several partial matches only, then you can of course present a disambiguation dialogue box – “Is this the 10 Downing Street you meant?”. For example, there are many “10 Downing Street”s in the UK – from Liverpool to Llanelli and Farnham to Fishwick…. without knowing either the town or the postcode, it could have referred to any of the following:

image

Churchill on Ramping Down

“This is no time for ease and comfort.
It is the time to dare and endure.”

–Winston Churchill (1874–1965)
British prime minister during WWII

Sales revolution…

Too many people will lose today’s productivity in anticipation of the weekend.

On Monday, too many people will complain about the start of the week (search Twitter for the word “Mondays” at 8 am for proof).

Could you spark a little positive revolution and help someone else break out of the TGIF mentality (or yourself if it applies)? Could you help inspire a “let’s kick some @$$” Monday morning start to the week (luksa… it’s Polish for “let’s kick some @$$”… okay, it’s an acronym we made up… get a printable reminder here)?

Wouldn’t both be more fun (and profitable)? How about just starting it at home?

If you’ve not seen it, here’s 1-minute from Nike that always gets to me (in a very, very good way).

(tbif: too bad it’s Friday… the last sales day of the week)

_____

Churchill on Being Relentless

Never give in, never give in,
never, never, never, never.”

–Sir Winston Churchill (1874–1965)
British prime minister during WWII

Email this quote

Sales resilience…

A couple bright sides to remember…

  1. Those gatekeepers keeping you from your prospects… You’ll love them once you’re on the other side and your competition comes calling. (Just be sure you’re continually qualifying your prospects – investing your effort only with the best possibilities… Get JustSell’s quick guide on qualifying).
  2. That deal you lose to a low-cost provider… Sometimes it can be more valuable in the long run. When the lowest priced product or service doesn’t meet the expectations of a customer, a deeper appreciation of the price/ value relationship is developed. This can create a new sales opportunity from what was initially lost – an opportunity for a much stronger business relationship than otherwise may have existed. (Make sure you keep your cool & kindness so you’re called if it happens.)

Here are 4 points to bouncing back.

_____

When you need a well-earned break (especially soccer fans), enjoy this example of perseverance and focus from the soccer world (1-minute). Amazing. A great one to pass along to your kids.