Many people wish to emulate a "nearest dealer" or "closest store" function. Let's assume you have a Stores table like this:
CREATE TABLE dbo.Stores ( StoreID INT IDENTITY(1,1) PRIMARY KEY, Storename VARCHAR(32) NOT NULL, Zip CHAR(5) NOT NULL ) GO SET NOCOUNT ON INSERT Stores(Address, Zip) SELECT 'Adelphia Fireworks', '07710' INSERT Stores(Address, Zip) SELECT 'West Milford BottleRockets', '07480' INSERT Stores(Address, Zip) SELECT 'Explosives of Ogdensburg', '07439' INSERT Stores(Address, Zip) SELECT 'Pompton Cherry Bombs, Inc.', '07442' |
So, you can imagine a user will enter their Zip code on a web page, let's say it is 07608 (Teterboro, NJ). You can't necessarily determine which store is closest to their Zip based on the value of the Zip code alone, and a relational database doesn't have a very good facility for making such a comparison anyway. You could convert to an INT and take the absolute value of the lowest difference, but that is not a very scientific method at all, since zip codes do not represent equal parts of a grid across the nation.
The first thing you need to find is a high-quality database that contains all the ZIP codes, and correlates them to their coordinates via latitude and longitude. You get what you pay for, so if you find a free version, you will likely spend a long time cleaning it up. One example I found was horribly out of date, had missing/corrupt data, and some values were just plain wrong. It is used later in this article, and I cleaned it up as best as I could; if you want accuracy, you'll want to buy the premium version of one of these products:
ZipCodeWorld ($40)
Zip Code Download ($30)
Or search around for similar products from other vendors.
Now, why is it so important to have a database that contains latitude and longitude? Primarily because there are quick and dirty ways to determine the distance between two points, given their lat and long coordinates. The user-defined function (UDF) I built for SQL Server 2000 looks like this:
CREATE FUNCTION dbo.CalculateDistance ( @zip1 CHAR(5), @zip2 CHAR(5) ) RETURNS DECIMAL(12,3) AS BEGIN DECLARE @lat1 DECIMAL(10,6), @lon1 DECIMAL(10,6), @lat2 DECIMAL(10,6), @lon2 DECIMAL(10,6), @rads DECIMAL(10,8), @dist DECIMAL(12,3), @calc DECIMAL(10,8) SELECT @rads = 57.29577951, @lat1 = lat, @lon1 = long FROM Zips WHERE Zip = @zip1 SELECT @lat2 = lat, @lon2 = long FROM Zips WHERE Zip = @zip2 SELECT @lat1 = @lat1 / @rads, @lon1 = @lon1 / @rads, @lat2 = @lat2 / @rads, @lon2 = @lon2 / @rads IF @lat1 = @lat2 AND @lon1 = @lon2 SET @dist = 0.00 ELSE BEGIN SET @calc = SIN(@lat1) * SIN(@lat2) + COS(@lat1) * COS(@lat2) * COS(@lon1 - @lon2) IF (@calc) > 1.0 SET @calc = 1.0 SET @dist = 3963.1 * ACOS(@calc) END RETURN (@dist) END GO |
Go ahead and try it with some Zip codes you know are close together, and some you know are far apart:
SELECT dbo.CalculateDistance('02100', '02840') -- Boston, MA to Newport, RI SELECT dbo.CalculateDistance('02100', '90210') -- Boston, MA to Beverly Hills, CA |
Results:
The calculation is based on this discussion and isn't perfect, of course; it doesn't take into account the fact that Earth isn't a perfect sphere, and it also calculates distances as the crow flies. So the exact mileage won't always be 100% accurate, and occasionally you will find that a store directly across a 4-mile-long, 1-mile-wide lake is deemed closer than the store 1.5 miles the other way.
So now, we need a repository so that we can look up any ZIP in the US, and compare it to our stores. We're going to create a table called Zips:
CREATE TABLE dbo.Zips ( ZIP CHAR(5) PRIMARY KEY CLUSTERED, City VARCHAR(24) NOT NULL, State CHAR(2) NOT NULL, Lat DECIMAL(10,6) NOT NULL, Long DECIMAL(10,6) NOT NULL ) |
And then populate it with Zip codes. The format of the population script will look like this:
SET NOCOUNT ON INSERT Zips SELECT '26151', 'MOUNT ZION', 'WV', 38.877455, -81.119470 INSERT Zips SELECT '26152', 'MUNDAY', 'WV', 39.021423, -81.204050 INSERT Zips SELECT '26153', 'MURRAYSVILLE', 'WV', 39.059143, -81.771370 ... |
However, rather than post it all within the article (it's over 3 MB!), I decided to zip it up so you can download and extract it on your own machine. You can download it here:
zipsInsert.zip (~649K)
(The contained file was updated 2006-05-01 to also create the tables, function and procedure described in this article.)
Once it's in place, you can use a stored procedure like this. The initial query is used to find the closest value match for a Zip code if it is not in our Zips table (sometimes a user makes a typo, or sometimes the Zip code database is out of date or incorrect). Usually, this should still yield a relatively accurate measurement and ranking. Feel free to change TOP 3 to whatever number of matches you think might be relevant.
CREATE PROCEDURE dbo.FindNearestStores @zip CHAR(5) AS BEGIN SET NOCOUNT ON IF ISNUMERIC(@zip) = 1 BEGIN SELECT TOP 1 @zip = RIGHT('00000'+Zip, 5) FROM Zips WHERE Zip <= CONVERT(INT, @zip) ORDER BY Zip DESC IF @zip IS NOT NULL BEGIN SELECT TOP 3 Miles = dbo.CalculateDistance(Zip, @zip), Storename, Zip FROM Stores ORDER BY Miles, Storename END END END GO |
This should perform quite well as long as your Stores table is not ultra-huge. If you find performance is a problem because of the UDF against the stores table, you might want to add a WHERE clause:
| WHERE State = (SELECT State FROM Zips WHERE Zip = @zip) |
This will make the query only operate against a single state, rather than all ~40,000 rows.
Here is a sample execution and result set:
EXEC dbo.FindNearestStores '07608' -- results: MilesStorenameZip ------------------------------------- 15.461Pompton Cherry Bombs, Inc.07442 23.088West Milford BottleRockets07480 32.064Explosives of Ogdensburg07439 |
The distance is not calculated based on the numeric value of the zip code, it is just a label here. So the same technique will work for a system based on UK post codes, Canadian postal codes, etc. Provided you can obtain a database which maps the mailing code to a latitude and longitude, the only thing you will do differently is the name of the tables/columns.
[ Comment, Edit or Article Submission ]