Posts Tagged ‘geohash’

Geohashing using matrices in Ruby

Monday, June 9th, 2008

After reading this article about geohashing, I thought I’d share an alternative implementation I have which uses a matrix to translate coordinates to and from a string.

People are generally scared of geolocation, but that’s because they’re overthinking it. Forget all that stuff about mapping onto a sphere - a set of coordinates are just a point on a 2D plane with dimensions 360*360. The fact that that plane is then mapped onto a sphere (well, a spheroid) is something that 99.9% of developers don’t have to worry about. All you care about most of the time is whether location X is near location Y.

So, since we have these vectors on a plane of known bounds, if we want to represent some coordinates as a single string, all we have to do is map them onto a very simple matrix.

Happily, our plane is of dimensions 360*360 - and we have 36 unique characters to play with! So what we’ll do is divide the plane into squares, then sucessively divide out the coordinates as we increase resolution on the plane. That way we reduce the vector into a string, keep the resolution, and retain the ability to compare to other nearby locations. We can also, of course, extract them out again later.

Encoding

Let’s start with the point 39.286534 -76.613558, like the article linked above.

 = [39.286534, -76.613558]

First, we need to normalise the numbers. Add 180 to both

 = 
.map! {|e| e =  e + 180}
=> [219.286534, 103.386442]

Now, our matrix. We have 36 characters in base36, so that is 36 squares we can have, 6 on each side. So we will start cutting the number at 60 degrees intervals.

First we need a matrix. We are just going to reuse this again and again, so we only need one. Dimensions 6*6 for a total of 36 squares.

require 'matrix'
 = Matrix.rows([
["0", "1", "2", "3", "4", "5"], 
["6", "7", "8", "9", "A", "B"], 
["C", "D", "E", "F", "G", "H"], 
["I", "J", "K", "L", "M", "N"], 
["O", "P", "Q", "R", "S", "T"], 
["U", "V", "W", "X", "Y", "Z"]
])

I had some code that generated that, but I can’t find it .. so just declare it manually. You’ll need to require ‘matrix’. If we were really serious we would do things a little different but we’re focussing on simplicity here and don’t want to start messing around with origins, etc.

>> [0,0]
=> "0"
>> [0,1]
=> "1"
>> [1,1]
=> "7"
>> [2,1]
=> "D"
>> [5,5]
=> "Z"

looking good.

First slice

So! The first cut.

slice = 60.0
first = .map {|e| (e / slice).to_i}
=> [3, 1]

Note we will cast to integer to get rid of the remainder, which we don’t care about. All we want is to find the first digit.

which is …

code = [first[0],first[1]]
=> "J"

Hooray! Our first character. Note that Ruby’s matrix class is a bit strange and won’t accept the straight array, you need to give two arguments .. not very nice but that’s the way it is.

Now, onto our next step .. but we’re forgetting something! Can you see what it is?

We need to reduce the coordinates in preparation for the next cut. But how to do that without affecting the digits afterwards?

Easy - with modulo!


=> [219.286534, 103.386442]
.map! {|e| e =  e % slice}
=> [39.286534, 43.386442]

Note the decimals are unchanged. We’ve simply removed the multiples of 60 we extracted for the first character of the code.

2nd slice

We have now placed our coordinates inside a 60*60 square. We have a 6*6 matrix to describe the next step. So, slice size is .. 10 degrees.

slice = 10.0
 
def process
  result = .map {|e| (e / slice).to_i}
  code << [result[0],result[1]]
  .map! {|e| e =  e % slice}
end
 
=> "JM"
=> [9.286534, 3.386442]

3rd slice

We are now in a 10*10 box, and are faced with a choice. Shall we continued with the 6-themed divisions, or go to 5? It’s up to you. If you choose 6, slicing will be easier, but the degree cuts will be horrible decimals. If you go to five, your slices become worth more after the 2nd character, your slice code will be more complex .. but the resolution of the slices will look nicer.

I’m going to go with 6 anyway.

slice = slice / 6 = 1.66666666666667
 
=> "JMW"
=> [0.953200666666684, 0.0531086666666689]

Hm. We have another problem. Our divisions are adding significant digits to our points. That’s bad, since if we continue our result will be more accurate - wrongly - than the data we fed in. In fact, it will be infinitely accurate, erroneously!

We need to measure the significant digits of our input points.

back to the original:

 = [219.286534, 103.386442]
 
sig_digits = .map {|e| e = e.abs.to_s.sub(/^.{1,3}\W/, '').size}.sort.last
=> 6
limit = 1.0/10 ** sig_digits

Hm. That’s not quite right, but will do for now. We will terminate when the slice size goes below that.

It’s getting tiresome typing all this out. Let’s combine it into a function:

def geocode(point_array)
  sig_digits = point_array.map {|e| e = e.abs.to_s.sub(/^.{1,3}\W/, '').size}.sort.last
   = 1.0/10 ** sig_digits
   = 360.0
   = ''
   = point_array
  .map! {|e| e =  e + 180}
  while  > 
     =  / 6
    result = .map {|e| (e / ).to_i}
     << [result[0],result[1]] if !result.include?(nil)
    .map! {|e| e =  e % }
  end
  return 
end
 
geocode([39.286534, -76.613558])
=> JMWIDIN7AM1

Aww. Our first geocode. How about trying something with less resolution?

geocode([39.2, -76.6])
=> JMWI1

Cool. As you can see, we keep the same code for as many characters we have resolution for.

Decoding

Now how about extraction?

I’ll just paste it in:

def extract(geocode)
  sig_digits = geocode.length
   = 360.0
   = [0.0,0.0]
  working = geocode.split(//)
  working.each do |c|
     =  / 6
    int = c.to_i(36)
    matcol = int % 6
    matrow = (int / 6).to_i
    [0] += matrow * 
    [1] += matcol * 
  end
  .map! do |r|
    left = r.to_s.split('.')[0].size + 1 # 1 extra for decimal point ...
    local_sigdigits = sig_digits - left
    r.round(local_sigdigits)
  end
  .map! {|e| e =  e - 180}
  return 
end
 
class Float # thanks Rails
  alias_method :round_without_precision, :round 
    def round(precision = nil) 
      precision = precision.to_i 
      precision > 0 ? (self*(10**precision)).round/(10**precision).to_f : round_without_precision 
    end 
end

Notice we don’t even have to touch the matrix getting the numbers back out. Since the rows and columns correspond to known multiples in a known order, we can just get the multiples out by dividing/modulo’ing by 6. Cool, huh?

What’s not so cool is my significant digit estimation there, which is, uh, wrong. I’ll fix it later.

So, does it work?

geo1 = [39.286534, -76.613558]
puts 'point 1: ' + geo1.inspect
code1 =  geocode(geo1)
puts 'code 1: ' + code1
excode1 = extract code1
puts 'extracted point 1: ' + excode1.inspect
puts
geo2 = [39.2, -76.6]
puts 'point 2: ' + geo2.inspect
code2 = geocode(geo2)
puts  'code 2: ' + code2
excode2 = extract code2
puts 'extracted point 2: ' + excode2.inspect
 
point 1: [39.286534, -76.613558]
code 1: JMWIDIN7AM1
extracted point 1: [39.2865334, -76.6135583]
 
point 2: [39.2, -76.6]
code 2: JMWI1
extracted point 2: [39.2, -76.6]

Cool. I need to fix that significant digit error, but it’s almost 4am and I can’t seem to get my brain around it right now ;-) Any suggestions for proper estimates welcome.

So what use can we make of a hashed set of coordinates, apart from being able to put them in a URL? Well, it’s now simple to estimate proximity.

One degree of latitude/longitude is equal to about 111km. So, if two places have the same first letter, they’re within 60*111km of each other. Second letter is 10*111km, 3rd is 1.67*111km, and so on. Unfortunately, that only works up to a point - places can be “next to each other” laterally but skip up 6 if they’re “next to” each other longitudinally. To get around that problem, we need to move to a 3D matrix - I have an implementation, but this post is already too long and 3D matrices are pretty hard to talk about. Anyway, with this system you can determine at a glance if two places are in roughly the same place, with a decent estimate of their general proximity.

It is definitely possible to write a function that will return the approximate distance between any two geohashed locations, but I will leave that as an exercise for the reader for now. Hint: look up the matrix coordinates using the divide/modulo trick from the extract function. The distance is just a vector * slice * km away.

Anyway, there you have it: my method of geohashing in Ruby. Totally incompatible with the first method, and I’ve probably made some horrible mistakes - but I like its method of operation; every successive character is directly mapped with increasing resolution on a consistent plane. I like the concinnity of the concept, hope you do too!