Geo Spatial Indexing with MongoDB


MongoDB is another NoSQL database that seems to have rising popularity.  Recently, I was evaluating NoSQL databases  for a project. I was planning to use it for storing and managing vast amount of Hadoop post processed data for future queries and audit purpose.  While I really liked the architecture of Cassandra, I was not happy with it’s limited querying and indexing capabilities. This is where MongoDB shines.

Quick MongoDB Review

MongoDB seems to have been designed with application development  in mind with it’s strong query and indexing capabilities. Here is a quick summary of MongoDB features.

  • MongoDB is a document database. A document has set of fields and embedded documents. Essentially  a document is a multi level map, whereas a Cassandra row with super column is a  two level map. Embedded document fields are referenced by a dotted notation e.g., blog.post. comments
  • Documents are stored in a collection, which is comparable to a Cassandra column family. Although MongoDb is schema less, it is expected that similar documents will be stored in the same collection
  • Data is stored in BSON, which is binary serializable and a super set of JSON. Unlike Cassandra, MongoDB is data type aware. Every document is uniquely identified by a system generated id, which acts like a primary key.
  • Indexes are automatically generated for the id field. Indexes on other fields or fields of embedded documents can be created as needed. Composite index on multiple fields can also be created.
  • Query language is  JSON like. It has the familiar feel of RDBMS query with select fields and where clause conditionals. Query results are returned in a cursor like structure.
  • While updating either the whole document or selected fields may be updated. For update and delete, where clause conditionals can be specified
  • Replication is important for backup and read scaling. MongoDB replication is based on a master slave architecture and asynchronous log shipping, very much like many RDBMS. In Cassandra, all nodes play equal role and there is no such thing as a master.
  • Sharding or partitioning provides write scalability. In MongoDB any field of the document can be used for partitioning. MongoDB does ordered, range based partitioning with the user specified field. In contrast, Cassandra partitioning  is based on row key only and by default it’s hash partitioned.

Location based services

What really caught my attention is the geo spatial indexing feature of MongoDB, which is really great for location based applications.

Any location data that is stored as a latitude and longitude pair, can be indexed in MongoDB.  Once the index is is place, location based queries like finding locations near a given location or the locations within a perimeter can be easily performed.

A location based service can be used to find the location of a person or an object. A common example of a location based application is  locating near by businesses  for mobile phone users  e.g., finding the nearest Italian restaurant near  the user’s current location.

Another usage is asset tracking e.g. as in transportation industry. In case of asset tracking, GPS devices are attached to moving assets being tracked. A location based service is also useful for mobile advertisement, where advertisements can be targeted based on a mobile phone user’s current location.

In one of my past projects, I had worked on a location based application for a GPS device. I decided to take MongoDB for test drive to see how it can be used for a simple  asset tracking application.

Asset Tracking with Geo Location Indexing

In our simple example application, messages arrive from  GPS devices. The message has the following components.

ESN Latitude Longitude Time
38471649 -40.6863 73.9302 2010:12:06 23:40:16
38471649 -40.6945 74.2815 2010:12:06 23:41:38

ESN stands for Equipment Serial Number, which uniquely identifies the device sending the message. The message also has the latitude, longitude and the time when the message was sent.

MongoDB provides drivers in different language. I used Ruby and wrote a simple class with methods to  save messages and perform location based queries. There are also ORM products based on the basic MongoDB ruby driver. Here I am using the basic driver.


require 'rubygems'
require 'mongo'

class Tracker
 def initialize
 @db = Mongo::Connection.new.db("test")
 @coll = @db.collection("trackerData")
 end

 #add geo spatial index
 def create_loc_index
 @coll.create_index([["loc", Mongo::GEO2D]])
 end

 #add time index
 def create_time_index
 @coll.create_index([["time", 1]])
 end

 #add composite location and time index
 def create_loc_time_index
 @coll.create_index([["loc", Mongo::GEO2D], ["time", 1]])
 end

 #drop all indexes
 def drop_indexes
 @coll.drop_indexes
 end

 #add new message
 def add_msg(esn, lat, long, time)
 doc = {"esn" => esn, "loc" => {"lat" => lat, "long" => long}, "time" => time}
 @coll.insert(doc)
 end

 #find all messages
 def find_all
 @coll.find().each { |doc| puts doc.inspect }
 end

 #find messages with location near given location
 def find_near(lat, long, num)
 @coll.find({"loc" => {"$near" => [lat, long]}}, {:limit => num}).
 sort(["$natural",-1]).each do |m|
 puts m.inspect
 end
 end

 #find messages with location within a given rectangle
 def find_within(ll_lat, ll_long, ur_lat, ur_long)
 lower_left = [ll_lat, ll_long]
 upper_right = [ur_lat, ur_long]
 box = [lower_left,upper_right]
 @coll.find({"loc" => {"$within" => {"$box" => box}}}).each do |m|
 puts m.inspect
 end
 end

 #find messages with location within a given rectangle and
 #within a time window
 def find_within_loc_time(ll_lat, ll_long, ur_lat, ur_long,
 beg_time, end_time)
 lower_left = [ll_lat, ll_long]
 upper_right = [ur_lat, ur_long]
 box = [lower_left,upper_right]
 @coll.find({"loc" => {"$within" => {"$box" => box}},
 "time" => {"$gt" => beg_time, "$lte" => end_time}}).each do |m|
 puts m.inspect
 end
 end

 #find recent messages after given time
 def find_history(time)
 @coll.find("time" => {"$gt" => time}).each do |m|
 puts m.inspect
 end
 end

end

I am using two simple indexes. The method create_loc_index creates a location based index. The method create_time_index creates a time based index.  These methods need to be called only once. The method add_msg saves a new message.

The method find_near, finds the locations near a given location. For asset tracking, this could be used to find all the devices along with locations and time near a given location.  If assets being tracked are expected to be near some target location at some given time, this query could be useful. In that case the query should have included a time window. In plain english, the query would have read like show all devices near xxx where time is greater than yyy and time is less than zzz.

The method find_within returns all the the devices with a specified rectangle defined with two pairs of latitude and longitude. For asset tracking, this query is useful to find out if the devices are within a target perimeter in a  given time window. Again time window should have been included as an additional query parameter.

The method find_within_loc_time takes beg_time and end_time as additional arguments and precisely does that. It uses the composite index based on location and time and created through the method create_loc_time_index. Here is a sample output from this query.

{"_id"=>BSON::ObjectId('4d198c63170b770aa2000010'),
"loc"=>{"lat"=>-141.34474, "long"=>79.67471},
"esn"=>38437642, "time"=>Sat Dec 04 07:21:09 UTC 2010}
{"_id"=>BSON::ObjectId('4d198c63170b770aa200000f'),
"loc"=>{"lat"=>-142.0858, "long"=>80.6207},
"esn"=>38437642, "time"=>Sat Dec 04 07:20:00 UTC 2010}
{"_id"=>BSON::ObjectId('4d198c63170b770aa2000013'),
"loc"=>{"lat"=>-139.15378, "long"=>76.87787},
"esn"=>38437642, "time"=>Sat Dec 04 07:24:33 UTC 2010}
{"_id"=>BSON::ObjectId('4d198c63170b770aa2000014'),
"loc"=>{"lat"=>-138.25162, "long"=>75.72623},
"esn"=>38437642, "time"=>Sat Dec 04 07:25:57 UTC 2010}
{"_id"=>BSON::ObjectId('4d198c63170b770aa2000011'),
"loc"=>{"lat"=>-140.5822, "long"=>78.7013},
"esn"=>38437642, "time"=>Sat Dec 04 07:22:20 UTC 2010}
{"_id"=>BSON::ObjectId('4d198c63170b770aa2000012'),
"loc"=>{"lat"=>-139.81966, "long"=>77.72789},
"esn"=>38437642, "time"=>Sat Dec 04 07:23:31 UTC 2010}

If the composite index is created, it’s not necessary to create the other two simple indexes mentioned earlier. Queries based only on location or time will still be able to make use of the composite index. Finally, find_history returns all the past messages after a specified time.

Final thoughts

I found MongoDB to be an easier and gentler transition to the NoSQL world.  Technically and architecturally, it may not as elegant and sound as Cassandra and Hbase.

But I liked the pragmatic and developer friendly approach of MongoDB. MongoDb documentation can be found in the MongoDB site. This is great post if you want a quick introduction to MongoDB

Advertisements

About Pranab

I am Pranab Ghosh, a software professional in the San Francisco Bay area. I manipulate bits and bytes for the good of living beings and the planet. I have worked with myriad of technologies and platforms in various business domains for early stage startups, large corporations and anything in between. I am an active blogger and open source project owner. I am passionate about technology and green and sustainable living. My technical interest areas are Big Data, Distributed Processing, NOSQL databases, Machine Learning and Programming languages. I am fascinated by problems that don't have neat closed form solution.
This entry was posted in MongoDB, NOSQL, Ruby and tagged , , . Bookmark the permalink.

6 Responses to Geo Spatial Indexing with MongoDB

  1. varun says:

    How about adding a a full flexed support of spatial indexing of other complex shapes, eg Polygon, multi polygon rectangles etc… basically an inbuilt RTree like implementation can do the job !!!!

  2. Pingback: Presence Data Analytic using MongoDb and Map Reduce | Mawazo

  3. Pingback: vier's me2day

  4. In your code you have:

    “lat”=>-140.5822

    A latitude on earth must be between -90 and 90 ( http://en.wikipedia.org/wiki/Latitude)

    It is -90 on the south pole, and 90 on the north pole. A lat of -140 would be off the face of the earth. I think you may have your lat/lon reversed.

  5. Pingback: Notes from Where Camp Boston 2011 | GeoNotes

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s