神刀安全网

Creating Geo-Enabled applications with MongoDB, GeoJSON and MySQL

Creating Geo-Enabled applications with MongoDB, GeoJSON and MySQL This blog post will discuss creating geo-enabled applications with MongoDB, GeoJSON and MySQL.

Introduction

Recently I published a blog post about the new GIS features in MySQL 5.7 . Today I’ve looked into how to use MongoDB (I’ve tested with 3.0 and 3.2, with 3.2 being much faster) for the same purpose. I will also talk about  GIS in MySQL and MongoDB atPercona Live next week (together with my colleagueMichael Benshoof).

MongoDB and GIS

MongoDB has a very useful feature called “ geoNear .” There are other MongoDB spatial functions available to calculate the distance on a sphere (like the Earth), i.e. $nearSphere , $centerSphere , $near – but all of them have restrictions. The most important one is that they do not support sharding. The  geoNear command in MongodDB, on the other hand, supports sharding. I will use geoNear in this post.

For this test, I exported Open Street Map data from MySQL to MongoDB (see the “ Creating GEO-enabled applications with MySQL 5.6 ” post for more details on how to load this data to MySQL).

  1. Export the data to JSON. In MySQL 5.7, we can use JSON_OBJECT to generate the JSON file:
    mysql> SELECT JSON_OBJECT('name', replace(name, '"', ''),        'other_tags', replace(other_tags, '"', ''), 'geometry', st_asgeojson(shape)) as j        FROM `points`        INTO OUTFILE '/var/lib/mysql-files/points.json';  Query OK, 13660667 rows affected (4 min 1.35 sec) 

  2. Use mongoimport   to import JSON into MongoDB (I’m using 24 threads, -j 24, to use parallel import):
    mongoimport --dbosm --collectionpoints -j 24 --file /var/lib/mysql-files/points.json  2016-04-11T22:38:10.029+0000    connectedto: localhost 2016-04-11T22:38:13.026+0000    [........................] osm.points  31.8 MB/2.2 GB (1.4%) 2016-04-11T22:38:16.026+0000    [........................] osm.points  31.8 MB/2.2 GB (1.4%) 2016-04-11T22:38:19.026+0000    [........................] osm.points  31.8 MB/2.2 GB (1.4%) … 2016-04-11T23:12:13.447+0000    [########################] osm.points  2.2 GB/2.2 GB (100.0%) 2016-04-11T23:12:15.614+0000    imported 13660667 documents 

  3. Create a 2d index:
    mongo > use osm switchedto dbosm > db.points.createIndex({ geometry : "2dsphere" } ) {         "createdCollectionAutomatically" : false,         "numIndexesBefore" : 1,         "numIndexesAfter" : 2,         "ok" : 1 } 

Another option would be using the osm2mongo Ruby script, which will convert the osm file and load it directly to MongoDB.

Now I can use the geoNear command to find all the restaurants near my location:

> db.runCommand( { geoNear: "points",                     near: { type: "Point" ,  coordinates: [ -78.9064543, 35.9975194 ]},                    spherical: true, ... query: { name: { $exists: true, $ne:null},                    "other_tags": { $in: [ /.*amenity=>restaurant.*/, /.*amenity=>cafe.*/ ] } },                    "limit": 5, "maxDistance": 10000 } ) {         "results" : [                 {                         "dis" : 127.30183814835166,                         "obj" : {                                 "_id" : ObjectId("570329164f45f7f0d66f8f13"),                                 "name" : "Pop's",                                 "geometry" : {                                         "type" : "Point",                                         "coordinates" : [                                                 -78.9071795,                                                 35.998501                                         ]                                 },                                 "other_tags" : "addr:city=>Durham,addr:country=>US,addr:housenumber=>605,addr:street=>West Main Street,amenity=>restaurant,building=>yes"                         }                 },                 {                         "dis" : 240.82201047521244,                         "obj" : {                                 "_id" : ObjectId("570329df4f45f7f0d68c16cb"),                                 "name" : "toast",                                 "geometry" : {                                         "type" : "Point",                                         "coordinates" : [                                                 -78.9039761,                                                 35.9967069                                         ]                                 },                                 "other_tags" : "addr:full=>345 West Main Street, Durham, NC 27701, US,amenity=>restaurant,website=>http://toast-fivepoints.com/"                         }                 }, ... } 

MongoDB 3.0 vs 3.2 with geoNear

MongoDB 3.2 features Geospatial Optimization :

MongoDB 3.2 introduces version 3 of 2dsphere indexes , which index GeoJSON geometries at a finer gradation. The new version improves performance of 2dsphere index queries over smaller regions. In addition, for both 2d indexes and 2dsphere indexes , the performance of geoNear queries has been improved for dense datasets.

I’ve tested the performance of the above geoNear query with MongoDB 3.0 and MongoDB 3.2 (both the old and new versions of 2dsphere index). All the results statistics are for a "limit" : 5 and "maxDistance" : 10000 .

MongoDB 3.0 , index version 2 :

> db.points.getIndexes() ...         {                 "v" : 1,                 "key" : {                         "geometry" : "2dsphere"                 },                 "name" : "geometry_2dsphere",                 "ns" : "osm.points",                 "2dsphereIndexVersion" : 2         } ]   "stats" : {         "nscanned" : 1728,         "objectsLoaded" : 1139,         "avgDistance" : 235.76379903759667,         "maxDistance" : 280.2681226202938,         "time" : 12 }, 

MongoDB 3.2 , index version 2 :

> db.points.getIndexes() [ ...         {                 "v" : 1,                 "key" : {                         "geometry" : "2dsphere"                 },                 "name" : "geometry_2dsphere",                 "ns" : "osm.points",                 "2dsphereIndexVersion" : 2         } ] ...   "stats" : {           "nscanned" : 513,           "objectsLoaded" : 535,           "avgDistance" : 235.76379903759667,           "maxDistance" : 280.2681226202938,           "time" : 5 }, 

What is interesting here is that even with the "2dsphereIndexVersion" : 2 , MongoDB 3.2 performs much faster and scans a much smaller number of documents.

MongoDB 3.2 , index version 3:

> db.points.dropIndex("geometry_2dsphere") { "nIndexesWas" : 2, "ok" : 1 } > db.points.createIndex({ geometry : "2dsphere" } ) {         "createdCollectionAutomatically" : false,         "numIndexesBefore" : 1,         "numIndexesAfter" : 2,         "ok" : 1 } > db.points.getIndexes() [         {                 "v" : 1,                 "key" : {                         "_id" : 1                 },                 "name" : "_id_",                 "ns" : "osm.points"         },         {                 "v" : 1,                 "key" : {                         "geometry" : "2dsphere"                 },                 "name" : "geometry_2dsphere",                 "ns" : "osm.points",                 "2dsphereIndexVersion" : 3         } ]   "stats" : {           "nscanned" : 144,           "objectsLoaded" : 101,           "avgDistance" : 235.76379903759667,           "maxDistance" : 280.2681226202938,           "time" : 1         }, 

That is significantly faster, 1ms for five results!

MySQL and GeoJSON revisited

To compare to the performance of the above query, I’ve created a similar query in MySQL. First of all, we will need to use the good old bounding rectangle (envelope) trick to only include the points in the 10 miles radius (or so). If we don’t, MySQL will not be able to use spatial (RTREE) index. I’ve created the following function to generate the envelope:

DELIMITER // CREATE DEFINER = current_user() FUNCTION create_envelope(lat decimal(20, 14), lon decimal(20, 14), dist int) RETURNS geometry DETERMINISTIC begin declare point_text varchar(255); declare l varchar(255); declare p geometry; declare env geometry; declare rlon1 double; declare rlon2 double; declare rlat1 double; declare rlat2 double;   set point_text = concat('POINT(', lon, ' ', lat, ')'); set p = ST_GeomFromText(point_text, 1); set rlon1 = lon-dist/abs(cos(radians(lat))*69); set rlon2 = lon+dist/abs(cos(radians(lat))*69); set rlat1 = lat-(dist/69); set rlat2 = lat+(dist/69); set l = concat('LineString(', rlon1, ' ', rlat1, ',', rlon2 , ' ', rlat2, ')'); set env= ST_Envelope(ST_GeomFromText(l, 1));  return env; end //   DELIMITER ;   mysql> set @lat= 35.9974043; Query OK, 0 rows affected (0.00 sec)   mysql> set @lon = -78.9045615; Query OK, 0 rows affected (0.00 sec)   mysql> select st_astext(create_envelope(@lat, @lon, 10)); +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | st_astext(create_envelope(@lat, @lon, 10)) | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | POLYGON((-79.08369589058249 35.852476764,-78.72542710941751 35.852476764,-78.72542710941751 36.142331836,-79.08369589058249 36.142331836,-79.08369589058249 35.852476764)) | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) 

Then we can use the following query (an update of theGeoJSON query from my previous post):

set @lat= 35.9974043; set @lon = -78.9045615; set @p = ST_GeomFromText(concat('POINT(', @lon, ' ', @lat, ')'), 1); set group_concat_max_len = 100000000;  SELECT CONCAT('{   "type": "FeatureCollection",   "features": [   ',   GROUP_CONCAT('{    "type": "Feature",       "geometry": ', ST_AsGeoJSON(shape), ',       "properties": {"distance":', st_distance_sphere(shape, @p) , ', "name":"', name , '"}    }' order by st_distance_sphere(shape, @p)),   '] }') as j FROM points_new WHERE st_within(shape, create_envelope(@lat, @lon, 10))   and (other_tags like '%"amenity"=>"cafe"%' or other_tags like '%"amenity"=>"restaurant"%')   and name is not null   and st_distance_sphere(shape, @p) < 1000; ... 1 row in set (0.04 sec) 

The time is slower: 40ms in MySQL compared to 1ms – 12ms in MongoDB. The box is AWS EC2 t2.medium.

To recap the difference between MongoDB geoNear and MySQL st_distance_sphere :

  • MongoDB geoNear  uses 2dsphere index, so it is fast; however, it can’t just calculate the distance between two arbitrary points
  • MySQL  st_distance_sphere is a helper function and will only calculate the distance between two points; it will not use an index – we will have to use the create_envelope  function to restrict the search so MySQL will use an index

Time-wise, this is not an apples to apples comparison as the query is quite different and uses a different technique.

Visualizing the results

Results for GeoJSON for Google Maps API:

{  "type": "FeatureCollection",  "features": [  {   "type": "Feature",     "geometry": {"type": "Point", "coordinates": [-78.9036457, 35.997125]},     "properties": {"distance":87.67869122893659, "name":"Pizzeria Toro"}   },{   "type": "Feature",     "geometry": {"type": "Point", "coordinates": [-78.9039761, 35.9967069]},     "properties": {"distance":93.80064086575564, "name":"toast"}   },{   "type": "Feature",     "geometry": {"type": "Point", "coordinates": [-78.9031929, 35.9962871]},     "properties": {"distance":174.8300018385443, "name":"Dame's Chicken and Waffles"}   }, ... } 

Now we can add those on a map:

Creating Geo-Enabled applications with MongoDB, GeoJSON and MySQL

Back to MongoDB: pluses and minuses

MongoDB uses Google’s S2 library to perform GIS calculations. The geoNear command is fast and easy to use for finding points of interests near you (which is the most common operation). However, full GIS support does not natively exist.

Another issue I came across when creating a 2dsphere index: MongoDB is very strict when checking the lines and polygons. For example:

> db.lines.createIndex({ geometry : "2dsphere" } ) {         "createdCollectionAutomatically" : false,         "numIndexesBefore" : 1,         "errmsg" : "exception: Can't extract geo keys: { _id: ObjectId('570308864f45f7f0d6dfbed2'), name: "75 North", geometry: { type: "LineString", coordinates: [ [ -85.808852, 41.245582 ], [ -85.808852, 41.245582 ] ] }, other_tags: "tiger:cfcc=>A41,tiger:county=>Kosciusko, IN,tiger:name_base=>75,tiger:name_direction_suffix=>N,tiger:reviewed=>no" }  GeoJSON LineString must have at least 2 vertices: [ [ -85.808852, 41.245582 ], [ -85.808852, 41.245582 ] ]",         "code" : 16755,         "ok" : 0 } 

MongoDB complains about this: type: “LineString”, coordinates: [ [ -85.808852, 41.245582 ], [ -85.808852, 41.245582 ] ]

This is a “bad” line string as the starting point and ending point are the same. I had to remove the bad data from my MongoDB imported dataset, which was tricky by itself. MongoDB (as opposed to MySQL) does not have a native way to compare the values inside the JSON, so I had to use $where construct – which is slow and acquires a global lock :

> db.lines.remove({"geometry.type": "LineString", "geometry.coordinates": {$size:2},  $where: "this.geometry.coordinates[0][0] ==  this.geometry.coordinates[1][0]       && this.geometry.coordinates[0][1] == this.geometry.coordinates[1][1]" })   WriteResult({ "nRemoved" : 22 }) 

After that, I was able to add the 2dsphere index.

Conclusion

MongoDB looks good, is pretty fast and easy for geo-proximity search queries – until you go outside of the one function and need full GIS support (which does not natively exist). It may be trickier to implement other GIS functions like st_contains or st_within .

,,

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Creating Geo-Enabled applications with MongoDB, GeoJSON and MySQL

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
分享按钮