神刀安全网

Hans-Juergen Schoenig: Beating Uber with a PostgreSQL prototype

The other day I got a link to an interesting post published by Uber, which has caught our attention here at Cybertec: https://eng.uber.com/go-geofence

The idea behind geo-fencing is to provide information about an area to users. Somebody might want to find a taxi near a certain location or somebody might simply want to order a Pizza from a nearby restaurant.

According to the blog post Uber has solved this problem using some hand-made GO code. Uber’s implementation: 95% <5ms.

Another Blog also had an eye on it:  https://medium.com/@buckhx/unwinding-uber-s-most-efficient-service-406413c5871d#.vdsg0fhoi

Of course, to a team of PostgreSQL professionals, 5 ms is quite a lot so we tried to do better than that.

Beating Uber? Here is how it works!

The first thing we need to compete is some test data. Some nice data can be found here:

https://github.com/buckhx/gofence-profiling/tree/master/nyc

Then the data can be loaded with psql nicely:

/set content `cat nyc_census_2010_tracts.geojson`  CREATE TEMPORARY TABLE geojson (data jsonb);  INSERT INTO geojson VALUES (:'content');  CREATE TABLE census_tracts (      id serial primary key,      ntaname text,      boroname text,      shape geometry);  INSERT INTO census_tracts (ntaname, boroname, shape) SELECT      geom->'properties'->>'NTAName',      geom->'properties'->>'BoroName',      st_geomfromgeojson(geom->>'geometry')  FROM geojson, LATERAL jsonb_array_elements(data->'features') geom;

The data can already be queried. The following query does what Uber tries to achieve:

SELECT *              FROM            census_tracts              WHERE          ST_Contains(shape, ST_Point(-73.9590, 40.7830));

However, to be fair:The sample set is not big enough yet. To increase the amount of data roughly to the size of Uber’s database, the following SQL can do the job:

CREATE TABLE tracts_100k (LIKE census_tracts INCLUDING ALL);  INSERT INTO tracts_100k (ntaname, boroname, shape)              SELECT ntaname || ' ' || offs::text, boroname || ' ' || offs::text,                          ST_Translate(shape, offs, 0)              FROM census_tracts,                          LATERAL generate_series(0,360,360/50) offs;

Checking latency

After applying some nice indexing we can already see, how well our query behaves:

<em>CREATE EXTENSION file_fdw;</em>  <em>CREATE SERVER files FOREIGN DATA WRAPPER file_fdw;</em>  <em>CREATE FOREIGN TABLE geobench (</em>  <em>            client int,         </em>  <em>            tx_no int,</em>  <em>            time int, file_no int, time_epoch int, time_us int)</em>  <em>SERVER files</em>  <em>OPTIONS (filename '/home/user/code/gis-build/gofence-profiling/nyc/pgbench_log.7665',</em>  <em>            format 'text', delimiter ' ');</em>  <em> </em>  <em>SELECT percentile_disc(ARRAY[0.5,0.95,0.99])</em>  <em>                        WITHIN GROUP (ORDER BY ROUND(time/1000.,3)) latency_ms</em>  <em>FROM geobench;</em>

The results are very promising. Our version is around 40 times faster than the Uber one. Clearly, Uber should consider using PostgreSQL instead of custom code. Given the fact that we invested around 30 minutes to get this done, even developing the business logic is faster with PostgreSQL.

Photo (c) Uber

Hans-Juergen Schoenig

Hans-Jürgen Schönig has 15 years of experience with PostgreSQL. He is consultant and CEO of the company „Cybertec Schönig & Schönig GmbH“ (www.postgresq-support.de), which has served countless customers around the globe.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Hans-Juergen Schoenig: Beating Uber with a PostgreSQL prototype

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址