Want to help @openaddr derive a parcel dataset for the US? I know of funding for a quick development project.
Many people who work at Postlight are the kinds of people who sit up straight in their chair when someone says “derive a parcel data set.” So I pinged Michal and was introduced to Waldo Jaquith , another helpful Internet person and Director of U.S. Open Data . Then, as Waldo wrote :
U.S. Open Data is a long-time supporter of the Open Addresses project, a volunteer-run project that aggregates government-published address datasets to create a global repository of the coordinates of street addresses. Anecdotally, project volunteers had noticed that a fair number of the data sources contained not just the latitude and longitude of an address, but the boundaries of the parcel. That raised the question of how many of the indexed 257 million addresses might include boundary data that was going unused. Could we have accidentally collected millions of cadastral records?
So we hiredPostlight to figure it out for us. Developer Bryan Bickford spent a little over a week creating a Python-based tool to find and extract parcel data from OpenAddresses’ records .
Bryan’s work gave us a hard number: of the 1,511 data sources ingested by OpenAddresses, 383 include parcel boundaries (or 25%). There are a total of 30,461,769 parcels included…
I realize that there are a lot of numbers in the paragraph above. The gist is that a nice person used some money from The Shuttleworth Foundation to hire us to to look at some freely available data, and we found the pattern they suggested we’d find, and as a result the coordinates for 30 million parcels of land were added to the global geographic commons. Cool!
There are massive sets of data out there, floating around, released into the wild by the government, NGOs, and other kinds of organizations. They contain wonders and mysteries. All it takes is time and a little funding for them to share their secrets.
What’s exciting is that right now is that while some days the world seems to be going to hell in a mechanically-made basket, we’re in this good moment over the lifespan of Moore’s Law—lots of data to explore, relatively fast bandwidth to download it, fast processors to process it, big hard drives to store it, frameworks for collaborating, and tons of tools left over from the orgiastic explosion of “big data” interest throughout the tech industry.
The expensive part is still programmers, designers, and product managers which—well, that’s Postlight’s business, so thank God. But once a programmer does something once they can put it on GitHub and never do it the first time again, which is also what happened here.
What can you do with millions of geographic parcels? You can map with them! The open mapping scene is wild right now. Not long ago I noticed lots of weird people started to leave their museum and publishing jobs to go work at Mapzen . (The person who wrote that first tweet is the VP of Mapzen.)
Mapzen is a very large open-sourced mapping stack—like Google Maps but you can download all the data (it’s funded by Samsung). There’s also another mapping tool called Mapbox , which mixes together open and proprietary data. The two platforms use a lot of OpenStreetMap data. You can use Mapzen data inside Mapbox . All these people go to the same conferences. They want to work together but also to make their own awesome things. As a result there’s an enormous amount of geographic data coming online, in increasingly useful formats, and the people bringing it online are getting salaries and making hard choices about how to bundle things up for future users, not just putting it all out there with good intentions.