神刀安全网

Tutorial: Web scraping and mapping breweries with import.io and R

Getting information on craft breweries can be a difficult process, with data dispersed over multiple websites or in formats unusable for analysis.

Import.io instantly turns webpages into data ready for analysis with minimal or no setup. This previous tutorial highlights the process in detail.

This method works well for extracting metadata on craft breweries, and overall beer ratings from the popular site Beer Advocate which is an online community which supports beer education, events and a forum to rate beers.

Getting started

Difficulty level: Intermediate.

This tutorial will make sense if you already know how to use R or have gone through these previous walkthroughs:

You’ll need the import.io app and R installed.

Getting the data from the website

1. Go to BeerAdvocate.com and navigate to the search for a place (Connecticut) state , then onto the breweries link .

Tutorial: Web scraping and mapping breweries with import.io and R

Tutorial: Web scraping and mapping breweries with import.io and R

Tutorial: Web scraping and mapping breweries with import.io and R

2. We need to determine the URL structure because of the pagination on Beer advocate so we can be sure we’re scraping more than one page of the results.

Tutorial: Web scraping and mapping breweries with import.io and R

Luckily enough this is fairly simple to do by clicking on each of the results links (ie. 1-20, 21-40, 41-60).

Tutorial: Web scraping and mapping breweries with import.io and R

Here are the necessary links:

3. After initially setting up an account on import.io, which can be done through linking your Github account, you can navigate to your my data page and input the previous links into the bulk extractor located in the “How would you like to use this API?” dropdown for your Magic API and press the button to run the queries.

Tutorial: Web scraping and mapping breweries with import.io and R

4. The new output page will be a tabular view of all of the extracted link data ready for export in multiple formats such as Spreadsheet (for CSV), HTML, and JSON. We will download the Spreadsheet format for this tutorial.

Tutorial: Web scraping and mapping breweries with import.io and R

Now that we have the data we can take it into R to process and visualize it.

Data pre-processing

5. Read the data into R:

6. So we have some links and columns filled with more than one piece of information, but that’s easy to fix by removing duplicate columns, creating coherent column headers, specifying and unifying missing data and extracting the necessary information from other columns.

Geolocation of addresses for plotting on a map

7. Grab some latitudes and longitudes for those craft brewery addresses.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Tutorial: Web scraping and mapping breweries with import.io and R

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
分享按钮