神刀安全网

shots: pull down the entire Internet into a single animated gif

shots

pull down the entire Internet into a single animated gif.

description

by leveraging waybackpack — a python program that pulls down the entire Wayback Machine archive for a given URL — shots goes one step further, by grabbing screenshots out of each of the archived pages, filtering out visually similar pages and blank pages, and ultimately creating a filmstrip of the website over time, as well as an animated gif that shows how the website evolved over time.

sample

shots: pull down the entire Internet into a single animated gif

install

pip install waybackpack npm i shots -S

usage

import shots from 'shots';  shots({   dest: 'resources/shots',   site: 'amazon.com' });

the shots function will return a Promise that’ll resolve once an animated gif of the site’s history, along with a side-by-side static filmstrip image are generated in resources/shots/output as 1024x768.gif and 1024x768.png respectively.

you can specify different options.

api

the shots api is exported as a single shots(options) function that returns a Promise . its options are outlined below.

options

there are several options , described next.

options.dest

directory used to store all wayback machine archive pages, their screenshots, the diffs between those screenshots, and your glorious output gifs. defaults to a temporary directory.

note that you’ll get that path back from the shots promise, e.g:

shots().then(dest => {   // ... })

options.concurrency

concurrency level used throughout the lib. determines how many screenshots are being taken at any given time, or how many diffs are being computed, etc.

options.pageres

options merged with defaults shown below and passed to pageres . only 9999x9999 -formatted sizes are supported (e.g: don’t use 'iphone 5s' ) .

{   "crop": true,   "scale": 1,   "sizes": ["1024x768"] }

options.sites

a site (or any url, really) that you want to work with. can also be an array of sites.

options.site

alias for options.sites .

options.tolerance

number between 0 and 100 where 100 means every screenshot will be considered different, whereas 0 means every screenshot will be considered the same. only "duplicate" screenshots (within the tolerated range) will be used when building the gif and filmstrip image.

steps

note that shots has a long runtime, due to the nature of the task it performs. be prepared to wait a few minutes until the gif is finally written to disk.

the following steps happen in series. the tasks in each step are executed concurrently where possible.

  • runs waybackpack for every provided options.site , starting at the last timestamp that can be found in the ${dest}/pages directory to save time
  • takes screenshots of every archive page, except for pages we have existing screenshots for at ${dest}/screenshots
  • computes difference between every screenshot and the previous ones
    • screenshots considered to be the same according to tolerance are discarded
    • screenshots considered to be noise (e.g: failed page loads) are discarded
  • creates the filmstrip
  • creates the gif

debugging and logging

if you want to print debugging statements, shots uses debug , so you can do DEBUG=shots node app and you’ll see tons of debug information pop into your screen.

license

mit

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » shots: pull down the entire Internet into a single animated gif

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址