神刀安全网

Waybackpack: download the entire Wayback Machine archive for a given URL

waybackpack

Waybackpack is a command-line tool that lets you download the entire Wayback Machine archive for a given URL.

For instance, to download every copy of the Department of Labor’s homepage before 1997, you’d run:

# Create a directory where you'll store the saved pages. mkdir ~/Downloads/dol-dot-gov  # Download all the dol.gov copies to that directory waybackpack dol.gov -d ~/Downloads/dol-dot-gov --end 1997

Or, just to print the URLs of all archived snapshots:

waybackpack dol.gov --list

Installation

pip install waybackpack 

Usage

waybackpack [-h] (-d DIR | --list) [--original] [--root ROOT]               [--prefix PREFIX] [--suffix SUFFIX] [--start START] [--end END]               [--quiet]               url  positional arguments:   url                The URL of the resource you want to download.  optional arguments:   -h, --help         show this help message and exit   -d DIR, --dir DIR  Directory to save the files.   --list             Instead of downloading the files, only print the list of                      snapshots.   --original         Fetch file in its original state, without snappshotted                      images/CSS/JS.   --root ROOT        The root URL from which to serve snapshotted resources.                      Default: 'https://web.archive.org'   --prefix PREFIX    Prefix to prepend to saved files. Defaults to the URL,                      with all non-alphanumeric characters replaced with                      hyphens.   --suffix SUFFIX    Suffix to append to saved files. Defaults to the file                      extension of the URL you're downloading.   --start START      Timestamp-string indicating the earliest snapshot to                      download. Should take the format YYYYMMDDhhss, though you                      can omit as many of the trailing digits as you like.                      E.g., '201501' is valid.   --end END          Timestamp-string indicating the latest snapshot to                      download. Should take the format YYYYMMDDhhss, though you                      can omit as many of the trailing digits as you like.                      E.g., '201604' is valid.   --quiet            Don't log progress to stderr. 

Support

Waypackback is written in dependency-less Python, and should work wherever Python works. Should be compatible with both Python 2 and Python 3.

Thanks

Many thanks to the following users for catching bugs and/or proposing fixes:

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Waybackpack: download the entire Wayback Machine archive for a given URL

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址