Waybackpack is a command-line tool that lets you download the entire Wayback Machine archive for a given URL.
For instance, to download every copy of the Department of Labor’s homepage before 1997, you’d run:
# Create a directory where you'll store the saved pages. mkdir ~/Downloads/dol-dot-gov # Download all the dol.gov copies to that directory waybackpack dol.gov -d ~/Downloads/dol-dot-gov --end 1997
Or, just to print the URLs of all archived snapshots:
waybackpack dol.gov --list
pip install waybackpack
waybackpack [-h] (-d DIR | --list) [--original] [--root ROOT] [--prefix PREFIX] [--suffix SUFFIX] [--start START] [--end END] [--quiet] url positional arguments: url The URL of the resource you want to download. optional arguments: -h, --help show this help message and exit -d DIR, --dir DIR Directory to save the files. --list Instead of downloading the files, only print the list of snapshots. --original Fetch file in its original state, without snappshotted images/CSS/JS. --root ROOT The root URL from which to serve snapshotted resources. Default: 'https://web.archive.org' --prefix PREFIX Prefix to prepend to saved files. Defaults to the URL, with all non-alphanumeric characters replaced with hyphens. --suffix SUFFIX Suffix to append to saved files. Defaults to the file extension of the URL you're downloading. --start START Timestamp-string indicating the earliest snapshot to download. Should take the format YYYYMMDDhhss, though you can omit as many of the trailing digits as you like. E.g., '201501' is valid. --end END Timestamp-string indicating the latest snapshot to download. Should take the format YYYYMMDDhhss, though you can omit as many of the trailing digits as you like. E.g., '201604' is valid. --quiet Don't log progress to stderr.
Waypackback is written in dependency-less Python, and should work wherever Python works. Should be compatible with both Python 2 and Python 3.
Many thanks to the following users for catching bugs and/or proposing fixes:
转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » Waybackpack: download the entire Wayback Machine archive for a given URL