I recently came across the website for Pepperstone, a forex broker. They give away FX tick data for 15 currency pairs. The data is stored in compressed CSV files, one file per month per currency pair. They include “fractional pip spreads in millisecond detail,” with timestamps in GMT. This seems to be really good data, and I was surprised to find it for free. New files arrive monthly, with about a two-month lag.
The only problem is the hassle of downloading such a large number of files: 1,215 zip-files, totaling 21GB. So, I wrote a Python script to handle it. The script should be saved in the download folder. It parses html to find the download links. It doesn’t download any files already present in the target folder, so it is suitable for running updates.
import urllib, os, urllib2, re, ntpath folder = r'E:/FX' localFiles = os.listdir(folder) print 'already downloaded ' + str(len(localFiles)) + ' files' # find all available files url = 'https://pepperstone.com/en/client-resources/historical-tick-data' req = urllib2.Request(url) page = urllib2.urlopen(req) html = page.read() html = re.sub(r'[/n/r]+', '', html) anchor_pattern = re.compile(r'http://www.truefx.com/dev/data/.*?/.zip') anchors = anchor_pattern.findall(html) # determine the missing files remoteFiles =  for anchor in anchors: filename = ntpath.basename(anchor) if not filename in localFiles: remoteFiles.append(anchor) # download the missing files print 'downloading ' + str(len(remoteFiles)) + ' new files' os.chdir(folder) for remoteFile in remoteFiles: try: filename = ntpath.basename(remoteFile) urllib.urlretrieve(remoteFile, filename) except: print 'error with file: ' + filename print 'finished with: ' + filename
转载本站任何文章请注明：转载至神刀安全网，谢谢神刀安全网 » A python script to download 7 years of FX tick data for 15 currency pairs