This is the third part in a short series of blog posts about quantum Monte Carlo (QMC). The series is derived from an introductory lecture I gave on the subject at the University of Guelph.
Introduction to QMC – Part 3: Markov Chains and the Metropolis algorithm
So far in this series we have seen various examples of random sampling. Here we’ll look at a simple Python script that uses Markov chains and the Metropolis algorithm to randomly sample complicated two-dimensional probability distributions.
If you come from a math, statistics, or physics background you may have leaned that a Markov chain is a set of states that are sampled from a probability distribution .
More recently, they have been used to string together words and make pseudo-random sentences . In this case the state is defined by e.g. the current and previous words in the sentence and the next words is generated based on this “state”. We won’t be looking at this sort of thing today, but instead going back to where it all began.
In the early 1900’s a Soviet mathematician named Andrey Markov published a series of papers describing, in part, his method for randomly sampling probability distributions using a dependent data set. It was not always clear how this could be done, and some believed that the law of large numbers (and hence the central limit theorem ) would only apply to an independent data set. Among these disbelievers was another Soviet professor named Pavel Nekrasov and there’s an interesting story about a “rivalry” between Markov and Nekrasov. To quote Eugene Seneta (1996) :
“Nekrasov’s (1902) attempt to use mathematics and statistics in support of ‘free will’ … led Markov to construct a scheme of dependent random variables in his 1906 paper”
Markov layed out the rules for properly creating a chain, which is a series of states where each is connected, in sequence, according to a specific set of rules . The transition between states must be ergodic, therefore
- Any state can be achieved within a finite number of steps; this ensures that the entire configuration space is traversable
- There is a chance of staying in the same place when the system steps forward
- The average number of steps required to return to the current state is finite
These rules may not apply, as such, for the modern Markovian-chain pseudo random text generators discussed above. However for other applications (such as QMC) these are very important.
The stage was set, but Markov would never live to see his ideas applied to QMC. This was done in the 1950’s and paralleled the creation of the worlds first electronic computer. And speaking about that …
QMC requires a set of configurations distributed according to the probability distribution (i.e., sampled from the square of the wave function). A configuration is a vector that contains the positions (e.g., , , coordinates) of each particle.
Recall, for comparison, how we used the Galton board to sample the binomial distribution. For QMC the probability distributions , where is a “many-body” configuration vector, are much more complicated and samples can be produced using the Metropolis algorithm. This algorithm obeys the rules for creating a Markov chain and adds some (crucial) details. Namely, when transitioning from the current state to the next in the chain of configurations, we accept the move with probability:
where is the configuration of the current state and is the configuration of the next proposed state. The move itself involves shifting the location of some or (in practice) all of the particles and this is done randomly according to a transition rule . In my experience it’s usually the case that is equal to the opposite transition and therefore we can simplify the acceptance probability to:
In English, this means we take as the ratio of the square of the wave function evaluated at the proposed and current configurations, or we take as 1 if this ratio is larger than 1. If then we accept the move and if we do the following:
- Produce a random number from 0 to 1
- Accept the move if , otherwise reject
The average acceptance ratio is an important quantity for controlling and understanding simulations and it will depend on the “maximum move size” (which is the maximum distance each particle can be shifted in each coordinate for each move – the actual distance shifted will depend also on a random number). Usually a desirable acceptance ratio is 50%.
Metropolis Monte Carlo sampling with Python
Let’s look at a simple script for sampling two-dimensional probability distributions. If you’re familiar with Python then reading over the code should be a great way of solidifying / understanding the Metropolis algorithm as discussed above.
import numpy as np def Metroplis_algorithm(N, m, dr): ''' A Markov chain is constructed, using the Metropolis algorithm, that is comprised of samples of our probability density: psi(x,y). N - number of random moves to try m - will return a sample when i%m == 0 in the loop over N dr - maximum move size (if uniform), controls the acceptance ratio ''' # we'll want to return the average # acceptance ratio a_total = 0 # sample locations will be stored in a list samples =  # get the starting configuration # and sample probability distribution # we'll start at r=(0,0) r_prime = np.zeros(2) p_prime = psi(r_prime, r_prime) for i in range(N): # propose a random move: r'-> r r = r_prime + np.random.uniform(-dr,dr, size=2) p = psi(r, r) # calculate the acceptance ratio # for the proposed move a = min(1, p/p_prime) a_total += a # check for acceptance p_prime, r_prime = check_move(p_prime, p, r_prime, r) if i%m == 0: samples.append(r) return np.array(samples), a_total/N*100.0 def check_move(p_prime, p, r_prime, r): ''' The move will be accepted or rejected based on the ratio of p/p_prime and a random number. ''' if p/p_prime >= 1: # accept the move return p, r else: rand = np.random.uniform(0, 1) if p/p_prime + rand >= 1: # accept the move return p, r else: # reject the move return p_prime, r_prime
Here we are building one Markov chain by propagating a single “walker”. A walker is generally a configuration of particles, but in our case we are only worrying about one “particle” – our sampling location. In order to ensure that our samples are sufficiently well spread out, we only take one sample at every iterations. The probability distribution is called `psi` and it takes the positional arguments and . We’ll use this tricky combination of 2D (bivariate) Gaussians:
import matplotlib.mlab as mlab def psi(x, y): ''' Our probability density function is the addition of two 2D Gaussians with different shape. ''' g1 = mlab.bivariate_normal(x, y, 2.0, 2.0, -5, -5, 0) g2 = mlab.bivariate_normal(x, y, 0.5, 5.0, 10, 10, 0) return g1 + g2
Let’s see what happens when we run this script:
N, m, dr = 50000, 10, 3.5 samples, a = Metroplis_algorithm(N, m, dr)
We get the following samples (or something similar):
Because the first configuration in the Markov chain was defined as
, the algorithm pulled the walker into the nearest area of high probability and the other area was completely ignored! Perhaps this can be corrected by increasing the maximum move size for the “particle” at each iteration.
As can be seen, this doesn’t really work. The move size must be quite large to achieve the desired effect, and by this point the sampling quality has degraded substantially. Notice how the acceptance ratio is correlated to the quality of sampling, and changes in response to altering the average move size (which is equal to half of the maximum move size: ).
A better solution is to propagate multiple walkers (effectively building a set of Markov chains) and choosing the initial configurations randomly in the simulation area. This way, although walkers may still be trapped inside one of the two Gaussians, they will be more evenly distributed between the two. Below we can see the results of doing this for a large number of iterations.
In this case, because the one distribution is so skinny, its beneficial to reduce the move size (as seen in the right panel) even though the acceptance ratio becomes larger than 50%. The modified script for using more than one walker is included below:
def Metroplis_algorithm_walkers(N, m, walkers, dr): ''' A Markov chain is constructed, using the Metropolis algorithm, that is comprised of samples of our probability density: psi(x,y). N - number of random moves to try m - will return a samples when i%m == 0 in the loop over N walkers - number of unique Markov chains dr - maximum move size, controls the acceptance ratio ''' # we'll want to return the average # acceptance ratio a_total = 0 # sample locations will be stored in a list samples =  # get the starting configuration # and sample probability distribution # we'll start at a randomly # selected position for each walker r_prime = [np.random.uniform(-10, 15, size=2) for w in range(walkers)] p_prime = [psi(r_prime[w], r_prime[w]) for w in range(walkers)] # initialize lists r = [np.zeros(2) for w in range(walkers)] p = [np.zeros(1) for w in range(walkers)] for i in range(N): for w in range(walkers): # propose a random move: r'-> r r[w] = r_prime[w] + np.random.uniform(-dr,dr, size=2) p[w] = psi(r[w], r[w]) # calculate the acceptance ratio # for the proposed move a = min(1, p[w]/p_prime[w]) # update the total a_total += a # check for acceptance p_prime[w], r_prime[w] = check_move(p_prime[w], p[w], r_prime[w], r[w]) if i%m == 0: samples.append(r[w]) return np.array(samples), a_total/N/walkers*100.0 def check_move(p_prime, p, r_prime, r): ''' The move will be accepted or rejected based on the ratio of p/p_prime and a random number. ''' if p/p_prime >= 1: # accept the move return p, r else: rand = np.random.uniform(0, 1) if p/p_prime + rand >= 1: # accept the move return p, r else: # reject the move return p_prime, r_prime
I thought it would also be fun to plot the distributions in 3D so I modified code from a matplotlib contour plot example and produced this:
The samples can be seen as blue dots at the base of the distributions.
I’ve included the modified code here:
import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D from matplotlib import cm fig = plt.figure(figsize=(14,14)) ax = fig.gca(projection='3d') # ax = fig.add_subplot(111, projection='3d') x1, x2 = -15, 15 y1, y2 = -15, 30 # set up a meshgrid - like labeling (x,y) coordinates # for each vertex on a piece of graph paper dx = 0.1 pad = 5 x = np.arange(x1, x2, dx) y = np.arange(y1, y2, dx) X, Y = np.meshgrid(x, y) # define Z as the value of the probability # distribution q at each 'vertex' # Z becomes a 2D Numpy array Z = psi(X, Y) # plot ax.plot_wireframe(X,Y,Z, rstride=5, cstride=7, color='r', alpha=0.7) ax.scatter(samples[:, 0], samples[:, 1], color='b', s=0.2) # make it pretty (as found in Axes3D.contour documentation) # cset = ax.contour(X, Y, Z, zdir='z', offset=-100, cmap=cm.coolwarm) cset = ax.contour(X, Y, Z, zdir='y', offset=y2, cmap=cm.coolwarm) cset = ax.contour(X, Y, Z, zdir='x', offset=x1, cmap=cm.coolwarm) # define the limits ax.set_xlabel('x', labelpad=15, fontsize=15) ax.set_xlim(x1, x2) ax.set_ylabel('y', labelpad=15, fontsize=15) ax.set_ylim(y1, y2) ax.set_zlabel('psi(x,y)', labelpad=15, fontsize=15) ax.set_zlim(0, 0.06) # ax.view_init(elev=20, azim=-45) plt.savefig('pretty_plot_metropolis_sampling.png', bbox_inches='tight', dpi=144) plt.show()
The 2D plots we’ve been looking at were produced using this code:
import matplotlib.pyplot as plt def plot_samples(samples, psi, limits=): ''' Plot the results of our Monte Carlo sampling along with the underlying probability distribution psi. ''' # set up a meshgrid - like labeling (x,y) # coordinates for each vertex on a piece # of graph paper dx = 0.1 pad = 5 if limits: xlow, xhigh = limits, limits ylow, yhigh = limits, limits else: xlow = np.min(samples)-pad xhigh = np.max(samples)+pad ylow = np.min(samples)-pad yhigh = np.max(samples)+pad x = np.arange(xlow, xhigh, dx) y = np.arange(ylow, yhigh, dx) X, Y = np.meshgrid(x, y) # define Z as the value of the probability # distribution psi at each 'vertex' # Z becomes a 2D Numpy array Z = psi(X, Y) # must be feeding in numpy arrays below plt.scatter(samples[:, 0], samples[:, 1], alpha=0.5, s=1) CS = plt.contour(X, Y, Z, 10) plt.clabel(CS, inline=1, fontsize=10) plt.xlim(xlow, xhigh) plt.ylim(ylow, yhigh) plt.xlabel('x', fontsize=20) plt.ylabel('y', fontsize=20) plt.tick_params(axis='both', which='major', labelsize=15)
Thanks for reading! You can find the entire ipython notebook document here . If you would like to discuss any of the plots or have any questions or corrections, please write a comment. You are also welcome to email me at firstname.lastname@example.org or tweet me @ agalea91
 – A great Python example of pseudo-random text generation can be found here .
 – These rules can be found on page 58 of my MSc thesis .