Class Session 1: The Markov matrix and the exponential convergence
In this first class session, we study a core tool of Markov Chain Monte Carlo algorithm: the Markov matrix. It will allow you to
  • describe in a compact way the evolution of the configuration probabilities,
  • depict their basic features (time evolution, conservation of probability),
  • understand why the convergence to the steady state is in general exponential, and to evaluate the typical convergence time.
This approach is in fact very general: not only it will allow you to design dynamical rules which sample a given probability law πC (as you have seen, this question was one of the main topic of the first lecture), but it will also allow you to design and study physical system from their dynamical rules.

Don't hesitate to ask questions and make remarks on this wiki page.

A toy-example : the pebble game

We consider a simple system: the 3x3 pebble game, where one particle moves on a 3x3-chessboard without periodic boundary conditions. The position on the chessboard are denoted 1,2,...,9, the four neighbors of site 1 are (2,4,1,1), and similarly for the other sites.

peppe_fig.png
For this case we can explicitly write down the Markov matrix, which exactly solves the dynamic of the system. The conclusions that we will draw from this trivial example are however general and can be extended to more complex and not exactly-solvable cases.

Markov matrix

If we start at t=0 with the pebble at site 1, we know that at t=1 the pebble will be in sites 2 and 4 with probability 1/4 (which is the probability for the pebble to remain on site 1 at t=1?). In order to compute the exact probabilities at all time steps, it is convenient to introduce a vector:
π(t)= { π1(t),..., π9(t) }
containing the probabilities of finding the pebble at time t in the site 1, 2, ...9. The Monte Carlo algorithm is fully encoded in a Markov transfer matrix T, whose element Tba gives the probability p(a->b) for the pebble to move from site a to site b. What is the dimension of the matrix T in the 3x3 pebble game?
The following short code (similar to the one seen during Lecture 1) writes the Markov transfer matrix T for the 3x3 pebble game. Notice here that we have made use of the python package numpy, which (you will soon realize) is very comfortable and efficient for dealing with matrices, vectors and linear algebra in general.
import numpy
neighbor =  [[1, 3, 0, 0], [2, 4, 0, 1], [2, 5, 1, 2],
             [4, 6, 3, 0], [5, 7, 3, 1], [5, 8, 4, 2],
             [7, 6, 6, 3], [8, 7, 6, 4], [8, 8, 7, 5]]
transfer = numpy.zeros((9, 9))  # creates 9x9 matrix full of zeroes
for k in range(9):
    for neigh in range(4): transfer[neighbor[k][neigh], k] += 0.25
print transfer
Let's study it in detail and observe the matrix elements.
  1. Compute the sum of every column of T. What do you observe? What is your interpretation?
  2. How are then the probability configurations of the pebble system at time t=0, π(0), and t=1, π(1), related? Write π(t) as a function of the matrix T and of π(0) only.

The steady state of the pebble game: Eigenvalues of the Markov matrix


Let's apply recursively (100 times) the Markov matrix that we constructed above to the initial state where the pebble is in site 1, employing the following algorithm (pebble_transfer.py):
import numpy
 
neighbor =  [[1, 3, 0, 0], [2, 4, 0, 1], [2, 5, 1, 2],
             [4, 6, 3, 0], [5, 7, 3, 1], [5, 8, 4, 2],
             [7, 6, 6, 3], [8, 7, 6, 4], [8, 8, 7, 5]]
transfer = numpy.zeros((9, 9))
for k in range(9):
    for neigh in range(4): transfer[neighbor[k][neigh], k] += 0.25
position = numpy.zeros(9)
position[0] = 1.0
for t in range(100):
    print t,'  ',["%0.5f" % i for i in position]
    position = numpy.dot(transfer, position)
  • What do you observe about the final state? What does it mean in relation with the Markov matrix? Run the following modified program (pebble_transfer_eigen_pr.py)...
import numpy
 
neighbor =  [[1, 3, 0, 0], [2, 4, 0, 1], [2, 5, 1, 2],
             [4, 6, 3, 0], [5, 7, 3, 1], [5, 8, 4, 2],
             [7, 6, 6, 3], [8, 7, 6, 4], [8, 8, 7, 5]]
transfer = numpy.zeros((9, 9))
for k in range(9):
    for neigh in range(4): transfer[neighbor[k][neigh], k] += 0.25
eigenvalues, eigenvectors = numpy.linalg.eig(transfer)
#print eigenvalues___
for iter in range(9):
    print "eigenvalue"
    print eigenvalues[iter]
    print"eigenvector"
    print numpy.transpose(eigenvectors)[iter]
    print "=================="
  • Observe in details the eigenvalues and the eigenvectors. The maximum eigenvalue value is clearly 1. Give a physical interpretation of the corresponding eigenvector in terms of the Markov dynamics. Do you think that there may exist a situation for which the maximum eigenvalue of the Markov matrix is strictly larger than 1? Why?

Eigenvalues and Eigenvectors of the Markov matrix, general case

A reminder : the Markov Chain Monte Carlo algorithm

Consider a system described by a set of configurations C. Those could be the position of a particle (as in the pebble game), or the full set of possible positions of every particle in a multi-particle system, or the set of every possible state of an Ising model. We would like to sample a configuration C with a given probability πC. A solution for this problem is the Markov Chain Monte Carlo method. We define a dynamics over the ensemble of configurations C such that, in the long time limit, each configuration is visited with the correct probability πC. As discussed in the lecture, a Markov chain describes the evolution which is fully specified by the transition probabilities pC→C1 between configurations C and C1. The probability πC(t) of being at time t in configuration C evolves according to the following recursion relation : πC(t+1)= C1 πC1(t) pC1→C .
When πC(t) converges towards a steady probability distribution π st, also called the steady state, it thus has to verify that
π stC = C1 π stC1 pC1C (for all C)
but thanks the the property of probability conservation (remember the sum of column-elements in the T matrix for the pebble game), this also rewrites
C1 π stC pCC1 = C1 π stC1 pC1C (for all C)
This is called the global balance condition.
The detailed balance condition is more restrictive:
π stC pCC1 = π stC1 pC1C (for all C and C1)
and is a sufficient condition for the global balance condition to be verified.

We have identified a property that the steady state has to verify, but nothing guaranties yet that such distribution exists and that it is really the large-time limit of the probability distribution π(t). To answer those questions, let's use the Markov matrix and let's understand the the connection between global balance and Markov chain convergence more in detail.
  1. Rewrite the stochastic property of the matrix T (concerning probability conservation and the sum of columns), which we have tested for the 3x3 pebble game, in a vector form : show that it is equivalent to 1.T = 1 , where 1 is the vector full of ones.
  2. Is the matrix T symmetric, in general? Why? This implies that, for an eigenvalue λ, one must distinguish the left eigenvectors (such that V.T = λ V ) and the right eigenvectors (such that T.V = λ V ).
  3. What does the equality 1.T = 1 mean for the vector 1 and the matrix T? What does it implies for the right eigenvectors of T? Use the mathematical fact that to any left eigenvector there exists a right eigenvector with the same eigenvalue. (To justify this: note that the spectrum of T and the spectrum of its transpose are the same, and that a left eigenvector of T is a right eigenvector of the transpose of T).
  4. Show, using the results of previous paragraph (the conservation of probability), that no eigenvalues of T can be of modulus strictly larger than 1.

Let's now focus on the right eigenvector of T of eigenvalue λ1 = 1 and let's call it π1 .
  1. Write the eigenvalue equation explicitly, for the components of π1.
  2. Interpret it in terms of the global balance condition: what is the link between π1(C) and π st(C)?

To summmarize : the global balance condition is a necessary condition for the probability vector π(t ) to converge to a steady distribution (i.e. for the Markov Monte Carlo algorithm to converge), but it is not sufficient, as we shall see at the end of this class section.

Exponential convergence

The Markov matrix provides us not only the steady state but a complete set of information on the dynamics of the system. In particular it can tell us how much time it is needed typically for the Markov algorithm to reach its steady state. This information is essential if we want to extract from the Monte Carlo simulation genuine information about observables of the system. We once again consider the pebble game to present a clear example. Apply iteratively the Markov matrix to the difference between the initial and the steady vectors π(t=0) – π st using the following code (pebble_transfer_sub_2.py)
import numpy, pylab
 
neighbor =  [[1, 3, 0, 0], [2, 4, 0, 1], [2, 5, 1, 2],
             [4, 6, 3, 0], [5, 7, 3, 1], [5, 8, 4, 2],
             [7, 6, 6, 3], [8, 7, 6, 4], [8, 8, 7, 5]]
transfer = numpy.zeros((9, 9))
for k in range(9):
    for neigh in range(4): transfer[neighbor[k][neigh], k] += 0.25
position = numpy.zeros(9)
position[0] = 1.0
list_T = []
list_av = []
list_fit= []
for t in range(100):
    list_T.append( t )
    list_av.append( abs(position[1]- 1.0 / 9.0) )
    list_fit.append( numpy.power(0.75,t) )
    print t,'  ',["%0.5f" % abs(i- 1.0 / 9.0) for i in position]
    position = numpy.dot(transfer, position)
pylab.semilogy(list_T, list_av, 'bo-', clip_on=False)
pylab.semilogy(list_T, list_fit, 'ro', clip_on=False)
pylab.annotate('(0.75)$^t$', xy=(10, 0.1), xytext=(10, 0.1))
pylab.title('dynamic evolution of the probability vector component' )
pylab.xlabel('$T$', fontsize=16)
pylab.ylabel('$\pi_i-\pi_{std}$', fontsize=16)
#pylab.show()
#pylab.savefig('exp-decay.png' )
pylab.clf()
You observe that indeed all the components of the resulting vector go rapidly to zero, and that if we plot these values as a function of time on semi-log axis (un-comment the line pylab.show) the decay is exponential (i.e. is linear in the semi-log scale). Notice here that we have made use of the python package pylab, which allows to easily draw and display graphics. We can in fact show that the decay rate is well portrayed by the second largest eigenvalue (0.75)t, i.e., it decays exponentially as e-t, with τ = 1/ ln 0.75.

In order to understand why the second largest eigenvalue determines the time decaying rate, let's sort the other eigenvalues λ2, λ3, ... of T by decreasing modulus 1 > 2| ≥ |λ3| ≥ ... and let's call π2, π3, ... the corresponding eigenvectors.
  1. Decompose the initial condition π(0) on the basis of eigenvectors.
  2. From this, determine π(t) as a function of the λ2, λ3, ... and the π1, π2, π3, ...
  3. Determine the relation between λ2 and the convergence time to the steady state.

Perron-Frobenius theorem


We have see above that the global balance condition is a necessary condition for the probability vector π(t) to converge to the desired distribution (i.e. for the Markov Monte Carlo algorithm to converge). But It is not a sufficient condition. The Perron-Frobenius theorems give the conditions for the unicity of the steady state and for the convergence of π(t) to it.

Unicity

The dynamics is said to be irreducible if, equivalently,
- No change of basis based on permutations can transform T into a matrix which is diagonal by blocks.
- Or, more physically: every configuration can be reached from any other in a finite number of steps.
Then, if the matrix T is stochastic (i.e. it has entries ≥ 0 and the sum of its entries of every colum is equal to 1), there exists a unique right eigenvector π1 of eigenvalue equal to 1.
  1. Is the dynamics of the pebble game irreducible?
  2. Give a simple example of a Markov dynamics which is reducible (i.e. not irreducible), for example 2 disconnected pebble boards, and explain why a Markov Chain Monte Carlo would not always give the same result, depending on the initial state.
  3. Discuss the unicity of the maximum eigenvalue.

Aperiodicity

A second condition is required to ensure that the dynamics converges. It is called the aperiodicity.
The dynamics is called periodic if there exists an initial state π(0) such that, after a finite number of state, strictly larger than 1, one obtains the same state. The minimal number of those time steps is called the period of the state. The second part of Perron-Frobenius theorem states that if the Markov matrix is aperiodic (i.e. if it has no periodic states), then the dynamics converges to the steady state π st.
  1. Consider a Markov chain with a periodic state π(0) of period p. What does it imply for the spectrum of the matrix T p ?
  2. What does this imply for the spectrum of the matrix T?
  3. Show that, in general, π(t) cannot converge to πst .

Example of periodic Markov chains

Consider the example of a particle jumping between two sites with probability 1 to the other site at each time step, explain why the dynamics is periodic, and find the eigenvectors and eigenvalues of the Markov matrix. Comment.

print this page