D. A. Lavis, the Concept of Probability in Statistical Mechanics

Mar 11 2023 · LastMod: Mar 11 2023

URL. This is a review article discussing various formulations of the foundations of statistical mechanics but it looks more like an introduction to the idea of Jaynes. There are sacttered personal views about the matter which I do not state explicitly that these are my own thoughts.


Kinetic Theory of Gases

Recall the formula called Maxwell law for a gas of $N$ small, hard, perfectly elastic spheres: the number of spheres whose speed lies between $v$ and $v+dv$ is $$ f_1(v) dv = \frac{4N}{\alpha^2 \sqrt{\pi}}v^2 \exp\big(-\frac{v^2}{\alpha^2}\big). $$ The quantity is used to establish the $H$-function in the $H$-theorem: $$ H(t) = \int d^3 v f_1(v,t)\ln [f_1(v,t)]. $$ Dividing out the factor $N$, it is easy to see that $H(t)$ is nothing but the negative "entropy" in the velocity space, and the function $f_1(v,t)$ is equal to $N\rho_1(v,t)$ where $\rho_1(v,t)$ is the probability distribution of velocity over the velocity space.

Originally the quantity $f_1(v,t)$ was considered as the actual distribution of the $N$ particles of the gas. To see this, recall how it was derived. It was assumed that the $x,y,z$ velocities are independent from each other, thus $$ f(v_x,v_y,v_z) dv_x dv_y dv_z = f(v_x)dv_x f(v_y) dv_y f(v_z) dv_z $$ at the same time the distribution of velocities should be isotropic, $$ f(v_x,v_y,v_z) = f(v^2) = f(v_x^2 + v_y^2 +v_z^2) = f(v_x)f(v_y)f(v_z), $$ so that $$ \ln f(v_x^2 + v_y^2 + v_z^2) = \ln f(v_x) + \ln f(v_y) + \ln f(v_z). $$ A possible solution will be $$ f(v_i) = C e^{-B v_i^2}. $$ the constants are then determined by

  1. the normalization of the velocity distribution,
  2. the formula for the everage energy $\overline{\epsilon} = 3kT/2 = m\overline{v}^2/2$, where $\overline{v}^2$ is computed by

$$ \overline{v}^2 = \int_{-\infty}^{+\infty} v^2 \cdot f(v)d^3 v = 4\pi \int_0^\infty v^2 \cdot f(v) v^2 dv. $$

A technical problem for the kinetic theory of gases was reconciling the

  1. reversibility of mechanical laws and the irreversibility of natural proceses as described by the second law of thermodynamics,
  2. incompatibility of the second law of thermodynamics and the mechanical theory heat which is absed on a usually recurrent dynamical system (Poincare's recuurence theorem).

Since there was no probability involved and the kinetic theory was thought to be trying to describe what thermodynamic phenomena actually are, these two were actual problems.

Interpretation of Probability

The defence given by Boltzmann were some points that have become standard arguments in textbook nowadays:

  1. The number of moclues is large for macro physical systems, so the recurrence time would be extremely large.
  2. In practice a finite system is never completely isolated, so the recurrence doesn't apply.
  3. One needs a clear definition, for the dynamical system, of what is meant by a macrostate.
  4. The second law is a statistical law from the molecular point of view.

The immediate effect of the last point is the change of perception of the meaning of the quantity $f_1(v,t)$. From $\rho_1(v,t) = f_1(v,t)/N$ to $\rho_(v,p,t)$, the probability density function on the phase space $\Gamma$ is relatively a small step. Now probability is always accompanied by debates around its interpretations. In statistical mechanics, the presence of statistics is considered, usually, in three light: Ensemble, Ergodic, Probablistic.


The ensemble point of view. Regard the subject as a procedure for arriving at answers. Abandoning the attempt to follow the precise change in state which take place in a particular system and study the behaviour of a collection or ensemble of systems. This point of view doesn't really say anything about the interpretation of "probability" and doesn't explain why considering the ensemble is useful in making reasonable predictions. Since the subject of study is the ensemble and not a particular system, one can avoid philosphical speculations. It is really the shut up and calculate stance, similar to many worlds interpretation in QM.


This regard probabilistic ideas in the light of ergodic theory. Briefly speaking this says that the system travels through a path, starting from $(x_0,p_0)$, on the phase space $\Gamma$, over a period of time $\tau$, and the thermodynamics quantities $Q_T$ are the average $\overline{Q}$ of the corresponding mechanical phase functions $Q$ over the limit $\tau \to \infty$. Birkhoff showed that almost everywhere in $\gamma\subseteq\Gamma$ (a subset that is of finite volume, and is invariant under the Hamiltonian flow) the limit $\lim_{\tau\to\infty}\tilde{Q}(x_0,p_0) = \hat{Q}(x_0,p_0)$ exists, so if $\hat{Q}$ is a constant of motioon almost everywhre in $\gamma$, $\hat{Q} = \overline{Q}$. A system is said to be ergodic if for all phase functions $Q$ integrable over $\gamma$ we have $\hat{Q} = \overline{Q}$. The problem becomes whether the set of $\mu$-measure zero (excluded by "almost everywhere") can really be neglected: a measurement is never made starting at one of its points. This makes the meaning of the measure $\mu$ important, and one needs to assume some sort of interpretation for $\mu$, which is a subject of statistics and probability, unless one argues that the orgodic hypothesis which assumes that the path of the system passed through every point of $\gamma$ which cannot be true. The alternative hypothesis, quasi-ergodic hypothesis has not proved sufficient to establish ergodicity. There is another approach called metric transitivity.


Interpreting the probabilistic ideas as really probabilistic in essence, and take part in the real debates taking place inside the theory of probability. Frenquentist, Bayesian, etc.

A point I want to make is that, it should not be considered that "probabilities do not describe reality - only our information about reality - […]" as E. T. Jaynes puts it, since our information about reality is the only reality.

Equilibrium Statistical Mechanics

Despite the debates about the foundations, eqilibrium statistical mechanics works well. This is because the superstructure is based on a few agreed propositions:

  1. The fact that equilibrium corresponds to having a probability density function $\rho$ that is not an explicit function of time.
  2. The form for $\rho$ which should be used in given sets of physical circumstances.

An integral of the equation of motion is called an isolating constant of motion if it can be used to reduce the dimension of a set invariant under the flow, i.e. define a surface in $\Gamma$. If the energy given by the value of the Hamiltonian is the only isolating constant of motion, then the appropriate probability denstiy function is agreed to be obtained by applying equal probabilities to the points of an accessible region of phase space. The invariant set $\gamma$ is usually taken to be the shell $E < H(x,p) < E+\Delta E$ and induce the microcanonical distribution in the limit $\Delta E \to 0$. The canonical distribution is obtained using either the central limit theorem or the method of steepest descents. The procedures are asymptotically valid for systems with a large number of microsystems (subjectivists do not need this large number limit).

Hence the foundational problem for equilibrium statistical mechanics is to justify the use of the uniform distribution over an energy shell.

  1. Ergodic theory: If the Hamiltonian is the only isolating constant of motion, then the ergodic theory will almost give a justification (expect the nergy surface to be metrically transitive), but the problem of proving the non-existence of additional isolating constant of motion is difficult, and the existence of these will result in the form of thermodynamics significantly different from the standard one.
  2. Ensemble: the ensemble systems with all values of the other unknown isolating constants of motion are all included in consideration, so there is no problem with additional isolating constant of motion, but no justification for the use of the uniform distribution is given.
  3. Justifying the uniform distribution by showing that equilibrium arises in the long-time limit from non-equilibrium situations. A subdivision is given by 1. Brussels-Austin School (objectivist/Prigogine) 2. Maximum entropy method (subjectivist/information theoretic/Bayesian/Jaynes, etc.).

The Maximum Entropy Method

While the maximum entropy method is primarily that of non-equilibrium statistical mechanics, it has a form specifically for equilibrium.

The motto is that probability distribution should model the best prediction one can make of observable phenomena, absed on the information available. The fact that there are unknown constants of motion is irrelevant since making predictions is based on available information.

Conssider a system with discrete energy levels $\{E_i\}_i$. Then one asks the following question

What is the best probability distribution for the random variable $E$ based on the information available to us?

while the standard objectivist formulation would be

Given the physical environment of the system, what is the probability distribution for $E$?

Now, given an appropriate measure of uncertainty, if one choose the probability distribution that maximizes the uncertainty relative to the available information, then this will be the best probability distribution, since it assumes as little as possible. He shows that the unique measure of uncertainty satisfying some reasonable mathematical properties is Shannon's entropy $$ S_I(p_i) = -\sum_i p_i \ln (p_i). $$ Subject to the condition $\sum_i p_i = 1$, maximizing $S_I(p_i)$ would give the same result as that hypothesized by the objectivists the uniform distribution. The objectivist, on the other hand, suppose that there is a probability associated with an experiment to determine its state. Popperian will hypothesize that $p_i=1/n$; sequence of experiments will be used to see if the hypothesis is falsified. Frequentists will define the probability being $p_i=1/n$.

For general cases, the maximum entropy method for equilibrium is identical to what is described here (Information-theoretic interpretation of thermodynamics).

Conceptual Problem: What is it that has entropy?

One might reject that in the information theretic interpretation the entropy $S$ of the physical system itself doesn't exist, but what is the physical system itself? One can say that the entropy computed is the entropy $S_M$ of a model $M$ of the system, and, just like all other theoretical and physical quantities, the fact that one is computing $S_M$ doesn't mean that $S$ doesn't exist, but I can go on to ask do "actual" physical quantities exists, independent of one's knowledge? The problem with this (seemingly) subjectivist view is that one needs to clarify what's the object of the knowledge, i.e. in "independent of one's knowledge", knowledge of what? This should be a subject of the theory of meaning and logic.

Non-Equilibrium Statistical Mechanics

For the maximun entropy approach the treatment is nearly identical to the equilibrium one. The observable is time-dependent now, constraints should be updated after each maximization, and the rest is identical to what is described here (Information-theoretic interpretation of thermodynamics). A point to be made is that entropy has not been shown to be monotonically increasing, while $S(\rho(t_0)) \leq S(\rho(t))$ for a state starting in $\rho(t_0)$ always holds, since $S(\rho(t_0))$ is maximized by $S(\rho(t))$, but if a measurement is made in $t_1 > t_0$, then for every $t_2 > t_1$, $S(\rho(t_2))> S(\rho(t_1))$, since now the available information is updated.

The objectivists, on the otherhand, criticizes the approach taken by Boltzmann and considers chaotic dynamics as necessary condition for irreversible behaviours. The appearance of chaos here is interesting since classical chaos itself is not well-understood, and might itself be related to foundations of mathematics, in particular, the intuitionistic approach to reals.