
Cournot and the frequentist's secret
This post contains the content of a talk, titled “Cournot and the frequentist’s secret”, that I gave for the Statistics Discussion Group at the Faculty of Bioscience Engineering (UGent). This topic arose in a very natural way during my research on conformal prediction, since two of the most influential researchers in the field of conformal prediction are the very same people that have written most of the recent papers on Cournot’s principle (Vladimir Vovk and Glenn Shafer).
The structure of the talk was as follows:
- A little history of probability theory.
- What is Cournot’s principle about?
- What are its different forms?
- How did people interpret this principle?
- Can this be applied to time series?
Our story starts in the early 18th century with (Jacob) Bernoulli and De Moivre. At that time, probabilities were defined in a very frequentist manner, i.e. as the ratio of ‘equally likely’ cases to the total number of cases. So, given an event for which we want to calculate the probability, choose any representative case $a\in\mathcal{X}$, where $\mathcal{X}$ is the set of all cases. Then
\[P(a) := \frac{|\{b\in\mathcal{X}\mid b\sim a\}|}{|\mathcal{X}|}\,,\]where $\sim$ denotes the equivalence relation of ‘equally likely’. In fact, this definition is already sufficient to prove some very important results such as the law of total probability, also called the addition theorem, and the law of compound probability, also called the multiplication theorem. Another important result that could be proved using this ‘crude’ definition is the law of large numbers.1 This theorem states that, given a sufficiently long sequence of observations, the observed frequency of an event approaches the true probability of the event with sufficiently high probability. Bernoulli’s interpretation of this result was that this ‘sufficiently high probability’ equates to a moral certainty and, consequently, frequencies can be used as proxies for true probabilities. This interpretation is the first instance of Cournot’s principle.
During the 18th century, Bernoulli’s interpretation of (very) high probabilities led to a lot of discussion in academic circles. d’Alembert (1760) looked at the other extreme, that of events of very small probability. Consider a fair coin toss, where one side is as likely to be obtained as the other. The event of obtaining 100 heads in a row is mathematically very small (the probability is $0.5^{100}\,$) but is still possible in theory. However, d’Alembert wondered whether it is actually physically possible at all. Shortly thereafter (1777), Buffon argued that the distinction between moral and physical certainty is only a matter of order of magnitude: $99.99\%$ might be deemed morally certain, $99.999999\%$ might be deemed physically certain.
In the 19th century, with the advent of geometric probability and statistical physics, the distinction between moral and physical certainty got even more relevant. For example, Boltzmann’s theory of statistical mechanics explains irreversible processes through the principle of entropy maximization, in particular, transitions from states with high entropy to states with low entropy have a vanishingly small probability of occurrence. Another example is the recurrence theorem by Poincaré, which states that almost all isolated systems will return to a state that is arbitrarily close to their initial state. ‘Almost all’ should be read as the states for which this does not happen occur with vanishingly small probability. Such situations were captured by the following statement by Cournot (1843):
… The physically impossible event is therefore the one that has infinitely small probability, and only this remarsk gives substance — objective and phenomenal value2 — to the theory of mathematical probability.
In the 20th century, after the second world war, people started using this principle for events with small, but not vanishingly small, probability. There were two important schools in probability theory: the Russian school (Chuprov, Markov, …) and the French school (Borel, Hadamard, Fréchet, Lévy, …). The prevailing opinion, as introduced by Markov, was that the statistical practicioner had the responsibility of determining the right threshold for the principle to apply. Borel introduced a hierarchy of such thresholds for science (refining Buffon’s opinion):3
- Human scale: $p\leq10^{-6}$,
- Terrestrial scale: $p\leq10^{-15}$, and
- Cosmic scale: $p\leq10^{-50}$.
Hadamard explained that probability theory can be founded in two main rules:
- Perfect equivalence (formalizing the notion of ‘equally likely’ as used by Bernoulli and De Moivre).
- Cournot’s principle.
The former is a mathematical notion and cannot be checked in practice. We usually have to relax the perfect equivalence relation to some approximate notion (we have to coarse grain the physical system). It still suffices to work out a theory of probability. However, it only exists at the mathematical/metaphysical level. It is Cournot’s principle — this was emphasized more strongly by Hadamard’s student Lévy — that connects probability theory to our world! The relation being given by Bernoulli’s law of large numbers.4 Although it was deemed fundamental, it only received its current name in 1940 when Fréchet coined the term Cournot’s principle. (Before that it was known as Cournot’s lemma in the Russian school and the only/fundamental law of chance as introduced by Borel).
Aside from coining its current name, Fréchet also made a distinction between two forms of Cournot’s principle. The strong form says that an event with (vanishingly) small probability, singled out in advance, will not happen. The weak form, on the other hand, says that such an event will only happen very rarely in repeated trials. It is the strong form that combines with the law of large numbers to lead to the frequentist definition of probabilities. The weak law only applies that frequencies ‘usually’ give a good estimate of the true probabilities. This weak form was called the general principle of chance by Castelnuovo.
Whereas the French and Russian schools embraced the principle, the English and German schools did not share the enthusiasm. The British were less interested in the formal mathematical underpinnings of the theory and only seemed to care about the application of the (frequentist) theory to statistics. On the other hand, the German researchers were rather skeptical of Cournot’s principle. Although they agreed with the idea that practicioners should fix a threshold for when to deem an event improbable, the philosophers refuted the general idea. von Kries argues that combining Cournot’s principle with Bernoulli’s law of large numbers is nonsensical. The principle was simply d’Alembert’s mistake! Objective probabilities might exist, but only when some conditions are met. To express these conditions, von Kries introduced the notion of a Spielraum, which is the range or set of possible (initial) conditions that are relevant to the events of interest. From the point of view of modern probability theory, this would be taken as the sample space $\Omega$. However, von Kries required three extra conditions:
- Indifference: The different conditions in the Spielräume are equally probable. One condition is as likely as the other and we have no logical preference for any alternative over another.
- Originality (or naturality): Spielräume are invariant when taking into account their individual histories, i.e. there exists no prior, more fundamental Spielräume from which they can be deduced.
- Comparability: If there is a unique and nonarbitrary way to decompose a Speilraum into its equiprobable alternatives.
The Spielraum principle then states that probabilities can be calculated in the Bernoulli–De Moivre way if and only if these conditions hold. Although this school of thought mainly remained confined to Germany, John Maynard Keynes did introduce it to the English school, where it did have some impact and might have influenced other philosophers such as Wittgenstein.
In the first half of the 20th century, the French dominance in probability theory was finally lost to the Russian school. Kolmogorov adopted a similar point of view as Hadamard, where probability is based on two foundational principles. For Kolmogorov these where frequentism and Cournot’s principle (in its strong form). He also showed great interest in the debates about foundational issues, but given the political environment, he tried to keep philosophical statements brief, an approach that was later also adopted by the French school through the Bourbaki movement.
The ideas of Kolmogorov where also picked up in the United States by Doob, who extended the theory of measurable spaces to the setting of stochastic processes, i.e. sequences of random variables. The main remaining issue here is that we cannot use the frequentist approach of repeated trials. For example, for any time series, such as daily precipitation measurements or daily stock market indexes, we only observe one trial (as we cannot go back in time and start a new observation).
A possible solution is given by the observation that Kolmogorov did not need the frequentist assumption in his framework since, as we have seen above, Cournot’s principle combined with repeated trials automatically gives this result (the law of large numbers). The principle itself, however, does not require repeated trials and can be applied to a single observation. We can test a hypothetical probability measure over the space of sample paths/histories, by calculating the (a priori) probability of observing a given path. If this probability is (very) small, Cournot’s principle tells us that this probability measure is unlikely as the given event will not be observed.
- Shafer, Glenn. (2016). From Cournot’s principle to market efficiency. http://www.probabilityandfinance.com.
- Shafer, Glenn, and Vovk, Vladimir. (2006). The sources of Kolmogorov’s Grundbegriffe. Statistical Science 21: 70–98.
- Zabel, Sandy. (2016). Johannes von Kries’s “Principien”: A Brief Guide for the Perplexed. Journal for General Philosophy of Science / Zeitschrift für allgemeine Wissenschaftstheorie 47: 131–150.
-
This terminology was introduced by Poisson. This first instance of the theorem was, however, only valid for binary variables (i.e. Bernoulli trials). ↩
-
This is a reference to Kant. ↩
-
Note that these probabilities are much smaller than the thresholds for $p$-values that are being used in contemporary science (even in the notoriously strict field of particle physics). ↩
-
The true probability can be seen as an objective/physical quantity that is measured through the frequency of observed events. ↩