The Expected Maximum of Independent and Identically Distributed Geometric Random Variables

In my last post I derived the formula for the expected maximum of n independent and identically distributed (iid) exponential random variables.  If each random variable has rate parameter \lambda, then the expected maximum is \frac{1}{\lambda} \sum_{i=1}^n \frac{1}{i} = \frac{H_n}{\lambda}, where H_n is the nth harmonic number.

This post considers the discrete version of the same problem: the expected maximum of n independent and identically distributed geometric random variables.  Unfortunately, there is no known nice, closed-form expression for the expected maximum in the discrete case.  However, the answer for the corresponding exponential random variables turns out to be a good approximation.

Let X_1, X_2, \ldots X_n be n iid geometric(p) random variables.  Let T_n denote the maximum of the X_i variables.  Let q = 1- p.  Let the rate parameter for the corresponding exponential distribution be \lambda = - \log(1-p).  Now, we have

\displaystyle E[T_n] = \sum_{k=0}^{\infty} P(T_n > k) = \sum_{k=0}^{\infty} (1 - P(T_n \leq k))
\displaystyle = \sum_{k=0}^{\infty} (1 - P(X_1 \leq k) P(X_2 \leq k) \cdots P(X_n \leq k)) = \sum_{k=0}^{\infty} (1 - (1-q^k)^n),

as the cumulative distribution function of a geometric(p) random variable is 1 - (1-p)^k.

By viewing the infinite sum as right- and left-hand Riemann sum approximations of the corresponding integral we obtain

\displaystyle \int_0^{\infty} (1 - (1 - q^x)^n) dx \leq E[T_n] \leq 1 + \int_0^{\infty} (1 - (1 - q^x)^n) dx.

Now, let’s take a closer look at the integral that occurs on both sides of this inequality. With the variable switch u = 1 - q^x we have

\displaystyle \int_0^{\infty} (1 - (1 - q^x)^n) dx = -\frac{1}{\log q} \int_0^1 \frac{1 - u^n}{1-u} du = -\frac{1}{\log q} \int_0^1 \left(1 + u + \cdots + u^{n-1}\right) du
\displaystyle = -\frac{1}{\log q} \left(1 + \frac{1}{2} + \cdots + \frac{1}{n}\right) = -\frac{1}{\log q} H_n,
proving the fairly tight bounds

\displaystyle \frac{1}{\lambda} H_n \leq E[T_n] \leq 1 + \frac{1}{\lambda} H_n.

We can obtain a more precise approximation by using the Euler-Maclaurin summation formula for approximating a sum by an integral.  Up to a first-order error term, it says that

\displaystyle E_n = \sum_{k=0}^{\infty} (1 - (1-q^k)^n) \approx \int_0^{\infty} (1 - (1 - q^x)^n) dx + \frac{1}{2},
yielding the approximation

\displaystyle E_n \approx -\frac{1}{\log q} H_n + \frac{1}{2},

with error term given by
\displaystyle \int_0^{\infty} n (\log q) q^x (1 - q^x)^{n-1} \left(x - \lfloor x \rfloor - \frac{1}{2}\right) dx.
One can verify that this is quite small unless n is also small or q is extreme.

See also Bennett Eisenberg’s paper “On the expectation of the maximum of IID geometric random variables,” Statistics and Probability Letters 78 (2008), 135-143.

Posted in probability | Leave a comment

The Expected Maximum of Independent and Identically Distributed Exponential Random Variables

Finding the expected maximum of independent and identically distributed (iid) exponentially distributed random variables is a standard calculation in an undergraduate course in probability theory.  This post goes through the details of that calculation.  (My next post will address the more difficult question of the expected maximum of iid geometric random variables.)

There are three properties of exponential random variables that we need.

  1. Exponentially distributed random variables are memoryless.  Formally, if X is exponentially distributed, then P(X > t+s | X > s) = P(X > t).  Less formally, suppose that the lifetime of an electronic component is exponentially distributed.  The memoryless property says that the probability that the component will last at least t units longer, given that the component has lasted s units of time, is independent of the value of s.
  2. The minimum of n exponentially distributed random variables with rate parameters \lambda_1, \lambda_2, \ldots, \lambda_n is itself exponentially distributed with rate parameter \sum_{i=1}^n \lambda_i.  (For a proof, see the Wikipedia page on the exponential distribution.)
  3. The exponential distribution is a continuous distribution.  (This is a very basic property of the exponential distribution.  However, the geometric distribution has the first two properties as well, yet its expected maximum is more complicated.)

Now, suppose we have n iid exponential(\lambda) random variables, and let T_i be the ith smallest of these.  Since T_1 is exponential(n \lambda), E[T_1] = \frac{1}{n \lambda}.

Since exponential random variables are continuous, the probability that any two of the n random variables have the same value is 0.  Thus the n-1 other random variables all have values larger than T_1.  However, the memoryless property says knowledge of T_1 essentially “resets” the values of the other random variables, so that the time between T_1 and T_2 is the same (distributionally) as the time until the first of n-1 iid exponential(\lambda) random variables takes on a value.  Thus E[T_2 - T_1] = \frac{1}{(n-1) \lambda}.  

Applying the same logic, we get that, for 1 \leq i \leq n-1, E[T_{i+1} - T_i] = \frac{1}{(n-i) \lambda}.  Thus the expected value of the maximum of the n random variables is

\displaystyle E[T_n] = E\left[T_1 + \sum_{i=1}^{n-1} (T_{i+1}-T_i) \right] = \sum_{i=0}^{n-1} \frac{1}{(n-i) \lambda} = \sum_{i=1}^n \frac{1}{i \lambda} = \frac{H_n}{\lambda},

where H_n is the nth harmonic number.

Posted in probability | Leave a comment

A Combinatorial Proof of a Stirling Number Formula

One of the fundamental properties of the (unsigned) Stirling numbers of the first kind is that they can be used to convert from rising factorial powers to ordinary powers via the formula \displaystyle x^{\overline{n}} = \sum_{k=0}^n \left[ n \atop k \right] x^k.  This post gives a combinatorial proof of this formula.

Suppose we can color the numbers in a permutation according to cycle (i.e., numbers in the same cycle are the same color). We can color two different cycles the same color. If there are n elements to permute and x colors, how many of these cycle-colored permutations are there?

One way to count them is to condition on the number of cycles. There are \left[ n \atop k \right] ways to choose a permutation on [n] with k cycles, and there are x^k ways to color the k cycles once chosen. So the answer is \displaystyle \sum_{k=0}^n \left[ n \atop k \right] x^k.

Another way to count them is to consider the elements 1, 2, \ldots, n in turn.  First, color the number 1 in x ways.  Then the number 2 can either start a new colored cycle, in x ways, or go after 1 in 1′s cycle, in 1 way. So there are x + 1 choices for the number 2.  The number 3 can start a new colored cycle, in x ways, or go immediately after 1 or 2 in their cycles, in 2 ways. So there are x + 2 choices for the number 3.  In general, there are there are x+k-1 choices for where to place the number k.  Thus there are x(x+1) \cdots (x+n-1) = x^{\overline{n}} choices for placing the n elements in order to create a cycle-colored permutation.

This argument assumes that x is a positive integer.  To extend the proof for all real numbers, note that the formula \displaystyle x^{\overline{n}} = \sum_{k=0}^n \left[ n \atop k \right] x^k is a polynomial in x of degree n.  Since we have shown that this expression holds for infinitely many values of x (the positive integers), it must hold for all real values of x.

(A variant of this argument appears in the second edition of Volume 1 of Richard Stanley’s Enumerative Combinatorics, pp. 33-34.)

Posted in combinatorics | Leave a comment

The Catalan Numbers from Their Generating Function

Deriving the expression \displaystyle C_n = \frac{1}{n+1} \binom{2n}{n} for the Catalan numbers from the Catalan recurrence \displaystyle C_{n+1} = \sum_{k=0}^n C_k C_{n-k} is a classic exercise in generating function manipulation, although the details are sometimes omitted in textbooks (and on the Wikipedia page).  This post fills in the details.

The Catalan numbers are the solution to many combinatorial problems.  See, for example, this selection from Vol. II, Chapter 6, of Stanley’s Enumerative Combinatorics [1], as well as the Catalan addendum, for many such combinatorial interpretations.  My favorite is that C_n is the number of lattice paths from (0,0) to (n,n) that do not go above the diagonal y = x.

To derive a closed-form for C_n from the recurrence, start by multiplying both sides by x^n and summing up:

\displaystyle  \sum_{n=0}^{\infty} C_{n+1} x^n = \sum_{n=0}^{\infty} \left(\sum_{k=0}^n C_k C_{n-k}\right)x^n \\  \implies \frac{1}{x}\sum_{n=0}^{\infty} C_{n+1} x^{n+1} = C^2(x) \\  \implies \frac{1}{x}\sum_{n=1}^{\infty} C_n x^n = C^2(x) \\  \implies \frac{1}{x} \left(\sum_{n=0}^{\infty} C_n x^n - C_0 \right) = C^2(x) \\  \implies \frac{1}{x} C(x) - 1 = C^2(x) \\  \implies xC^2(x) - C(x) + 1 = 0 \\  \implies C(x) = \frac{1 \pm \sqrt{1 - 4x}}{2x}.

Now, do we want the positive or the negative square root?  We know that C_0 = 1, so the generating function satisfies C(0) = 1.  Subbing in 0 for either form of C(x) gives division by zero.  However, \displaystyle \lim_{x \to 0^+} C(x) is the indeterminate form 0/0 for the negative square root, so let’s take a closer look at this limit.  (The same limit evaluates to \infty for the positive square root, so that cannot be the right choice.)

Applying l’Hôpital’s rule, we have

\displaystyle \lim_{x \to 0^+} \frac{1 - \sqrt{1 - 4x}}{2x} = \lim_{x \to 0^+} \frac{2 (1-4x)^{-1/2}}{2} = 1.

The negative square root does evaluate to the correct number, so that one must be the right choice for C(x):

\displaystyle C(x) = \frac{1 - \sqrt{1 - 4x}}{2x}.

Applying Newton’s generalized binomial series, we then have

\displaystyle C(x) = \frac{1}{2x} \left(1 - \sqrt{1 - 4x} \right) = \frac{1}{2x} \left(1 - \sum_{n=0}^{\infty} \binom{1/2}{n} (-4x)^n \right).

The coefficient of x^n in the summand simplifies to

\displaystyle  \frac{\frac{1}{2} (\frac{1}{2}-1) \cdots (\frac{1}{2}-n+1) }{n!} (-4)^n \\  = \frac{1(1-2) \cdots (1-2n+2) }{n!} (-1)^n 2^n \\  = \frac{(-1)(-3) \cdots (-(2n-3)) }{(n!)^2} (-1)^n 2^n (n)(n-1) \cdots (1) \\  = \frac{(-1)(-3) \cdots (-(2n-3)) }{(n!)^2} (-1)^n (2n)(2n-2) \cdots (2) \\  = - \frac{(2n)!}{(n!)^2 (2n-1)} \\  = - \binom{2n}{n} \frac{1}{2n-1}.

Therefore,
\displaystyle  C(x) = \frac{1}{2x} \left(1 + \sum_{n=0}^{\infty} \binom{2n}{n} \frac{1}{2n-1} x^n \right) \\  = \frac{1}{2x} \left(1 + -1 + \sum_{n=1}^{\infty} \binom{2n}{n} \frac{1}{2n-1} x^n \right) \\  = \frac{1}{2} \sum_{n=1}^{\infty} \binom{2n}{n} \frac{1}{2n-1} x^{n-1} \\  = \frac{1}{2} \sum_{n=0}^{\infty} \binom{2(n+1)}{n+1} \frac{1}{2n+1} x^n \\  = \sum_{n=0}^{\infty} \frac{2(n+1)(2n+1)}{2(n+1)^2}\binom{2n}{n} \frac{1}{2n+1} x^n \\  = \sum_{n=0}^{\infty} \frac{1}{n+1}\binom{2n}{n} x^n.

Thus \displaystyle C_n = \frac{1}{n+1}\binom{2n}{n}, the coefficient of x^n in the generating function C(x).

References

1.  Richard Stanley, Enumerative Combinatorics, Vol. II, 2nd ed., Cambridge University Press, 2011.

Posted in Catalan numbers, combinatorics | 2 Comments

The Exponential Generating Function for the Derangement Numbers

Recently I ran across a derivation of the exponential generating function for the derangement numbers that I like better than the ones I had seen before.  (Perhaps it’s standard, but it was new to me.)  It’s Problem 10 in Chapter 5 of the text [1] I’m currently using for combinatorics.

A derangement is a permutation with no fixed points.  One of the standard recurrences for the number of derangements D_n on n elements is D_n = n D_{n-1} + (-1)^n, valid for n \geq 1.  The base case is D_0 = 1.  Now, multiply this recurrence by x^n/n! and sum over n to get

\displaystyle  \sum_{n \geq 1} D_n \frac{x^n}{n!} = \sum_{n \geq 1} nD_{n-1} \frac{x^n}{n!} + \sum_{n \geq 1} (-1)^n \frac{x^n}{n!}

\displaystyle  \implies \sum_{n \geq 0} D_n \frac{x^n}{n!} - 1 = x \sum_{n \geq 1} D_{n-1} \frac{x^{n-1}}{(n-1)!} + \sum_{n \geq 0} (-1)^n \frac{x^n}{n!} - 1

\displaystyle  \implies G(x) = x G(x) + e^{-x}

\displaystyle  \implies G(x) = \frac{e^{-x}}{1-x}.

(I’ve also given this answer on math.SE.)

  1. W. D. Wallis and J. C. George, Introduction to Combinatorics, CRC Press, 2011.
Posted in combinatorics | 2 Comments

Symmetric 0-1 Matrices with All Eigenvalues Positive, Part 2

A recent post and question on math.SE ask for an intuitive proof of the fact that the identity is the only symmetric 0-1 matrix with all eigenvalues positive.  As I mentioned in that recent post, Robert Israel’s argument is quite nice, but I was hoping for a more big-picture-type proof.  ”Intuitive” is, of course, rather fuzzy, but I found two proofs that were along the lines of what I was hoping for.  (I answered my own math.SE question with these proofs.)

Let A be a symmetric, 0-1 matrix with all eigenvalues positive.


Proof 1: Correlation matrices.

Let B be the diagonal matrix consisting of 1′s where A has 0′s and 0′s where A has 1′s. Then A+B is a correlation matrix. Since adding nonnegative values to A’s diagonal can’t decrease any of A’s eigenvalues (see, for example, Weyl’s inequality), A+B has all eigenvalues positive and is thus invertible. But the only invertible 0-1 correlation matrix is the identity. Since A+B is the identity, A must be a diagonal matrix. But the only invertible 0-1 diagonal matrix is the identity. Therefore, A is the identity matrix.


Proof 2: Linear transformations.

Since A is a real symmetric matrix it is orthogonally diagonalizable, which means that it represents a linear transformation with scaling in mutually perpendicular directions. Since the eigenvalues are the scaling factors for the different directions, the scaling factors are all positive. This implies that for x ≠ 0A cannot map x to a vector that is perpendicular to x. (This follows geometrically, but you can also see this by remembering that positive definite implies something even stronger: Ax must make an acute angle with x.)

This non-orthogonality restriction between Ax and x rules out any diagonal elements of A being 0 as otherwise aii = 0 would imply ei and Aei are orthogonal.

Since A has all 1′s along its diagonal, the non-orthogonality restriction now rules out any off-diagonal elements being 1. Suppose otherwise; i.e., that aij = aji = 1 with aii = ajj = 1. Then A maps both ei and ej to the vector (1,1) when the range of A is restricted to the 2D subspace given by span{eiej}. But this means that A maps eej to the vector (0,0) when the range of A is restricted to span{eiej}. Since e - ej is itself in span{eiej}, this implies that e - ej and A(e - ej) are orthogonal, and we have our contradiction with the non-orthogonality restriction.

The only option left is that A is the identity matrix.

Posted in linear algebra, matrices | 4 Comments

A Simple Proof that the Largest Eigenvalue of a Stochastic Matrix is 1

A stochastic matrix is a square matrix whose entries are non-negative and whose rows all sum to 1.  The transition matrix for a finite-state Markov chain is a stochastic matrix, and so they are essential for tackling problems that can be modeled as Markov chains.

One of the many interesting properties of a stochastic matrix is that its largest eigenvalue is 1.  I believe this can be proved using the Perron-Frobenius theorem.  However, I found a simple, self-contained proof, thanks in part to an answer to a different question of mine on math.SE.

Theorem: The largest eigenvalue of a stochastic matrix is 1.

Proof: First, if A is a stochastic matrix, then A11, since each row of A sums to 1.  This proves that 1 is an eigenvalue of A.  Second, suppose there exists λ > 1 and x such that A= λx.  Let xi be a largest element of x.   Since the rows of A are nonnegative and sum to 1, each entry in λx is a convex combination of the elements of x.  Thus no entry in λx can be larger than xi.  But since λ > 1, λxi > xi.  Contradiction.  Therefore, the largest eigenvalue of A is 1.

Posted in linear algebra, matrices, probability | Leave a comment