## The Expected Maximum of Independent and Identically Distributed Geometric Random Variables

In my last post I derived the formula for the expected maximum of n independent and identically distributed (iid) exponential random variables.  If each random variable has rate parameter $\lambda$, then the expected maximum is $\frac{1}{\lambda} \sum_{i=1}^n \frac{1}{i} = \frac{H_n}{\lambda}$, where $H_n$ is the nth harmonic number.

This post considers the discrete version of the same problem: the expected maximum of n independent and identically distributed geometric random variables.  Unfortunately, there is no known nice, closed-form expression for the expected maximum in the discrete case.  However, the answer for the corresponding exponential random variables turns out to be a good approximation.

Let $X_1, X_2, \ldots X_n$ be n iid geometric(p) random variables.  Let $T_n$ denote the maximum of the $X_i$ variables.  Let $q = 1- p$.  Let the rate parameter for the corresponding exponential distribution be $\lambda = - \log(1-p)$.  Now, we have

$\displaystyle E[T_n] = \sum_{k=0}^{\infty} P(T_n > k) = \sum_{k=0}^{\infty} (1 - P(T_n \leq k))$
$\displaystyle = \sum_{k=0}^{\infty} (1 - P(X_1 \leq k) P(X_2 \leq k) \cdots P(X_n \leq k)) = \sum_{k=0}^{\infty} (1 - (1-q^k)^n),$

as the cumulative distribution function of a geometric(p) random variable is $1 - (1-p)^k$.

By viewing the infinite sum as right- and left-hand Riemann sum approximations of the corresponding integral we obtain

$\displaystyle \int_0^{\infty} (1 - (1 - q^x)^n) dx \leq E[T_n] \leq 1 + \int_0^{\infty} (1 - (1 - q^x)^n) dx.$

Now, let’s take a closer look at the integral that occurs on both sides of this inequality. With the variable switch $u = 1 - q^x$ we have

$\displaystyle \int_0^{\infty} (1 - (1 - q^x)^n) dx = -\frac{1}{\log q} \int_0^1 \frac{1 - u^n}{1-u} du = -\frac{1}{\log q} \int_0^1 \left(1 + u + \cdots + u^{n-1}\right) du$
$\displaystyle = -\frac{1}{\log q} \left(1 + \frac{1}{2} + \cdots + \frac{1}{n}\right) = -\frac{1}{\log q} H_n,$
proving the fairly tight bounds

$\displaystyle \frac{1}{\lambda} H_n \leq E[T_n] \leq 1 + \frac{1}{\lambda} H_n.$

We can obtain a more precise approximation by using the Euler-Maclaurin summation formula for approximating a sum by an integral.  Up to a first-order error term, it says that

$\displaystyle E[T_n] = \sum_{k=0}^{\infty} (1 - (1-q^k)^n) \approx \int_0^{\infty} (1 - (1 - q^x)^n) dx + \frac{1}{2},$
yielding the approximation

$\displaystyle E[T_n] \approx -\frac{1}{\log q} H_n + \frac{1}{2},$

with error term given by
$\displaystyle \int_0^{\infty} n (\log q) q^x (1 - q^x)^{n-1} \left(x - \lfloor x \rfloor - \frac{1}{2}\right) dx.$
One can verify that this is quite small unless n is also small or q is extreme.

See also Bennett Eisenberg’s paper “On the expectation of the maximum of IID geometric random variables,” Statistics and Probability Letters 78 (2008), 135-143.

(I first posted this argument online as my answer to “Expectation of the maximum of IID geometric random variables” on Math Stack Exchange.)