birth-and-death process is a special kind of continuous-time Markov chain that has applications in queuing.  If we assume that arrivals to a queuing system follow a Poisson process and that service times are exponentially distributed, then the resulting queuing system is a birth-and-death process.  In this post we’ll derive the steady-state probabilities for a general birth-and-death process.  At the end we’ll take the special case in which the arrival rates are the same and the service rates are the same.  This will give us the steady-state probabilities for the basic M/M/1 queuing system.

In steady-state for a continuous-time Markov chain the rate at which individuals transition into a particular state must be equal to the rate at which individuals transition out of that state.  If $\lambda_n$ is the birth rate for state n, $\mu_n$ is the death rate for state n, and $P_n$ is the probability of being in state n in steady-state, then we have the set of equations

$\displaystyle \lambda_0 P_0 = \mu_1 P_1,$
$\displaystyle (\lambda_n + \mu_n)P_1 = \lambda_{n-1} P_{n-1} + \mu_{n+1} P_{n+1}$, for $n \geq 1$.

The first equation gives us $\displaystyle P_1 = \frac{\lambda_0}{\mu_1} P_0$.  Substituting into the second equation when $n =1$ yields

$\displaystyle P_2 = \frac{(\lambda_1 + \mu_1)P_1 - \lambda_0 P_0}{\mu_2} = \frac{(\lambda_1 + \mu_1)P_1 - \mu_1 P_1}{\mu_2} = \frac{\lambda_1}{\mu_2}P_1 = \frac{\lambda_1 \lambda_0}{\mu_2 \mu_1} P_0$.

This suggests that we might have $\displaystyle P_n = \frac{\lambda_{n-1}}{\mu_n} P_{n-1}$ for $n \geq 1$.  Assuming this, we have

$\displaystyle P_{n+1} = \frac{(\lambda_n + \mu_n)P_n - \lambda_{n-1} P_{n-1}}{\mu_{n+1}} = \frac{(\lambda_n + \mu_n)P_n - \mu_n P_n}{\mu_{n+1}} = \frac{\lambda_n}{\mu_{n+1}} P_n.$

Therefore, $\displaystyle P_n = \frac{\prod_{i=0}^{n-1} \lambda_i}{\prod_{i=1}^n \mu_i} P_0.$

Assuming that the birth and death rates are such that we actually have well-defined steady state probabilities, $\displaystyle \sum_{n=0}^{\infty} P_n = 1$, and so

$\displaystyle \sum_{n=0}^{\infty} \frac{\prod_{i=0}^{n-1} \lambda_i}{\prod_{i=1}^n \mu_i} P_0 = 1$,

which implies

$\displaystyle P_0 = \left( \sum_{n=0}^{\infty} \frac{\prod_{i=0}^{n-1} \lambda_i}{\prod_{i=1}^n \mu_i} \right)^{-1}$.

In general, then,

$\displaystyle P_n = \left(\prod_{i=1}^n \frac{\lambda_{i-1}}{\mu_i} \right) \left(\sum_{n=0}^{\infty} \frac{\prod_{i=0}^{n-1} \lambda_i}{\prod_{i=1}^n \mu_i} \right)^{-1}$.

Application to the M/M/1 queuing system.

For an M/M/1 queuing system (i.e., a system with Markovian, or exponential, interarrival and service times, plus a single server), we have $\lambda_i = \lambda$ and $\mu_i = \mu$ for each i.  In this case, the formula for $P_0$ simplifies nicely.  We get

$\displaystyle P_0 = \left( \sum_{n=0}^{\infty} \frac{\lambda^n}{\mu^n} \right)^{-1} = \left( \sum_{n=0}^{\infty} \left(\frac{\lambda}{\mu} \right)^n \right)^{-1} = \left( \frac{1}{1 - \lambda/\mu} \right)^{-1} = 1 - \frac{\lambda}{\mu}.$

In the second-to-last step we use the formula for a geometric series.  This step requires $\lambda < \mu$; otherwise, the infinite series fails to converge.  This should make sense: If the arrival rate is larger than the service rate then over time the queue will get longer and longer and so will never approach steady-state.  (It is perhaps harder to see intuitively that we do not get a steady state if the arrival and service rates are equal, but that falls out of the mathematics here.)

In general, then, for the M/M/1 queuing system, the probability that there are exactly n customers in the system in steady-state is given by $\displaystyle P_n = \frac{\lambda^n}{\mu^n} \left(1 - \frac{\lambda}{\mu} \right).$

## The Product and Quotient Rules for Differential Calculus

A couple of weeks ago one of my senior colleagues subbed for me on the day we discussed the product and quotient rules for differential calculus.  Afterwards he told me that he had never seen the way that I introduced these formulas, so I thought I would reproduce the derivations here. They are not original, but I cannot remember now where I first saw them.  The product rule derivation is more standard, I think.  The quotient rule derivation is nice because it avoids the chain rule.

The Product Rule.

The product rule tells us how to evaluate $\dfrac{d}{dx} f(x)g(x)$.

To understand this it helps to visualize what is going on.  You can think of the product as representing the area of a rectangle with sides $f(x)$ and $g(x)$.  If we increase $x$ by $\Delta x$ we are (if f and g are increasing at $x$) making tiny increases in the side lengths of the rectangle.  Effectively, we are adding the three shaded areas (two blue and one gray) to the white rectangle in the picture below.  (Image borrowed from the Wikipedia article on the product rule.)

The blue shaded area on the top is $\Delta f(x) g(x)$, the blue shaded area on the right is $f(x) \Delta g(x)$, and the gray shaded area in the corner is $\Delta f(x) \Delta g(x)$.

With this in mind, let’s take the limit of the difference quotient in order to calculate the derivative. This gives

$\displaystyle \dfrac{d}{dx} f(x)g(x) = \lim_{\Delta x \to 0} \frac{\Delta f(x) g(x) + f(x) \Delta g(x) + \Delta f(x) \Delta g(x)}{\Delta x}$
$\displaystyle = g(x) \lim_{\Delta x \to 0} \frac{\Delta f(x)}{\Delta x} + f(x) \lim_{\Delta x \to 0} \frac{\Delta g(x)}{\Delta x} + \lim_{\Delta x \to 0} \frac{\Delta f(x) \Delta g(x)}{\Delta x}$
$\displaystyle = g(x) f'(x) + f(x) g'(x) + f'(x)(0)$
$= g(x) f'(x) + f(x) g'(x)$, which is the product rule.

(This argument is essentially the one on the Wikipedia page for the product rule.)

The Quotient Rule.

The quotient rule tells us how to evaluate $\dfrac{d}{dx} \dfrac{f(x)}{g(x)}$.

A standard way to derive the quotient rule is to use the product rule together with the chain rule.  However, by the point in the semester at which I want to introduce the quotient rule we generally haven’t seen the chain rule yet.  Here is a derivation that avoids the chain rule.

Let $\displaystyle Q(x) = \frac{f(x)}{g(x)}$.  We want $Q'(x)$.  Algebra gives us $f(x) = Q(x)g(x)$.  Differentiating both sides, we have $f'(x) = g(x) Q'(x) + Q(x)g'(x)$, by the product rule.  Now we solve for $Q'(x)$ to obtain the quotient rule:

$\displaystyle Q'(x) = \dfrac{f'(x) - Q(x) g'(x)}{g(x)} = \dfrac{f'(x) - \frac{f(x)}{g(x)}g'(x)}{g(x)} = \dfrac{\frac{g(x)}{g(x)}f'(x) - \frac{f(x)}{g(x)} g'(x)}{g(x)} = \dfrac{g(x) f'(x) - f(x)g'(x)}{(g(x))^2}.$

## Which is bigger: π^e or e^π?

The question of whether $\pi^e$ or $e^{\pi}$ is larger is a classic one.  Of course, with a calculator it is easy to see what the answer is.  But how would you answer the question without a calculator?

There are lots of interesting ways to tackle the question.  For this (short) post I would like to highlight one I saw on math.SE, given by Aryabhata.  We need the assumptions that $\pi > e$ and $e^x > x+1$ for $x > 0$.  (One way to view the latter claim is that, for $x > 0$, $e^x$ is larger than the first two terms of its Maclaurin series.  Since all the rest of the terms of the Maclaurin series are positive, and the Maclaurin series converges to $e^x$, we get that $e^x > x+1$ for $x > 0$.)

On to the proof.  Since $\pi > e$, $\pi/e - 1 > 0$.  Applying the second assumption, then, we get $\displaystyle e^{\pi/e-1} > \pi/e$.  This means that $\displaystyle e^{\pi/e} > \pi$, which in turn implies that $\displaystyle e^{\pi} > \pi^e$.

## The Expected Maximum of Independent and Identically Distributed Geometric Random Variables

In my last post I derived the formula for the expected maximum of n independent and identically distributed (iid) exponential random variables.  If each random variable has rate parameter $\lambda$, then the expected maximum is $\frac{1}{\lambda} \sum_{i=1}^n \frac{1}{i} = \frac{H_n}{\lambda}$, where $H_n$ is the nth harmonic number.

This post considers the discrete version of the same problem: the expected maximum of n independent and identically distributed geometric random variables.  Unfortunately, there is no known nice, closed-form expression for the expected maximum in the discrete case.  However, the answer for the corresponding exponential random variables turns out to be a good approximation.

Let $X_1, X_2, \ldots X_n$ be n iid geometric(p) random variables.  Let $T_n$ denote the maximum of the $X_i$ variables.  Let $q = 1- p$.  Let the rate parameter for the corresponding exponential distribution be $\lambda = - \log(1-p)$.  Now, we have

$\displaystyle E[T_n] = \sum_{k=0}^{\infty} P(T_n > k) = \sum_{k=0}^{\infty} (1 - P(T_n \leq k))$
$\displaystyle = \sum_{k=0}^{\infty} (1 - P(X_1 \leq k) P(X_2 \leq k) \cdots P(X_n \leq k)) = \sum_{k=0}^{\infty} (1 - (1-q^k)^n),$

as the cumulative distribution function of a geometric(p) random variable is $1 - (1-p)^k$.

By viewing the infinite sum as right- and left-hand Riemann sum approximations of the corresponding integral we obtain

$\displaystyle \int_0^{\infty} (1 - (1 - q^x)^n) dx \leq E[T_n] \leq 1 + \int_0^{\infty} (1 - (1 - q^x)^n) dx.$

Now, let’s take a closer look at the integral that occurs on both sides of this inequality. With the variable switch $u = 1 - q^x$ we have

$\displaystyle \int_0^{\infty} (1 - (1 - q^x)^n) dx = -\frac{1}{\log q} \int_0^1 \frac{1 - u^n}{1-u} du = -\frac{1}{\log q} \int_0^1 \left(1 + u + \cdots + u^{n-1}\right) du$
$\displaystyle = -\frac{1}{\log q} \left(1 + \frac{1}{2} + \cdots + \frac{1}{n}\right) = -\frac{1}{\log q} H_n,$
proving the fairly tight bounds

$\displaystyle \frac{1}{\lambda} H_n \leq E[T_n] \leq 1 + \frac{1}{\lambda} H_n.$

We can obtain a more precise approximation by using the Euler-Maclaurin summation formula for approximating a sum by an integral.  Up to a first-order error term, it says that

$\displaystyle E[T_n] = \sum_{k=0}^{\infty} (1 - (1-q^k)^n) \approx \int_0^{\infty} (1 - (1 - q^x)^n) dx + \frac{1}{2},$
yielding the approximation

$\displaystyle E[T_n] \approx -\frac{1}{\log q} H_n + \frac{1}{2},$

with error term given by
$\displaystyle \int_0^{\infty} n (\log q) q^x (1 - q^x)^{n-1} \left(x - \lfloor x \rfloor - \frac{1}{2}\right) dx.$
One can verify that this is quite small unless n is also small or q is extreme.

See also Bennett Eisenberg’s paper “On the expectation of the maximum of IID geometric random variables,” Statistics and Probability Letters 78 (2008), 135-143.

(I first posted this argument online as my answer to “Expectation of the maximum of IID geometric random variables” on Math Stack Exchange.)

Posted in probability | 2 Comments

## The Expected Maximum of Independent and Identically Distributed Exponential Random Variables

Finding the expected maximum of independent and identically distributed (iid) exponentially distributed random variables is a standard calculation in an undergraduate course in probability theory.  This post goes through the details of that calculation.  (My next post will address the more difficult question of the expected maximum of iid geometric random variables.)

There are three properties of exponential random variables that we need.

1. Exponentially distributed random variables are memoryless.  Formally, if X is exponentially distributed, then $P(X > t+s | X > s) = P(X > t)$.  Less formally, suppose that the lifetime of an electronic component is exponentially distributed.  The memoryless property says that the probability that the component will last at least t units longer, given that the component has lasted s units of time, is independent of the value of s.
2. The minimum of n exponentially distributed random variables with rate parameters $\lambda_1, \lambda_2, \ldots, \lambda_n$ is itself exponentially distributed with rate parameter $\sum_{i=1}^n \lambda_i$.  (For a proof, see the Wikipedia page on the exponential distribution.)
3. The exponential distribution is a continuous distribution.  (This is a very basic property of the exponential distribution.  However, the geometric distribution has the first two properties as well, yet its expected maximum is more complicated.)

Now, suppose we have n iid exponential($\lambda$) random variables, and let $T_i$ be the ith smallest of these.  Since $T_1$ is exponential($n \lambda)$, $E[T_1] = \frac{1}{n \lambda}$.

Since exponential random variables are continuous, the probability that any two of the n random variables have the same value is 0.  Thus the n-1 other random variables all have values larger than $T_1$.  However, the memoryless property says knowledge of $T_1$ essentially “resets” the values of the other random variables, so that the time between $T_1$ and $T_2$ is the same (distributionally) as the time until the first of n-1 iid exponential($\lambda$) random variables takes on a value.  Thus $E[T_2 - T_1] = \frac{1}{(n-1) \lambda}$.

Applying the same logic, we get that, for $1 \leq i \leq n-1$, $E[T_{i+1} - T_i] = \frac{1}{(n-i) \lambda}$.  Thus the expected value of the maximum of the n random variables is

$\displaystyle E[T_n] = E\left[T_1 + \sum_{i=1}^{n-1} (T_{i+1}-T_i) \right] = \sum_{i=0}^{n-1} \frac{1}{(n-i) \lambda} = \sum_{i=1}^n \frac{1}{i \lambda} = \frac{H_n}{\lambda},$

where $H_n$ is the nth harmonic number.

Posted in probability | 2 Comments

## A Combinatorial Proof of a Stirling Number Formula

One of the fundamental properties of the (unsigned) Stirling numbers of the first kind is that they can be used to convert from rising factorial powers to ordinary powers via the formula $\displaystyle x^{\overline{n}} = \sum_{k=0}^n \left[ n \atop k \right] x^k.$  This post gives a combinatorial proof of this formula.

Suppose we can color the numbers in a permutation according to cycle (i.e., numbers in the same cycle are the same color). We can color two different cycles the same color. If there are n elements to permute and x colors, how many of these cycle-colored permutations are there?

One way to count them is to condition on the number of cycles. There are $\left[ n \atop k \right]$ ways to choose a permutation on $[n]$ with k cycles, and there are $x^k$ ways to color the k cycles once chosen. So the answer is $\displaystyle \sum_{k=0}^n \left[ n \atop k \right] x^k.$

Another way to count them is to consider the elements $1, 2, \ldots, n$ in turn.  First, color the number 1 in x ways.  Then the number 2 can either start a new colored cycle, in x ways, or go after 1 in 1′s cycle, in 1 way. So there are x + 1 choices for the number 2.  The number 3 can start a new colored cycle, in x ways, or go immediately after 1 or 2 in their cycles, in 2 ways. So there are x + 2 choices for the number 3.  In general, there are there are $x+k-1$ choices for where to place the number k.  Thus there are $x(x+1) \cdots (x+n-1) = x^{\overline{n}}$ choices for placing the n elements in order to create a cycle-colored permutation.

This argument assumes that x is a positive integer.  To extend the proof for all real numbers, note that the formula $\displaystyle x^{\overline{n}} = \sum_{k=0}^n \left[ n \atop k \right] x^k$ is a polynomial in x of degree n.  Since we have shown that this expression holds for infinitely many values of x (the positive integers), it must hold for all real values of x.

(A variant of this argument appears in the second edition of Volume 1 of Richard Stanley’s Enumerative Combinatorics, pp. 33-34.)

## The Catalan Numbers from Their Generating Function

Deriving the expression $\displaystyle C_n = \frac{1}{n+1} \binom{2n}{n}$ for the Catalan numbers from the Catalan recurrence $\displaystyle C_{n+1} = \sum_{k=0}^n C_k C_{n-k}$ is a classic exercise in generating function manipulation, although the details are sometimes omitted in textbooks (and on the Wikipedia page).  This post fills in the details.

The Catalan numbers are the solution to many combinatorial problems.  See, for example, this selection from Vol. II, Chapter 6, of Stanley’s Enumerative Combinatorics [1], as well as the Catalan addendum, for many such combinatorial interpretations.  My favorite is that $C_n$ is the number of lattice paths from (0,0) to $(n,n)$ that do not go above the diagonal $y = x$.

To derive a closed-form for $C_n$ from the recurrence, start by multiplying both sides by $x^n$ and summing up:

$\displaystyle \sum_{n=0}^{\infty} C_{n+1} x^n = \sum_{n=0}^{\infty} \left(\sum_{k=0}^n C_k C_{n-k}\right)x^n \\ \implies \frac{1}{x}\sum_{n=0}^{\infty} C_{n+1} x^{n+1} = C^2(x) \\ \implies \frac{1}{x}\sum_{n=1}^{\infty} C_n x^n = C^2(x) \\ \implies \frac{1}{x} \left(\sum_{n=0}^{\infty} C_n x^n - C_0 \right) = C^2(x) \\ \implies \frac{1}{x} C(x) - 1 = C^2(x) \\ \implies xC^2(x) - C(x) + 1 = 0 \\ \implies C(x) = \frac{1 \pm \sqrt{1 - 4x}}{2x}.$

Now, do we want the positive or the negative square root?  We know that $C_0 = 1$, so the generating function satisfies $C(0) = 1$.  Subbing in 0 for either form of $C(x)$ gives division by zero.  However, $\displaystyle \lim_{x \to 0^+} C(x)$ is the indeterminate form $0/0$ for the negative square root, so let’s take a closer look at this limit.  (The same limit evaluates to $\infty$ for the positive square root, so that cannot be the right choice.)

Applying l’Hôpital’s rule, we have

$\displaystyle \lim_{x \to 0^+} \frac{1 - \sqrt{1 - 4x}}{2x} = \lim_{x \to 0^+} \frac{2 (1-4x)^{-1/2}}{2} = 1.$

The negative square root does evaluate to the correct number, so that one must be the right choice for $C(x)$:

$\displaystyle C(x) = \frac{1 - \sqrt{1 - 4x}}{2x}.$

Applying Newton’s generalized binomial series, we then have

$\displaystyle C(x) = \frac{1}{2x} \left(1 - \sqrt{1 - 4x} \right) = \frac{1}{2x} \left(1 - \sum_{n=0}^{\infty} \binom{1/2}{n} (-4x)^n \right).$

The coefficient of $x^n$ in the summand simplifies to

$\displaystyle \frac{\frac{1}{2} (\frac{1}{2}-1) \cdots (\frac{1}{2}-n+1) }{n!} (-4)^n \\ = \frac{1(1-2) \cdots (1-2n+2) }{n!} (-1)^n 2^n \\ = \frac{(-1)(-3) \cdots (-(2n-3)) }{(n!)^2} (-1)^n 2^n (n)(n-1) \cdots (1) \\ = \frac{(-1)(-3) \cdots (-(2n-3)) }{(n!)^2} (-1)^n (2n)(2n-2) \cdots (2) \\ = - \frac{(2n)!}{(n!)^2 (2n-1)} \\ = - \binom{2n}{n} \frac{1}{2n-1}.$

Therefore,
$\displaystyle C(x) = \frac{1}{2x} \left(1 + \sum_{n=0}^{\infty} \binom{2n}{n} \frac{1}{2n-1} x^n \right) \\ = \frac{1}{2x} \left(1 + -1 + \sum_{n=1}^{\infty} \binom{2n}{n} \frac{1}{2n-1} x^n \right) \\ = \frac{1}{2} \sum_{n=1}^{\infty} \binom{2n}{n} \frac{1}{2n-1} x^{n-1} \\ = \frac{1}{2} \sum_{n=0}^{\infty} \binom{2(n+1)}{n+1} \frac{1}{2n+1} x^n \\ = \sum_{n=0}^{\infty} \frac{2(n+1)(2n+1)}{2(n+1)^2}\binom{2n}{n} \frac{1}{2n+1} x^n \\ = \sum_{n=0}^{\infty} \frac{1}{n+1}\binom{2n}{n} x^n.$

Thus $\displaystyle C_n = \frac{1}{n+1}\binom{2n}{n}$, the coefficient of $x^n$ in the generating function $C(x)$.

References

1.  Richard Stanley, Enumerative Combinatorics, Vol. II, 2nd ed., Cambridge University Press, 2011.

Posted in Catalan numbers, combinatorics | 2 Comments