## A Math Error in the New Yorker

I normally use this blog to talk about mathematical questions that interest me.  However, I saw a math error in the New Yorker yesterday that I think is worth commenting on.

James Surowiecki has an article (The Mobility Myth) arguing that income mobility (the ability of groups to change their economic status) in the United States isn’t very high.  At one point he makes the following statement: “The middle class isn’t all that mobile, either: only twenty per cent of people born into the middle quintile ever make it into the top one.”

This sounds pretty bad at first glance: Twenty percent isn’t that large of a percentage.  However, the top quintile is, by definition, the top one-fifth; i.e., the top twenty percent.  If the United States were to have perfect income mobility (in the sense that a parent’s economic status has absolutely no effect on that of their children) then we would see exactly twenty percent of children born into the middle class end up in the top quintile (as well as twenty percent in the bottom quintile and twenty percent in each of the other three quintiles).  In other words, the argument that “twenty percent of people born into the middle quintile make it into the top one” is actually an argument in favor of high income mobility in the United States, not against it!

Surowiecki’s misunderstanding about what quintiles actually mean causes some of his other claims not to be as drastic as they sound.  For example, “fewer than ten per cent [of the bottom quintile] get into the top quintile,” while still not that great, isn’t quite so bad once you realize that twenty percent is the goal.  Similarly with “In San Jose, just thirteen per cent of people in the bottom quintile make it to the top.”  (However, I should point out that, given the number of people we’re talking about here, the difference between twenty percent and thirteen percent is quite large in absolute terms.)

I’m not going to comment on the merits of the rest of Surowiecki’s column, as to do so would be to step too far from my area of expertise.  I will add, though, that his data appear to come from the study “Where is the Land of Opportunity?: Intergenerational Mobility in the United States,” by Chetty, Hendren, Kline, and Saez.  This document contains of wealth of information to interest data guys like me.  See especially Table III and Online Appendix Table IV, which appear to be the sources of some of the numbers Surowiecki cites.

Posted in journalism, statistics | 1 Comment

## The Maximum Value of Spearman’s Footrule Distance

Given a permutation $\sigma$ on n elements, Spearman’s Footrule Distance $F(\sigma)$ is the sum of the absolute differences between i and $\sigma(i)$ over all values of i:

$\displaystyle F(\sigma) = \sum_{i=1}^n |\sigma(i) - i|.$

Spearman’s Footrule Distance can be thought of as a measure of the disarray of a permutation.  Another such measure is Kendall’s Tau, which uses the sum of squared deviations rather than the sum of absolute deviations.

In this post we prove that $\displaystyle \max_{\sigma} F(\sigma) = \begin{cases} 2m^2, & n = 2m; \\ 2m^2+2m, & n = 2m+1; \end{cases}$
where the maximum is taken over all permutations $\sigma$ on n elements.  The argument is that given by user4140 in this post on Math.SE with some details added.

First, let’s look at the case where n is even.  Let $n = 2m$.  Let $\sigma$ be any permutation with property P; namely, that $\sigma(i) > m$ if $i \leq m$ and $\sigma(i) \leq m$ if $i > m$.  Then

$\displaystyle F(\sigma) = \sum_{i=1}^{2m} |\sigma(i) - i| = \sum_{i=1}^m (\sigma(i) - i) + \sum_{i=m+1}^{2m} (i - \sigma(i))$.

Now, in the first sum $\sigma(i)$ ranges over the values m+1 through 2m, hitting each exactly once.  In the second sum $\sigma(i)$ does the same with the values 1 through m.  Therefore,

$\displaystyle F(\sigma) = 2 \sum_{i=m+1}^{2m} i - 2 \sum_{i=1}^m i = 2m(2m+1) - m(m+1) - m(m+1)$
$\displaystyle = 2m(2m+1) - 2m(m+1) = 2m(2m+1-m-1) = 2m^2.$

Now, suppose we have a permutation $\sigma$ that does not have property P.  Suppose, without loss of generality, that $\sigma(i) \leq m$ for some $i \leq m$.  Then, by the pigeonhole principle, there exists some $j > m$ such that $\sigma(j) > m$.  Swap $\sigma(i)$ and $\sigma(j)$ to create a new permutation $\sigma'$.  Place the four points $i, \sigma(i), j,\sigma(j)$ on a number line according to their numerical values.  Let A, B, and C denote, respectively, the distances between consecutive points.  Then $|\sigma(i) - i| + |\sigma(j) - j| = A+C$.  Regardless of the order of $i, \sigma(i), j$, and $\sigma(j)$, we have $|\sigma'(i) - i| + |\sigma'(j) - j| = A + C + 2B$.  This is because the A to C distances must both be covered under the $\sigma'$ permutation and the B distance must be covered twice.

Therefore, any permutation that does not have property P cannot be a permutation that achieves the maximum value of Spearman’s Footrule Distance.  Since those permutations $\sigma$ that do have property P have $F(\sigma) = 2m^2$, this must be the maximum value of Spearman’s Footrule Distance for permutations satisfying $n = 2m$.

For $n = 2m+1$ the argument is similar.  We need property Q; namely, that $\sigma(i) > m+1$ if $i \leq m$, $\sigma(m+1) = m+1$, and $\sigma(i) \leq m$ if $i > m+1$.  Then

$\displaystyle F(\sigma) = \sum_{i=1}^{2m+1} |\sigma(i) - i| = \sum_{i=1}^m (\sigma(i) -i) + \sum_{i=m+2}^{2m+1} (i - \sigma(i))$
$\displaystyle = (2m+1)(2m+2) - (m+1)(m+2) - m(m+1) = 2m^2+2m.$

The argument for permutations $\sigma$ that do not satisfy property Q is similar.  This proves the result.

For more information on Spearman’s Footrule Distance, see Diaconis and Graham, “Spearman’s Footrule as a Measure of Disarray,” Journal of the Royal Statistical Society, Series B, Vol. 39, No. 2 (1977) , pp. 262-268.

## Euler Sums, Part I

Sums of the form $\displaystyle E(p,q) = \sum_{n=1}^{\infty} \frac{H^{(p)}_n}{n^q}$, where $\displaystyle H^{(p)}_n = \sum_{k=1}^n \frac{1}{n^p}$ (i.e., the nth harmonic number of order p) are sometimes called Euler sums.  There are a large number of interesting ways to evaluate these Euler sums, such as converting them to integrals, applying complex analysis, and rewriting them as polylogarithms.  I tend to prefer the most basic techniques, though – those that involve manipulating the sum directly.

This post reproduces one of those basic techniques for evaluating $\displaystyle \sum_{n=1}^{\infty} \frac{H_n}{n^2}$.  It appeared as part of an answer by Rob Johnson to a question of mine about the evaluation of three different alternating Euler sums.  (An alternating Euler sum is one of the form $\displaystyle \sum_{n=1}^{\infty} \frac{(-1)^{n-1} H^{(p)}_n}{n^q}.$)

Here’s Rob’s argument.

$\displaystyle \sum_{n=1}^{\infty} \frac{H_n}{n^2} = \sum_{n=1}^{\infty} \sum_{k=1}^{\infty} \frac{1}{n^2} \left( \frac{1}{k} - \frac{1}{k+n} \right)$

$\displaystyle = \sum_{n=1}^{\infty} \sum_{k=1}^{\infty} \frac{1}{n k (k+n)}$

$\displaystyle = \sum_{k=1}^{\infty} \sum_{n=k+1}^{\infty} \frac{1}{(n-k)kn}$

$\displaystyle = \sum_{n=2}^{\infty} \sum_{k=1}^{n-1} \frac{1}{(n-k)kn}$

$\displaystyle = \sum_{n=2}^{\infty} \sum_{k=1}^{n-1} \frac{1}{n^2} \left(\frac{1}{k} + \frac{1}{n-k}\right)$

$\displaystyle = 2\sum_{n=1}^{\infty} \frac{1}{n^2} H_{n-1}$

$\displaystyle = 2\sum_{n=1}^{\infty} \frac{1}{n^2} H_n - 2\sum_{n=1}^{\infty} \frac{1}{n^3}$

$\displaystyle \implies \sum_{n=1}^{\infty} \frac{H_n}{n^2} = 2 \zeta(3)$.

This is a clever sequence of manipulations.  The first good idea is to express $H_n$ as a telescoping series in the first step.  The variable switch in step three (n-k for n) is nice because, after swapping the order of summation in step four and the partial fractions decomposition in step five, it sets up the crucial step six: $\displaystyle \sum_{k=1}^{n-1} \left(\frac{1}{k} + \frac{1}{n-k}\right)$ is just $H_{n-1}$ added forwards and backwards (and so twice).  (The switch in the lower index in step six from $n = 2$ to $n = 1$ is because $H_0$ is an empty sum and so is $0$.)  Step seven is also nice: Adding two copies of $\displaystyle \sum_{n=1}^{\infty} \frac{1}{n^3}$ to the sum in line six enables you to change the $H_{n-1}$ to $H_n$, giving you two copies of the Euler sum you’re trying to evaluate on the right side of the equation in addition to the one already one the left side.  Of course, if you add two copies of $\displaystyle \sum_{n=1}^{\infty} \frac{1}{n^3}$ you have to subtract two copies as well, and the basic algebra that remains in step eight gives you the final, simple, beautiful, result:

$\displaystyle \sum_{n=1}^{\infty} \frac{H_n}{n^2} = 2\zeta(3)$.

For more information on evaluating Euler sums, you can see the rest of Rob’s answer or some of the other nice answers to my question about evaluating some simple alternating Euler sums.  (The integral representations in this answer are particularly nice, but the others are good, too.)  Michael Hoffman’s page on multiple zeta values and Euler sums contains a huge number of references for evaluation of Euler sums and their variations.  It’s a great resource.  I’ve also recently been thumbing through Ovidiu Furdui’s problem book Limits, Series, and Fractional Part Integrals [1].  Chapter 3, “A Bouquet of Series,” contains explicit evaluation of several Euler and Euler-type sums.

References

1.  Ovidiu Furdui, Limits, Series, and Fractional Part Integrals: Problems in Mathematical Analysis, Springer Problem Books in Mathematics, 2013.

birth-and-death process is a special kind of continuous-time Markov chain that has applications in queuing.  If we assume that arrivals to a queuing system follow a Poisson process and that service times are exponentially distributed, then the resulting queuing system is a birth-and-death process.  In this post we’ll derive the steady-state probabilities for a general birth-and-death process.  At the end we’ll take the special case in which the arrival rates are the same and the service rates are the same.  This will give us the steady-state probabilities for the basic M/M/1 queuing system.

In steady-state for a continuous-time Markov chain the rate at which individuals transition into a particular state must be equal to the rate at which individuals transition out of that state.  If $\lambda_n$ is the birth rate for state n, $\mu_n$ is the death rate for state n, and $P_n$ is the probability of being in state n in steady-state, then we have the set of equations

$\displaystyle \lambda_0 P_0 = \mu_1 P_1,$
$\displaystyle (\lambda_n + \mu_n)P_1 = \lambda_{n-1} P_{n-1} + \mu_{n+1} P_{n+1}$, for $n \geq 1$.

The first equation gives us $\displaystyle P_1 = \frac{\lambda_0}{\mu_1} P_0$.  Substituting into the second equation when $n =1$ yields

$\displaystyle P_2 = \frac{(\lambda_1 + \mu_1)P_1 - \lambda_0 P_0}{\mu_2} = \frac{(\lambda_1 + \mu_1)P_1 - \mu_1 P_1}{\mu_2} = \frac{\lambda_1}{\mu_2}P_1 = \frac{\lambda_1 \lambda_0}{\mu_2 \mu_1} P_0$.

This suggests that we might have $\displaystyle P_n = \frac{\lambda_{n-1}}{\mu_n} P_{n-1}$ for $n \geq 1$.  Assuming this, we have

$\displaystyle P_{n+1} = \frac{(\lambda_n + \mu_n)P_n - \lambda_{n-1} P_{n-1}}{\mu_{n+1}} = \frac{(\lambda_n + \mu_n)P_n - \mu_n P_n}{\mu_{n+1}} = \frac{\lambda_n}{\mu_{n+1}} P_n.$

Therefore, $\displaystyle P_n = \frac{\prod_{i=0}^{n-1} \lambda_i}{\prod_{i=1}^n \mu_i} P_0.$

Assuming that the birth and death rates are such that we actually have well-defined steady state probabilities, $\displaystyle \sum_{n=0}^{\infty} P_n = 1$, and so

$\displaystyle \sum_{n=0}^{\infty} \frac{\prod_{i=0}^{n-1} \lambda_i}{\prod_{i=1}^n \mu_i} P_0 = 1$,

which implies

$\displaystyle P_0 = \left( \sum_{n=0}^{\infty} \frac{\prod_{i=0}^{n-1} \lambda_i}{\prod_{i=1}^n \mu_i} \right)^{-1}$.

In general, then,

$\displaystyle P_n = \left(\prod_{i=1}^n \frac{\lambda_{i-1}}{\mu_i} \right) \left(\sum_{n=0}^{\infty} \frac{\prod_{i=0}^{n-1} \lambda_i}{\prod_{i=1}^n \mu_i} \right)^{-1}$.

Application to the M/M/1 queuing system.

For an M/M/1 queuing system (i.e., a system with Markovian, or exponential, interarrival and service times, plus a single server), we have $\lambda_i = \lambda$ and $\mu_i = \mu$ for each i.  In this case, the formula for $P_0$ simplifies nicely.  We get

$\displaystyle P_0 = \left( \sum_{n=0}^{\infty} \frac{\lambda^n}{\mu^n} \right)^{-1} = \left( \sum_{n=0}^{\infty} \left(\frac{\lambda}{\mu} \right)^n \right)^{-1} = \left( \frac{1}{1 - \lambda/\mu} \right)^{-1} = 1 - \frac{\lambda}{\mu}.$

In the second-to-last step we use the formula for a geometric series.  This step requires $\lambda < \mu$; otherwise, the infinite series fails to converge.  This should make sense: If the arrival rate is larger than the service rate then over time the queue will get longer and longer and so will never approach steady-state.  (It is perhaps harder to see intuitively that we do not get a steady state if the arrival and service rates are equal, but that falls out of the mathematics here.)

In general, then, for the M/M/1 queuing system, the probability that there are exactly n customers in the system in steady-state is given by $\displaystyle P_n = \frac{\lambda^n}{\mu^n} \left(1 - \frac{\lambda}{\mu} \right).$

## The Product and Quotient Rules for Differential Calculus

A couple of weeks ago one of my senior colleagues subbed for me on the day we discussed the product and quotient rules for differential calculus.  Afterwards he told me that he had never seen the way that I introduced these formulas, so I thought I would reproduce the derivations here. They are not original, but I cannot remember now where I first saw them.  The product rule derivation is more standard, I think.  The quotient rule derivation is nice because it avoids the chain rule.

The Product Rule.

The product rule tells us how to evaluate $\dfrac{d}{dx} f(x)g(x)$.

To understand this it helps to visualize what is going on.  You can think of the product as representing the area of a rectangle with sides $f(x)$ and $g(x)$.  If we increase $x$ by $\Delta x$ we are (if f and g are increasing at $x$) making tiny increases in the side lengths of the rectangle.  Effectively, we are adding the three shaded areas (two blue and one gray) to the white rectangle in the picture below.  (Image borrowed from the Wikipedia article on the product rule.)

The blue shaded area on the top is $\Delta f(x) g(x)$, the blue shaded area on the right is $f(x) \Delta g(x)$, and the gray shaded area in the corner is $\Delta f(x) \Delta g(x)$.

With this in mind, let’s take the limit of the difference quotient in order to calculate the derivative. This gives

$\displaystyle \dfrac{d}{dx} f(x)g(x) = \lim_{\Delta x \to 0} \frac{\Delta f(x) g(x) + f(x) \Delta g(x) + \Delta f(x) \Delta g(x)}{\Delta x}$
$\displaystyle = g(x) \lim_{\Delta x \to 0} \frac{\Delta f(x)}{\Delta x} + f(x) \lim_{\Delta x \to 0} \frac{\Delta g(x)}{\Delta x} + \lim_{\Delta x \to 0} \frac{\Delta f(x) \Delta g(x)}{\Delta x}$
$\displaystyle = g(x) f'(x) + f(x) g'(x) + f'(x)(0)$
$= g(x) f'(x) + f(x) g'(x)$, which is the product rule.

(This argument is essentially the one on the Wikipedia page for the product rule.)

The Quotient Rule.

The quotient rule tells us how to evaluate $\dfrac{d}{dx} \dfrac{f(x)}{g(x)}$.

A standard way to derive the quotient rule is to use the product rule together with the chain rule.  However, by the point in the semester at which I want to introduce the quotient rule we generally haven’t seen the chain rule yet.  Here is a derivation that avoids the chain rule.

Let $\displaystyle Q(x) = \frac{f(x)}{g(x)}$.  We want $Q'(x)$.  Algebra gives us $f(x) = Q(x)g(x)$.  Differentiating both sides, we have $f'(x) = g(x) Q'(x) + Q(x)g'(x)$, by the product rule.  Now we solve for $Q'(x)$ to obtain the quotient rule:

$\displaystyle Q'(x) = \dfrac{f'(x) - Q(x) g'(x)}{g(x)} = \dfrac{f'(x) - \frac{f(x)}{g(x)}g'(x)}{g(x)} = \dfrac{\frac{g(x)}{g(x)}f'(x) - \frac{f(x)}{g(x)} g'(x)}{g(x)} = \dfrac{g(x) f'(x) - f(x)g'(x)}{(g(x))^2}.$

## Which is bigger: π^e or e^π?

The question of whether $\pi^e$ or $e^{\pi}$ is larger is a classic one.  Of course, with a calculator it is easy to see what the answer is.  But how would you answer the question without a calculator?

There are lots of interesting ways to tackle the question.  For this (short) post I would like to highlight one I saw on math.SE, given by Aryabhata.  We need the assumptions that $\pi > e$ and $e^x > x+1$ for $x > 0$.  (One way to view the latter claim is that, for $x > 0$, $e^x$ is larger than the first two terms of its Maclaurin series.  Since all the rest of the terms of the Maclaurin series are positive, and the Maclaurin series converges to $e^x$, we get that $e^x > x+1$ for $x > 0$.)

On to the proof.  Since $\pi > e$, $\pi/e - 1 > 0$.  Applying the second assumption, then, we get $\displaystyle e^{\pi/e-1} > \pi/e$.  This means that $\displaystyle e^{\pi/e} > \pi$, which in turn implies that $\displaystyle e^{\pi} > \pi^e$.

## The Expected Maximum of Independent and Identically Distributed Geometric Random Variables

In my last post I derived the formula for the expected maximum of n independent and identically distributed (iid) exponential random variables.  If each random variable has rate parameter $\lambda$, then the expected maximum is $\frac{1}{\lambda} \sum_{i=1}^n \frac{1}{i} = \frac{H_n}{\lambda}$, where $H_n$ is the nth harmonic number.

This post considers the discrete version of the same problem: the expected maximum of n independent and identically distributed geometric random variables.  Unfortunately, there is no known nice, closed-form expression for the expected maximum in the discrete case.  However, the answer for the corresponding exponential random variables turns out to be a good approximation.

Let $X_1, X_2, \ldots X_n$ be n iid geometric(p) random variables.  Let $T_n$ denote the maximum of the $X_i$ variables.  Let $q = 1- p$.  Let the rate parameter for the corresponding exponential distribution be $\lambda = - \log(1-p)$.  Now, we have

$\displaystyle E[T_n] = \sum_{k=0}^{\infty} P(T_n > k) = \sum_{k=0}^{\infty} (1 - P(T_n \leq k))$
$\displaystyle = \sum_{k=0}^{\infty} (1 - P(X_1 \leq k) P(X_2 \leq k) \cdots P(X_n \leq k)) = \sum_{k=0}^{\infty} (1 - (1-q^k)^n),$

as the cumulative distribution function of a geometric(p) random variable is $1 - (1-p)^k$.

By viewing the infinite sum as right- and left-hand Riemann sum approximations of the corresponding integral we obtain

$\displaystyle \int_0^{\infty} (1 - (1 - q^x)^n) dx \leq E[T_n] \leq 1 + \int_0^{\infty} (1 - (1 - q^x)^n) dx.$

Now, let’s take a closer look at the integral that occurs on both sides of this inequality. With the variable switch $u = 1 - q^x$ we have

$\displaystyle \int_0^{\infty} (1 - (1 - q^x)^n) dx = -\frac{1}{\log q} \int_0^1 \frac{1 - u^n}{1-u} du = -\frac{1}{\log q} \int_0^1 \left(1 + u + \cdots + u^{n-1}\right) du$
$\displaystyle = -\frac{1}{\log q} \left(1 + \frac{1}{2} + \cdots + \frac{1}{n}\right) = -\frac{1}{\log q} H_n,$
proving the fairly tight bounds

$\displaystyle \frac{1}{\lambda} H_n \leq E[T_n] \leq 1 + \frac{1}{\lambda} H_n.$

We can obtain a more precise approximation by using the Euler-Maclaurin summation formula for approximating a sum by an integral.  Up to a first-order error term, it says that

$\displaystyle E[T_n] = \sum_{k=0}^{\infty} (1 - (1-q^k)^n) \approx \int_0^{\infty} (1 - (1 - q^x)^n) dx + \frac{1}{2},$
yielding the approximation

$\displaystyle E[T_n] \approx -\frac{1}{\log q} H_n + \frac{1}{2},$

with error term given by
$\displaystyle \int_0^{\infty} n (\log q) q^x (1 - q^x)^{n-1} \left(x - \lfloor x \rfloor - \frac{1}{2}\right) dx.$
One can verify that this is quite small unless n is also small or q is extreme.

See also Bennett Eisenberg’s paper “On the expectation of the maximum of IID geometric random variables,” Statistics and Probability Letters 78 (2008), 135-143.

(I first posted this argument online as my answer to “Expectation of the maximum of IID geometric random variables” on Math Stack Exchange.)

Posted in probability | 2 Comments