An Expected Value Connection Between Order Statistics from a Discrete and a Continuous Distribution

Years ago, in the course of doing some research on another topic, I ran across the following result relating the expected values of the order statistics from a discrete and a continuous distribution.  I found it rather surprising.

Theorem: Fix n, and let 1 \leq i \leq n. Let X_1 < X_2 < \cdots < X_i be the order statistics from a random sample of size i from the continuous uniform [0,n+1] distribution. Let Y_1 < Y_2 < \cdots < Y_i be the order statistics from a random sample of size i, chosen without replacement, from the discrete uniform \{1, \ldots, n\} distribution. Then, for each j, 1 \leq j \leq i, E[X_j] = E[Y_j].

Why would the expected values of the order statistics from a discrete distribution and a continuous distribution with different ranges match up exactly?  Particularly when the values from the discrete distribution are chosen without replacement?  It’s not too hard to prove, using basic order statistic properties, that E[X_j] = j(n+1)/(i+1) and that E[Y_j] = j(n+1)/(i+1) as well.  I didn’t find that very satisfying, though.  It took me a little while, but eventually I found a proof that uses a transformation from one sample to the other.  Here it is.

First, though, I need a lemma (which is fairly intuitive and which I won’t prove):

Lemma: Let X_1 \leq X_2 \leq \cdots \leq X_n be the order statistics from a random sample of size n from a distribution F (continuous or discrete). Let \sigma be a random permutation of the values \{1, \ldots, n\}, where \sigma is independent of the values of X_1, X_2, \ldots, X_n. Then the values X_{\sigma(1)}, X_{\sigma(2)}, \ldots, X_{\sigma(i)} are a random sample of size i from distribution F.

Now the proof.

Proof: Select a random sample of i values from the continuous uniform [0,n+1] distribution in the following manner: Randomly select n values from [0,n+1] and order them W_1 < W_2 < \cdots < W_n. Then randomly select i of these n values independently of the values of the W_j‘s. By the lemma, this produces a random sample of i values. In addition, the previous step is equivalent to randomly selecting i of the n indices; i.e., selecting i values without replacement from the discrete uniform \{1, \ldots, n\} distribution. Thus we have, for each k \in \{1, \ldots, n\}, P(X_j = W_k) = P(Y_j = k). In addition, E[W_k] = k(n+1)/(n+1) = k, since W_k is the k^{th} smallest value of a random sample of size n from the continuous uniform [0,n+1] distribution. Therefore, by the law of total expectation,

\displaystyle E[X_j] =  \sum_{k=1}^n E[X_j | X_j = W_k] \; P(X_j = W_k) = \sum_{k=1}^n E[W_k] \; P(Y_j = k) \\ = \sum_{k=1}^n k \; P(Y_j = k) = E[Y_j].

Advertisements
Posted in order statistics, probability, statistics | Leave a comment

Two Methods for Proving an Alternating Binomial Identity

Recently I gave an optional homework problem in which I asked students to prove the following binomial identity:

\displaystyle \sum_{k=0}^n \binom{n}{k} \frac{2^k}{k+1} (-1)^k = \frac{1}{n+1} [ n \text{ is even}].

(Here, I’m using the Iverson bracket notation [P] in which [P] = 1 if P is true and [P] =  0 if P is false.)

I intended for students to use the following approach.

Method 1.

First, prove the identity

\displaystyle \sum_{k=0}^n \binom{n}{2k}\frac{1}{2k+1} = \frac{2^n}{n+1}.

This was an earlier problem in the exercises, so the students could have just cited it.  But let’s prove it with the absorption identity, \displaystyle \binom{n}{k} \frac{n+1}{k+1} = \binom{n+1}{k+1}.

Rewriting the absorption identity, we have \displaystyle \binom{n}{2k} \frac{n+1}{2k+1} = \binom{n+1}{2k+1}.  Then

\displaystyle \sum_{k=0}^n \binom{n}{2k}\frac{1}{2k+1} = \frac{1}{n+1} \sum_{k=0}^n \binom{n+1}{2k+1} = \frac{2^{n+1}}{2(n+1)} = \frac{2^n}{n+1}.

(Here, the second summation is just adding up the odd-index numbers in row n+1 of Pascal’s triangle.  This is known to be half the value of the sum of the numbers in row n+1; i.e., half of 2^{n+1}.)

Now that we’ve proved this identity, let’s rewrite it as

\displaystyle \sum_{k=0}^n \binom{n}{k} \frac{(-1)^k}{k+1} [k \text{ is even}] = \frac{2^n}{n+1}.

(Notice how the left side is the same as the left side in the identity we just proved because all the odd-index terms are zero.)  Now we can apply binomial inversion to get

\displaystyle \sum_{k=0}^n \binom{n}{k} \frac{2^k}{k+1} (-1)^k = \frac{1}{n+1} [n \text{ is even}].

This was a fairly hard problem to do this way in the sense that it’s likely not clear by looking at the first identity we proved that you can rewrite it in such a way that binomial inversion can be used.  However, my student Jiawen (Christine) Li found an easier way to prove the original identity by using the absorption property directly.  Christine’s proof comprises Method 2.

Method 2.

Applying the absorption identity, we have

\displaystyle \sum_{k=0}^n \binom{n}{k} \frac{2^k}{k+1} (-1)^k = \frac{1}{n+1} \sum_{k=0}^n \binom{n+1}{k+1} (-2)^k = \frac{1}{n+1} \sum_{k=1}^{n+1} \binom{n+1}{k} (-2)^{k-1} = \frac{-1}{2(n+1)} \sum_{k=1}^{n+1} \binom{n+1}{k} (-2)^k = \frac{-1}{2(n+1)} \left( \sum_{k=0}^n \binom{n+1}{k} (-2)^k - 1 \right) = \frac{-1}{2(n+1)} \left( (-1)^{n+1} - 1\right) = \frac{1}{2(n+1)} \left( (-1)^n + 1 \right) = \frac{2}{2(n+1)} [n \text{ is even}] = \frac{1}{n+1} [ n \text{ is even}].

For more examples with the absorption identity, see this post.

Posted in binomial coefficients, combinatorics | Leave a comment

A Bonus Question on Convergent Series

Occasionally when teaching the sequences and series material in second-semester calculus I’ve included the following question as a bonus:

Question: Suppose \displaystyle \sum_{n=1}^{\infty} a_n is absolutely convergent.  Does that imply anything about the convergence of \displaystyle \sum_{n=1}^{\infty} a^2_n?

The answer is that \displaystyle \sum_{n=1}^{\infty}a^2_n converges.  I’m going to give two proofs.  The first is the more straightforward approach, but it is somewhat longer.  The second is clever and shorter.

First method: If \displaystyle \sum_{n=1}^{\infty} a_n is absolutely convergent, then, by the divergence test, \displaystyle \lim_{n \to \infty} |a_n| = 0.  Thus there exists some N > 0 such that if n \geq N then |a_n| \leq 1.  This means that, for n \geq N, a^2_n \leq |a_n|.  By the direct comparison test, then, \displaystyle \sum_{n=1}^{\infty} a^2_n converges.

Second method: As we argued above, \displaystyle \lim_{n \to \infty} |a_n| = 0.  Thus \displaystyle \lim_{n \to \infty} \frac{a^2_n}{|a_n|} = \lim_{n \to \infty} |a_n| = 0.  By the limit comparison test, then, \sum_{n=1}^{\infty} a^2_n converges.

(The first method is David Mitra’s answer and the second method is my answer to this Math.SE question.)

Posted in calculus, sequences and series | 6 Comments

My Experiences on a Post-Election Panel

No mathematics post this month.  Instead, I’m just going to link to an article I published last week in Inside Higher Ed.  In this article I describe my experiences as the conservative voice on a panel held on my campus on November 9.  This was the day after the U.S. Presidential election, and the purpose of the panel was to help the mostly progressive campus make sense of Donald Trump’s win.

Posted in campus issues, politics | 1 Comment

Relations That Are Symmetric and Antisymmetric

When I teach relations and their properties, the question of whether a relation can be both symmetric and antisymmetric always seems to arise.  This post addresses that question.

First, a reminder of the definitions here: A relation \rho on a set S is symmetric if, for all x,y \in S, (x,y) \in \rho implies that (y,x) \in \rho.  A relation \rho on a set S is antisymmetric if, for all x,y \in S, (x,y) \in \rho and (y,x) \in \rho implies that x = y.

At first glance the definitions look a bit strange, in the sense that we would expect antisymmetric to mean “not symmetric.”  But that’s not quite what antisymmetric means. In fact, it is possible for a relation to be both symmetric and antisymmetric.

The boring (trivial) example is the empty relation \rho on a set.  In this case, the antecedents in the definitions of symmetric ((x,y) \in \rho) and antisymmetric ((x,y) \in \rho and (y,x) \in \rho) are never satisfied.  Thus the conditional statements in the definitions of the two properties are vacuously true, and so the empty relation is both symmetric and antisymmetric.

More interestingly, though, there are nontrivial examples.  Let’s think through what those might be. Suppose (x,y) \in \rho.  If \rho is symmetric, then we require (y,x) \in \rho as well.  If \rho is antisymmetric, then, since (x,y) \in \rho and (y,x) \in \rho, we must have x = y.  And this gives us a characterization of relations that are both symmetric and antisymmetric:

If a relation \rho on a set S is both symmetric and antisymmetric, then the only ordered pairs in \rho are of the form (x,x), where x \in S.

Thus, for example, the equality relation on the set of real numbers is both symmetric and antisymmetric.

Perhaps it’s clearer now the sense in which antisymmetric is opposing symmetric.  Symmetric means that there cannot be one-way relationships between two different elements.  Antisymmetric means that there cannot be two-way relationships between two different elements.  Both definitions allow for a relationship between an element and itself.

Posted in relations | Leave a comment

How Do You Tell Whether Two Lines Intersect in 3D?

How do you tell whether two three-dimensional lines intersect?  My colleague Greg Elliott in physics asked me this recently.  He understood how the usual method works, but he wanted to know whether there was a more clever method.  In this post I’ll discuss the usual method and the alternative method I first suggested and that we refined.  (I’ll leave it to the reader to determine whether the second method is “more clever.”)  This post also assumes that you know that the lines aren’t parallel.

To see how the methods work, let’s look at an example.  Do the lines (t, -2+3t, 4-t) and (2s, 3+s, -3+4s) intersect?

Method 1:  The usual method is to set the components equal to each other and see whether there is a solution.  This gives the system

\displaystyle \begin{aligned} t &= 2s \\ -2+3t &= 3+s \\ 4-t &= -3+4s \end{aligned}

Substituting the first equation into the second yields -2+3(2s) = 3+s \implies s = 1.  Substituting the first equation into the third yields 4-(2s) = -3+4s \implies s = 7/6.  Since s cannot take on both values, the lines do not intersect.  (If we had gotten the same value for s then the lines would intersect.)

Method 2: Two lines intersect if they lie in the same plane.  Thus, first construct the plane defined by the first line and the direction vector for the second line.  Then see if the second line lies in this plane.  To construct the plane, take the direction vectors for the two lines: {\bf u} = \langle 1, 3, -1 \rangle and {\bf v} = \langle 2, 1, 4 \rangle.  Then cross them to get the vector {\bf w} = \langle 13, -6, -5 \rangle.  A plane needs a point and a normal vector.  Take any point from the first line; the simplest is probably (0, -2, 4).  The equation for the plane defined by the first line and v is thus 13x - 6(y+2) - 5(z-4) = 0.  Now, if the second line lies in this plane then any point from that line will satisfy the plane equation.  Substituting in the point (0, 3, -3) from the second line yields 0 - 6(3+2) - 5(-3-4) = -30+35 = 5 \neq 0.  Thus the second line lies in a different plane, and the two lines do not intersect.

(Formula for Method 2).  Generalizing from this example, we can give an explicit condition in terms of a formula.   A line in 3D is defined by a point p and a director vector u.  Let the second line be defined by point q and direction vector v.  Take the two direction vectors u and v and cross them to get vector {\bf u} \times {\bf v}.  Then take the displacement vector {\bf p} - {\bf q} between the two points.  Since {\bf u} \times {\bf v} is orthogonal to the plane defined by the direction vectors, the only way the two lines lie in the same plane is if {\bf u} \times {\bf v} is orthogonal to the displacement vector {\bf p} - {\bf q}.  In other words, the two lines lie in the same plane if and only if ({\bf u} \times {\bf v}) \cdot ({\bf p} - {\bf q}) = 0.

Posted in analytic geometry, vectors | Leave a comment

Cassini’s Identity without Matrix Powers

Cassini’s identity for Fibonacci numbers says that F_{n+1} F_{n-1} - F_n^2 = (-1)^n.  The classic proof of this shows (by induction) that \begin{bmatrix} F_{n+1} & F_n \\ F_n & F_{n-1} \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}^n.  Since \det \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} = -1, Cassini’s identity follows.

In this post I’m going to give a different proof involving determinants, but one that does not use powers of the Fibonacci matrix.

Let’s start with the 2 \times 2 identity matrix, which we’ll call A_0:

\displaystyle A_0 = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.

To construct A_1, add the second row to the first and then swap the two rows.  This gives us

\displaystyle A_1 = \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}.

Continue this process of adding the second row to the first and then swapping the two rows.  This yields

\displaystyle A_2 = \begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix}, A_3 = \begin{bmatrix} 1 & 2 \\ 2 & 3 \end{bmatrix}, A_4 = \begin{bmatrix} 2 & 3 \\ 3 & 5 \end{bmatrix}, \ldots .

Since A_1 = \begin{bmatrix} F_0 & F_1 \\ F_1 & F_2 \end{bmatrix} and F_{n+1} = F_n + F_{n-1}, the fact that we’re adding rows each time means that A_n = \begin{bmatrix} F_{n-1} & F_n \\ F_n & F_{n+1} \end{bmatrix}.

Since adding a row to another row doesn’t change the determinant, and swapping two rows changes only the sign of the determinant, we have

\displaystyle F_{n+1} F_{n-1} - F_n^2 = \det A_n = (-1)^n \det A_0 = (-1)^n,

which is Cassini’s identity.

See also my paper “Fibonacci Identities via the Determinant Sum Property.”

Posted in Fibonacci sequence, matrices | Leave a comment