The Validity of Mathematical Induction

Suppose you have some statement $S(k)$.  Mathematical induction says that the following is sufficient to prove that $S(k)$ is true for all natural numbers k.

1. $S(1)$ is true.
2. For any natural number k, if $S(k)$ is true, then $S(k+1)$ is true.

The idea is that the first two statements imply that $S(2)$ is true.  Then the truth of $S(2)$ together with the second statement implies that $S(3)$ is true.  Then the truth of $S(3)$ together with the second statement implies that $S(4)$ is true.  One can continue with this process, showing that $S(k)$ is true for all natural numbers k.

While this is the idea, the formal proof that mathematical induction is a valid proof technique tends to rely on the well-ordering principle of the natural numbers; namely, that every nonempty set of positive integers contains a least element.  See, for example, here.

However, if you dig into this a little further you eventually realize that mathematical induction is tied up with the very definition of the natural numbers.  See, for example, here.  The rest of this post is a proof of the validity of mathematical induction that uses that idea explicitly.  (It’s taken from Fitzpatrick’s Advanced Calculus, second edition, pages 5-6.)

First, the following definition: A set S of real numbers is said to be inductive provided that (1) $1 \in S$ and (2) if $x \in S$ then $x+1 \in S$.

There are lots of inductive sets: the reals, the rationals, and the integers, for example.

Next, we define the natural numbers: The set of natural numbers $\mathbb{N}$ is the intersection of all inductive subsets of the real numbers.

Now we’re ready to prove the validity of mathematical induction.  Let $A = \{ k \in \mathbb{N} | S(k) \text{ is true}\}$.  Since $S(1)$ is true, $1 \in A$.  Since the truth of $S(k)$ for some natural number $k$ implies the truth of $S(k+1)$ for the natural number $k+1$, we have that if $k \in S$ then $k+1 \in S$.  Thus the set A is an inductive set.  By definition of $\mathbb{N}$, this means $\mathbb{N} \subseteq A$.  However, the definition of A means that $A \subseteq \mathbb{N}$.  Thus $A = \mathbb{N}$.  Therefore, $S(k)$ is true for all natural numbers.

The Divergence of the Harmonic Series

The fact that the harmonic series, $1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + \cdots$, diverges has been known since the time of Nicole Oresme in the 14th century, but this fact is still somewhat surprising from a numerical standpoint.  After all, each successive term only adds a small amount to the total.  For example, if you add up the first 100 terms (i.e., $1 + 1/2 + \cdots + 1/100$) you only get a value of about 5.2.  Adding up the first 1000 terms only gives you a value of about 7.5.  Adding up the first 10,000 terms only gives you a value of about 9.8.

In fact, suppose you added up the first quintillion terms in the sum.  That’s $10^{18}$ terms, or 1,000,000,000,000,000,000 terms.  A quintillion is such a large number that if you traveled a quintillion centimeters you would be in the next solar system.  Yet adding up the first quintillion terms in the harmonic series would only give you a value of about 42.

The harmonic series grows very, very slowly.

Yet it does eventually get arbitrarily large.  To put that more precisely, no matter how large you posit a real number S there will always be some (probably quite large) integer N such that $1 + 1/2 + \cdots + 1/N > S$.

Let’s prove that the harmonic series diverges.  The proof is by contradiction.  First, suppose that the harmonic series converges to some number S, so that

$\displaystyle S = 1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5} + \frac{1}{6} + \frac{1}{7} + \frac{1}{8} + \cdots.$

Let’s pair up the fractions on the right side, like so:

$\displaystyle S = \left(1 + \frac{1}{2}\right) + \left(\frac{1}{3} + \frac{1}{4}\right) + \left(\frac{1}{5} + \frac{1}{6}\right) + \left(\frac{1}{7} + \frac{1}{8}\right) + \cdots.$

Within each pair of parentheses, there’s a larger number and a smaller number.  If we replace the larger number in each pair with the smaller number, the value on the right must be smaller than it was before.  That leaves us

$\displaystyle S > \left(\frac{1}{2} + \frac{1}{2}\right) + \left(\frac{1}{4} + \frac{1}{4}\right) + \left(\frac{1}{6} + \frac{1}{6}\right) + \left(\frac{1}{8} + \frac{1}{8}\right) + \cdots \\ = 1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \cdots \\ = S.$

This gives us $S > S$,  a logical contradiction.  Thus our assumption that the harmonic series converges must be false.  Therefore, the harmonic series diverges.

There are a lot of proofs that the harmonic series diverges.  For several of these, see Kifowit and Stamps [1].  (The proof I’ve given here is Proof 6 in their paper.)

Reference

1.  Steven J. Kifowit and Terra A. Stamps, “The Harmonic Series Diverges Again and Again,” AMATYC Review 27 (2), pp. 31-43, Spring 2006.

The Secretary Problem With the Two Best

The secretary problem is the following: Suppose a manager wants to hire the best person for his secretary out of a group of n candidates.  He interviews the candidates one by one.  After interviewing a particular candidate he must either (1) hire the candidate he just interviewed, rejecting all other candidates, or (2) reject the current candidate and interview the next candidate in the list.  (A manager cannot change his mind and hire a previously-rejected candidate.)

What’s the best strategy for the manager?  The best strategy turns out to be to reject the first $n/e$ candidates (in other words, reject about the first 37% of candidates) and then hire the next one who is better than all candidates seen thus far.  Somewhat surprisingly, this strategy gives the manager a probability of $1/e$ (about 37%) of hiring the best candidate out of the entire group!

But what if the manager is a little more flexible and changes his goal to hiring one of the two best candidates?  The analysis is a little more complicated, so let’s simplify things somewhat by assuming that he has 100 candidates to interview.  In this case, using a similar strategy, the manager can get the probability of hiring one of the two best candidates above 50%.  The rest of this post contains the derivation of this fact.

Suppose the manager’s strategy is (like in the original version of the secretary problem) to reject the first k candidates and then hire the next candidate who is better than all candidates seen thus far.  Let’s condition on the absolute ranking of the best candidate among this first k.

If the #1 candidate is the best among the first k, the manager will never see a better candidate and thus will reject all the candidates.  The probability of success here is 0.

If the #2 candidate is the best among the first k, the next candidate that the manager sees who is better than any of the first k must be the #1 candidate.  The manager will end up hiring the #1 person.  This scenario happens with probability $\frac{k}{100} \cdot \frac{100-k}{99}$, as the probability that the #2 candidate falls in the first k is $\frac{k}{100}$, and, given this, the probability that the #1 candidate falls in the last $100-k$ is $\frac{100-k}{99}$.

If the #3 candidate is the best among the first k, the next candidate that the manager sees who is better than any of the first k must be the #1 or the #2 candidate.  The manager will thus end up hiring one of the two best people.  This scenario happens with probability $\frac{k}{100} \cdot \frac{100-k}{99} \cdot \frac{99-k}{98}$.  To see this, the probability that the #3 candidate falls in the first k is $\frac{k}{100}$.  Given this, the probability that the #1 candidate does not fall in the first k is $\frac{100-k}{99}$, and given all of that, the probability that the #2 candidate does not fall in the first k is $\frac{99-k}{98}$.

If the #4 candidate is the best among the first k, then the manager is not guaranteed to end up hiring one of the two best candidates.  There’s a 2/3 chance that the manager interviews the #1 or #2 candidate before the #3 candidate after rejecting the first k.  The manager’s probability of success is thus $\frac{2}{3} \cdot \frac{k}{100} \cdot \frac{100-k}{99} \cdot \frac{99-k}{98} \cdot \frac{98-k}{97}$.

Let’s jump to the general case.  If the #j candidate is the best among the first k, then the manager will end up hiring one of the two best candidates with probability $\frac{2}{j-1}$.  The manager’s probability of success is thus $\frac{2}{j-1} \cdot \frac{k}{100} \cdot \frac{100-k}{99} \cdot \frac{99-k}{98} \cdots \frac{100-k-(j-2)}{100-(j-1)}$.

The largest possible value for j is $100 - k + 1$.

Putting this all together, then, the probability that the manager hires one of the two best candidates using the strategy of rejecting the first k and then hiring the next candidate who is better than all seen before is

$\displaystyle \frac{k}{100} \cdot \frac{100-k}{99} + \sum_{j=3}^{100-k+1} \frac{k}{100} \cdot \frac{(100-k)(99-k) \cdots (100-k-(j-2))}{100 \cdot 99 \cdot 98 \cdots (100-(j-1))}$.

The value of this expression for different values of k is given below.  As you can see, the optimal strategy of this type is for the manager to reject the first 30 candidates outright and then take the best candidate he next sees.  Doing so gives him a greater than 50% probability of selecting one of the two best candidates.  Any value of k from 22 to 39, though, also gives him a better than 50% probability.  And even taking k to be anything from 12 to 56 gives him a better than 40% change of hiring one of the two best candidates.  This type of strategy is thus rather robust.

k    Probability
1     0.0935476
2     0.147297
3     0.191249
4     0.228736
5     0.261425
6     0.290316
7     0.316075
8     0.33918
9     0.359986
10   0.378773
11   0.395761
12   0.411133
13   0.425041
14   0.437612
15   0.448957
16   0.45917
17   0.468335
18   0.476526
19   0.483808
20   0.490239
21   0.495872
22   0.500755
23   0.504931
24   0.508439
25   0.511316
26   0.513595
27   0.515306
28   0.516479
29   0.51714
30   0.517313
31   0.517021
32   0.516287
33   0.515129
34   0.513567
35   0.511619
36   0.509302
37   0.506631
38   0.503622
39   0.500288
40   0.496643
41   0.492701
42   0.488473
43   0.48397
44   0.479205
45   0.474186
46   0.468926
47   0.463433
48   0.457716
49   0.451784
50   0.445647
51   0.439311
52   0.432786
53   0.426077
54   0.419194
55   0.412142
56   0.404928
57   0.39756
58   0.390042
59   0.382382
60   0.374584
61   0.366656
62   0.358601
63   0.350426
64   0.342136
65   0.333735
66   0.325228
67   0.31662
68   0.307916
69   0.29912
70   0.290236
71   0.281268
72   0.272221
73   0.263097
74   0.253902
75   0.244639
76   0.235311
77   0.225922
78   0.216475
79   0.206973
80   0.197421
81   0.187821
82   0.178175
83   0.168488
84   0.158762
85   0.149
86   0.139204
87   0.129378
88   0.119524
89   0.109645
90   0.0997434
91   0.0898213
92   0.0798815
93   0.0699263
94   0.0599581
95   0.0499792
96   0.0399917
97   0.0299979
98   0.02
99   0.01
100 0.0

Some Thoughts on Teaching Math for Social Justice

Short version: I’m not in favor of it.  For a teacher to use their position of power to push their political and social views on their students is an abuse of that power.

Long version: See the rest of this post.

Recently someone posted some “Teaching Math for Social Justice” presentation slides on a math mailing list I’m on.  This got me to thinking more about this topic and why it has always bothered me.  What follows isn’t an argument so much as it is a collection of thoughts.

The term “social justice” is a bit slippery.  What does it mean, actually?  It sounds like it’s a subset of justice, but that’s not how it’s used in practice.  In practice, “social justice” seems to me to be pretty much identical to “a progressive’s view of justice.”  It’s one of those political buzzwords that sounds like every decently moral person ought to support, but when you dig down to how it’s used and what it means in practice, you find it always promoting one side of the political spectrum’s vision of how the world should be.  (The closest analogy on the right to “social justice” that I can think of is “family values.”)

So, to me, someone advocating “teaching math for social justice” is saying that they are going to use their mathematics classes to push their political and social views on their students.  That has no place in a classroom.  It’s an abuse of power.  It’s certainly not just.

The usual rationalization that I’ve heard (and it’s one that shows up in the slides) is something along the lines of “Teaching is a political act.  It can’t be neutral.  Since it can’t be neutral I’m just going to use it to advocate for my own beliefs.”  (Well, the conclusion is never stated that baldly, but that’s what it always amounts to.)  This argument has always struck me as self-serving and a cop-out.  There are lots of ideals that are almost certainly unattainable – a perfectly just society, a completely neutral press, an end to crime and poverty and all of our other social ills, for example.  Does that mean that we should give up – stop trying for a more just society, descend into purely partisan journalism, quit punishing crime and working to alleviate poverty?  Absolutely not.  Neither should we stop trying to keep our teaching focused on the subject matter, remaining neutral with respect to topics and subjects that do not directly pertain to that subject matter.

Another justification is that using political and social examples in a math class can be effective teaching tools.  I agree with this, but it has to be done carefully.  I also don’t think the teaching-math-for-social-justice advocates would be that interested in, say, the most famous use of Simpson’s paradox (a phenomenon often discussed in introductory statistics): showing that a claim of bias against women in graduate admissions at the University of California-Berkeley was completely unfounded.  When they advocate for the use of political and social examples in math, they mean examples that support their politics.

I think some of the motivation for teaching math for social justice comes from the fact that teaching math is hard.  It can be difficult for some people to get fired up about continuing to find ways to lead students through concepts that are often complex and abstract.  It’s a whole lot easier to get fired up about advocating for a worldview that you care deeply about, slowly substituting advocacy for the subject you’re supposed to be teaching.  This is a temptation that all teachers must resist.  You were hired to teach a particular subject, and your students are expecting you to teach that subject.  The same argument applies to fields other than math, some of which (it seems to me) have been partially or even mostly taken over by the social justice folks.  Please let’s keep math as free as possible from the politicization that dominates so many other academic fields.  Let’s keep social justice out of the teaching of mathematics.

If you’re a teacher (as the title of Stanley Fish’s book goes), save the world on your own time.  Leave your advocacy at the classroom door.

Pascal Matrices and Binomial Inversion

In this post we’ll look at the relationship between a Pascal matrix, its inverse, and binomial inversion.  It turns out that these are the same concepts viewed from two different angles.

The Pascal matrix $P_m$ is the $(m+1) \times (m+1)$ matrix containing Pascal’s triangle through row m.  For example,

$\displaystyle P_3 = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 2 & 1 & 0 \\ 1 & 3 & 3 & 1 \end{bmatrix}.$

The inverse of a Pascal matrix turns out to be a slight variation on that Pascal matrix – one in which some of the entries are negative.  For example,

$\displaystyle P_3^{-1} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ -1 & 1 & 0 & 0 \\ 1 & -2 & 1 & 0 \\ -1 & 3 & -3 & 1 \end{bmatrix}.$

More precisely, $P_m$ is the $(m+1) \times (m+1)$ whose $(i,j)$ entry (starting with row 0 and column 0) is $\binom{i}{j}$, and $P_m^{-1}$ is the $(m+1) \times (m+1)$ matrix whose $(i,j)$ entry (again starting with row 0 and column 0) is $\binom{i}{j}(-1)^{i-j}$.

Binomial inversion is the following property: Given two sequences $\{a_n\}, \{b_n\}$, we have

$\displaystyle b_n = \sum_{k=0}^n \binom{n}{k} a_k \iff a_n = \sum_{k=0}^n \binom{n}{k} b_k (-1)^{n-k}$.

Binomial inversion is useful for proving sum identities involving the binomial coefficients: If you can prove one such identity, you get a second identity immediately from binomial inversion.

Let’s look at the connection between Pascal matrices and binomial inversion now.  Suppose you take a sequence $\{a_n\}$ and turn its first $m+1$ entries into a column vector ${\bf a}_m$.  For example,

$\displaystyle {\bf a}_3 = \begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \end{bmatrix}$.

Now, multiply ${\bf a}_m$ by $P_m$.  This produces another vector that we can call ${\bf b}_m$.  In other words, $P_m {\bf a}_m = {\bf b}_m$.  For example,

$\displaystyle P_3 {\bf a}_3 = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 1& 1 & 0 & 0 \\ 1 & 2 & 1 & 0 \\ 1 & 3 & 3 & 1 \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \end{bmatrix} = \begin{bmatrix} b_0 \\ b_1 \\ b_2 \\ b_3 \end{bmatrix} = {\bf b}_3$.

If we multiply both sides of this equation by $P_m^{-1}$, we get ${\bf a}_m = P^{-1}_m {\bf b}_m$.  For example,

$\displaystyle {\bf a}_3 = \begin{bmatrix} a_0 \\ a_1 \\ a_2 \\ a_3 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ -1& 1 & 0 & 0 \\ 1 & -2 & 1 & 0 \\ -1 & 3 & -3 & 1 \end{bmatrix} \begin{bmatrix} b_0 \\ b_1 \\ b_2 \\ b_3 \end{bmatrix} = P_3^{-1} {\bf b}_3$.

Let’s take a closer look at how ${\bf b}_m$ is calculated.  Entry n$b_n$, is the inner product of row n of $P_m$ and the ${\bf a}_m$ vector.  In other words,

$\displaystyle b_n = \sum_{k=0}^n \binom{n}{k} a_k$.

In addition, if we look at how entry n is calculated in ${\bf a}_m$ via the ${\bf a}_m = P^{-1}_m {\bf b}_m$ equation, we have that $a_n$ is the inner product of row n of $P_m^{-1}$ and the ${\bf b}_m$ vector.  In other words,

$\displaystyle a_n = \sum_{k=0}^n \binom{n}{k} (-1)^{n-k} b_k$.

Thus the relationship between the $\{a_n\}$ and $\{b_n\}$ sequences expressed via the Pascal matrix and its inverse is that

$\displaystyle b_n = \sum_{k=0}^n \binom{n}{k} a_k \iff a_n = \sum_{k=0}^n \binom{n}{k} (-1)^{n-k} b_k$.

This is exactly binomial inversion.

Thus binomial inversion is just expressing what the inverse of a Pascal matrix is, and knowledge of the inverse of a Pascal matrix gives you the binomial inversion formula.

Posted in binomial coefficients, matrices | 2 Comments

Counting Poker Hands

For this post I’m going to go through a classic exercise in combinatorics and probability; namely, proving that the standard ranking of poker hands is correct.

First, here are the standard poker hands, in ranked order.

1. Straight flush: Five cards of the same suit, in sequential numerical order.
2. Four-of-a-kind: Four cards of the same denomination and one card of a different denomination.
3. Full house: Three cards of one denomination and two cards of the same different denomination.
4. Flush: Five cards of the same suit.
5. Straight: Five cards in sequential numerical order.
6. Three-of-a-kind: Three cards of one denomination and two cards of two other, different, denominations.
7. Two pair: Two cards of one denomination, two cards of a second denomination, and one card of a third denomination.
8. One pair: Two cards of one denomination and three cards of three other denominations.
9. High card: None of the above.

Let’s count the number of ways each of these hands can occur.

First, there are $\binom{52}{5} = 2,598,960$ possible poker hands, as we’re choosing five cards from fifty-two.

1. Straight flush:
1. Choose the suit, in $\binom{4}{1} = 4$ ways.
2. Then choose the denomination of the smallest card in the straight, in $\binom{10}{1} = 10$ ways.
3. Once you choose the denomination of the smallest card in the straight, the rest of the denominations in the straight are determined.
4. Thus there are 4(10) = 40 straight flushes.
5. Probability: 40/2,598,960 = 0.0000154.
2. Four-of-a-kind:
1. Choose the denomination, in $\binom{13}{1} = 13$ ways.
2. Then choose the four cards in that denomination, in $\binom{4}{4} = 1$ way.
3. Finally, choose the off card, in $\binom{48}{1} = 48$ ways.
4. Thus there are 13(1)(48) = 624 four-of-a-kind hands.
5. Probability: 624/2,598,960 = 0.000240.
3. Full house:
1. Choose the denomination for three cards, in $\binom{13}{1} = 13$ ways.
2. Next choose three of four cards in that denomination in $\binom{4}{3} = 4$ ways.
3. Then choose the denomination for two cards in $\binom{12}{1} = 12$ ways.
4. Finally, choose two of four cards in the second denomination in $\binom{4}{2} = 6$ ways.
5. Thus there are 13(4)(12)(6) = 3744 full house hands.
6. Probability: 3744/2,598,960 = 0.00144.
4. Flush:
1. Choose the suit in $\binom{4}{1} = 4$ ways.
2. Then choose five cards from that suit in $\binom{13}{5} = 1287$ ways.
3. This gives 4(1287) = 5148.
4. However, we’ve included the straight flushes in this count, so we must subtract them off to get $5148 - 40 = 5048$ flush hands.
5. Probability: 5048/2,598,960 = 0.00194.
5. Straight:
1. Choose the denomination of the smallest card in the straight, in $\binom{10}{1} = 10$ ways.
2. This fixes the denominations in the straight.
3. Next, for each of the five denominations in the straight, we have four choices for the suit, giving 4(4)(4)(4)(4) = 1024 choices for the suits.
4. This gives 10(1024) = 10,240.
5. However, we’ve included the straight flushes in this count, so we must subtract them off to get $10,240 - 40 = 10,200$ straight hands.
6. Probability: 10,200/2,598,960 = 0.00392.
6. Three-of-a-kind:
1. Choose the denomination for the three-of-a-kind in $\binom{13}{1} = 13$ ways.
2. Next, choose three cards from that denomination in $\binom{4}{3} = 4$ ways.
3. Then choose two different denominations from the remaining 12 in $\binom{12}{2} = 66$ ways.
4. Then select one card from the larger of the two off-denominations in $\binom{4}{1} = 4$ ways.
5. Finally, select one card from the smaller of the two off-denominations in $\binom{4}{1} = 4$ ways.
6. All together, there are 13(4)(66)(4)(4) = 54,912 three-of-a-kind hands.
7. Probability: 54,912/2,598,960 = 0.021.
7. Two pair:
1. Choose the two denominations for the pairs in $\binom{13}{2} = 78$ ways.
2. Choose two cards from the larger of the chosen denominations in $\binom{4}{2} = 6$ ways.
3. Choose two cards from the smaller of the chosen denominations in $\binom{4}{2} = 6$ ways.
4. Choose the fifth card from the 44 cards not in one of the two chosen denominations in $\binom{44}{1} = 44$ ways.
5. Thus there are 78(6)(6)(44) = 123,552 two-pair hands.
6. Probability: 123,552/2,598,960 = 0.0475.
8. One pair:
1. Choose the denomination for the pair in $\binom{13}{1} = 13$ ways.
2. Choose two cards from that denomination in $\binom{4}{2} = 6$ ways.
3. Choose three different denominations for the remaining three cards in $\binom{12}{3} = 220$ ways.
4. Choose one card from each of those three different denominations in $\binom{4}{1} \binom{4}{1} \binom{4}{1} = 4(4)(4) = 64$ ways.
5. Thus there are 13(6)(220)(64) = 1,098,240 one-pair hands.
6. Probability: 1,098,240/2,598,960 = 0.423.
9. High card:
1. The sum of all the previous counts is 1,296,420.
2. Thus there are $2,598,960 - 1,296,420 = 1,302,540$ high card hands.
3. Probability: 1,302,540/2,598,960 = 0.501.

An Expected Value Connection Between Order Statistics from a Discrete and a Continuous Distribution

Years ago, in the course of doing some research on another topic, I ran across the following result relating the expected values of the order statistics from a discrete and a continuous distribution.  I found it rather surprising.

Theorem: Fix n, and let $1 \leq i \leq n$. Let $X_1 < X_2 < \cdots < X_i$ be the order statistics from a random sample of size i from the continuous uniform $[0,n+1]$ distribution. Let $Y_1 < Y_2 < \cdots < Y_i$ be the order statistics from a random sample of size i, chosen without replacement, from the discrete uniform $\{1, \ldots, n\}$ distribution. Then, for each j, $1 \leq j \leq i$, $E[X_j] = E[Y_j]$.

Why would the expected values of the order statistics from a discrete distribution and a continuous distribution with different ranges match up exactly?  Particularly when the values from the discrete distribution are chosen without replacement?  It’s not too hard to prove, using basic order statistic properties, that $E[X_j] = j(n+1)/(i+1)$ and that $E[Y_j] = j(n+1)/(i+1)$ as well.  I didn’t find that very satisfying, though.  It took me a little while, but eventually I found a proof that uses a transformation from one sample to the other.  Here it is.

First, though, I need a lemma (which is fairly intuitive and which I won’t prove):

Lemma: Let $X_1 \leq X_2 \leq \cdots \leq X_n$ be the order statistics from a random sample of size n from a distribution F (continuous or discrete). Let $\sigma$ be a random permutation of the values $\{1, \ldots, n\}$, where $\sigma$ is independent of the values of $X_1, X_2, \ldots, X_n$. Then the values $X_{\sigma(1)}, X_{\sigma(2)}, \ldots, X_{\sigma(i)}$ are a random sample of size i from distribution F.

Now the proof.

Proof: Select a random sample of i values from the continuous uniform $[0,n+1]$ distribution in the following manner: Randomly select n values from $[0,n+1]$ and order them $W_1 < W_2 < \cdots < W_n$. Then randomly select i of these n values independently of the values of the $W_j$‘s. By the lemma, this produces a random sample of i values. In addition, the previous step is equivalent to randomly selecting i of the n indices; i.e., selecting i values without replacement from the discrete uniform $\{1, \ldots, n\}$ distribution. Thus we have, for each $k \in \{1, \ldots, n\}$, $P(X_j = W_k) = P(Y_j = k)$. In addition, $E[W_k] = k(n+1)/(n+1) = k$, since $W_k$ is the $k^{th}$ smallest value of a random sample of size n from the continuous uniform $[0,n+1]$ distribution. Therefore, by the law of total expectation,

$\displaystyle E[X_j] = \sum_{k=1}^n E[X_j | X_j = W_k] \; P(X_j = W_k) = \sum_{k=1}^n E[W_k] \; P(Y_j = k) \\ = \sum_{k=1}^n k \; P(Y_j = k) = E[Y_j].$