An Expected Value Connection Between Order Statistics from a Discrete and a Continuous Distribution

Years ago, in the course of doing some research on another topic, I ran across the following result relating the expected values of the order statistics from a discrete and a continuous distribution.  I found it rather surprising.

Theorem: Fix n, and let 1 \leq i \leq n. Let X_1 < X_2 < \cdots < X_i be the order statistics from a random sample of size i from the continuous uniform [0,n+1] distribution. Let Y_1 < Y_2 < \cdots < Y_i be the order statistics from a random sample of size i, chosen without replacement, from the discrete uniform \{1, \ldots, n\} distribution. Then, for each j, 1 \leq j \leq i, E[X_j] = E[Y_j].

Why would the expected values of the order statistics from a discrete distribution and a continuous distribution with different ranges match up exactly?  Particularly when the values from the discrete distribution are chosen without replacement?  It’s not too hard to prove, using basic order statistic properties, that E[X_j] = j(n+1)/(i+1) and that E[Y_j] = j(n+1)/(i+1) as well.  I didn’t find that very satisfying, though.  It took me a little while, but eventually I found a proof that uses a transformation from one sample to the other.  Here it is.

First, though, I need a lemma (which is fairly intuitive and which I won’t prove):

Lemma: Let X_1 \leq X_2 \leq \cdots \leq X_n be the order statistics from a random sample of size n from a distribution F (continuous or discrete). Let \sigma be a random permutation of the values \{1, \ldots, n\}, where \sigma is independent of the values of X_1, X_2, \ldots, X_n. Then the values X_{\sigma(1)}, X_{\sigma(2)}, \ldots, X_{\sigma(i)} are a random sample of size i from distribution F.

Now the proof.

Proof: Select a random sample of i values from the continuous uniform [0,n+1] distribution in the following manner: Randomly select n values from [0,n+1] and order them W_1 < W_2 < \cdots < W_n. Then randomly select i of these n values independently of the values of the W_j‘s. By the lemma, this produces a random sample of i values. In addition, the previous step is equivalent to randomly selecting i of the n indices; i.e., selecting i values without replacement from the discrete uniform \{1, \ldots, n\} distribution. Thus we have, for each k \in \{1, \ldots, n\}, P(X_j = W_k) = P(Y_j = k). In addition, E[W_k] = k(n+1)/(n+1) = k, since W_k is the k^{th} smallest value of a random sample of size n from the continuous uniform [0,n+1] distribution. Therefore, by the law of total expectation,

\displaystyle E[X_j] =  \sum_{k=1}^n E[X_j | X_j = W_k] \; P(X_j = W_k) = \sum_{k=1}^n E[W_k] \; P(Y_j = k) \\ = \sum_{k=1}^n k \; P(Y_j = k) = E[Y_j].

Advertisements
This entry was posted in order statistics, probability, statistics. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s