A Coin-Flipping Problem

One problem that I’ve assigned when discussing Markov chains is to calculate the expected number of flips required for a particular pattern to appear.  (Here I mean a pattern such as heads, heads, heads, or HHH.)  In this post I’m going to discuss another approach – one that doesn’t use Markov chains – to solve this problem.

Suppose we want to find the expected number of flips required for the pattern HHH to appear.  Call this X.  We can calculate X by conditioning on the various patterns we might achieve that do or don’t give us HHH.  For example, for our first few flips we could observe T, HT, HHT, or HHH.  These cover the entire outcome space and have probability 1/2, 1/4, 1/8, and 1/8, respectively.

  • If we observe T, then we effectively have to start the entire process over, and we’ve used one flip to get to that point.  So, if we observe T, then the average number of flips required is 1+X.
  • If we observe HT, then we also have to start the entire process over, and we’ve used two flips to get there.  For this outcome, the average number of flips required is 2+X.
  • If we observe HHT, then we have to start the entire process over, and we’ve used three flips.  The average number of flips required for this scenario is 3+X.
  • Finally, if we observe HHH, then we have achieved our goal.  The number of flips required is 3.

All together, then, the average number of flips required satisfies the equation

\displaystyle X = 1/2(1+X) + 1/4(2+X) + 1/8(3+X) + 1/8(3).

Solving for X, we obtain X = 14.  So it takes 14 flips, on average, to obtain three consecutive heads.


What if we have a more complicated pattern, though?  Let’s look at HHT as an example.  Let X be the expected number of flips required for HHT to appear for the first time.

Once again, we can condition on the various patterns we might achieve that do or don’t give us HHT.  These are T, HT, HHH, and HHT.  As in the previous example, these cover the entire outcome space and have probability 1/2, 1/4, 1/8, and 1/8, respectively.

  • If we observe T, then we have to start over.  The average number of flips required is 1+X.
  • If we observe HT, then we have to start over.  The average number of flips required is 2+X.
  • If we observe HHH, then we don’t have to start over.  It’s entirely possible that the HH at the end of HHH could be followed by a T, and then we would have achieved our pattern!  Mathematically, this means that things get a bit more complicated.
    • Let E_{HH} denote the expected number of flips required for HHT to appear given that we currently have HH.
    • Thus if we start with HHH, then the average number of flips required to obtain HHT is 3 + E_{HH}.
    • Now we need to determine E_{HH}.
      • If we currently have HH, then the next flip is either a T, which completes our pattern, or an H, we means we must start over in our quest to complete HHT given that we currently have HH.
      • Thus E_{HH} = 1/2(1) + 1/2(1 + E_{HH}).
      • Solving this yields E_{HH} = 2.
    • Thus if we observe HHH, then the average number of flips required to obtain HHT is 3 + 2 = 5.
  • Finally, if we observe HHT, then we have achieved our goal in 3 flips.

Therefore, X = 1/2(1 + X) + 1/4(2 + X) + 1/8(5) + 1/8(3).  Solving this equation for X gives us X = 8 flips.

Notice that it takes fewer flips, on average, to achieve HHT than it does HHH.  This is because we don’t always have to start over every time the sequence fails to match our goal sequence.

The interested reader is invited to find the expected number of flips for the other sequences.

Posted in coin flipping, Markov chains, probability | Leave a comment

A Request for a Proof of a Binomial Identity

A few weeks ago I received an email from Professor Steve Drekic at the University of Waterloo. He asked if I knew of a way to prove the following binomial identity:

\displaystyle \sum_{k=1}^n \frac{1}{2k-1} \binom{2k}{k} \binom{2n-2k}{n-k} = \binom{2n}{n}.

(He told me later that he wanted it to help prove that the random walk on the integers with transition probability p = 1/2 is null recurrent.)

It’s an interesting binomial identity, in that it has a nontrivial but not overly complicated sum on the left side and a simple expression on the right. Yet I had not seen it before. I was able to find a proof that uses generating functions, and I thought I would reproduce it here.

Lemma 1.

\displaystyle \binom{1/2}{k} = \frac{-1}{2k-1} \binom{-1/2}{k}.

Proof.
By Identity 17 in my book The Art of Proving Binomial Identities [1],

\displaystyle \binom{1/2}{k} = \frac{(1/2)^{\underline{k}}}{k!} = \frac{(1/2) (-1/2)^{\underline{k}}}{(-1/2-k+1) k!} = \frac{(-1/2)^{\underline{k}}}{-(2k-1)) k!} = \frac{-1}{2k-1} \binom{-1/2}{k},

where in the second step we use properties of the falling factorial x^{\underline{k}}.

Next, we need the following generating function.

Lemma 2.

Proof.
\displaystyle - \sqrt{1-4x} = \sum_{k=0}^{\infty} \frac{1}{2k-1} \binom{2k}{k} x^k.

By Newton’s binomial series (Identity 18 in my book [1]),
\displaystyle  -\sqrt{1-4x} = -(1-4x)^{1/2} = -\sum_{k=0}^{\infty} \binom{1/2}{k} (-4x)^k \\ = -\sum_{k=0}^{\infty} \frac{-1}{2k-1} \binom{-1/2}{k} (-4x)^k, \text{ by Lemma 1} \\ = \sum_{k=0}^{\infty} \frac{1}{2k-1} \left(\frac{-1}{4}\right)^k \binom{2k}{k} (-4x)^k, \text{ by Identity 30 in [1]} \\ = \sum_{k=0}^{\infty} \frac{1}{2k-1} \binom{2k}{k} x^k.

Finally, we’re ready to prove the identity.

Identity 1.

\displaystyle \sum_{k=1}^n \frac{1}{2k-1} \binom{2k}{k} \binom{2n-2k}{n-k} = \binom{2n}{n}.

Proof.
Let a_k = \binom{2k}{k}/(2k-1), and let b_k = \binom{2k}{k}. Since the generating function for (a_k) is -\sqrt{1-4x} (Lemma 2), and the generating function for (b_k) is 1/\sqrt{1-4x} (Identity 150 in [1]), the generating function for their convolution,

\displaystyle c_n = \sum_{k=0}^n \frac{1}{2k-1} \binom{2k}{k} \binom{2n-2k}{n-k},

is, by the convolution property for generating functions (see Theorem 13 in [1]),

\displaystyle - \frac{\sqrt{1-4x}}{\sqrt{1-4x}} = -1.

Since -1 just generates the sequence -1, 0, 0, 0, \ldots, this means that

\displaystyle \sum_{k=0}^n \frac{1}{2k-1} \binom{2k}{k} \binom{2n-2k}{n-k} = \begin{cases} -1, & \: n = 0; \\ 0, & \: n \geq 1.\end{cases}

Therefore, when n \geq 1, we have
\displaystyle  \sum_{k=0}^n \frac{1}{2k-1} \binom{2k}{k} \binom{2n-2k}{n-k} = 0 \\ \implies \sum_{k=1}^n \frac{1}{2k-1} \binom{2k}{k} \binom{2n-2k}{n-k} - \binom{2n}{n} = 0 \\ \implies \sum_{k=1}^n \frac{1}{2k-1} \binom{2k}{k} \binom{2n-2k}{n-k} = \binom{2n}{n}.

References

1. Michael Z. Spivey, The Art of Proving Binomial Identities, CRC Press, 2019.

Posted in binomial coefficients, generating functions | 2 Comments

Strong Induction Wasn’t Needed After All

Lately when I’ve taught the second principle of mathematical induction – also called “strong induction” – I’ve used the following example to illustrate why we need it.

Prove that you can make any amount of postage of 12 cents or greater using only 4-cent and 5-cent stamps.

At this point in a class we would have just done several examples using regular induction, and so we would be naturally inclined to try to prove this postage stamp problem using that technique.  The base case is easy: Just use three 4-cent stamps to make 12 cents of postage.  The induction hypothesis is as usual, too: Suppose, for some k \geq 12, that we can make k cents’ worth of postage using only 4-cent and 5-cent stamps.  But regular induction fails at the induction step: To prove that you can make k+1 cents using only 4-cent and 5-cent stamps, knowing that you can make k cents isn’t helpful, since you can’t add either a 4-cent or a 5-cent stamp to k cents’ worth of postage to generate k+1 cents’ worth of postage.  Instead, you need to know that you can make, say, k-3 cents’ worth of postage.  Then you can add a 4-cent stamp to that amount to produce your k+1 cents.  In other words, you need to assume in the induction hypothesis not that you can make k cents but that you can make 12, 13, \ldots, k cents.  And that’s the essence of strong induction.

However, before we get to this realization my students will often suggest that we can produce k+1 cents from k cents by simply removing one of the 4-cent stamps used to produce k cents and replacing it with a 5-cent stamp.  This is a good idea.  However, it assumes that we actually used at least one 4-cent stamp to produce k cents, and that’s a faulty assumption.  Sometimes we don’t need a 4-cent stamp, such as if we use three 5-cent stamps to produce 15 cents.  (In fact, for 15 cents we must use only 5-cent stamps.)  If we don’t use any 4-cent stamps then we can’t generate k+1 cents from k cents by replacing a 4-cent stamp with a 5-cent one.

Normally the students are a little chagrined that this idea fails.  And that happened again this term.  After discussing the idea and why it doesn’t quite work I was about ready to move on and introduce strong induction when one of my students interjected, “I’ve got it!  I’ve got it!”  He pointed out that if we’re going to use only 5-cent stamps to generate k cents, when k \geq 12, then we’ll need at least three of them.  That’s true, I said.  Then, he added, if we have at least three 5-cent stamps to make k cents, we can replace them with four 4-cent stamps to yield k+1 cents’ worth of postage.  That covers the remaining case.  I was impressed enough that I applauded the class for their collective work in solving the problem.

To recap: You don’t actually need strong induction to solve the stamp problem described above.  Use regular induction, and break the induction step into two cases: (1) If you use at least one 4-cent stamp to make k cents, replace it with a 5-cent stamp to make k+1 cents.  (2) If you only use 5-cent stamps to make k cents, you’ll need at least three of them.  Replace those three with four 4-cent stamps to generate k+1 cents’ worth of postage.

Well done, Spring 2020 Math 210 students!

Posted in number theory, proof techniques | Leave a comment

Arguments for 0.9999… Being Equal to 1

Recently I tried to explain to my 11-year-old son why 0.9999… equals 1.  The standard arguments for 0.9999... = 1 (at least the ones I’ve seen) assume more math background than he has.  So I tried another couple of arguments, and they seemed to convince him.

The first argument.  The usual claim you get from people who aren’t yet convinced that 0.9999... = 1 is that 0.9999… is the real number just before 1 on the number line.  Let’s suppose this is true.  What is the average of 0.9999… and 1?  Stop and think about that for a bit.

If the average exists, it must be larger than 0.9999…, and it must be smaller than 1.  But if 0.9999… is just before 1 on the number line, there can’t be such a number.  So either (1) the averaging operation doesn’t apply to 0.9999… and 1, (2) there are numbers between 0.9999… and 1 on the number line, or (3) 0.9999… equals 1.  But (1) immediately leads to the question “Why not?”, which has no obvious answer, and (2) leads to the question of what their decimal representations would be, which also has no obvious answer.  Explanation (3), that 0.9999… equals 1, starts to look more plausible.

The second argument.  Again, let’s assume that 0.9999… is the real number just before 1 on the number line.  If this is true, then what is the difference of 1 and 0.9999…?  Again, stop and think about that for a bit.

If it’s not zero, then you could just add half of that difference to 0.9999… to get a new number between 0.9999… and 1, which not only contradicts our assumption but also forces us to come up with the decimal representation of such a number.  If it is 0, then you have that 0.9999... = 1.  And if you try to argue that you can’t subtract 0.9999… from 1, then you need to explain why that operation is not allowed for those two real numbers.  (This second argument is a lot like the first one, really.)  The most reasonable of the three options is that the difference is 0, which means that 0.9999… is actually equal to 1.

Final comments and the standard algebra argument.  Both arguments are reductio ad absurdum arguments; that is, they assume that 0.9999… is not equal to 1 and then reason to a contradiction.  The other arguments that I’ve seen are all direct arguments; i.e., they reason from basic mathematical principles to the conclusion that 0.9999… equals 1.

For example, here’s the standard argument via algebra.  We know that 0.9999… must be equal to some number, so let’s call that number x.  Multiplying by 10 yields 10x = 9.9999....  Subtracting the first equation from the second leaves us 9x = 9.0000..., which implies that x = 1.  Thus 0.9999... = 1.

The algebraic argument is a great one, provided you know algebra.  But for my preteen with a pre-algebra background, these two reductio ad absurdum arguments seemed to be enough to convince him.

Posted in arithmetic, number theory | Leave a comment

Diversity Statements in Hiring

Recently Abigail Thompson, chair of the mathematics department at the University of California, Davis, and a vice president of the American Mathematical Society, published this article in Notices of the American Mathematical Society.  The article includes the following statement:

Faculty at universities across the country are facing an echo of the loyalty oath [of the 1950s], a mandatory “Diversity Statement” for job applicants.  The professed purpose is to identify candidates who have the skills and experience to advance institutional diversity and equity goals.  In reality it’s a political test, and it’s a political test with teeth.

She goes on to explain why.

Why is it a political test? Politics are a reflection of how you believe society should be organized. Classical liberals aspire to treat every person as a unique individual, not as a representative of their gender or their ethnic group. The sample rubric dictates that in order to get a high diversity score, a candidate must have actively engaged in promoting different identity groups as part of their professional life. The candidate should demonstrate “clear knowledge of, experience with, and interest in dimensions of diversity that result from different identities” and describe “multiple activities in depth.” Requiring candidates to believe that people should be treated differently according to their identity is indeed a political test.

I agree.  The use of diversity statements in hiring is a political test and thus is inherently discriminatory.  We should abandon the practice.

Posted in campus issues, diversity, politics | 2 Comments

Finding the Area of an Irregular Polygon

Finding the area of an irregular polygon via geometry can be a bit of a chore, as the process depends heavily on the shape of the polygon.  It turns out, however, that there’s a formula that can give you the area of any polygon as long as you know the polygon’s vertices.  That formula is based on Green’s Theorem.

Green’s Theorem is a powerful tool in vector analysis that shows how to convert from a line integral around a closed curve to a double integral over the region enclosed by the curve and vice versa.  More specifically, the circulation-curl form of Green’s Theorem states that, if D is a closed region with boundary curve C and C is oriented counterclockwise, then

\displaystyle \oint_C (M dx + N dy) = \iint_D \left( \frac{\partial N}{\partial x} - \frac{\partial M}{\partial y} \right) dA.

The special case of Green’s Theorem that can generate our formula for the area of a polygon has N = x and M = 0.  This yields

\displaystyle \oint_C x dy = \iint_D dA.

Since the double integral here gives the area of region D, this equation says that we can find the area of D by evaluating the line integral of x dy counterclockwise around the curve C.  The boundary curve for a polygon consists of a finite set of line segments, though, and so to obtain our formula we just need to find out what \oint_L x dy is for a line segment L that runs from a generic point (x_1, y_1) to another generic point (x_2, y_2).  Let’s do that now.

We need a parameterization of L.  That requires a point on the line segment L and a vector in the direction of L.  The starting point (x_1, y_1) and the vector (x_2 - x_1){\bf i} + (y_2 - y_1){\bf j} from the starting point to the ending point work well, giving us the parameterization

\displaystyle {\bf r}(t) = (x_1 + t(x_2-x_1)){\bf i} + (y_1 + t(y_2-y_1)){\bf j}, \: 0 \leq t \leq 1.

With x = x_1 + t(x_2-x_1) and y = y_1 + t(y_2-y_1), we have dy = (y_2 - y_1)dt.  This gives us the integral

\displaystyle \int_0^1 (x_1 + t(x_2-x_1))(y_2-y_1)dt \\ = (y_2-y_1)\int_0^1 (x_1 + t(x_2-x_1))dt \\ = (y_2-y_1)\left[tx_1  + \frac{t^2}{2}(x_2-x_1)\right]_0^1 \\ = (y_2-y_1)\frac{x_1+x_2}{2}.

In other words, the line integral of x dy from (x_1, y_1) to  (x_2, y_2) is \displaystyle (y_2-y_1)\frac{x_1+x_2}{2}, the difference in the y coordinates times the average of the x coordinates.

Let’s put all this together to get our formula.  Suppose we have a polygon D with n vertices. Start with any vertex.  Label it (x_1, y_1).  Then move counterclockwise around the polygon D, labeling successive vertices (x_2, y_2), (x_3, y_3), and so forth, until you label the last vertex (x_n, y_n).  Finally, give the starting vertex a second label of (x_{n+1}, y_{n+1}).  Then the area of D is given by

\displaystyle \text{Area of } D = \sum_{i=1}^n (y_{i+1} - y_i) \frac{x_i + x_{i+1}}{2}.

In words, this means that to find the area of polygon D you can just take the difference of the y coordinates times the average of the x coordinates of the endpoints of each line segment making up the polygon and then add up.

This formula isn’t the only such formula for finding the area of a polygon.  In fact, a more common formula that can also be proved with Green’s Theorem is

\displaystyle \text{Area of } D=  \frac{1}{2} \sum_{i=1}^n (x_i y_{i+1} - x_{i+1}y_i).

This second formula looks a little nicer to the eye, but I prefer the first formula for calculations by hand.

Posted in analytic geometry, calculus, Green's Theorem | 1 Comment

An observation on the unit circle

We did a quick review of the unit circle in my multivariate calculus class last week, and I pointed out a fact about the sines and cosines of the common angles in the first quadrant that some of the students appeared not to have seen before.  I thought I would record it here.

The five angles most commonly encountered in the first quadrant, together with their coordinates on the unit circle, are as follows (angle measures are in radians):

  • 0: (1,0)
  • \pi/6: (\sqrt{3}/2, 1/2)
  • \pi/4: (\sqrt{2}/2, \sqrt{2}/2)
  • \pi/3: (1/2, \sqrt{3}/2)
  • \pi/2: (0, 1)

Part of the reason these coordinates are important is that they tell you the sine and cosine of the corresponding angle.  The cosine value is the first coordinate (technical term: abscissa), while the sine value is the second coordinate (technical term: ordinate).  For example, \cos (\pi/3) = 1/2 and \sin(\pi/3) = \sqrt{3}/2.

Here’s another way of writing the same information, one that illustrates a pattern with the coordinate values.

  • 0: (\sqrt{4}/2, \sqrt{0}/2)
  • \pi/6: (\sqrt{3}/2, \sqrt{1}/2)
  • \pi/4: (\sqrt{2}/2, \sqrt{2}/2)
  • \pi/3: (\sqrt{1}/2, \sqrt{3}/2)
  • \pi/2: (\sqrt{0}/2, \sqrt{4}/2)

Every coordinate value is of the form \sqrt{i}/2, where i is a value between 0 and 4.  As the angle measure increases from 0 radians to \pi/2 radians, the cosine value decreases from \sqrt{4}/2 down to \sqrt{0}/2, while the sine value increases from \sqrt{0}/2 up to \sqrt{4}/2.  Perhaps this pattern will help some folks to memorize the coordinate values more easily.

Posted in trigonometry | Leave a comment

Six Proofs of a Binomial Identity

I’m a big fan of proving an identity in multiple ways, as I think each perspective gives additional insight into why the identity is true.  In this post we’ll work through six different proofs of the binomial identity \displaystyle \sum_{k=0}^n \binom{n}{k} k = n 2^{n-1}.

1. The absorption identity proof.

The absorption identity states that, for real n, k \binom{n}{k} = n \binom{n-1}{k-1}.  Thus we have

\displaystyle \sum_{k=0}^n \binom{n}{k} k = \sum_{k=0}^n \binom{n-1}{k-1} n = n \sum_{k=-1}^{n-1} \binom{n-1}{k} = n \sum_{k=0}^{n-1} \binom{n-1}{k} = n 2^{n-1}, recalling that \binom{n-1}{-1} = 0.

2. The combinatorial proof.

How many chaired committees of any size can be formed from a group of n people?

One way to count them is to choose the committee first and then choose the chair.  First, we condition on the size of the committee.  If there are k people on the committee, there are \binom{n}{k} ways to choose the people to be on the committee.  Then there are k ways to choose the chair.  Summing up over all possible values of k, we find that the answer is \sum_{k=0}^n \binom{n}{k} k.

Another way is to choose the chair first and then choose the committee.  There are n choices for the chair.  Then, for each of the remaining n-1 people, we have two options: Each person could be on the committee or not.  Thus there are 2^{n-1} ways to choose the committee once the chair is chosen.  This gives an answer of n 2^{n-1} when choosing the chair first.

Since the two answers must be equal, we have \displaystyle \sum_{k=0}^n \binom{n}{k} k = n 2^{n-1}.

3. The calculus proof.

Differentiate the binomial theorem, \displaystyle \sum_{k=0}^n \binom{n}{k} x^k = (x+1)^n, with respect to x to obtain \displaystyle \sum_{k=0}^n \binom{n}{k} k x^{k-1} = n(x+1)^{n-1}.  Letting x = 1, we have \displaystyle \sum_{k=0}^n \binom{n}{k} k = n 2^{n-1}.

4. The probabilitistic proof.

Imagine an experiment where you flip a fair coin n times.  What is the expected number of heads for this experiment?

One way to determine this starts by conditioning on the number k of heads.  If there are k heads, there are \binom{n}{k} ways of choosing which k flips will be the heads.  The probability that these flips are all heads is (1/2)^k, and the probability that the remaining flips are all tails is (1/2)^{n-k}.  Multiply these together and apply the definition of expected value to get an answer of \displaystyle \sum_{k=0}^n \binom{n}{k} k \left(\frac{1}{2} \right)^k \left(\frac{1}{2} \right)^{n-k} = \sum_{k=0}^n \binom{n}{k} k \left( \frac{1}{2} \right)^n = \frac{1}{2^n} \sum_{k=0}^n \binom{n}{k} k.

Another way is to use indicator variables.  Let X_k be 1 if flip k is heads and 0 if flip k is tails.  Then the number of heads in the sequence of n flips is \sum_{k=0}^n X_k.  Also, the expected value of X_k is, by definition, E(X_k) = 1 (1/2) + 0 (1/2) = 1/2.  Thus the expected number of heads is E \left( \sum_{k=0}^n X_k \right) = \sum_{k=0}^n E(X_k) = \sum_{k=0}^n (1/2) =n/2.  (This is a formal way of arguing for the answer of n/2 that our intuition says should be the expected number of heads.)

Equating our two answers, we have \displaystyle \frac{1}{2^n} \sum_{k=0}^n \binom{n}{k} k = \frac{n}{2}, which implies \displaystyle \sum_{k=0}^n \binom{n}{k} k = n 2^{n-1}.

5. The exponential generating functions proof

For this proof we’re going to need a definition and a few properties of exponential generating functions.

First, the binomial convolution of the sequences (a_n) and (b_n) is given by \displaystyle \sum_{k=0}^n \binom{n}{k} a_k b_{n-k}.

Second, we have the following.  (See, for example, pages 126-128 of my book The Art of Proving Binomial Identities. [1])

  1. If \displaystyle f_a(x) = \sum_{n=0}^{\infty} a_n \frac{x^n}{n!} and \displaystyle f_b(x) = \sum_{n=0}^{\infty} b_n \frac{x^n}{n!} then \displaystyle f_a(x) f_b(x) = \sum_{n=0}^{\infty} \left(\sum_{k=0}^n \binom{n}{k} a_k b_{n-k} \right) \frac{x^n}{n!}.  (This is the binomial convolution property for exponential generating functions.)
  2. We have \displaystyle e^x = \sum_{k=0}^{\infty} \frac{x^k}{k!}.  (This is the Maclaurin series for e^x.)
  3. If \displaystyle f_a(x) = \sum_{n=0}^{\infty} a_n \frac{x^n}{n!} then x f_a(x) is the exponential generating function for the sequence given by (n a_{n-1}) = (0, a_0, 2a_1, 3a_2, \ldots).

By the definition of binomial convolution, \sum_{k=0}^n \binom{n}{k} k is the binomial convolution of the sequences with a_n = n and b_n = 1.  What are the exponential generating functions of these two sequences?

By Property 2, the exponential generating function for the sequence (b_n), with b_n = 1, is e^x.

If we take the sequence 1, 1, 1, \ldots, append a 0 to the front, and multiply it by n, we have the sequence with a_n = n.  By Property 3, then, the exponential generating function for the sequence (a_n) is x e^x.

Thus, by Property 1, the exponential generating function for \sum_{k=0}^n \binom{n}{k} k is x e^x (e^x) = x e^{2x}.  However, \displaystyle e^{2x} = \sum_{k=0}^{\infty} \frac{(2x)^k}{k!} = \sum_{k=0}^{\infty} \frac{2^k x^k}{k!}, which means that e^{2x} is the exponential generating function for the sequence (2^n).  Thus, by Property 3, x e^{2x} is also the exponential generating function for the sequence (n 2^{n-1}).   Since \left(\sum_{k=0}^n \binom{n}{k} k \right) and \left(n 2^{n-1}\right) have the same generating function, they must be equal, and so \displaystyle \sum_{k=0}^n \binom{n}{k} k = n 2^{n-1}.

6. The finite differences bootstrapping proof

This proof requires a result that I don’t think is that well-known.  It’s Theorem 4 in my paper “Combinatorial Sums and Finite Differences.” [2]

Theorem: Let (a_k) and (b_k) be sequences such that (a_k) is the finite difference sequence for (b_k); i.e., \Delta b_k = b_{k+1} - b_k = a_k for each k \geq 0.  If g_n = \sum_{k=0}^n a_k and h_n = \sum_{k=0}^n b_k then, for n \geq 0, \displaystyle h_n = 2^n \left( b_0 + \sum_{k=1}^n \frac{g_{k-1}}{2^k} \right).

First, if b_k = 1, then a_k = 0.  Thus g_n = 0, and, by the theorem, h_n = \sum_{k=0}^n \binom{n}{k} = 2^n.

Next, if b_k = k, then a_k = k+1 - k = 1.  By the theorem and the previous result, then, \displaystyle \sum_{k=0}^n \binom{n}{k} k = 2^n \left(0 + \sum_{k=1}^n \frac{2^{k-1}}{2^k} \right) = 2^n \sum_{k=1}^n \frac{1}{2} = 2^n \frac{n}{2} = n 2^{n-1}.

References 

  1. Spivey, Michael Z.  The Art of Proving Binomial Identities.  CRC Press, 2019.
  2. Spivey, Michael Z.  “Combinatorial Sums and Finite Differences.”  Discrete Mathematics, 307 (24): 3130-3146, 2007.
Posted in binomial coefficients, calculus, combinatorics, generating functions, probability | 2 Comments

Zeno’s Paradoxes

The ancient Greek philosopher Zeno of Elea is known for proposing several paradoxes related to time, space, motion, and infinity.  In this post we’ll focus on one of Zeno’s paradoxes and discuss how some ideas associated with calculus might or might not be able to resolve it.

The most famous of Zeno’s paradoxes is perhaps the Dichotomy paradox.  In this one, Zeno points out that in order for an object in motion to reach its goal it must first cover half the distance to its goal.  But to do so it must have previously covered half of the first half, or a quarter of the total distance.  Continuing with this line of thinking, we see that the object must have already covered an eighth of the total distance before reaching the quarter point, a sixteenth of the total distance before reaching the one-eighth point, and so forth, in order to complete its goal.  We end up with an infinite sequence of time intervals that the object must cover in order to travel the finite distance to its goal.  This seems impossible, for multiple reasons.  For one, it seems likely that one cannot actually complete infinitely many tasks.  For two, since an object’s initial movement interval can always be halved, how would an object even start moving?  Because of objections such as these, the conclusion of the paradox is that motion cannot occur.

Of course we know that things do, in fact, move, so the real intellectual challenge here is to determine exactly where Zeno’s line of reasoning fails.

One way out is to posit that space and time are not actually infinitely divisible.  Mathematicians assume that they are, but it might be the case that, say, real numbers are an intellectual abstraction that only approximates the true discrete nature of reality.  This would turn some basic calculus concepts on their heads; for example, it would make a derivative an approximation of a difference quotient, rather than the other way around.

Alternatively, some folks think that the precise nineteenth-century epsilon-delta notion of a limit has resolved Zeno’s paradoxes.  Certainly the Dichotomy paradox as presented here is considering a problem very close to the one that led to the precise definition of the derivative.  In the Dichotomy paradox there is an infinite sequence of time intervals involved, but they are getting smaller and smaller.  Moreover, each time interval has an associated distance interval, and these distance intervals are also getting smaller and smaller.  By dividing the length of each distance interval by the length of its corresponding time interval we get the sequence of average speeds associated with these intervals.  Mathematically, this is a sequence of difference quotients.  So the problem, in the Dichotomy paradox, of what happens as the distance and time intervals get smaller and smaller is closely related to the problem of what happens to the difference quotients as the associated intervals get smaller and smaller.

Over three hundred years after the development of calculus mathematicians are quite comfortable with what happens to an infinite sequence of difference quotients as the associated intervals get smaller and smaller: If those difference quotients are consistently getting closer and closer to a specific number then we call that specific number the derivative.  And we define precisely what we mean by “consistently getting closer and closer” through an epsilon-delta (or perhaps epsilon-N) argument.  Thus we have a consistent way to talk about the motion that the Dichotomy paradox says cannot occur.  Moreover, the value of the derivative is the speed of the object, which is a concept that makes no sense unless the object is actually moving.

But does this formulation really resolve Zeno’s Dichotomy paradox?  I’m not convinced it does.  I’m coming to the conclusion that mathematicians have merely defined away the problem posed by Zeno.  To be sure, mathematicians have found a way to talk about what happens to an infinite sequence of numbers that is both logically coherent and fits what we understand to be happening with those numbers.  On the other hand, can an actual person perform infinitely many tasks?  I would argue not, and I don’t think the limit approach truly addresses this aspect of the Dichotomy paradox.  It just avoids it in a consistent way.

At the very least, though, Zeno’s paradoxes have forced mathematicians to consider carefully what they mean by notions such as “infinitely small” and “arbitrarily large,” as well as the concept of infinity itself.  And that is no small thing.

Posted in calculus, infinity, logic, paradoxes, real analysis, sequences and series | Leave a comment

Five Theses on Mathematics in Games

This past weekend I gave a talk entitled “Mathematics Through Narrative” at the NarraScope conference on narrative games at MIT in Boston.  Much of what I discussed I had also put into a series of blog posts several months ago.  (See part 1, part 2, and part 3 of those posts.)

However, in the talk I also presented a set of five theses on mathematics in games that did not appear in those blog posts.  Here they are, with commentary.

Mathematics in games should…

  1. Be integrated into the story world.
  2. Be used to solve puzzles in-game.
  3. Feature, if possible, repetition.
  4. Allow the player to play with the mathematical concepts.
  5. Avoid symbolic manipulation as much as possible.

Mathematics in games should be integrated into the story world.  This was the primary point I was making in part 1 of the series of blog posts from several months back, so perhaps just a quick recap is all that’s necessary here.  First, we retain new ideas (both mathematical and otherwise) when we can situate them in a web of mental connections that we already have.  (I don’t think this is a particularly controversial or outlandish claim.)  Therefore, games that attempt to teach mathematics should integrate the math into the story world.  They should not yank the player out of the story world and force him to solve mathematics problems that are otherwise irrelevant to that story world in order to advance in the game.  To do otherwise turns the gaming experience into a homework set with a fancy wrapper around it.  It also likely leaves the impression that mathematics isn’t relevant to real life.  We can do better than that.

Mathematics in games should be used to solve puzzles in-game.  This follows from the previous thesis.  If we’re going to integrate math into the story world, how do we go about doing it?  Well, lots of games from a variety of genres have puzzles that must be solved in order to advance in the game.  And mathematics is a great source of ideas for puzzles.  It’s also an underutilized one: There is a lot of mathematics out there, but I have seen very little of it actually appear in a game.  I’m sure that’s because mathematics is perceived as inaccessible.  In addition, game designers aren’t used to thinking about mathematical ideas as part of their puzzle-building toolkit, beyond some standard logic puzzles.  However, there are examples of mathematical ideas – even advanced ones – appearing in games.  Jessica Sklar has an article [1] on linear algebra in game puzzles.  I also have an article [2] that discusses – among other examples – the Klein bottle puzzle in Trinity and Zeno’s bridge in Beyond Zork.  Then there’s my A Beauty Cold and Austere, whose gameplay is all about solving mathematics-based puzzles.  There is still a lot more mathematics out there that could be tapped to create engaging challenges for players, too.  Combinatorial optimization problems in particular seem like a rich, underutilized puzzle source.

Mathematics in games should feature, if possible, repetition.  I don’t think this is a particularly controversial claim.  After all, it’s clear that we learn and retain ideas not just when they’re placed in a context but also through continued repetition.  I put the qualifier “if possible” in there because the repetition goal can cut against other design goals.  If a game is going for breadth, mathematically-speaking, then there’s not going to be space in the game to hit every mathematical idea multiple times.  In addition, if a game just wants to feature a few mathematical ideas as puzzles or has more of an exposure-to-mathematics goal in mind, then repetition becomes less important.  But if a game really has the intent to teach mathematics, then repetition must be one of the features the game designer has to keep in mind.

Mathematics in games should allow the player to play with the mathematical concepts.  This is related to the previous thesis.  One way to implement something like repetition without creating multiple puzzles that feature exactly the same concept (which, if not carefully done, could get boring for the player) is to create a mini-sandbox featuring the mathematical idea you want to get across.  Then the player can engage the concept from multiple angles, seeing how it responds under different choices she makes.  I’ll draw an analogy here with people: If you want to get to know someone, you’ll have to see how that person responds in lots of different scenarios (e.g., with friends, with family, in good times, in stressful situations), not just in the context in which you originally met them.  The same is true with ideas – mathematical and otherwise.  You can’t truly understand an idea until you look at it from multiple perspectives.  One example in A Beauty Cold and Austere is the sequences-and-series machine.  By playing with the various controls on the machine you can create sixteen different settings for the golden path leading out of the room, each of which represents a different sequence or series.  Following the path for a particular setting takes the player to a number space corresponding to the limit associated with the underlying sequence or series.  Some of these paths represents solutions to puzzles, and some just exist to allow the player to engage with the mathematics.

Mathematics in games should avoid symbolic manipulation as much as possible.  An expression like \sum_{k=1}^{\infty} \frac{(-1)^{k+1}}{k} = \ln 2 is a concise way to represent the alternating harmonic series and its value, but it’s also a foreign language to the vast majority of players out there.  As such, when the typical player encounters something like this expression in a game it is not inviting or engaging.  Instead, it’s intimidating and off-putting.  As game designers it is our responsibility to find ways to express fascinating mathematical ideas without triggering players’ phobias around mathematics, many of which are (I’m convinced) tied to those ideas’ symbolic representations.  For example, when I gave my talk at NarraScope I displayed the alternating harmonic series equation above as an example of something to avoid.  Then I demoed a couple of puzzles from A Beauty Cold and Austere, the second of which was the sequences-and-series machine.  After playing with several of the settings to see what happened I picked a particular collection of settings and asked the audience whether we could predict where the golden path would lead.  We agreed that these settings would not get us to infinity (unlike another setting) nor result in values that bounced around more and more wildly (also unlike another setting).  Instead, these settings would create a path that would eventually settle down to a limiting value somewhere between 0 and 1.  And sure enough, that’s what happened, with the game telling us that the actual value was the natural logarithm of 2.  Then I pointed out that we had just recreated the ideas behind the alternating harmonic series equation I had displayed fifteen minutes earlier, without using any mathematical symbols at all.  That produced multiple murmurs from the audience, including a “mind blown” comment from someone sitting near the front.  (I’ll admit that that was my favorite moment of the whole talk.)  The point is that we can avoid symbolic manipulation when representing mathematics in games, even if we have to get creative in how we do it.

So, these are my five theses on featuring mathematics in games.  It’s a full ninety fewer than Martin Luther had.  I also did not nail them to one of the classroom doors at MIT.

References

  1. Sklar, Jessica.  “Dials and levers and glyphs, oh my! Linear algebra solutions to computer game puzzles.” Mathematics Magazine 79(5), 2006: 360-367.
  2. Spivey, Michael Z.  “Mathematics through narrative,” Math Horizons 26(3), 2019: 22-25.
Posted in games, teaching | 1 Comment