## Minimum Number of Wagons to Connect Every City in Ticket to Ride – Europe

How many wagons would you need to create a route network in the board game Ticket to Ride – Europe so that every city is connected to every other city through the network?  If you could do this you could potentially earn every destination ticket in the game.

To answer this question, we first need to make a graph out of the Ticket to Ride – Europe map.  We need to represent each city by a vertex, and we need to represent each city-to-city route by an arc with weight equal to the number of wagons it takes to claim that particular route.  Then the question entails finding a minimum spanning tree that connects all the vertices in this graph.  The total weight of the minimum spanning tree will be the answer to our question.

Fortunately, there are some fast algorithms for finding minimum spanning trees, and some of them are simple enough that we can easily perform them by hand.  For this post we’ll use Prim’s algorithm, which involves the following steps:

1. Start at any vertex to initialize the tree.
2. Select a minimum-weight arc that is not in the tree but that connects a vertex in the tree with one not in the tree.  Add that arc and the new vertex to the tree.
3. Repeat Step 2 until all vertices are in the tree.

Edinburgh is a logical city in which to start, since from there the only choice for arc is the 4-wagon route to London.  Add London and the route from Edinburgh to London to our tree.

Now, there are two minimum-weight arcs out of Edinburgh or London to new cities.  These are the two-wagon routes from London to Amsterdam and from London to Dieppe.  It actually doesn’t matter which one we pick; the proof of correctness for Prim’s algorithm tells us that either would lead to a minimum spanning tree, even if we don’t eventually choose the other route for our tree.  Let’s take alphabetical order as the tiebreaker and choose the London to Amsterdam route for two wagons.

In the next iteration the shortest route available to us isn’t the two-wagon route from London to Dieppe; it’s the one-wagon route from Amsterdam to Bruxelles.  So we must take that one.

Next iteration: There are multiple two-wagon routes available now: London to Dieppe, Bruxelles to Dieppe, Bruxelles to Paris, Bruxelles to Frankfurt, and Amsterdam to Frankfurt.  Alphabetical order for the new city gives us Dieppe, and alphabetical order again would select Bruxelles over London as the city to connect Dieppe to.  (Again, any of these choices would give us a minimum spanning tree at the end, so all we need is some tiebreaker criterion.)

Next choice is Dieppe to Paris for one wagon.

In the next iteration there are multiple two-wagon routes available again: Dieppe to Brest, Bruxelles to Frankfurt, and Amsterdam to Frankfurt.  We do not include the London-to-Dieppe route or the Bruxelles-to-Paris route because we only consider arcs from cities already in our tree to new cities.  Alphabetical order as the tiebreaker criterion means we select Dieppe to Brest as our new arc.

From here, I’ll just list the remaining arcs that are added, in order of selection (again using our alphabetical order tiebreaker).  For completeness I’ll include the arcs and vertices we’ve already selected.

1. Edinburgh (start)
2. Edinburgh to London (4 wagons)
3. London to Amsterdam (2)
4. Amsterdam to Bruxelles (1)
5. Bruxelles to Dieppe (2)
6. Dieppe to Paris (1)
7. Dieppe to Brest (2)
8. Amsterdam to Frankfurt (2)
9. Frankfurt to Essen (2)
10. Essen to Berlin (2)
11. Frankfurt to Munchen (2)
12. Munchen to Venezia (2)
13. Venezia to Roma (2)
14. Roma to Brindisi (2)
15. Venezia to Zagrab (2)
16. Zagrab to Budapest (2)
17. Budapest to Wien (1)
18. Munchen to Zurich (2)
19. Zurich to Marseille (2)
20. Essen to Kobenhavn (3)
21. Brindisi to Palermo (3)
22. Budapest to Sarajevo (3)
23. Sarajevo to Sofia (2)
24. Sofia to Bucuresti (2)
25. Sofia to Athina (3)
26. Athina to Smyrna (2)
27. Smyrna to Constantinople (2)
28. Constantinople to Angora (2)
29. Angora to Erzurum (3)
30. Erzurum to Sochi (3)
31. Sochi to Rostov (2)
32. Rostov to Kharkov (2)
33. Sochi to Sevastopol (2)
34. Kobenhavn to Stockholm (3)
35. Marseille to Barcelona (4)
37. Barcelona to Pamplona (2)
40. Berlin to Danzig (4)
41. Danzig to Warszawa (2)
42. Danzig to Riga (3)
43. Warszawa to Wilno (3)
44. Wilno to Kyiv (2)
45. Kyiv to Smolensk (3)
46. Smolensk to Moskva (2)

This gives us a total of 108 required wagons.  Since you only have 45 wagons you can play in a game of Ticket to Ride – Europe, you can’t get anywhere close to connecting all of the cities in this game.  Even two complete sets (i.e., 90 wagons) wouldn’t be enough.  (We haven’t considered stations in this analysis, but even if you play all three of your stations to substitute for the three four-wagon routes required in the minimum spanning tree, you still need 96 wagons to complete the rest of the tree.)

So, even if it were possible in the game to hold all the destination tickets, you cannot actually earn them all.  (That should not surprise anyone who has played Ticket to Ride – Europe, I imagine.)  What may be a bit more surprising is what our answer of 108 implies about just how far from possible it is to earn every destination ticket.

A couple of fun observations: (1) At one point Zagrab actually wins the alphabetical order criterion!  This is because the alternative choice is Zurich.  (2)  For quite a long time after selecting Edinburgh to London we can avoid choosing a route that requires more than two wagons.  This is because many of the cities in northwestern and central Europe on the map are connected by only one- or two-wagon routes.

## The Max Flow-Min Cut Theorem via Linear Programming Duality

The Max Flow-Min Cut Theorem states that the maximum flow from the source node to the sink node through a capacitated network is equal to the capacity of the minimum cut that separates that source node from the sink node.  In this post we’ll show that this result, which can be proved using only network concepts, also follows from the strong duality theorem for linear programming.

Here’s a linear programming formulation of the maximum flow problem.

\displaystyle \begin{aligned} \text{Maximize } f & \\ \text{subject to } &\sum_{j:(i,j) \in A} x_{ij} - \sum_{k:(k,i) \in A} x_{ki} = \begin{cases} f, & i = s; \\ -f, & i = t; \\ 0, &\text{otherwise;} \end{cases} \text{ for all } i \in N; \\ &0 \leq x_{ij} \leq u_{ij}, \text{ for all } (i,j) \in A. \end{aligned}

Here, f is the total amount of flow, $x_{ij}$ is the amount of flow sent along arc $(i,j)$A is the set of arcs, N is the set of nodes, s is the (single) source, t is the (single) sink, and $u_{ij}$ is the capacity (upper bound) of arc $(i,j)$.

The constraints involving the summations are the flow-balance constraints.  They enforce the idea that the total amount of flow into a node must equal the total amount of flow out of that node.  The two exceptions are the source and the sink, which emit and absorb, respectively, all the flow through the network.

The dual of this problem is

\displaystyle \begin{aligned} \text{Minimize } &\sum_{(i,j) \in A} u_{ij} z_{ij} \\ \text{subject to } &y_i - y_j + z_{ij} \geq 0, \:\: \text{ for all } (i,j) \in A; \\ &-y_s + y_t = 1; \\ &z_{ij} \geq 0, \text{ for all } (i,j) \in A. \end{aligned}

Here, $z_{ij}$ is the dual variable associated with the upper bound constraint on arc $(i,j)$, and $y_i$ is the dual variable associated with the flow-balance constraint on node i.

Next, let’s take a look at why the dual problem corresponds to the problem of finding a minimum cut.  In a search for an optimal solution we will want to set $z_{ij} = 0$ as much as possible.  This means that we will want to set $y_i = y_j$ for as many arcs $(i,j)$ as possible. (*)  However, we cannot set $y_i = y_j$ for all arcs $(i,j)$, as the requirement $-y_s + y_t = 1$ means $y_t$ will have a value one unit larger than $y_s$.  So in a search for an optimal solution to the dual we can consider only those cases in which we break the node set into two pieces: Those nodes i satisfying $y_i = y_s$ and those nodes j satisfying $y_j = y_t$.  Thus this set of cases in which the optimal solution must be found corresponds to the set of $s-t$ cuts that place the source and sink in different subsets.

Does this set of cases leave out any $s-t$ cuts, though?  No.  To see this, note that every cut $(S,\bar{S})$ with $s \in S, t \in \bar{S}$ corresponds to a feasible solution to the dual in the following manner.  Let $y_s = 0, y_t = 1$.  Then let $y_i = \begin{cases} 0, & i \in S; \\ 1, & i \in \bar{S}. \end{cases}$

Finally, let $z_{ij} = \begin{cases} 1, & (i,j) \in (S, \bar{S}); \\ 0, & (i,j) \not\in (S, \bar{S}). \end{cases}$

This assignment satisfies the dual problem, and its corresponding objective function value is exactly the capacity of the cut $(S, \bar{S})$.

Since the assignment of zero flow through the network is feasible for the original problem, both the original problem and its dual are feasible.  By the strong duality theorem, then, the original problem and its dual have optimal solutions whose objective function values are equal.  Therefore, the maximum flow through the network must equal the capacity of the minimum cut that separates the source node s from the sink node t.

(*)  Well, the constraint actually just requires $y_i \geq y_j$, but we know from complementary slackness that if there’s positive flow on an arc $(i,j)$ in the optimal solution to the maximum flow problem then the corresponding dual constraint  $y_i - y_j + z_{ij} \geq 0$ must be satisfied at equality.  So for the dual constraints that actually constrain the solution and thus actually matter we’ll have $y_i - y_j + z_{ij} = 0$ in the dual optimal.

## Solving Second-Order Difference Equations with Constant Coefficients: Part II

In this post we’ll discuss the missing case from last month’s derivation of the solution for second-order difference equations with constant coefficients; namely, the case in which the characteristic equation has a repeated root rather than two distinct roots.

Once again, suppose we have the difference equation $a_{n+2} = A a_{n+1} + B a_n$.  The characteristic equation is still $r^2 = Ar + B$.  Let’s suppose now that the latter equation has the single root $r_0$ rather than distinct roots $r_1$ and $r_2$.

Much of the derivation goes through as before with $r_1 = r_2 = r_0$, so we’ll pick up the derivation where it starts to differ.  This occurs with the partial fractions decomposition.  Instead of using partial fractions decomposition to rewrite

$\displaystyle S = \frac{a_0 + (a_1 - a_0)x}{(1-r_1 x)(1-r_2 x)}$

as

$\displaystyle S = \frac{C}{1-r_1 x} + \frac{D}{1-r_2 x}$,

having repeated roots means we must do the decomposition as

$\displaystyle S = \frac{C}{1-r_0 x} + \frac{D}{(1-r_0 x)^2}$.

The question now is how to expand the second fraction into an infinite series.  In the previous derivation we used the geometric series formula

$\displaystyle \frac{1}{1-rx} = \sum_{n=0}^{\infty} r^n x^n$.

Differentiating both sides of this formula, we obtain

$\displaystyle \frac{r}{(1-rx)^2} = \sum_{n=1}^{\infty} n r^n x^{n-1}$.

We’ll rewrite this and shift indices to get

$\displaystyle \frac{1}{(1-rx)^2} = \sum_{n=1}^{\infty} n r^{n-1} x^{n-1} = \sum_{n=0}^{\infty} (n+1) r^n x^n$.

Now we can express S as

$\displaystyle S = C \sum_{n=0}^{\infty} r_0^n x^n + D \sum_{n=0}^{\infty} (n+1) (r_0^n x^n) = \sum_{n=0}^{\infty} ((C +D) r_0^n + D n r_0^n)x^n.$

Letting $E = C+D$, this gives us

$\displaystyle S = \sum_{n=0}^{\infty} (E r_0^n + D n r_0^n)x^n.$

As before, if we remember that $\displaystyle S = \sum_{n=0}^{\infty} a_n x^n$ and that the only way two formal power series can be equal is if their coefficients are equal, we get the solution in the repeated roots case to be

$\displaystyle a_n = E r_0^n + D n r_0^n$,

where E and D depend on the initial conditions $a_0$ and $a_1$.

## Solving Second-Order Difference Equations with Constant Coefficients

Suppose you have a linear, homogeneous second-order difference equation with constant coefficients of the form $a_{n+2} = A a_{n+1} + B a_n$.  The solution procedure is as follows: “Guess” a solution of the form $a_n = r^n$.  Then substitute this guess into the difference equation to obtain $r^{n+2} = A r^{n+1} + Br^n$.  Divide both sides by $r^n$ (assuming $r \neq 0$), yielding $r^2 = Ar + B$.  Solve this equation to obtain the two solutions $r = r_1$ and $r = r_2$.  This gives two solutions to the original difference equation: $a_n = r_1^n$ and $a_n = r_2^n$.  Assuming $r_1 \neq r_2$, and after arguing that the solutions to the original equation form a vector space of dimension 2 (we’ll leave that aside for now), your two solutions to the original equation form a basis for the solution space.  This means that all solutions to the difference equation can be represented as a linear combination of the two basis solutions $r_1^n$ and $r_2^n$, which means that the general solution to $a_{n+2} = A a_{n+1} + B a_n$ is $a_n = C r_1^n + D r_2^n$, where C and D are determined by the initial conditions $a_0$ and $a_1$.

Many of my students do not find this argument satisfying (although it is valid).  They don’t like the guessing aspect near the beginning.  How do you know the right guess?  How would you come up with that guess?  In this post we’ll do a direct derivation that $a_n = C r_1^n + D r_2^n$ is actually the correction solution (with no guesses!).

The derivation uses generating functions.  We’ll start by reindexing the original recurrence into the form $a_n = A a_{n-1} + B a_{n-2}$.  Then multiply this equation by $x^n$ and then sum as n goes from 2 to infinity to obtain

$\displaystyle \sum_{n=2}^{\infty} a_n x^n = A \sum_{n=2}^{\infty} a_{n-1} x^n + B \sum_{n=2}^{\infty} a_{n-2} x^n$.

We want to reindex this equation so that each summation has the same bounds and the same subscripts for $a_n$.  Doing this yields

$\displaystyle \sum_{n=0}^{\infty} a_n x^n - a_0 - a_1 x= A \sum_{n=1}^{\infty} a_n x^{n+1} + B \sum_{n=0}^{\infty} a_n x^{n+2},$ or

$\displaystyle \sum_{n=0}^{\infty} a_n x^n - a_0 - a_1 x= A x \sum_{n=0}^{\infty} a_n x^n - a_0 x + B x^2 \sum_{n=0}^{\infty} a_n x^n$.

For simplicity’s sake, let’s let $\displaystyle S = \sum_{n=0}^{\infty} a_n x^n$.  This substitution and some rearrangement produces

$\displaystyle S - Ax S - Bx^2 S = a_0 + a_1 x - a_0 x$, which implies

$\displaystyle S = \frac{a_0 + (a_1 - a_0)x}{1 - Ax - Bx^2}$.

Now, let’s factor the denominator.  We already know that $x^2 - Ax - B = (x-r_1)(x-r_2)$.  Replacing x with $1/x$ produces $1/x^2 - A/x - B = (1/x-r_1)(1/x-r_2)$.  Then multiply by $x^2$ to obtain $1 - Ax - Bx^2 = (1-r_1 x)(1 - r_2 x)$.  Therefore,

$\displaystyle S = \frac{a_0 + (a_1 - a_0)x}{(1-r_1 x)(1-r_2 x)}$.

Applying partial fractions decomposition, we can write this as

$\displaystyle S = \frac{C}{1-r_1 x} + \frac{D}{1-r_2 x}$,

for some constants and D.  We could figure out what C and D are in terms of $a_0$ and $a_1$, but for the purposes of this derivation we don’t need to do that.

Now we’re going to pull a property of generating functions.  They can be thought of as formal power series, with the consequence that you can expand them as power series without worrying about questions of convergence.  (See, for example, Wilf’s text generatingfunctionology.)  Applying the geometric series expansion, we have

$\displaystyle S = C \sum_{n=0}^{\infty} (r_1 x)^n + D \sum_{n=0}^{\infty} (r_2 x)^n = \sum_{n=0}^{\infty} (C r_1^n + D r_2^n)x^n.$

Remembering that $\displaystyle S = \sum_{n=0}^{\infty} a_n x^n$, we now have

$\displaystyle \sum_{n=0}^{\infty} a_n x^n = \sum_{n=0}^{\infty} (C r_1^n + D r_2^n)x^n.$

Since the only way two formal power series can be equal is if their coefficients are equal, we get the solution to the original difference equation to be

$a_n = C r_1^n + D r_2^n$.

Remember, this is for the case when the characteristic polynomial $r^2 - Ar - B$ has two distinct roots $r_1$ and $r_2$.  Next month we’ll take a look at the repeated root case, where $r_1 = r_2$.

The Princeton Alumni Weekly has a great article up about the mathematicians at Princeton and the Institute for Advanced Study in the 1930s and 1940s.  I’ve heard a lot of anecdotes about mathematicians over the years, but I had not heard any of the ones in this article before.  Check it out!

## A Quasiperfect Number Must Be an Odd Perfect Square

Let $\sigma$ be the sum of divisors function.  For example, $\sigma(5) = 1 + 5 = 6$, as the divisors of 5 are 1 and 5.  Similarly, $\sigma(6) = 1 + 2 +3 + 6 = 12$, as the divisors of 6 are 1, 2, 3, and 6.

A perfect number is one that is equal to the sum of its divisors, excluding itself.  Thus 6 is a perfect number, as $6 = 1 + 2 + 3$.  In terms of the divisor function, a perfect number is one such that $\sigma(n) = 2n$.  An interesting theorem about perfect numbers is that an even number is perfect if and only if it is of the form $2^{p-1}(2^p-1)$, where both p and $2^p-1$ are prime.

Similarly, a quasiperfect number is a number n such that $\sigma(n) = 2n+1$.  We don’t know whether there are any quasiperfect numbers, but we do know that if they exist there are certain properties they must have.  In this post we will work through the following result of Cattaneo [1].  (I gave the same argument in this post on math.SE.)

Theorem: If $\sigma(n) = 2n+1$, then n is an odd perfect square.

Proof: If $n = \prod_{i=1}^r p_i^{a_i}$ is the prime factorization of n, then we know that
$\displaystyle \sigma(n) = \prod_{i=1}^r (1 + p_i + p_i^2 + \cdots + p_i^{a_i}).$
Since $\sigma(n) = 2n+1$ is odd, though, each factor $1 + p_i + p_i^2 + \cdots + p_i^{a_i}$ must also be odd.

For each odd prime factor $p_i$, then, there must be an odd number of terms in the sum $1 + p_i + p_i^2 + \cdots + p_i^{a_i}$. Thus $a_i$ must be even. This means that if $p_i$ is odd, $p_i^{a_i}$ is an odd perfect square. The product of odd perfect squares is another odd perfect square, and therefore $n = 2^s m^2$, where m is odd.

Now we have $\sigma(n) = (2^{s+1}-1)M$, where

$\displaystyle M = \prod_{p_i \text{ odd}} (1 + p_i + p_i^2 + \cdots + p_i^{a_i})$.

Since $\sigma(n) = 2n+1$,
\begin{aligned} &(2^{s+1}-1)M = 2^{s+1}m^2 + 1 \\ \implies &(2^{s+1}-1)(M - m^2)-1 = m^2. \end{aligned}
This means that -1 is a quadratic residue for each prime factor of $2^{s+1}-1$. A consequence of the quadratic reciprocity theorem, though, is that -1 is a quadratic residue of an odd prime p if and only if $p \equiv 1 \bmod 4$. Thus all prime factors of $2^{s+1}-1$ are congruent to $1 \bmod 4$. The product of numbers congruent to $1 \bmod 4$ is still congruent to $1 \bmod 4$, so $2^{s+1}-1 \equiv 1 \bmod 4$. However, if $s > 0$, then $2^{s+1}-1 \equiv 3 \bmod 4$. Thus s must be $0$. Therefore, $n = m^2$, where m is odd.

References

1.  Paolo Cattaneo, “Sui numeri quasiperfetti,” Bollettino dell’Unione Matematica ItalianoSerie 3, Vol. 6 (1951), no. 1, p. 59-62.

## Determinant of a Symmetric Pascal Matrix

The infinite symmetric Pascal matrix Q is given by

$\displaystyle Q = \begin{bmatrix} 1 & 1 & 1 & 1 & \cdots \\ 1 & 2 & 3 & 4 & \cdots \\ 1 & 3 & 6 & 10 & \cdots \\ 1 & 4 & 10 & 20 & \cdots \\ \vdots & \vdots & \vdots & \vdots & \ddots \end{bmatrix},$

where entry $(i,j)$ in Q is $\binom{i+j}{i}$.  (Note that we begin indexing the matrix with 0, not 1, in keeping with the way Pascal’s triangle is usually indexed.)

The purpose of this post is to show that the determinant of $Q_n$, the $(n+1) \times (n+1)$ upper-left submatrix of Q, is 1.

For example, here’s $Q_4$:

$\displaystyle Q_4 = \begin{bmatrix} 1 & 1 & 1 & 1 & 1\\ 1 & 2 & 3 & 4 & 5\\ 1 & 3 & 6 & 10 & 15\\ 1 & 4 & 10 & 20 & 35 \\ 1 & 5 & 15 & 35 & 70 \end{bmatrix}.$

The proof starts with Vandermonde’s identity,

$\displaystyle \sum_{k=0}^n \binom{n}{k} \binom{m}{r-k} = \binom{n+m}{r}$.

Let $r = m$ in Vandermonde, and then use the symmetry property of the binomial coefficients, $\binom{m}{m-k} = \binom{m}{k}$.  This gives us

$\displaystyle \sum_{k=0}^n \binom{n}{k} \binom{m}{k} = \binom{n+m}{n}$.

Now, let’s create matrix $P_n$, where the $(i,j)$ entry in $P_n$ is $\binom{i}{j}$.  For example, $P_4$ is given by

$\displaystyle P_4 = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 2 & 1 & 0 & 0 \\ 1 & 3 & 3 & 1 & 0 \\ 1 & 4 & 6 & 4 & 1 \end{bmatrix}$.

Now, let’s consider the product $P_n P^T_n$.  Since the $(i,j)$ entry in $P^T_n$ is $\binom{j}{i}$, entry $(n,m)$ in $P_n P^T_n$ is given by

$\displaystyle \sum_{k=0}^n \binom{n}{k} \binom{m}{k}$,

which, according to the corollary to Vandermonde’s identity that we just proved above, is $\binom{n+m}{n}$.

Since $\binom{n+m}{n}$ is entry $(n,m)$ in $Q_n$, this means that $P_n P^T_n = Q_n$.  For example,

$\displaystyle \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 2 & 1 & 0 & 0 \\ 1 & 3 & 3 & 1 & 0 \\ 1 & 4 & 6 & 4 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 2 & 3 & 4 \\ 0 & 0 & 1 & 3 & 6 \\ 0 & 0 & 0 & 1 & 4 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 & 1 & 1\\ 1 & 2 & 3 & 4 & 5\\ 1 & 3 & 6 & 10 & 15\\ 1 & 4 & 10 & 20 & 35 \\ 1 & 5 & 15 & 35 & 70 \end{bmatrix}.$

Since it is clearly the case that $\det P_n = \det P^T_n = 1$, and $\det (AB) = \det A \det B$ in general, we have that

$\displaystyle \det Q_n = 1$.

We’ve done more than prove that $\det Q_n = 1$, actually; we’ve also found the LU decomposition of $Q_n$!

For further reading I recommend Alan Edelman and Gilbert Strang’s article “Pascal Matrices” (American Mathematical Monthly 111(3): March 2004, pp. 189-197).  They give four proofs (including this one) that $\det Q_n = 1$.  Each proof describes a different way of establishing the LU decomposition given here.

Posted in binomial coefficients, matrices | 2 Comments