In a previous post, we solved a small instance of the traveling salesman problem (TSP) around Singapore’s hawker centers using linear programming. Today, we will talk about limitations of the linear programming approach to TSP, namely a recent result showing a super-polynomial lower bound on the number of linear constraints needed to characterize the set of valid tours. The paper is “Linear vs. Semidefinite Extended Formulations: Exponential Separation and Strong Lower Bounds” by Fiorini, Massar, Pokutta, Tiwary, and de Wolf (hereafter referred to as FMPTW) and shared the best paper award at the STOC 2012 conference.
A key tool in this lower bound is another concept we have discussed on this blog, the non-negative rank. Believe it or not, those posts were written to prepare for this one! Even so, this post will be more technical than usual. Before we get started, let me thank Ronald de Wolf for his comments on this post, and for explaining this work to me when he visited CQT at the end of last year.
To put the FMPTW work in context, let me first recap the approach we took to finding the shortest tour through Singapore’s hawker centers. We considered variables \(x_{i,j}\) meant to represent the presence or absence of an edge between city \(i\) and \(j\) in a tour. We then brainstormed linear constraints on the \(x_{i,j}\) satisfied by true tours—starting out with basic constraints like all edge weights should be non-negative and the total weight of edges incident to any city should be two. Finding the shortest “tour” with respect to these linear constraints can be done efficiently, as it is a linear program. We saw, however, that the solution was not a true tour but had problems like non-integral weight edges and small cycles. To combat this, we did several iterations of adding more linear constraints in an attempt to rub out spurious solutions. A natural question is: when will this process stop? How many linear constraints are needed to describe the set of valid tours? The work of FMPTW gives the first super-polynomial lower bound on the number of linear constraints needed to characterize the set of valid tours: in fact, they show a bound of \(2^{n^{1/4}}\) for tours on \(n\) cities. (Ronald tells me they have now improved this to \(2^{\sqrt{n}}\).)
I want to clear up one thing that might be confusing with this introduction to their result. In the hawker center problem, we did not develop a set of linear constraints to characterize all valid tours on 18 cities—we just needed to eliminate false solutions near the optimal tour for our particular problem. To do this, we chose the constraints adaptively, meaning that when the linear program returned a spurious solution, we scratched our heads and identified a fault of this solution (like having a small cycle) and added a linear constraint to eliminate this fault. For a different configuration of hawker centers, our
final linear program might still return a spurious solution. The FMPTW result, on the other hand, speaks about the number of linear inequalities needed to characterize the set of valid tours. Such a characterization is not dependent on the objective function (the configuration of the cities), and so would universally allow you to solve a TSP problem by optimizing with respect to these inequalities.
The paper actually deals with more general linear programming formulations called extended formulations that allow the introduction of extra variables. In addition to the variable \(x \in R^{n \choose 2}\) whose \((i,j)\) entry represents the “strength” of the edge between cities \(i\) and \(j\) in a tour
(and ideally would be either 0 or 1), we are now also allowed some extra variables \(y \in R^k\) that don’t enter in the objective function. An extended formulation for TSP expresses the convex hull of the set of valid tours by the set \(\{x : \exists y \ge 0 : Ax +By = b\}\). The size of an extended formulation is the number of constraints, that is the number of rows in \(A\) and \(B\).
While it may be surprising that allowing extra variables can make a difference, it can. For example, in solving the hawker tour problem a key role was played by subtour inequalities to eliminate small cycles—inequalities of the form \(\sum_{i<j \in S} x_{i,j} \le |S|-1\) for a set \(S \subset [n]\). We did not add all the subtour inequalities, though, because there are exponentially many! The extended formulation size of the linear programming relaxation of TSP with all subtour inequalities, however, is still polynomial, in fact \(O(n^3)\).
The connection between extended formulation size and non-negative rank goes back to a beautiful result by Yannakakis over 20 years ago. We first need a couple of definitions. For our purposes a (convex) polytope \(P\) is a bounded set defined by linear inequalities \(P = \{x : Ax \le b\}\). A slack matrix \(S\) for \(P\) is a matrix with rows labeled by constraints (i.e. rows of \(A\), which we denote by \(A_i\) for the \(i\)th row) and columns labeled by points in \(P\) whose convex hull is all of \(P\). The \((i,j)\) entry corresponding to a row labeled by \(A_i\) and column by a point \(v_j\) is \(S(i,j)=b_i – A_i v_j\). That is, the \((i,j)\) entry of \(S\) indicates the slack of \(v_j\) with respect to the \(i^{th}\) constraint. As \(v_j \in P\) it must be the case that \(A_iv_j \le b_i\), so all the entries of the slack matrix are non-negative. An entry is \(0\) exactly when \(v_j\) satisfies the corresponding constraint with equality.
Note that there can be many different slack matrices for a polytope \(P\) as we have some freedom in choosing \(A,b\) and the points in \(P\). The extension complexity of \(P\) is the size of a smallest extended formulation defining \(P\). Yannakakis showed the following.
Theorem 1 (Yannakakis) Let \(P\) be a polytope. For any slack matrix \(S\) of \(P\), the non-negative rank of \(S\) is equal to the extension complexity of \(P\).
Thus in particular the non-negative rank of all slack matrices of \(P\) is the same.
Showing lower bounds on non-negative rank is very difficult, and we have few strong lower bounds for concrete examples. For example, even the following is open:
Question 2: Consider an \(n\)-by-\(n\) matrix \(M\) where \(M(i,j) = (i-j)^2\) for \(i,j \in [n]\). The rank of \(M\) is \(3\). What is the non-negative rank of \(M\)?
It is conjectured that the non-negative rank is \(\Omega(n)\), yet the best lower bound is \(\Omega(\log n)\).
One of the most interesting lower bounds on non-negative rank is for the unique disjointness matrix
from communication complexity. Consider a \(2^n\)-by-\(2^n\) matrix \(M\) indexed by \(n\)-bit strings. If \(x \cap y = \emptyset\) then \(M(x,y)=1\). If \(|x \cap y| =1\) then \(M(x,y)=0\). No matter how you fill out the rest of \(M\) with non-negative numbers, the non-negative rank will be \(2^{\Omega(n)}\). This lower bound follows from the key lemma in Razborov’s \(\Omega(n)\) lower bound on the randomized communication complexity of the disjointness function.
As unique disjointness provides a non-trivial family of matrices for which there is strong lower bound on the non-negative rank, it would be great to embed it in the slack matrix for an interesting combinatorial problem. This is exactly what FMPTW do.
Instead of directly working with TSP, they instead work with the correlation polytope, the convex hull of \(\{ bb^t : b \in \{0,1\}^n\}\). While not as famous as TSP, this is another important and well studied polytope. FMPTW provide a reduction to show that lower bounds on the extension complexity of the correlation polytope provide (slightly weaker) lower bounds on extension complexity of the TSP polytope. We will just focus on showing the lower bound on the extension complexity of the correlation polytope.
Now what matrices can appear as slack matrices of the correlation polytope? Here is a more general, and I think simpler, construction than that given by FMPTW to show a useful class of matrices that are slack matrices for the correlation polytope.
Lemma 3: Let \(p(z) = a + bz+cz^2\) be a single variate degree 2 polynomial that is non-negative on non-negative integers. Consider the matrix \(M(x,y) = p(| x \cap y)|)\) for \(x, y \in \{0,1\}^n\). Then \(M\) is a submatrix of a slack matrix for the correlation polytope.
Proof:
As \(p\) is non-negative on non-negative integers, \(-bz -cz^2 \le a\) is a valid inequality for integers \(z \ge 0\). Note that \(\langle xx^t , yy^t \rangle = | x \cap y|^2\) and \(\langle \mathrm{diag}(x), yy^t \rangle = |x \cap y|\) for \(x, y \in \{0,1\}^n\). Thus \(\langle – b \cdot \mathrm{diag}(x) – c\cdot xx^t, yy^t \rangle \le a\) is a valid inequality, whose slack is \(p(|x \cap y|)\). Note that the columns of \(M\) are labeled by vertices of the correlation polytope \(yy^t\) for \(y \in \{0,1\}^n\) and likewise the constraints are labeled by \(xx^t\) for \(x \in \{0,1\}^n\).
The matrix used by FMPTW is \(F(x,y) = p(|x \cap y|)\) where \(p(z) = (z-1)^2\). By Lemma 3 this matrix is a slack matrix for the correlation polytope and moreover \(F\) is an instance of unique disjointness. Thus we get an exponential lower bound on the non-negative rank of \(F\), and hence the extension complexity of the correlation polytope.
The way that FMPTW came up with this matrix was through—of all things—quantum communication complexity! That story will have to wait for a future post.