NP-complete

Computational Complexity
Polynomial function
Defining complexity class
Complexity class P
Complexity class NP
NP-completeness
    Polynomial reduction
    Definition
    How to prove that a problem is NP-complete
    Example of NP-complete proof
NP-hard
Picture of complexity space
We study how hard or how easy to solve a computation problem.  There are class of problems which existed algorithms that are efficient to solve them.  Some class of problems are known to be so difficult that they are impossible to solve efficiently.  The difficulty of some class of problems is unknown, we do not know any algorithm to solve them efficiently and yet we don't know how to prove it.

efficiency of algorithm  easy/hard   table 13-2 Somchaip p 323

The design and analysis of algorithms that we have seen previously focuses on algorithms.  We study classes of algorithms that solve problems.  There are algorithms which we can analyse. To study the difficulty of problems we can not study the algorithms to solve them.  Instead, the focus is on the problems.

Computational Complexity

Theory of complexity (or computational complexity) is the study of the difficulty of problems.  Given an algorithm to solve a problem in big-oh( f(n) ), we want f(n) to be as small as possible.  Contrast to the study of complexity, we want to find the largest g(n) that can be prove that ANY algorithm to solve this problem must be in big-ohmega( g(n)).  That is, g(n) is the lower bound on complexity of the problem. The tighter lower bound means we have better knowledge about the problem.  We are satisfied when f(n) is-in big-theta(g(n)).  That means, our knowledge about the problem and how to solve is complete.  As we will see, for most classes of problems we have not yet approach this state of knowledge.

Remember that we do not study any concrete algorithm hence our analysis will be only "proving" properties about problems.

There are several approach to find complexity of a problem.  I will show three approches, two of which will be just briefly explained and we will concentrate on the last approach:

1)  information-theoretic approach

This approach analyses the information required to solve the problem.  For example, a 20-question problem can be formed into a decision tree. The height of the tree (as a function of the problem size) is the complexity of the problem.  Finding the lower bound of a comparison-based sorting problem ( big-oh( n log n )) uses this approach.

2)  adversary argument

This approach uses an adversary, an imaginary agent, that make a solver work as hard as possible to solve the problem.  Imagine again a game of guessing a number, I think of a number and you must guess what it is by asking only the question: is this number "bigger", "smaller", "equal" to n?  Now, I try to make you work as hard as possible (asking as many question as possible).  In the beginning, I do not fix any number.  When being asked a question, I give the answer that is consistent with all previous answers and yet try not commit to any number.  Arguing this way, the guesser must asked not less than what is absolutely necessary to find the answer ( lg n ).  This amount of asking is the difficulty of the problem.

3) reduction approach

This approch compares the difficulty of a problem to the problem of known difficulty.  The assertion is in the form: problem A is not harder to solve than problem B.  We prove it by providing a way to transform instances of problem A to instances of problem B.  This process is called "reduction".  Reduction will be discussed in depth in later lecture.

Polynomial function

The polynomial function (the running time in big-oh(n^k) ) is proposed be the "dividing line" between easy/hard problems.  This function is robust, the analysis is not dependent on machines, languages.  The operations on polynomial are closed (the result is polynomial) hence simplified the analysis.  It has been tested for the last 40 years and found to be suitable.

Using polynomial, all problems the we studied previously are classified as "easy".  Examples of hard problems are:  Traveling saleman problem (TSP), graph colouring, 0/1 knapsack, Hamilton graph, longest simple path in a graph, satisfiability problem.  This group of problems has many important practical applications and yet we do not have any efficient algorithm to solve them.  However, by the study of complexity they are known to belong to the same class.  If we can find an algorithm to solve one  of these problems efficiently then there exist efficient algorithms to solve all of them.  This is very significant!

The examples of problems that we can prove to be too difficult to solve by computers are: diophantine equation, halting problem.  They are proven to be non-computable.

Word of caution about efficiency.

Using polynomial to distinguish between easy and hard problem is sound, however remember that the asymptotic analysis concerns "growth" of function, i.e. when the size of problem grow toward infinity, the threshold for the size is n > n0.  See the following example:
Algorithm A: big-theta(n^ (lg lg n))
algorithm B: big-theta(n^ 10)
By definition A is efficient and B is inefficient.  However, for that to be true the size of the problem must be larger than n0 = 10^300.

Defining class of complexity

We will use formal language framework to define class of complexity.  The problems that will be studied are restricted to "decision" problems, i.e. the problems that have the answer yes/no.  For "optimisation" problems, they can be transformed into decision problems by including a constant bound, for example, TSP can be transformed into "is there a tour that cost at most k?"

An algorithm A "accepts" a string x is-in {0, 1}* when given the input x the algorithm outputs A(x) = 1.  The language accepted by A is the set L = { x is-in {0, 1}* : A(x) =1 }.  A "rejects" x when A(x) = 0.

L accepted by A, however A might not reject x not-in L.  For example, the program stuck in a loop.  L is said to be "decided" by A if all input is either accepted of rejected by A.

We can informally define a complexity class as a set of languages that an algorithm can determine whether a given string x belongs to language L.

The complexity class P

Def:  P = { L subset-of {0, 1}* : there exits an algorithm A that decides L in polynomial time}.

Thm 36.2  P = {L: L is accepted by a polynomial-time algorithm}

(read the proof in CLR p.923)

Please note that this definition does not allow probabilistic algorithm s to be in P.

The complexity class NP

There are problems that are very hard to solve which the validity of the proposed solution can be "verified" efficiently.  For example, given a graph G = <V,E> find a Hamilton path (one which visit every node once and is a cycle). Finding the Hamilton is very difficult but given a proposed solution it is easy to verify if it is indeed a Hamilton path.  Another example is the satisfying Boolean formula, given a Boolean formula, find a truth assignment to literals that make the formula to be true.  The truth assignment is very difficult to find for the possible assignment is 2^n when n is the number of literals.  However, verifying whether a proposed assignment make the formula satisfiable is easy.

A proof system for a decision problem X is a set F of pair <x,q>.  For any x is-in X there must exist at least one q is-in Q such that <x,q> is-in F; on the other hand no such q exist when x is-not-in X.  To convince someone that x is-in X, it suffices to show some q is-in Q such taht <x,q> is-in F.  Any q such that <x,q> is-in F is called a "certificate" that x is-in X.

NP is the class of decision problem that haves an "efficient" proof system.  In other words, NP is the class of problems that can be verified in polynomial-time.

Def. NP is the class of decision problem X that: q is a certificate, the size of q is at most polynomial in the size of x (an instant of X) and there exists a polynomial-time algorithm that can verify whether or not <x,q> is-in  F where F is a proof system for X.

NOTE: NP stands for Nondeterministic Polynomial-time.  NP was originally defined as the class of problems that can be solved by a nondeterministic algorithm in polynomial time.  Although the definition is diffirent but it is equivalent to the one we presented here.

Whether this class of problem is easy or difficult to solve is still unknown (no proof either way).

The first theorem establishes a relation between P and NP.

Theorem:  P is-subset of NP

Consider an arbitrary decision problem X is-in P.  The solution can be found.  We can make a trivial certificate Q = {0} for any yes-instance and no certificate for no-instance.  This certificate can be verified in polynomial time because the solution has been found in polynomial time (X is-in P). QED

The central open question is whether or not the set inclusion is proper.
     P = NP ?

NP-completeness

Many problems have not known efficient algorithms to solve them but no one has yet manage to prove their difficulty.  May be there exist efficient algorithms for these problems but we did not yet discovered.  Perhaps these problems are really hard but we do not have the mathematical technique to prove it.

What have been accomplished is to establish a class of problems that have similar difficulty called "NP-complete".  The implication of NP-complete is this:  if there is an efficienty algorithm to solve any one of problems in NP-complete then it would automatically provide efficient algorithms for all of them.

To state the NP-complete class we need the notion of "polynomial reduction".

Polynomial reduction

Def:  Let X and Y be two decision problems.  X is "polynomially many to one reducible" to Y if there exists a function f which is computable in polynomial time, such that x is-in X iff f(x) is-in Y.  Denote by X <=p Y.

Let Ax be the algorithm to solve X, Ay be the algorithm to solve Y.  The concept of reduction can be illustrated as the diagram below:
          Ax
   x  --> [ --> f(x) ---> [Ay] --> ]  -->

If there exists polynomial time algorithm Ay to solve Y then f(x) plus Ay can solve X in polynomial time.  (because f(x) can be computed in polynomial time).

for X <=p Y can be read as "X is not harder to solve than Y".

Example, we prove the polynomial equivalence of two Hamilton cycle problems.  Let HAM be the problem of finding a Hamilton cycle in a graph if one exists, we allow it to return an arbritary answer when presented with non-Hamilton graph.  Let HAMD be decision problem, deciding whether a given graph is Hamiltonian.

Thm:  HAM =p HAMD

Proof:  first we prove the direction HAMD <=p HAM.

function HamD(G)
  d = Ham(G)
  if d defines a Hamilton cycle in G
    then return TRUE
    else return FALSE

This algorithm solves HAMD correctly provided that Ham solves HAM correctly.  It is clear that HAMD takes polynomial time if Ham is counted at unit cost.

now, the other direction HAM <=p HAMD.

Consider each edge in turn.  Remove the edge and ask if the graph still be Hamiltonian.  The edge is kept only when removing it make the graph non-Hamiltonian.  When all is done the remaining graph will contain a Hamilton cycle.

function  Ham(G)
  if HamD(G) = FALSE then return "no solution"
  for each e is-in E
    if HamD( <V,E-{e}> ) then V = V -{e}
  return unique cycle in remaining G

Ham takes polynomial time if we count each call on HamD at unit cost.

Example of reduction:  Let TSPD be decision version of TSP, finding a tour in the graph, with costs on the edge, that visit every nodes once and is a cycle, whose cost is minimum.  For the decision version: decide whether or not there exists a valid tour with cost lower than L.

Prove that HAMD <=p TSPD

Proof Let G be a graph with n nodes.  f(G) be an instance of TSPD with the cost function:

c(u,v) = 1  if {u,v} is-in V
       = 2  otherwise

bound L = n.  Any Hamilton cycle in G translates into a tour in H that cost exactly n. If there are no Hamilton cycle in G, any tour in H must use at least one edge of cost 2, thus the total cost must be at least n+1.   G is a yes-instance of HAMD iff f(G) = <H,c,L> is a yes-instance of TSPD.

Example:  Let SAT-3-CNT be satisfiable problem of Boolean formula in conjunctive normal form that has at most 3 literals per clause.  Let 3COL be a problem of deciding if a given graph can be 3 colours.
     SAT-3-CNT <=p 3COL

Build a graph that correspond to Boolean formula.
Consider first, the literals assignment.

<graph representation of Boolean variable truth assignment>

now consider the clause.

<widget that represents a clause>

Any valid three-colouring of the graph provides a truth assignment for the Boolean formula and vice versa.  Thus, the graph can be painted with 3 colours iff the formula is satisfiable.  The construction graph can be done efficiently containing 3 + 2t + 6k nodes and 3 + 3t + 12k edges where t is the number of literals and k be the number of clauses in the Boolean formula.

============================================
Def: A decision problem X is NP-complete if
1)  X is-in NP
2)  Y <=p X for every problem Y is-in NP
============================================

This definition is one of the most important discovery in computer science.  NP-complete can be said to be "the most difficult problem in NP".

How to prove that a problem is NP-complete?

We can use the problem that already know to be NP-complete.

Theorem:  Let X be an NP-complete problem.  A decision problem Y is-in NP if X <=p Y then Y is also NP-complete.

Proof:  By definition of NP-complete, the first condition Y is-in NP.  The second condition, consider an arbitrary Z is-in NP.  X is NP-complete and Z is-in NP, therefore Z <=p X.  By assumption X <=p Y, hence by transitive rule of reduction
Z <=p  Y  which is the necessary condition for Y to be NP-complete.

Therefore to prove a problem Y to be NP-complete there are two steps:
1) prove Y is-in NP.
2) we use the known NP-complete X and reduce it to the new problem.  X <=p Y.

The first proof of NP-complete came from Cook (1971) with the SAT-CNF (satisfiability of Boolean formula in conjunctive normal form).  Its proof is too difficult to be presented here.

Example of NP-completeness proof

Thm:  SAT is NP-complete
Proof:
1) SAT is-in NP  (assignment is verifiable)
2) SAT-CNF <=p SAT
since SAT-CNF is a restricted form of SAT, any formula in SAT-CNF can be transformed to SAT (directly) efficiently.

Thm:  SAT-3-CNF is NP-complete
Proof:
1)  SAT-3-CNF is NP (assignment is verifiable)
2)  Now we have two NP-complete problems to be used: SAT-CNF and SAT. We can do either  SAT-CNF <=p SAT-3-CNF or
SAT <=p SAT-3-CNF.  We choose
SAT-CNF <=p SAT-3-CNF
We must find f such that if B is a formula in SAT-CNF, E = f(B) is a formula in SAT-3-CNF such that E is satisfiable iff B is satisfiable.  Consider one clause with k literals.

case 1:  k <=3 , E = B.
case 2:  k = 4 , let B = l1 + l2 + l3 + l4, add a u
            E = (l1 + l2 + u)(/u + l3 + l4)  where / is negate.
            if B is satisfiable, we can assign u such that E is satisfiable.
case 3:  k > 4
            E = (l1 + l2 + u1)(/u1 + l3 + u2) . . . (/u_k-1, l_k-1, l_k)
all clauses can be transformed the same way.
the additional size of u will be in polynomial of size of original literals.  The transformation can be done in polynomial time.

example:  B = (p + /q + r + s)(/r + s)(/p + s + /x + v + /w)
E = (p + /q + u1)(/u1 + r + s)(/r + s)(/p + s + u2)
 (/u2 + /x + u3)(/u3 + v + /w)

More reductions

The classic six of NP-complete from Garey and Johnson (1979) is shown below:

SAT <=p 3SAT
3SAT <=p 3DM,  3SAT <=p VC
3DM <= PARTITION
VC <=p  HC,  VC <=p CLIQUE

where

3SAT is a restricted form of SAT where each clause contains exactly 3 literals.  (in CNF)

3DM is a 3 dimensional matching.  A set M  is-subset W x X x Y wehre W, X, Y are disjoint and have the same number of q elements.  Does M contain a matching  M' is-subset of M, size of M' is q and no two elements of M' agree in any coordinate. It is generalise of the stable marriage problem (which is 2D, stable marriage has polynomial algorithm).

VC is vertex cover.  Is there a vertex cover of size K <= |V| of graph G, a subset V' of V such that |V'| <= K and for each edge {u,v} is-in E, at least one of u and v belong to V'?

CLIQUE Does a graph G contain a clique of size J <= |V| or more, that is, a subset  V' of V such that |V'| >= J and every two vertices in V' are joined by an edge in E?  ( a clique is a complete graph)

HC Hamilton circuit. Does G contain a Hamilton circuit, that is, an ordering <v1, v2, ... vn> of V in G, where n = |V|, such that {vn,v1} is-in E and {vi, vi+1} is-in E for all i, 1<=i<=n?

PARTITION.  A finite set A, and a size s(a) is-in Z+ for each a is-in A.  Is there a subset A' of A such that
  (sum a is-in A') s(a) = ( sum a is-in A-A') s(a) ?

Currently there are thousand of problems proven to be NP-complete and the size of this set keep on enlarging.

NP-hard

P and NP are defined for decision problems.  To prove that a problem X is highly likely to be difficult it is often enough to prove that a NP-complete Y,  Y <=p X.  X does not have to be in NP.  We call X, NP-hard.  The obvious advantage of NP-hard notion is that the problem can be general such as finding a chromatic number of a graph.  (COLC optimal graph colouring).

It is obvious that 3COL <=p COLC (a graph is 3-colourable when it chromatic number is either 0,1,2,3).  However COLC is-not-in NP.  Knowing that a problem is NP-hard tell us a lot that it is likely to be very difficult.  Knowing that the problem is-in NP is not as useful.

Beside NP-hard, there are decision problems that are not is NP.  For example, the "exact" problem, can a graph be colour exactly with k colours?  We can imagine how to verify that a graph can be k-colourable, however it is unlikely to find a certificate that a graph "can not" be coloured with less/more than k colours.  Another example is the unique satisfiability problem, is it true that there is only one truth assignment that make a Boolean formula true?

Picture of complexity space

P is-subset  NP
NPC is-subset NP
NPC intersect P is empty

End NP-complete  Last update 21 August 2002