Lower Bounds for Property Testing Luca Trevisan U C Berkeley.
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
2
Transcript of Lower Bounds for Property Testing Luca Trevisan U C Berkeley.
Lower Bounds for Property Testing
Luca Trevisan
U C Berkeley
Sub-linear Time Algorithms
• Want to design algorithms that run in less than linear time– cannot read entire input– must be probabilistic and approximate
• For optimization problems: – compute numerical apx of optimum cost
(and implicit representation of apx solution?)
• For decision problems:– what is approximation?
Graph Property Testing [GGR]
Testing a property P with accuracy • Given graph G that has property P
– accept with probability >3/4• Given graph G that is -far from property P
– accept with probability <1/4-far = must change –fraction of
representation of G to get property P
Intuition: input (not output) is approximate
Different Representations
• G is represented as adjacency matrix– -far = must add/remove n2 edges
• G has max degree d and is represented using adjacency lists– -far = must add/remove dn edges
(Some extra subtleties in bounded-degree case)
Purpose of This Talk
• Discuss algorithms and lower bounds for– Sub-linear time property testing for some
basic graph properties– Sub-linear time approximation algorithms
for some basic optimization problems
(we’ll mostly discuss lower bounds)
Motivations
• Large data sets– web, wall-mart, amazon, phone calls, . . .– linear time can still be infeasibleFine print: most research on property testing focuses on problems having no
connection to applications with large data sets
• Goal for theory research– Develop general algorithmic techniques
(like dynamic programming, local search, … for P)
– Develop general techniques for impossibility results(like NP-completeness)
Property Testing and Approximation in
Adjacency Matrix Representation
Bipartiteness Algorithm [GGR,AK]
Testing bipartiteness of a given graph G• Pick (1/)polylog(1/) vertices, and check if
they induce a bipartite graph; if so accept otherwise reject
• If G is bipartite then alg accepts with prob 1• If G is -far from bipartite, then whp algorithm
discovers an odd cycle (non-trivial to prove)
• Running time: O ((1/)polylog(1/))
Lower Bounds [BT]
• (1/1.5) for adaptive algorithms
• (1/2) for non-adaptive algorithms
• The bounds apply to the ‘query complexity’ of the algorithm(and to running time for a stronger reason)
Proof for one-sided error case
• Pick a random graph with edge-probability 3– whp it is -far from bipartite
• Consider view of (possibly adaptive) algorithm that makes q ‘queries’ and finds odd cycle w.h.p.– sees (q) edges and O(2q2) pairs of connected vertices
– a cycle can be discovered only by querying two vertices in same connected component
– it takes (1/) such attempts
– q= (1/1.5 )
One-sided error non-adaptive
• Pick a random graph with edge-probability 3• Consider view of non-adaptive algorithm that
makes q ‘queries’• Same as:
– Start with q-edges graph– Independently delete each edge with prob 1-
• If q=o(1/2) then view is a forest w.p. 1-o(1)– Proof: There are at most O(qt/2) cycles of length t
Two-Sided Error• Two distributions:• Gfar: random graph with edge probability 3• Gbip: first random partition, then each edge
crossing partition exists with prob 6• Distributions indistinguishable by
– Non-adaptive algorithms of query complexity o(1/2)
– Adaptive algorithms of query complexity o(1/1.5)
Both tight for these distributions
Generality/Lessons
• Possible lesson: try random graph as a possible distribution of ‘hard’ instances far from having the properties
• Not good for “Triangle freeness” property whose complexity is possibly most interesting open question in the adjacency matrix model.
Triangle-free Graphs
• Want to distinguish triangle-free graphs from graphs where need to remove n2 edges to break all triangles
• Solvable in time super-exponential in 1/• Polynomial in 1/ is impossible [Alon]
• 2poly(1/) possible?
• Simplest special case of more general (and important) question
Sublinear Time Approximation
• Max CUT and other graph problems can be approximated within (1+) in graphs with at least n2 edges in time 2poly(1/) [GGR]
• Max 3SAT can be approximated within (1+) in instances with at least n3 clauses in time 2poly(1/) and similar results for other satisfiability problems [AFKK]
• Lower bounds?
Property Testing and Approximation in
Adjacency List Representation
Bipartiteness [GR]
Testing bipartiteness• Repeat polylog n times:
– Start at random point, and pick sqrt(n) random walks of length polylog n, if two of them combine to form an odd cycle reject, otherwise accept
• Analysis: – in a graph where you need to remove constant
fraction of edges to make it bipartite, algorithm finds odd cycle
Matching Lower Bound [GR]
• Define two distributions of graphs:– Gfar: a random hamiltonian circuit, plus a random
matching(whp 1/100-far from bipartite)
– Gbip: a random hamiltonian circuit, plus a random matching conditioned on making the graph bipartite
• Gfar and Gbip are indistinguishable to algorithms of query complexity o(sqrt(n)).
Approximation Algorithms
• Minimum spanning tree– given a connected weighted graph of degree d
with weights in range {1,…,w}, can approximate MST weight within (1+) in time about O(dw/2)[Chazelle, Rubinfeld, T]
• Max SAT– Given a CNF where every variable occurs at most
d times, can approximate Max SAT optimum within .618, presumably also 2/3, in O(d) time[Hopefully will get 3/4-]
Testing 3-Colorability
• NP-hard in adjacency list representation• Only for small enough
– Can find 3-coloring good for 80% of the edges in a 3-colorable graph using SDP
– NP-hard to find 3-coloring good for 98% (?) fraction of edges
• Gives non-tight, and conditional lower bound for query complexity
Other Problems
• Query complexity of following problems is ‘equivalent’ to query complexity of testing 3col – Testing satisfiability of 3SAT instance
• Every variable occurs in O(1) clauses, “adjacency list” representation
– Approximating max cut, vertex cover, independent set, . . ., in bounded-degree graphs
– Approximating Max SAT, Max 2SAT, . . .
• Lower bound of sqrt(n) for all problems– Reduction from bipartiteness
Tight Lower Bound [BOT]
• For one-sided error algorithms:– (n) query complexity to distinguish
3-colorable graphs from graphs that are (1/3 – )-far
– Lower bound applies to testing problems that are solvable in polynomial time
• For two-sided error algorithms:– For some , (n) query complexity to distinguish
3-colorable graphs from graphs that are -far.
Using Reductions. . .
• Unconditionally, algorithms running in time o(n) cannot:– Approximate Max 3SAT better than 7/8– Approximate Max Cut in bounded-degree graphs
better than 16/17– . . .
• Hastad’97 proved above problems are NP-hard
The 3-Coloring Lower Bound
• Consider first one-sided error algorithms• It’s enough to find a graph G that is (1/3 – )-
far from 3-colorable, but every subgraph of size < n is 3-colorable– (for every there is an such that . . .)
• Then an algorithm of query complexity < n either accepts G (which is wrong) or rejects some 3-colorable graph (which means the algorithm has not one-sided error)
The Graph
• Pick a graph of degree O(1/2) at random (pick so many random matchings)
• Then it is (1/3 – )-far whp• But, for some , whp, every subgraph induced
by k < n vertices contains <1.5k edges• In a minimal non-3-colorable graph, every vertex
has degree at least 3• Every subgraph induced by < n vertices is 3-
colorable
[Erdos]
Derandomization
• For constants d, , , and for every suff large n, we can explicitly construct a graph – on n vertices, – max degree d,– -far from 3-colorable, – such that every subset of n vertices
induces a 3-colorable subgraph.
Two-Sided Error Algorithms
• Need to define two distributions of graphs Gcol and Gfar such that
• Graphs in Gcol are (almost) always 3-colorable• Graphs in Gfar are (almost) always far from 3-
colorable• To an algorithm of bounded query complexity,
Gcol and Gfar look (almost) the same
Main Step
• Define two distributions Dsat and Dfar of instances of E3LIN-2(systems over GF(2) with 3 variables per equation)– Systems in Dsat are always satisfiable– Systems in Dfar are (almost) always (1/2-)-far from
satisfiable– To an algorithm of bounded query complexity, Dsat and
Dfar look the same
• We get Gcol and Gfar using reduction fromapproximate E3LIN-2 to approximate 3-coloring
E3LIN-2
X1 + X3 + X10 = 0 mod 2
X2 + X3 + X4 = 1 mod 2
X1 + X2 + X9 = 0 mod 2
. . .
Main Building Block
• We show that for every c there is such that there exists a left-hand side with– n variables, cn equations, 3 variables per equations,
every variable occurs in 3c equations– every n equations are linearly independent
• Pick the left-hand side at random– repeat 3c times: pick at random a set of n/3 disjoint
triples of variables
• Explicit construction?– Need strong unique-neighbor expanders
Distributions
• The left-hand side is always as before• In Dsat, we pick a random assignment to the
variables, and set right-hand side consistently– always satisfiable
• In Dfar, we pick the right-hand side uniformly at random– With high probability, (1/2 – O(1/sqrt c))-far
Indistinguishability
• Two distributions differ only in right-hand side• In Dfar uniformly distributed• In Dsat, n-wise independent
– Linear independence implies statistical independence
• Look the same to algorithm that sees less than n equations
Conclusion of the Argument
• No algorithm of “query complexity” o(n) can distinguish satisfiable instances of E3LIN-2 from instances that are (1/2-)-far from satisfiable
• For some , no algorithm of query complexity o(n) can distinguish 3-colorable graphs from graphs that –far from 3-col.
• No algorithm of query complexity o(n) can approximate Max 3SAT better than 7/8 . . .
Generality/Lessons
• Reductions are useful and extend results to several problems
• In adjacency matrix (dense graph) setting, several and general algorithms. Few and ad-hoc lower bounds
• In adjacency list (sparse graph) setting, vice versa.
Open Questions
• Show that distinguishing 3-colorable graphs from (1/3-)-far graphs requires query complexity (n)– we can only prove it for one-sided error
• Show that approximating Max SAT better than ¾ and Max CUT bettter than ½ requires query complexity (n)– we only know (sqrt(n)) [implicit in GR]– would “explain” why we need SDP