Advanced Topics NP-complete reports. Continue on NP, parallelism.
-
Upload
kory-harris -
Category
Documents
-
view
213 -
download
0
Transcript of Advanced Topics NP-complete reports. Continue on NP, parallelism.
Advanced Topics
NP-complete reports. Continue on NP, parallelism
Reprise: Non-determinism
• Informal: add to any algorithm – taking a guess at one or more places– forking and pursuing one or more possibilities
• If there is a Non-deterministic algorithm, then there is a regular/standard algorithm – just try all the possibilities– may take a long time
Reprise: the class P
• … is all problems for which there exist an algorithm with complexity bounded by a polynomial.
Reprise: the class NP
• all problems for which there is an algorithm, possibly non-deterministic, that assuming you take the right paths, is bounded by a polynomial
• Alternative definition: you can check that the answer is correct in polynomial time.
Reprise: does P = NP?
• Is it possible to find actual standard algorithms for these NP problems?
• THE great problem of computer science.
• Proving it false would also be significant.
• Theoretical problem with considerable practical value.
NP complete
• A set of NP problems that can be translated into each other in polynomial time so…
• If one of the problems can be solved in polynomial time– aka tractible
• …. they all can.
NP-hard
• A problem is NP-hard if there is an NP-complete problem that can be translated into it in polynomial time.– but not necessarily the other way.
• NP-hard problems are at least as hard as NP-complete problems.
NP-hard example
• Robot path planning in a dynamic environment
Reports on NP-complete problems
• Tetris
• Knapsack problem
• Steiner Tree problem
• Graph coloring
• Minesweeper
• Subset problem
Note
• There are methods for getting answers to NP problems, but they aren't guaranteed to be optimal.
• Called heuristics or approximations
Distributed computing
• Approach to NP problems: fork a new process
• That is, use distributed computing to investigate the different choices
• Some problems may be embarrassingly parallelizable.
Sources
• Many
• Google: http://code.google.com/edu/parallel/mapreduce-tutorial.html
• Note: there is controversy re: MapReduce– may be issue of patent– Is it the right framework– ??
Concepts
• key/value pair
• Master / Worker
• nodes on network– may be one Master and many Workers
• hashing: quick way to find data (key/value data)
• piece / partition / split / shard
Example from Google tutorial
• Compute pi using many workers, each doing a calculation using pseudo-random function.– no data (NOT typical MapReduce problem)
• Worker picks a random pointin the square. If it is in the circle,worker increments a counter.
• http://faculty.purchase.edu/jeanine.meyer/processing/piEstimate/applet/
Formulas
• Area_of_circle = pi * r2• Area_of_square containing circle = 4 * r2• So r2 = Area_of_square / 4• Let Ac be Area_of_circle and
As be Area_of_square• Then pi = 4 * Ac / As
• Estimate for pi is 4 * counter / Number_of_points_tried
Informal proof
• The chances of any point being in the circle is proportional to the ratio of the areas.
• Choosing many points randomly carries out this test.
• We could [simply] use for-loops and do the calculation for every point.
MapReduce
• Model for distributed (aka parallel) computing• There are different products that implement
MapReduce. From a google search:– Google– Apache Hadoop: Open source– Teradata– Amazon– Greenplum– Platform
MapReduce
• Programmers sets up program for Master and for Workers. Typically, the Master program sets up and partitions input array(s).
• Typically, data is key/value pairs.• Programmers write
– Map functions that process data, possibly making use of functions in the MapReduce library
– Reduce functions that combine the results• Workers work on Map tasks and/or Reduce
tasks. The Map task is applied to the worker's piece (aka shard) of the input array.
MapReduce for pi estimate
• Not typical in that there is no data
• The map function does the calculation
• When all done, the reduce function adds up all the individual counters and calculates the estimate for pi
Speed up for pi estimate• Suppose
– each step (getting the 2 random values and determining if in circle) takes K steps
– suppose 1000 workers calculating all together 1000000 values
– suppose adding 2 numbers takes 1 time unit
• Time without distributed computing: 1000000*K• Time with distributed computing
1000*K + 1000• Speed up is slightly less than 1000
Follow-up
• Look up examples using MapReduce
• Note: one example is Google maintaining its keyword index by scanning (crawling) the web
Speaker
Twitter: @kmwinterfield
• IBM Smarter Cities
• Social media for political campaigns
• World Community Grid
Homework
• Prepare question for Kevin– follow on twitter and send message OR– post on moodle
• Continue with postings
• Research unique NP complete problem and post summary and source!