Online Cake Cutting Toby Walsh NICTA and UNSW Sydney, Australia.
SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney,...
![Page 1: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/1.jpg)
SAT and CSP competitions &
benchmark libraries:some lessons learnt?
Toby WalshNICTA & UNSW
Sydney, Australia
![Page 2: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/2.jpg)
Whats the best way to benchmark systems?
QuickTime™ and a decompressor
are needed to see this picture.
![Page 3: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/3.jpg)
Outline
» Benchmark libraries» Founding CSPLib.org
» Competitions» SAT competition judge» TPTP competition judge» …
![Page 4: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/4.jpg)
Why?
» Why did I set up CSPLib.org» I needed problems against which to benchmark my latest inference techniques
» Zebra and random problems don’t cut it!
» I thought it would help unify and advance the CP community
![Page 5: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/5.jpg)
Random problems
» +ve» Easy to generate» Hard (if chosen from phase transition)
» Impossible to cheat» You can solve 1000 variable random 3SAT problems at l/n=4.2, I’ll be impressed
![Page 6: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/6.jpg)
Random problems
» -ve» Lack structures found in real world» Unrepresentative
» E.g. random 3SAT either have many solutions or none
» Different methods work well on them» Random SAT: forward looking algorithms» Industrial SAT: backward looking algorithms
![Page 7: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/7.jpg)
Why?
» Thesis: every mature field has a benchmark library» Deduction started in 1960s
» TPTP set up in 1993
» SAT started in 1960s» SAT DIMACS challenge in 1992» SATLib set up in 1999
» CP started in 1970s» CSPLib set up in 1998
![Page 8: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/8.jpg)
Why?
» Thesis: every mature field has a benchmark library» Spatial and temporal reasoning started in early 80s (or before?)
» It’s been approximately 30 years so it’s about time you guys set one up!
![Page 9: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/9.jpg)
Benchmark libraries» CSPLib.org
» Over 35k unique visitors
» Still not everything I’d want it to be
» But state of the art for experimentation is now much better than it was» I haven’t seen a zebra for a very long time
QuickTime™ and a decompressor
are needed to see this picture.
![Page 10: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/10.jpg)
An ideal library
» Desiderata taken from:» CSPLib: a benchmark library for constraints, Proc. CP-99
QuickTime™ and a decompressor
are needed to see this picture.
![Page 11: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/11.jpg)
An ideal library
» Location» On the web and easy to find
» TPTP.org» CSPLib.org» SATLib.org» QBFLib.org» …» http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsplib.html
» http://mat.gsia.cmu.edu/COLOR/instances.html
![Page 12: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/12.jpg)
An ideal library
» Easy to use» Tools to make benchmarking as painless as possible
» tptp2X, …
» Diverse» To help prevent over-fitting
![Page 13: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/13.jpg)
An ideal library
» Large» Growing continuously» Again helps to prevent over-fitting
» Extensible» To new problems or domains
![Page 14: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/14.jpg)
An ideal library
» Complete» One stop for your problems
» Topical» For instance, it should report current best solutions found
![Page 15: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/15.jpg)
An ideal library
» Independent» Not tied to a particular solver or proprietary input language
» Mix of difficulties» Hard and easy problems» Solved and open problems» With perhaps even a difficulty index?
![Page 16: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/16.jpg)
An ideal library
» Accurate» It should be trusted
» Used» A valued resource for the community
![Page 17: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/17.jpg)
Problem format
» Lo-tech or hi-tech?
QuickTime™ and a decompressor
are needed to see this picture.
![Page 18: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/18.jpg)
Lo-tech formats
» DIMACS format used in SATLib
c a simple examplep cnf 3 21 -1 01 2 3 0
This represents: x v -x, x or y or z
![Page 19: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/19.jpg)
Lo-tech formats
» DIMACS format used in SATLib» +ve
» All programming languages can read integers!
» Small amount of extensibility built in (e.g. QBF)
» -ve» Larger extensions are problematic (e.g. beyond CNF to arbitrary Boolean circuits)
![Page 20: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/20.jpg)
Hi-tech formats
» CP competition<instance>
<presentation name="4-queens" description="This problem involves placing 4 queens on a chessboard" nbSolutions="at least 1" format="XCSP1.1 (XML CSP Representation 1.1)"
/> <domains nbDomains="1">
<domain name="dom0" nbValues="4" values="1..4" /> </domains> <variables nbVariables="4"> <variable name="X0" domain="dom0"/>
… </variables>
<relations nbRelations="3"> <relation
name="rel0" domain="dom0 dom0” nbConflicts="10 conflicts="(1,1)(1,2)(2,1)(2,2)(2,3)(3,2)(3,3)(3,4)(4,3)(4,4)" />
… </relations > <constraints nbConstraints="6">
<constraint name="C0" scope="X0 X1" relation="rel0"/>…
![Page 21: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/21.jpg)
Hi-tech formats
» XML» +ve
» Easy to extend» Parsing tools can be provided
» -ve» Complex and verbose» Computers can parse terse structures easily
![Page 22: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/22.jpg)
No-tech formats
» CSPLib» Problems are specified in natural language» No agreement at that time for an input language
» One focus was on how you model a problem
» Today there is more consensus on modelling languages like Zinc
![Page 23: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/23.jpg)
No-tech formats
» CSPLib» Problems are specified in natural language
» But you can still provide in one place» Input data» Results» Code» Parsers …
![Page 24: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/24.jpg)
Getting problems
» Submit them yourself» Initially, you must do this so library has some critical mass first time people look at it
» But it becomes tiresome and unrepresentative to do so continually
» Ask at every talk» Tried for several years but it (almost) never worked
![Page 25: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/25.jpg)
Getting problems
» Need some incentive» Offer money?» Price of entry for the competition?» If you have a competition, users will submit problems that their solver is good at?
![Page 26: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/26.jpg)
Competitions
QuickTime™ and a decompressor
are needed to see this picture.
![Page 27: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/27.jpg)
Libraries + Competitions
» You can have a library without a competition» But you can’t have a competition without a library
QuickTime™ and a decompressor
are needed to see this picture.
![Page 28: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/28.jpg)
Libraries + Competitions
» Libraries then competition» TPTP then CASC» Easy and safe!
» Libraries and competition» Planning» RoboCup» …
![Page 29: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/29.jpg)
Increasing complexity
» Constraints» 1st year, binary extensional» 2nd year, limited number of globals» 3rd year, unlimited
» Planning» Increasing complexity» Time, metrics, uncertainty, …
QuickTime™ and a decompressor
are needed to see this picture.
![Page 30: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/30.jpg)
Benefits
» Gets ideas implemented
» Rewards engineering» Progress needs both science and engineering!
» Puts it all together
QuickTime™ and a decompressor
are needed to see this picture.
![Page 31: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/31.jpg)
Benefits
» Gives greater importance to important low-level issues» In SAT:
» Watched literals» VSIDS» …
![Page 32: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/32.jpg)
Benefits
» Witness the progress in SAT» 1985, 10s vars» 1995, 100s vars» 2005, 1000s vars» …» Not just Moore’s law at play!
![Page 33: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/33.jpg)
Pitfalls
» Competitions require lots of work» Organizers get limited (academic) reward
» One solution is to organize also competition special issues
QuickTime™ and a decompressor
are needed to see this picture.
![Page 34: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/34.jpg)
Pitfalls
» Competitions encourage incremental improvements» Don’t have them too often!
» You may discover a local minimum» E.g. MDPs for speech recognition» Give out best new solver prize?
![Page 35: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/35.jpg)
The Chaff story
» Industrial problems, SAT & UNSAT instances» 2008, 1st MiniSAT (son of zChaff)» 2007, 1st RSAT (son of MiniSAT)» 2006, 1st MiniSAT» 2005, 1st SatELite GTI
(MiniSAT+preprocessor)» 2004, 1st zChaff (Forklift from 2003 was
better)» 2003, 1st Forklift» 2002, 1st zChaff
![Page 36: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/36.jpg)
Other issues
» Man-power» Organizers
» One is not enough?
» Judges» All rules need interpretation
» Compute-power» Find a friendly cluster
QuickTime™ and a decompressor
are needed to see this picture.
![Page 37: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/37.jpg)
Other issues
» Multiple tracks» SAT/UNSAT» Random/industrial/crafted» …» Certificate/Uncertificated
![Page 38: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/38.jpg)
Other issues
» Holding problems back if possible» Release some problems so competitors can ensure solver compliance
» But hold most back so competition is blind!
![Page 39: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/39.jpg)
Other issues
» Multiple phases» Too many solvers for all to compete with long timeouts
» First phase to test correctness » Second phase to throw out the slow solvers (who cost you many timeouts)
» Third phase to differentiate between better solvers
![Page 40: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/40.jpg)
Other issues
» Reward function» <#completed, average time, …>» solution purse + speed purse
» Points for each problem divided between those solvers that solve it
» Getting buy in from competitors» It will (and should) evolve over time!
![Page 41: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/41.jpg)
Other issues
» Prizes» Give out many!» Good for people’s CVs
» Good motivator for future years
QuickTime™ and a decompressor
are needed to see this picture.
![Page 42: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/42.jpg)
Other issues
» Open or closed source?» Open to share progress» Closed to get the best
» Last year’s winner» Condition of entry» To see progress is being made!
![Page 43: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/43.jpg)
Other issues
» Smallest unsolved problem» Give a prize!
» Timing» Run during the conference» Creates a buzz so people enter next year» Get a slot in program to discuss results» Get a slot in banquet to give out prizes
![Page 44: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/44.jpg)
Conclusions
» Benchmark libraries» When an area is several decades old, why wouldn’t you have one?
» Competitions» Designed well, held not too frequently, & with buy-in from the community, why wouldn’t you?
![Page 45: SAT and CSP competitions & benchmark libraries: some lessons learnt? Toby Walsh NICTA & UNSW Sydney, Australia.](https://reader035.fdocuments.us/reader035/viewer/2022062421/56649d2b5503460f949fffc1/html5/thumbnails/45.jpg)
Questions
» Disagreements» Other opinions» Different experiences
» …QuickTime™ and a
decompressorare needed to see this picture.