Cooperative Computing for Data Intensive Science

download Cooperative Computing for Data Intensive Science

of 15

  • date post

  • Category


  • view

  • download


Embed Size (px)


Cooperative Computing for Data Intensive Science. Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008. What is Cooperative Computing?. By combining our computing and storage resources together, we can attack problems larger than we could alone. - PowerPoint PPT Presentation

Transcript of Cooperative Computing for Data Intensive Science

  • Cooperative Computing for Data Intensive ScienceDouglas ThainUniversity of Notre DameNSF Bridges to Engineering 2020 Conference12 March 2008

  • What is Cooperative Computing?By combining our computing and storage resources together, we can attack problems larger than we could alone.I can use your computer when it is idle, and vice versa. (Most computers are idle about 90 percent of the day.)Also known asGrid computing, distributed computing, metacomputing, volunteer computing, etc

  • Who Needs Coop Computing?Many fields of study rely on simulation and data processing to conduct science.Physics, chemistry, biology, engineering, finance, sociology, computer science.

    More Computing == Better ResultsNOT High Performance: Speed up one program.High Throughput: Produce as many results as possible over the next day / week / year.

  • Cooperative Computing LabWe design and build distributed systems that helps people to attack BIG problems.Work directly with end users to make sure that our solutions affect the real world.Operate a modest computing system as both a production service and a research testbed.Currently about 500 cpus and 300 disks.CS Research challenges: scalability, robustness, usability, debugging, and performance.

  • What Makes this Challenging?The Programming ModelI want to process 10 TB of data on 100 machines, then distribute it across 20 disks, then view the best results on my workstation.Fault ToleranceSomething is always broken!Performance RobustnessThere is always one slowpoke.DebuggingMy job runs correctly here but not there...!?

  • An Example Collaboration:

    Biometrics ResearchandDistributed Systems

  • A Common Pattern in BiometricsSample Workload:4000 images256KB each1s per F185 CPU-days

    Future Workload:60000 images1MB each0.1s per F4166 CPU-days

  • Non-Expert User Using 500 CPUs

  • All Pairs Production SystemWeb Portal

    300 active storage units500 CPUs, 40TB diskFGHSTAll-PairsEngine2 - AllPairs(F,S)FFFFFF3 - O(log n) distributionby spanning tree.6 - Return resultmatrix to user.1 - Upload F and Sinto web portal.5 - Collect andassemble results.4 Choose optimal partitioningand submit batch jobs.

  • Some Results on Real Workload

  • Collaboration is Where the Interesting Problems Are! (Cooperative ComputingProvides the Resources)

  • What Makes a Collaboration Work?Like a marriage? (old joke.)First, a show of commitment: go after some low hanging fruit, and publish it.A proposal for funding only succeeds if you have already started working together.Need very concrete goals: your partner may not share your idea of an interesting tangent.Students sometimes need a big push to leave their comfort zone and work together.

  • For more informationDouglas Thaindthain@nd.eduCooperative Computing Lab

    Apply for Summer 2008 REU: by NSF Grants CCF-0621434 and CNS-0643229.