Art of Multiprocessor Programming 1 Programming Paradigms for Concurrency Pavol Černý, Vasu Singh,...

19
Art of Multiprocessor Programming 1 Programming Paradigms for Concurrency Pavol Černý, Vasu Singh, Thomas Wies

Transcript of Art of Multiprocessor Programming 1 Programming Paradigms for Concurrency Pavol Černý, Vasu Singh,...

Art of Multiprocessor Programming 1

Programming Paradigms for Concurrency

Pavol Černý, Vasu Singh, Thomas Wies

Programming Paradigms for Concurrency

Three parts covering three major paradigms.

1. Classical shared memory programming Pavol Černý

2. Programming with transactional memories Vasu Singh

3. Message-passing programming Thomas Wies

Administrivia

•Course webpagehttp://pub.ist.ac.at/courses/ppc10/

•Feel free to contact the instructors [email protected]

•Current plan is 6 homework assignements • two per course part

•Class project (much more on this today)•Grades: 60% course project, 40% homework

•Register at [email protected]

Art of Multiprocessor Programming 4

Programming Paradigms for Concurrency

Part I: Shared Memory Programming

Pavol Černý

Mutual Exclusion

no access

access

Flag is raised means “I am going to use a shared resource”

Accessing a shared resource:

Flag is lowered means “I am not using a shared resource”

Mutual Exclusion

flag[0]

← 0

no access

flag[0] ←0

flag[1] = 0

access

test

flag[0] ← 1

flag[0]

← 0

no access

flag[1] ←0

flag[0] = 0

access

test

flag[1] ← 1

Alice Bob

Boom!

Mutual Exclusion: Attempt 2

flag[0]

← 0

no access

flag[0] ←0

flag[1] = 0

access

request

flag[0] ← 1flag[0]

← 0

no access

flag[1] ←0

flag[0] = 0

access

request

flag[1] ← 1

Alice Bob

Now what?!

Mutual Exclusion: Attempt 3

flag[0]

← 0

no access

flag[0] ←0

flag[1] = 0 or turn = 0

access

request

flag[0] ← 1

turn ← 1 flag[0]

← 0

no access

flag[1] ←0

access

request

flag[1] ← 1

turn ← 0

Alice Bob

flag[0] = 0 or turn = 1

OK, works!

Mutual exclusionQuestions to ponder:

1. Can we make do with two shared bits (instead of three)?

2. How can one extend this idea to n processes?3. Does the algorithm work in Java?

Run it and see. What is the problem? Where is the fault in our proof?

ScheduleDate Topic

October 7th Intro + Projects

October 14th Mutual Exclusion

October 21st Synchronization primitives: Spin-locks, Monitors, Barriers

October 28th Theory of Concurrent Objects, Linerizability

November 4th Case studies: Linked Lists, Concurrent Hashing. Fine-grained locking, Lazy, Lock-free implementations

How many of you have seen…?

1. Have you programmed concurrent programs? In Java? In pthreads? …

2. Bakery algorithm? 3. Queue locks?4. Linearizability? Sequential consistency?5. compareAndSet?6. Concurrent Hashtables?

Projects

• Topic: • good: choose from among our suggestions• better: define your own project

• On your own or in groups of two

• Pick a project by: before Christmas• Progress report 1: January 15th (2 pages)• Presentation and final report: January 27th and

February 3rd (final report: 4 pages)

Project 1: Irregular data parallelism

Cavity

Example: Delaunay mesh refinement

Effects of updates are local, but not bounded statically (“irregular”).

Can we still exploit locality for parallelism?

Project 1: Irregular data parallelism

http://iss.ices.utexas.edu/lonestar/index.html

Locality of effects: Mesh retriangulation

Project 1: Irregular data parallelismLonestar benchmark suite:http://iss.ices.utexas.edu/lonestar/index.html

1. Barnes-Hut N-body Simulation

2. Delaunay Mesh refinement3. Focused communities4. Delaunay triangulation5. …

Project: 1. pick one of these applications, 2. find a good (possibly novel) way of parallelizing it, 3. implement it (by modifying the sequential implementation provided in Lonestar benchmarks), 4. confirm improvement in running time by experimentation.

Project 2: Deductive verification: proving a concurrent data structure correct

1. Pick an implementation of a concurrent data structure: a stack, a queue, a set, ..

2. Pick a theorem prover or a verification tool: for example: PVS or QED

3. Prove that the implementation is linearizable

3 5 7 9

P1: remove(7)

P2: remove(5)

Project 2: Deductive verification: proving a concurrent data structure correct

References:1. R. Colvin, L. Groves, V. Luchangco, M. Moir: Formal Verification of a Lazy

Concurrent List-Based Set Algorithm. CAV 20062. T. Elmas, S. Qadeer, A. Sezgin, O. Subasi, S. Tasiran: Simplifying

Linearizability Proofs with Reduction and Abstraction. TACAS 2010.

3 5 7 9

P1: remove(7)

P2: remove(5)

Project 3:Performance measurement/ performance model for concurrent programs

1. Pick a problem with at least three-four different solutions1. Lock implementations2. Data structures: queues, stacks, sets…

2. Examine the performance of the solutions in different settings:1. small number of threads vs large number of threads2. 2 cores, small amount of memory (laptop) vs 8 cores, large

memory/cache (server)3. different usage models4. input that generates little vs input that generates lots of contention

3a. Find a hybrid solution that works well in a particular settingor

3b. Find a performance model that explains the data

Project 4: Your own!