Barry Wimlett Technical Specialist BSc MBCS [email protected] blackmarble.com.

40
Barry Wimlett Technical Specialist BSc MBCS [email protected] blackmarble.com

Transcript of Barry Wimlett Technical Specialist BSc MBCS [email protected] blackmarble.com.

Page 1: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Barry Wimlett

Technical Specialist

BSc MBCS

[email protected]

blackmarble.com

Page 2: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Concurrent Programming A lap around what’s new in Visual Studio 2010

and .net 4.0

Page 3: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Moore’s LawApril 19 1965

• Transistor count and computing power double every 2* years for same cost.

• Law#2• Manufacturing plant costs double at the same time.

• http://www.wired.com/thisdayintech/tag/moores-law/

• ftp://download.intel.com/museum/Moores_Law/Articles-Press_Releases/Gordon_Moore_1965_Article.pdf

* originally 18 months

Page 4: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Some DataYear Processor Transistor Count Mhz1975 6502 4,000 11979 8086 30,000 41984 286 134,000 121987 386SX 270,000 201988 386DX 275,000 501989 486 1,200,000 601993 P60 3,100,000 601995 Ppro 5,500,000 2001997 k6-200 8,800,000 2001997 Pentium2 7,500,000 2331999 Athlon 22,000,000 6001999 Pentium3 28,000,000 6002000 P4 42,000,000 14002001 AlthlonTBird 37,000,000 10002004 Athlon64 105,000,000 2,4002005 Athlon64x2 233,000,000 2,4002007 Phenom 450,000,000 2,2002009 PhenomII 758,000,000 2,800

Page 5: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Transistor Counts & Clock Speeds

19751984

19881993

19971999

20002004

20070

100,000,000

200,000,000

300,000,000

400,000,000

500,000,000

600,000,000

700,000,000

800,000,000

Transistor Count

Clock SpeedTransistor Count

19751984

19881993

19971999

20002004

20070

500

1000

1500

2000

2500

3000

Clock Speed

Clock Speed

Page 6: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.
Page 7: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Transistor Count and Clock Speeds – Log10

1975 1979 1984 1987 1988 1989 1993 1995 1997 1997 1999 1999 2000 2001 2004 2005 2007 20090

1

2

3

4

5

6

7

8

9

10

lg Transistor Countlg ClockSpeedCores

Page 8: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

• Wintel - “Grove giveth, and Gates taketh away.”

• Lazy programmers dream

“The Honeymoon”

Page 9: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Moore’s Law is NOT dead.• http://www.engadget.com/2010/05/03/nvidia-vp-says-moores-

law-is-dead

• http://arstechnica.com/business/news/2010/05/moores-law-is-not-dead-its-merely-pining-for-the-fjords.ars

• Last few years clock speeds flatten out at approx 3 Ghz– limited by silicon tech

• Transistor counts continue to increase, but...

• More cores not faster processors

Page 10: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Problems With More Cores

• It all goes wrong once you leave the package

• “The Flat” metaphor for shared resources.

• Cooperation required; just like development projects – blindly throwing more processors at a problem will not necessarily give increase performance and rarely linear increases.

Page 11: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Single Core

Data/Address Bus

Memory I/O

Cache

CPU Core

Page 12: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Multi Core

Data/Address BusCache

CPU Core

CPU Core

Cache Cache

Memory I/O

Page 13: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

ManyCore

Data/Address Bus

Memory I/O

Cache

CPU Core

Cache

CPU Core

Cache

CPU Core

Cache

CPU Core

Cache

CPU Core

Cache

Page 14: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Windows 7/Server 2008

• Kernel and I/O contention– Mention Windows 7 kernel advances @128cores– http://www.osnews.com/story/22501/

Microsoft_Kernel_Engineers_Talk_About_Windows_7_s_Kernel

Page 15: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

For Windows 7, Microsoft removed several locks that seriously hindered performance - all without breaking a single application. The global dispatcher lock, for instance, is gone completely, and replaced by fine-grained locking which provides 11 types of more specific locks as well as rules on how locks can be obtained so that you no longer run into deadlocks.

The pre-7 dispatcher spent 15% of the CPU time waiting to acquire contended locks. "If you think about it, 15% of the time on a 128-processor system is, more than 15 of these CPUs are pretty much full-time just waiting to acquire contended locks. So we're not getting the most out of this hardware," kernel engineer Arun Kishan explained

Page 16: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

That has obviously changed in modern times, and in Vista, this architecture simply gave in. The statistic Wang gave during the talk was pretty... Disconcerting. "As you went to 128 processors, SQL Server itself had an 88% PFN lock contention rate. Meaning, nearly one out of every two times it tried to get a lock, it had to spin to wait for it... Which is pretty high, and would only get worse as time went on."

The more fine-grained approach in Windows 7 and Windows Server 2008R2 yields some serious performance improvements: on 32-processor configurations, some operations in SQL and other applications perform 15 times faster than on Vista. And remember, the new fine-grained method has been implemented without any application breakage.

Page 17: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Multi-threading

• Programming tasks in parallel for compute intensive tasks.

Page 18: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Asynch Programming

• BeginFirstThing..... Do Something Else... FinishSecondThing()

• Allows processor to “do something more useful” while waiting for disk/network or other I/O.

• Helps keep UI responsive in windows apps, by allowing UI to execute while I/O done in background.

Page 19: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Problems

• Atomic data access operations , Locking, • Deadlocks• Race conditions • Order of execution• All subtle, infrequent difficult to detect and

replicate– attaching a debugger affects how the software

behaves• Not very unit testable.

Page 20: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

User as a bottleneck

• User does not scale out• Office Apps and other heavily interactive

software limited more by user than processor.– Think about ExcelTM and recalculation of

spreadsheets “JFDI” versus “thinking about it too hard” real-time scheduling type problem

Page 21: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

The future

• New languages F#, Axom• Probably the as big a change as the shift from

assembler to ‘C’• Task orientated• Less imperative, what you want doing - not

how you want it doing.– Making a cup of tea.

Page 22: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Task-Orientated

• Focus on what we want to achieve not how to do it.

• Read-only values where possible ( immutability, less conflict)

• Using “workflow” and “workflow like” programming (Azure) to scale out– PDC’08– Axom @ PDC’09

Page 23: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

New for .Net4 for NOW

• ThreadPool.Queue is dead, long live Tasks• WorkStealing Queues in the ThreadPool• AsParrallel and P/LINQ• Collections/Bags/Lists/Queues/Locks

http://msdn.microsoft.com/en-us/library/system.collections.concurrent.aspx

• RxExtensions - IObservable<T>• ParallelExtensionsExtras

http://blogs.msdn.com/pfxteam/archive/2010/04/04/9990342.aspx

Page 24: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Parallel Computing and .net4

Parallel Pattern Library

Resource Manager

Task Scheduler

Task Parallel Library

Parallel LINQ

Threads

Operating System

Native Concurrency Runtime

Managed Libraries

ThreadPool

Data Structures D

ata

Stru

ctur

es

Tools

AsyncAgentsLibrary

UMS Threads

MicrosoftResearch

Visual Studio 2010

ParallelDebugger Windows

Profiler Concurrency

Analysis

Race Detection

Fuzzing

AxumVisual F#

Managed Languages

Rx

Native Libraries

Managed Concurrency Runtime

DryadLINQ

Key: Research / Incubation Visual Studio 2010 / .NET 4 Windows 7 / Server 2008 R2

HPC Server Operating System

Page 25: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Global Queue

Program Thread

Worker Thread 1

Worker Thread 1

ThreadPool in .NET 3.5

Item 1Item 2Item 3

Item 4Item 5

Item 6 Thread Management: Starvation Detection Idle Thread Retirement

Page 26: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Program Thread

ThreadPool in .NET 4

Lock-FreeGlobal Queue

LocalWork-

Stealing Queue

Local Work-

Stealing Queue

Worker Thread 1

Worker Thread p

Task 1Task 2

Task 3Task 5

Task 4

Task 6

Thread Management: Starvation Detection Idle Thread Retirement Hill-climbing

Page 27: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Demo Time

“All Hail Murphy!”*

* An appeasement for the mischievousdemo gods.

Page 28: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Easy wins with P/LINQ

• Uses TPL• IParallelEnumerable<T>• Parallel.AsParallell<T>• Migration to LINQ a good first step to parallelisation• Also Parallel.Foreach• Choose carefully for best performance; but either is

probably better than the alternatives.• Lots of knobs.• http://blogs.msdn.com/pfxteam/archive/

2010/04/21/9997559.aspx

Page 29: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

public static IEnumerable<T> Zipping<T>(IEnumerable<T> a, IEnumerable<T> b) { return a .AsParallel() .AsOrdered() .Select(element => ExpensiveComputation(element)) .Zip( b .AsParallel() .AsOrdered() .Select(element => DifferentExpensiveComputation(element)), (a_element, b_element) => Combine(a_element,b_element)); }

Page 30: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

public static IEnumerable<T> Zipping<T>(IEnumerable<T> a, IEnumerable<T> b) { var numElements = Math.Min(a.Count(), b.Count()); var result = new T[numElements]; Parallel.ForEach(a, (element, loopstate, index) => { var a_element = ExpensiveComputation(element); var b_element = DifferentExpensiveComputation(b.ElementAt(index)); result[index] = Combine(a_element, b_element); }); return result; }

Page 31: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

TPL - Task is Your New Best Friend• ThreadPool.QueueUserWorkItem– Great for fire-and-forget– But what about…• Waiting• Canceling• Continuing• Composing• Exceptions• Dataflow• Debugging• …

Page 32: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Demo Time

Page 33: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

IObserver<T>,IObservable<T>

• Part of the Rx Reactive Framework• Duality of IEnumerable<T>; ie. Push versus pull• Good for replacing events; Asynchronous I/O

Programming• Used in Sliverlight Toolkit for Unit Test• See “BurningMonk”’s article on Drag and Drop

and Iobservable<T>.• cf: ProducerConsumer

Page 34: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Code Show and Tell

• GPS Exampleform MSDN

Page 35: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

New Sync Primitives in .NET 4

• Thread-safe, scalable collections– IProducerConsumerCollection<T>

• ConcurrentQueue<T>• ConcurrentStack<T>• ConcurrentBag<T>

– ConcurrentDictionary<TKey,TValue>

• Phases and work exchange– Barrier – BlockingCollection<T>– CountdownEvent

• Partitioning– {Orderable}Partitioner<T>

• Partitioner.Create

• Exception handling

– AggregateException• Initialization

– Lazy<T>• LazyInitializer.EnsureInitialized<T>

– ThreadLocal<T>

• Locks– ManualResetEventSlim– SemaphoreSlim– SpinLock– SpinWait

• Cancellation• CancellationToken{Source}

• Public, and used throughout PLINQ and TPL• Address many of today’s core concurrency issues

Page 36: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Parrallel Extensions Extras

Useful but either to specific or not mature enough to properly enter the framework.Built ontop of .net 4.0 objects

Not fully tested being augmented continually, feedback welcome.

• LINQ to Tasks• Task<TResult>.ToObservable • Additional Task Extensions Methods • BlockingCollectionExtensions• StaTaskScheduler• ConcurrentExclusiveInterleave• Additional TaskSchedulers• ReductionVariable<T>• ObjectPool<T>• Pipeline• ParallelDynamicInvoke• AsyncCache• More to come...

Page 37: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

LINQ for TASKS http://blogs.msdn.com/pfxteam/archive/2010/04/04/9990343.aspx

•GOAL:

Task<string> result = from x in Task.Factory.StartNew(                          () => ProduceInt())                      from y in Task.Factory.StartNew(                          () => Process(x))                      select y.ToString();

The LinqToTasks.cs file in ParallelExtensionsExtras provides a set of more complete implementations, covering Select, SelectMany, Where, Join, GroupJoin, GroupBy, OrderBy, and more.

Page 38: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

PipeLine

Stage1 Stage2 Stage3

Pipeline.Create(rawChunk => Compress(rawChunk)) .Next(compressedChunk => Encrypt(compressedChunk));

Page 39: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Visual Studio

• Some of the new tooling.

Page 40: Barry Wimlett Technical Specialist BSc MBCS Barry@blackmarble.com blackmarble.com.

Summary and Links

• Understand the Impact of Low-Lock Techniques in Multithreaded Apps.http://msdn.microsoft.com/en-us/magazine/cc163715.aspx

• Key Links– Parallel Computing Dev Center

• http://msdn.com/concurrency

– Code samples• http://code.msdn.microsoft.com/ParExtSamples

– Blogs• Managed: http://blogs.msdn.com/pfxteam • Tools: http://blogs.msdn.com/visualizeparallel

– Forums• http://social.msdn.microsoft.com/Forums/en-US/category/parallelcomputing

blackmarble.com