The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
-
Upload
datacore-software -
Category
Technology
-
view
5.590 -
download
0
Transcript of The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
Today’s PresentersCo
pyrig
ht ©
201
5 by
the
Data
Man
agem
ent I
nstit
ute,
LLC
. Al
l Rig
hts R
eser
ved.
Jon ToigoChairman, Data Management Institute
Sushant RaoSenior Director of Product Marketing, DataCore
Presented ByJon Toigo
Managing Principal Toigo Partners InternationalChairman Data Management Institute
Ask anyone…• Application performance, especially in virtual server settings, is
becoming a big problem…• Flash memories and faster disk drives and interconnects seem to be a
temporary fix to the problem…• What can be done? History provides a clue…
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
History repeats itself• In the 1890s, Karl Benz, Gottlieb Daimler and
Wilhelm Maybach perfected the first automobiles to run on petroleum.
• Other notables, including Frederick William Lanchester and George F. Foss, improved designs…though Foss drove his vehicle for four years, ignoring official warnings of impending arrest for his “mad antics.”
• Foss’s single cylinder engine produced about .024 horsepower and delivered dizzying speeds of up to 24 mph (45 kmh)
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
The Need for Speed• By 1906, two, three, four and eight
cylinder engines began to appear. • The V8, invented by Levavasser in 1902
and first built in 1906 by engineers in Redondo Beach, CA, who called their engine “the Coyote,” leveraged essentially two 4 cylinder engines driving a common crankshaft
• Rolls Royce first manufactured a V8 engine powered vehicle, but the first mass-produced vehicle came from Cadillac, the Model 51, sporting 70 horsepower…
Buick Model F2 CylinderTouring Car
4 cylinder Fiat with 48 horses and56 mph (90 kmph)
Cadillac Model 51 (1915) First production V8 made 70 horsepowerCo
pyrig
ht ©
201
5 by
the
Data
Man
agem
ent I
nstit
ute,
LLC
. Al
l Rig
hts R
eser
ved.
An example of how embracing parallelism drove performance gains• Over time, we learned to increase the
number of cylinders and to configure them to derive greater power and speed
• 4 cylinder engines produced power strokes every half crankshaft revolution; an 8 cylinder engine every quarter revolution.
• Power strokes are translated to the drive train to increase propulsion
• Crossplane crankshafts were introduced to reduce vibration from multi-engine parallel operations – though racing V8s continue to use a single plane crankshaft for faster acceleration
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
Today…• Chevy Camaro SS creates 455 hp using a 6.2 liter
V8; the ZL1 model delivers 184 mph• Ford Mustang GT does 435 hp with a 5.0 liter V8;
the Shelby GT500 version makes 189 mph• The Challenger SRT Hellcat delivers 707 hp and
199 mph (burning 1.5 gallons of petrol per minute) with a 6.4 liter V8
• Those are American “muscle cars”
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
Equal time to the Europeans• The Aston Martin Alfa Romeo 4C boasts 237 hp from a
1.7 liter turbocharged 4 cylinder engine• The 2015 Porsche Boxster delivers 265 hp from a 2.7 liter
6 cylinder engine, while the S and GTS produce 315 and 330 hp respectively from a 3.4 liter 6
• Jaguar’s F Type delivers 190 mph and 495 horses from an optional supercharged V8
• The BMW Z4 makes 249 hp from a turbo-charged 2.0 liter 4 cylinder, and sDrive35i models deliver 335 hp from a 3.0 liter twin turbo six cylinder
• Ferrari 458 Italia makes 562 hp with a 4.5 liter V8• Lamborghini Aventador makes well over 200 mph with a
6.5 liter 12 cylinder engine sporting 691 hp
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
Okay. We get it…• Engine designers have been coupling many cylinders into parallel
configurations yielding all kinds of expensive go fast cars that few of us can afford…
Mercedes-Benz SLS AMG583 hp from 6.2 liter V8
196 mphStarting at $221,580
Audi R84.2 liter V8 does 430 hp
5.3 liter V10 does 550 hpStarting at $182,500
Lamborghini Huracan LP610-45.2 liter v10 for 602 hp
202 mphStarting at $244,000
Porsche 918 Spyder4.6 liter V8 makes 608 hp$845,000 (recently pulled
for serial problems)Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
So it has been with application performance…• Server virtualization promised application performance and hosting
efficiency…
• When it wasn’t delivered, different explanations and solutions were proffered…
• Servers overburdened with I/O tasks they aren’t needed to execute – offload these tasks to array controllers (aka server motherboards attached to disk drives) to free up CPU for more important work
• Storage is too slow to keep up with CPU – add lots of Flash memory cache – spoofing the problem
• Storage is too slow to keep up with CPU – ditch “legacy” SAN storage for direct-attached software-defined storage
• VMs producing random I/O, ultimately randomizing data placement on storage: deploy log structuring to overcome the I/O blender effectCo
pyrig
ht ©
201
5 by
the
Data
Man
agem
ent I
nstit
ute,
LLC
. Al
l Rig
hts R
eser
ved.
Innovation is good when it does the job…• But when the problem is performance, shouldn’t we address the
causes of performance problems…• Four basic categories
• Overloaded CPU• Overloaded Memory• Storage Latency• Network Latency
• Focus has been on storage latency…
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
But to understand speed problems, sometimes it helps to look at the track…• The I/O path has lots of interlocking pieces…
STORAGE I/OElements that handle the reading and writing of data to physical storage media
RAW I/O Elements ahead of storage that impact I/O performance
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
But some performance issues have to do with inefficiencies in the engine…• The CPU is the engine of the computer• Like cylinders in an auto engine, chip
cores determine the capacity and throughput of the CPU
• As with autos, CPUs have undergone significant change over time…
From the 60s until the early 90s,much work in industry focused onmaking multiple low power CPUswork together in parallel processingconfigurations for improved performance…
First transistorized CPU
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
Then came the unicore tick-tock…• Unicore (uniprocessor) – a computer system with a single central
processing unit. All processing tasks share a single CPU
Moore: “Transistors on a die will double every 24 months…”House: “What he said. Plus, we will see chip clock speeds double every 18 months…”
ALU = Arithmetic Logic Unit
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
Ramifications…• Computer designs based on sequential
processing and unicore chips drove the PC and server revolution…
• Innovations and speed improvements occurred too fast for the parallel processing engineers to keep pace…leading to some inherently limited thinking…
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
It worked…until it didn’t• Two things have happened
• First, unicore ran out of steam (House’s Hypothesis peaked in 2004)
• Second, Moore’s Law proceeded and transistor densities continued to double
• Result: Multicore processors that did not demonstrate significant clock speed improvements…
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
Multicore chips + threading technology enabled many more logical cores…• Seized upon by Microsoft, VMware and others to do concurrent sequential
processing of application code• But at the expense of I/O efficiency: I/O waits for sequential processing
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
Multicore chips, meet Parallel I/O• Parallel I/O is simply the use of a percentage of available logical CPU
cores to provide a scalable I/O processing engine…• Derived from multiprocessor architecture, implemented by Datacore
as easily deployed and configured software…
Eight “Logical” Cores
Allocate a percentage of logical cores toprocessing I/O exclusively…
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
And in parallel…
Truth be told, comparatively few virtualized application performance issues are storage related…
• Easy to see when they aren’t• Despite what hypervisor vendors
may say, what you may need is parallel I/O processing to alleviate RAW I/O congestion and to facilitate I/O operational efficiency…
Short Storage I/O Queues
SERV
ERS
STORAGE
High CPU Processing CyclesHot CPU
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
There are other issues that contribute to performance problems…• The I/O Blender Effect is real,
needs to be addressed to prevent randomization of data placement on storage devices
• Interconnects need to be properly configured or virtualized for more efficient allocation
• Storage devices need to be used in a manner appropriate to capabilities and workload requirements…Co
pyrig
ht ©
201
5 by
the
Data
Man
agem
ent I
nstit
ute,
LLC
. Al
l Rig
hts R
eser
ved.
However, Parallel I/O establishes a new tick-tock in storage that will only improve as cores multiply…
Copy
right
© 2
015
by th
e Da
ta M
anag
emen
t Ins
titut
e, L
LC.
All R
ight
s Res
erve
d.
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Impact of Parallel I/O on Storage PerformanceSushant Rao, Sr. Director of Product Marketing
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Storage (I/O) is the Bottleneck,especially for Virtualized Infrastructure
1990 2000 2010 2020
Performance gap between Compute
& Storage
Compute vs StoragePerformanceYearly Performance Gains
Compute: 26%
Storage: 2%
25
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Big Picture: DataCore Parallel I/O Architecture
26
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 27
IO-Starved Virtualized Servers
Increasingly faster Uni-processors
Com
pute
WorkPotential
2010 20202000
CPU clock ratesslow down
More cores per socket
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 28
IO-Starved Virtualized Servers
Increasingly faster Uni-processors
Com
pute
Serial IO
WorkPotential
2010 20202000
CPU clock ratesslow down
More cores per socket
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
IO Gap
29
IO-Starved Virtualized Servers
Increasingly faster Uni-processors
Com
pute
Serial IO
WorkPotential
2010 20202000
CPU clock ratesslow down
More cores per socket
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 30
Serial vs. Parallel Processing
1 worker(Serial)
Pile of work
Load 1
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 31
Serial vs. Parallel Processing
1 worker(Serial)
Pile of work
Load 1 Load 2 Load 3
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 32
Serial vs. Parallel Processing
Time to finish1 worker(Serial)
Pile of work
Load 1 Load 2 Load 3 Load 4 Load 5
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 33
Serial vs. Parallel Processing
Time to finish1 worker(Serial)
Pile of work
5workers
(Parallel)
Load 1
Load 2
Load 3
Load 4
Load 5
Load 1 Load 2 Load 3 Load 4 Load 5
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 34
Modern Multi-core CPUs
Worker1
Worker2
Worker3
Worker4
Worker5
Worker6
Worker7
Worker8
Worker9
Worker10
Multiple “workers” capable of simultaneously handling compute, networking and I/O loads
10-cores
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 35
Standard use of Multi-core CPUsin Virtual Servers
VM1
VM2
VM3
VM4
VM5
idle I/Oidle idle idle
Sequential Computing with Concurrency
Serial I/O
VM = Virtual Machine
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
36
Serial I/O Bottleneck in Virtualized Server
idleidleidleidle I/O
Compute
I/O
Compute waits on I/O CPU cores are wasted Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
37
Serial I/O Bottleneck in Virtualized Server
idleidleidleidle I/O
Compute
I/O
Compute waits on I/O CPU cores are wasted Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
38
Serial I/O Bottleneck in Virtualized Server
idleidleidleidle I/O
Compute
I/O
Compute waits on I/O CPU cores are wasted Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
39
Serial I/O Bottleneck in Virtualized Server
idleidleidleidle I/O
Compute
I/O
Compute waits on I/O CPU cores are wasted Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
idle
40
Serial I/O Bottleneck in Virtualized Server
idleidleidleidleI/O
Compute
I/O
Compute waits on I/O CPU cores are wasted Very little work gets done
Workload
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 41
Impact: Many servers needed to spread I/O
Workload
Server 2 Server 3 Server 4 Server 5Server 1
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 42
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload I/O keeps pace with
compute demands CPU cores are fully used Lots of work gets done in
very little timeI/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 43
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload I/O keeps pace with
compute demands CPU cores are fully used Lots of work gets done in
very little timeI/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 44
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload I/O keeps pace with
compute demands CPU cores are fully used Lots of work gets done in
very little timeI/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 45
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload I/O keeps pace with
compute demands CPU cores are fully used Lots of work gets done in
very little timeI/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 46
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload I/O keeps pace with
compute demands CPU cores are fully used Lots of work gets done in
very little timeI/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 47
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload I/O keeps pace with
compute demands CPU cores are fully used Lots of work gets done in
very little timeI/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 48
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload I/O keeps pace with
compute demands CPU cores are fully used Lots of work gets done in
very little timeI/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 49
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload I/O keeps pace with
compute demands CPU cores are fully used Lots of work gets done in
very little timeI/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 50
I/OI/OI/OI/O
Turbo-Charge through Parallel I/O
Compute
I/O
Workload I/O keeps pace with
compute demands CPU cores are fully used Lots of work gets done in
very little timeI/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
51
Adaptive Parallel I/O
Workload
51
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
52
Adaptive Parallel I/O
Workload
ResponseTime
(millisec)IOPS
52
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
53
Adaptive Parallel I/O
Workload
ResponseTime
(millisec)IOPS
53
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
54
Adaptive Parallel I/O
Workload
ResponseTime
(millisec)IOPS
54
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
55
Adaptive Parallel I/O
Workload
ResponseTime
(millisec)IOPS
400,000 IOPS< 1 millisec
55
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
56
Adaptive Parallel I/O
Workload
ResponseTime
(millisec)IOPS
400,000 IOPS< 1 millisec
56
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
57
Adaptive Parallel I/O
Workload
ResponseTime
(millisec)IOPS
400,000 IOPS< 1 millisec
57
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
58
Adaptive Parallel I/O
Workload
ResponseTime
(millisec)IOPS
400,000 IOPS< 1 millisec
58
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
59
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS< 1 millisec
59
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
60
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS< 1 millisec
60
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
61
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS< 1 millisec
61
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
62
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS< 1 millisec
62
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
63
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS< 1 millisec
63
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
64
Adaptive Parallel I/O
Workload
No more load
400,000 IOPS< 1 millisec
64
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Worker1
Worker2
Worker3
Worker4
Worker5
Worker6
Worker7
Worker8
Worker9
Worker10
DataCore’s Adaptive use of Multi-core CPUs in Virtual Servers
VM = Virtual Machine
65
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Worker1
Worker2
Worker3
Worker4
Worker5
Worker6
Worker7
Worker8
Worker9
Worker10
DataCore’s Adaptive use of Multi-core CPUs in Virtual Servers
VM1
VM2
VM3
VM4
VM5
Sequential Computing with Concurrency
VM = Virtual Machine
66
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Worker1
Worker2
Worker3
Worker4
Worker5
Worker6
Worker7
Worker8
Worker9
Worker10
DataCore’s Adaptive use of Multi-core CPUs in Virtual Servers
VM1
VM2
VM3
VM4
VM5
I/OParallel I/O
VM = Virtual Machine
I/OI/O I/O I/O
67
Sequential Computing with Concurrency
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Translations:
68
Work completes in 1/5th* the time
2* machines can do the work of 10
*Varies based on number of I/O worker cores
\ˈper-ə-ˌlel, ˈpa-rə-, -ləl\ \ˈī-(ˌ)ō\
PARALLEL I/O
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Work completes in 1/5 the time 2 machines can do work of 10 5X lower overall solution cost All-inclusive simplicity
► Compute & storage services combined
69
DataCore Parallel I/O Breakthrough
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 70
SANsymphony-V Converged Storage
Server Separate compute and
shared storage tiers Virtualized and non-
virtualized applications
Virtual SAN Hyper-converged Compute & shared
storage on same nodes Virtualized applications
Adaptive Parallel I/O available in 2 Products
Free Trial: datacore.com/resources/software-downloads.aspx
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
100%reduction in
storage-related downtime
71
DataCore Benefits at a Glance
Surveyed DataCore™ Customers Report Up To:
www.techvalidate.com
75%reduction in storage
costs
4xcapacity
utilization
90%decrease in time spent on routine
storage tasks
10xperformance
increase
Copyright © 2015 DataCore Software Corp. – All Rights Reserved. 72
25,000+ Deployments Worldwide
10,000+ Customers 10th Gen Product
Companies in all Industries & Sizes
Market: Software-defined Storage
Technology: Storage Virtualization
Main Offices• Australia• Germany• France• Japan• UK• USA
Proven. Globally.
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Schedule a 15-minute live demo with one our technical consultants at http://info.datacore.com/LiveDemo
73
Request a Live Demo
Copyright © 2015 DataCore Software Corp. – All Rights Reserved.
Thank You!
www.datacore.com