Hardware Support for Spin Management in Overcommitted Virtual Machines

Philip Wells Koushik ChakrabortyGurindar Sohi

{pwells, kchak, sohi}@cs.wisc.edu

PACT, Sept. 2006Wells, Chakraborty & Sohi. Hardware Support for Spin Management... 2

Paper overview

• Want to run unmodified OSs in overcommitted VM– But OS will spin excessively

• Propose hardware spin detection to mitigate – Allows more flexible scheduling– Enables other applications

• Server consolidation case study– Improve throughput & performance isolation

Talk outline

• Background & motivation– Overcommitted VM: What & why?

• Problem with overcommitted VMs• Hardware spin detection• Case study• Summary

Background: VMMs

• Virtual Machine Monitors (VMMs)– Translate interface exposed by hardware into interface

expected by OS

• Focus on pure virtualization– No modifications to OS (e.g. VMWare)

• Focus on processor virtualization– Mapping virtual processors to physical processors

Background: Processor virtualization

Guest OS A

Guest OS B

VCPU1 VCPU0VCPU0 VCPU1

Physical Processors

Guest OS A Guest OS B

PCPU0 PCPU1

Virtual Processors

• VMM exposes more VCPUs than PCPUs– Machine is overcommitted

– How to map VCPUs to PCPUs?

Guest VMs

• Gang scheduling (or co-scheduling)– All VCPUs are running, or none are

– Ensures environment similar to non-virtualized

– Used, e.g., by VMWare, Cellular Disco But not flexible

Guest OS A

Guest OS B

Virtual Processors

Guest VMs

Physical ProcessorsPCPU0 PCPU1

VMM PausedRunning

• Desire to restrict available PCPUs– Individual guest VM is overcommitted

– Server consolidation case study later…

Guest OS A

Guest OS B

Virtual Processors

Guest VMs

VMM Paused RunningRunning

PCPU3PCPU0 Physical Processors

Guest OSGuest VM

Virtual Processors

PCPU2PCPU1

• Desire to restrict available PCPUs– Thermal management – without OS support

[e.g. Heat & Run, Powell, ASPLOS ’04]

– Dynamic specialization[e.g. Computation Spreading, Chakraborty, ASPLOS ’06]

Gang scheduling infeasible

VCPU3VCPU0 VCPU2VCPU1

Talk outline

• Background & motivation• Problem with overcommitted VMs

– Spin overhead: How much and why?

• Hardware spin detection• Case study• Summary

So, what’s the problem? Spin!

• Multiprocessor OSs make assumption: All VCPUs are always executing

– Clearly not possible in overcommitted environment– Causes severe performance problems with

synchronization

OS spin: Methodology

• Highly overcommitted for illustration– 24 VCPUs, 8 PCPUs

– Simics full-system simulation running Solaris & Linux (SPARC)

– VCPU is given a timeslice on PCPU

– Share physical processor, TLB arrays, caches

– Assume HW mechanism for switching VCPU state

SW Threads

Physical Processors

PCPU0 PCPU1 PCPU2 PCPU3

V0V1V2 V6V7V8V3V4V5 V9 10 11 Virtual Processors

OS spin: Overhead

• 1-4X more instructions– Same amount of work

Spinning!

– Longer slices worse

– User instr stable

– Solaris spins more than Linux

Solaris Linux

Timeslice:

OS spin: Where does it come from?

• OS mutex locks– Requester often spins on held lock– Especially if OS thinks lock holder is currently executing

• Frequent cross-calls (software interrupts)– TLB shootdowns & scheduling– Initiator blocks until recipient processes interrupt– Much more frequent in Solaris than Linux

• Other workloads have user spin and idle loop

Propose hardware spin detection to mitigate

Talk outline

• Background & motivation• OS spin overhead• Hardware spin detection

– Spin Detection Buffer (SDB)

• Case study• Summary

Hardware spin detection

• Observation:A program that’s not performing useful work makes few changes to program state

• Use hardware to detect changes to state– Requiring no changes misses cases of spinning

• Or even temporally silent changes…

– Allowing too many changes causes false positives• Performance (not correctness) issues if too frequent

• No software modifications

Hardware spin detection cont…

• Proposed heuristic takes the middle ground– Observe < 8 unique stores in 1k commits spin– Uniqueness defined by address & data– Works very well for OS

• But, user programs search– Register allocated index no stores– Also check for < 8 unique loads in user code– Avoids false positives from user code

Spin Detection Buffer (SDB)

• Implement heuristic with two 8-entry CAMs– After ST (+LD in user) commits: Search CAM; insert if unique– Check for less than 8 entries every 1k instr.– Off critical path, has low B/W requirements– Low activity

St [A], 0x10

Commit

St [B], 0x10Ld [A]Ld [C]Ld [B]

Unique St?

User mode &Unique Ld?

456Incr.

=1024?

Spin Detection Buffer: Effectiveness

Solaris Linux

Timeslice:

No spin detection

• Using SDB– No undetected

spins (that we are aware)

– Very few false positives

Spin Detection Buffer: Related work

• “Safe-points,”– [Uhlig, et al., VM ‘04]

• Spin Detection Hardware (not cited in paper)– Li, Lebeck & Sorin [IEEE Trans. Par. Dist. Sys., June ‘06]

HardwareFalsePos

UserSpin

OSMutex

IdleLoop

Safe Pts

Li, et al.

Crosscalls

SDB Some Few

• But, overcommitting individual VM was not goal of other work

Talk outline

• Background & motivation• OS spin overhead• Hardware spin detection• Case study

– Server consolidation

• Summary

Case study: Consolidated servers

• Run multiple services on one physical server– Better utilize physical servers– Centralize data center

• Minimal changes to guest OSs and configs– Pure virtualization (e.g., VMWare ESX Server)

Guest OS A

Guest OS B

VCPU0 VCPU0

Consolidated servers cont…

• Partition PCPUs among guest OSs– Good cache locality, performance isolation & response latency

• Or, overcommit VMM and gang schedule– Allows guest VMs to handle bursts in demand

VCPU1 VCPU1 Virtual Processors

PausedRunning

Guest VMs

VMM Running

Consolidated servers cont…

• Use SDB for more flexibility for a variety of scenarios– E.g. partition PCPUs among VMs

Guest OS A

Guest OS B

Virtual Processors

Guest VMs

VMM Paused RunningRunning

• Two workloads consolidated into one checkpoint– 16 VCPUs, 8 PCPUs– Do not share any physical memory– Share physical processor, TLB arrays, caches– Idealized software VMM

• No overhead from virtualizing memory, I/O

Consolidated servers: Methodology

Consolidated servers: Locality

• 16 VCPUs 100% utilized– 10ms has better locality

than 100μs

– SDB allows throughput of 10ms & response latency of 100us

• 16 VCPUs ~50% utilized– Expected case

– SDB avoids wasting resources on idle VCPUs

Summary

• Many reasons to overcommit a guest VM – But unmodified OS will spin

• Spin Detection Buffer (SDB)– Hardware technique to detect useless ‘work’– Performs much better than other proposals

• Consolidated server case study– SDB allows more flexibility than gang scheduling– Can optimize for cache locality, performance isolation,

Backup slides

Workloads

• Multithreaded (Solaris & Linux)– Apache (Solaris) – 15k trans– Apache (Linux) – 30k trans– pmake (Solaris & Linux) – 1.5B user instr – Zeus (Solaris) – 7500 trans– OLTP (Solaris) – 750 trans – pgbench (Solaris) – 3000 trans– Spec2000 Mix (Linux) – 24 at once, 500M cycles

Methodology

• Simics full system simulation– Commercial workloads on Solaris 9 & Linux 2.6.10– Multiple trials, avg w/ 95% C.I. on runtime graphs

• Each Core– In-order, idealized, 1 GHz– Private L2s: 4-way, 15 cycle, 512k

• Shared– Exclusive L3: 8MB, 16-way, 55 cyc. load-to-use, 8 banks

Safe points comparison

Solaris Linux

Consolidated servers: Isolation

Hardware Support for Spin Management in Overcommitted Virtual Machines

Documents

Transcript of Hardware Support for Spin Management in Overcommitted Virtual Machines

The Impact of an Organization’s Culture towards Employees’ … · 2020. 7. 29. · dilemma is that frontline workers are motionlessly untrained, paid at low wages, and overcommitted

From Virtualy Constrained to Really Overcommitted€¦ · From VIRTUALly Constrained to REALly overcommitted Adrian Burke DB2 SWAT team SVL agburke@us.ibm.com. Agenda • Storage

SPIN STATES AND SPIN-ORBIT COUPLING IN NANOSTRUCTURES · SPIN STATES AND SPIN-ORBIT COUPLING IN ... SPIN STATES AND SPIN-ORBIT COUPLING IN NANOSTRUCTURES ... At zero magnetic eld

Spin Spin Spin

Chapter XX Spin Structures and Spin Wave Excitations · Spin Structures and Spin Wave Excitations Q1 ... 2 Magnetism and Spin 1 ... in the framework of spin hydrody-namics (Andreev,

SpinSpin--spin coupling analysis spin coupling analysisfile.upi.edu/Direktori/FPMIPA/JUR._PEND._FISIKA... · 2012. 3. 8. · SpinSpin--spin coupling (…) spin coupling (…) •

Spin-Spin and Spin-Orbit coupling studies of small species and …kth.diva-portal.org/smash/get/diva2:309923/FULLTEXT01.pdf · 2010-04-09 · Spin-Spin and Spin-Orbit coupling studies

Designing electron spin textures and spin interferometers ...

DTS: Trendwatch/Minister to the Overworked Smart Girlaspire2.com/wp-content/uploads/2014/11/TrendwatchMinister.pdfRx for the Highly Caffeinated, Tech-Savvy, Overcommitted Woman: Trendwatch

Technology A New ‘Spin’phaelosopher.files.wordpress.com/2015/03/...an even more overlooked, and under-estimated factor; its rate of molecular spin. Hydration Machines™ Rainmaker

Carbon-hydrogen Spin-spin Coupling Constants

Energy efficient spin-locking in multi-core machines

New spin on smart at Titas - UMORFIL · WSA New spin on smart at Titas company will be the first one to dye woven fabrics with supercritical CO2. “Dyecoo machines currently represent

Spin-Mediated Transport in Superconducting and Spin ...

Spin-spin coupling in NMR

Competitive Product - Case Medical, Inc. · 2017-07-17 · Competitive Spin Price Weight ... - Infant incubators, cribs and warmers - Anesthesia machines and respiratory therapy equipment

Spin Diffusion NMR For Distance Determination Spin Diffusion Methods in SSNMR 1H-driven X-spin isotropic spin diffusion: Direct 1H spin diffusion: • With 1H evolution and X-spin

Tips and Tools Tool Time 8. Points to Remember Take a break from precepting when you are overcommitted and stressed. 8-2.

Lateral Spin Transport (Diffusive Spin Current)

Spin Riveting Machines by Orbital Systems (Bombay) Pvt. Ltd. Mumbai