Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya...

Scaling Guest OS Critical Sections with eCS

Sanidhya Kashyap, Changwoo Min, Taesoo Kim

The physical and virtual CPU abstraction

● Mismatch between

CPU abstraction

Physical machine (Host)pCPU 1 pCPU 2 pCPU 3 pCPU 4

CPU abstraction

Hardware abstraction Physical machine (Host)

pCPU 1 pCPU 2 pCPU 3 pCPU 4

CPU abstraction

Physical machine (Host)pCPU 1

Hypervisor

pCPU 2 pCPU 3 pCPU 4

Virtual machinevCPU 1 vCPU 2 vCPU 3 vCPU 4

App App App...Hardware abstraction

CPU abstraction

Hardware abstraction

Software abstraction

Hypervisor

App App App...● Mismatch between

CPU abstraction

● VM consolidation- Contention on pCPU

Hypervisor

App App App...

Multiple vCPUs

Hypervisor

CPU abstraction

● VM consolidation- Contention on pCPU

Hypervisor

App App App...

Multiple vCPUs

Hypervisor

A vCPU can be preempted without notification

CPU abstraction

● VM consolidation- Contention on vCPU

Hypervisor

App App App...

Multiple vCPUs

Hypervisor

A vCPU can be preempted without notification

Double scheduling issue

vCPU 1 vCPU 3vCPU 2vCPU 1

Double scheduling: Lock holder preemption (LHP)

● vCPU holding a lock is preempted

● Preemption hinders forward progress of the VM

● Can lead to application slowdown by 20 -- 130%

vCPUscheduled

vCPUpreempted

Access a file Running task in a VM

Efforts to mitigate preemption issues

● Focussed only non-blocking locks

○ Acquire iff sufficient schedule time

● Hotplug vCPUs on the fly

○ May not scale to large vCPU VMs

● VM co-scheduling

○ Does not always alleviate the issue

● Mostly address other preemption

problem

○ Blocking locks

○ Unfair non-blocking locks

● Hardware features to mitigate

preemptions

Research efforts Current practice

Efforts to mitigate preemption issues

● Focussed only non-blocking locks

○ Acquire iff sufficient schedule time

● Hotplug vCPUs on the fly

○ May not scale to large vCPU VMs

● VM co-scheduling

○ Does not always alleviate the issue

● Mostly address other preemption

problem

○ Blocking locks

○ Unfair non-blocking locks

● Hardware features to mitigate

preemptions

Research efforts Current practice

Prior approaches are mostly specialized

Still the double scheduling is looming!● LHP for blocking locks

○ mutex, rwsem

● Readers preemption (RP) in read-write locks

○ A reader is preempted while holding the lock

● Interrupt context preemption (ICP)

○ Preemption of a vCPU processing an interrupt

● Blocked-waiter wakeup (BWW)

○ Waking up a blocked thread on an idle vCPU is at least 10 times costlier

Still the double scheduling is looming!● LHP for blocking locks

○ mutex, rwsem

● Readers preemption (RP) in read-write locks

○ A reader is preempted while holding the lock

● Interrupt context preemption (ICP)

○ Preemption of a vCPU processing an interrupt

● Blocked-waiter wakeup (BWW)

○ Waking up a blocked thread on an idle vCPU is at least 10 times costlier

Semantic gap between virtual and physical CPU

Our approach to address semantic gap

Insight:A vCPU may be running a critical task!

Approach:Avoid preempting a vCPU with a critical task

Design:Identify and mark/unmark a critical task

vCPU 1vCPU 1 vCPU 2vCPU 2 vCPU 3

Identifying each critical section with eCS

ScheduledvCPU

PreemptedvCPU

Access a file

● Synchronization primitives protect critical sections → ensure OS progress

● Mark and unmark critical sections before and after the critical section

● Conservative, but effective approach to address each preemption problem

○ 60 LoC annotates 85K lock invocations in 13M LoC in Linux

Running task in a VM

vCPU 1vCPU 1 vCPU 2vCPU 2 vCPU 3

ScheduledvCPU

PreemptedvCPU

Access a file EnlightenedvCPU

vCPU 1 vCPU 2vCPU 2 vCPU 3

ScheduledvCPU

PreemptedvCPU

Access a file EnlightenedvCPU

Sharing the state for efficient notification

vCPU(A) vCPU(B) vCPU(C)

eCSstates

pcpu_overloaded (0/1)vcpu_preempted (0/1)

non_preemptable_ecs_countpreemptable_ecs_count

vCPU(A)state

eCSstates

vCPU(B)state

vCPU(C)state

Hypervisor

● Each vCPU shares memory with the hypervisor

● vCPU updates information for critical sections

○ Notifies critical task to the hypervisor

● Hypervisor also updates scheduler context

before/after scheduling out a vCPU

○ Enables vCPU to make efficient scheduling

decisions

Lightweight para-virtualized APIs to update states

eCSstates

pcpu_overloaded (0/1)vcpu_preempted (0/1)

Hint API

VM → Hypervisor

activate_non_preemptable_ecs(cpu)

deactivate_non_preemptable_ecs(cpu_id)

activate_preemptable_ecs(cpu_id))

deactivate_preemptable_ecs(cpu_id)

Hypervisor → VMis_vcpu_preempted(cpu_id)

is_pcpu_overloaded(cpu_id)

non_preemptable_ecs_countpreemptable_ecs_count

vCPU(A)state

eCSstates

vCPU(B)state

vCPU(C)state

Hypervisor

Updated by each vCPU; read by the hypervisor

Update by the hypervisor; read by a vCPU

Hypervisor checks eCS state before scheduling out a vCPU

Access a file

eCSstates

ecs_count (0)

ecs_count (1)ecs_count (0)

Time sharedpCPU 1

vCPU 1VM2

vCPU 1VM1

ScheduledvCPU

PreemptedvCPU

EnlightenedvCPU

➀ Running vCPU 1➁ vCPU 1 acquires lock➂ vCPU 1 updates eCS count➃ Hypervisor checks states before vCPU 1 preemption➄ Hypervisor lets vCPU 1 runs for extra time➅ vCPU 1 finishes and updates eCS count➆ Hypervisor penalizes vCPU 1 later

➂➃

Hypervisor checks eCS state before scheduling out a vCPU

Access a file

eCSstates

ecs_count (0)

ecs_count (1)ecs_count (0)

Time sharedpCPU 1

vCPU 1VM2

vCPU 1VM1

ScheduledvCPU

PreemptedvCPU

EnlightenedvCPU

➀ Running vCPU 1➁ vCPU 1 acquires lock➂ vCPU 1 updates eCS count➃ Hypervisor checks states before vCPU 1 preemption➄ Hypervisor lets vCPU 1 runs for extra time➅ vCPU 1 finishes and updates eCS count➆ Hypervisor penalizes vCPU 1 later

➂➃

Extended schedule Penalized schedule

The case for system eventual fairness● Hypervisor accounts extra time and later penalizes the enlightened VM

○ Penalize the schedule of an enlightened VM

○ Extend the schedule of the very next VM

● Hypervisor optimistically extends time for an enlightened CS

○ Decision made just before scheduling out a vCPU

○ Extra time (schedule) to avoid preemption: 1 ms

Even vCPU can make efficient scheduling decisions

● Share the hypervisor context with each VM○ Lock waiters can avoid bWW problem

● Virtualized scheduling-aware spinning○ Lock waiter keeps spinning until the lock is not

acquired if the pCPU is not overloaded

eCSstates

vCPU(A)state

eCSstates

vCPU(B)state

vCPU(C)state

Hypervisor

pcpu_overloaded (0/1)

Implementation

● Rely on paravirtualized VM

● Extended scheduler’s preempt_notifier API to check eCS states

○ Rely on scheduler_tick() to avoid vCPU preemption

● Overall implementation is 1000 LoC

○ 60 LoC for annotating almost every lock-based critical section

Evaluation

● Does eCS improves VM’s performance?

● Does hypervisor maintain system eventual fairness?

● Setup: 8-socket, 80-core NUMA machine

Impact of eCS in over-committed scenario

Apache web server Psearchy

● Experiment: run two VMs running same application

● eCS improves application throughput by 1.2 -- 2.3X

● eCS avoids preemptions by 85.8--100% → an extra schedule tick is sufficient

Preemptions avoided

Impact of eCS in under-committed scenario

● Experiment: Run only one VM with an application

● eCS improves application performance by 1.2 -- 1.9X

● Virtualized scheduling-aware spinning addresses BWW for blocking locksApache web server Psearchy

System eventual fairness

● Experiment: an application reading a file

● Hypervisor’s scheduler (CFS) maintains eventual fairness

● Both VMs get equal time even though VM2 (eCS) is granted extra schedules

● CFS maintains eventual fairness by penalizing VM2

○ Each run for equal time (4.95 seconds out of 10 seconds)

Discussion

● Right approach for Linux adoption

○ Leverage steal_time_struct that exposes preempted method

● Annotation

○ Use VM → Hypervisor API to mark functions

● Extending the concept to the userspace

○ Require composable scheduling abstraction to support user space

Conclusion

● Double scheduling leads to several preemption problems

● Six lightweight paravirtualized methods to annotate critical sections

● Leverage hypervisor’s scheduler to mitigate vCPU preemptions

● Allow vCPU to make efficient scheduling decision

● A generic approach to mitigate all preemption problems!

Thank you!

Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya...

Documents

Transcript of Scaling Guest OS Critical Sections with CS · Scaling Guest OS Critical Sections with eCS Sanidhya...

Ground Level - Greystone on Hudson · artist rendering guest bedroom 1 guest bath 1 guest bath 2 guest bath 3 guest bedroom 2 guest wing arcade gym/yoga room lower pantry utility/mechanical

sosa - JSI Furniture · sosa sosa guest seating guest chair guest chair guest chair guest chair guest chair guest chair. guge mix and match nine progressive back styles The modern

Sensory Map - mktg.mlbstatic.com · guest services sections 111, 242 & 314 sections 111, 218, 233 & 314 carter’s nursing lounge section 141 p r e s b o x coca-cola red porch ce

Chapter Nine Measurement and Scaling: Noncomparative Scaling Techniques.

Chapter 10: OSPF Tuning and Troubleshooting - … Scaling Networks/Week 7/ScaNv6... · Chapter 10: OSPF Tuning and Troubleshooting Scaling Networks. ... Chapter 10 - Sections & Objectives

Scaling Factors and scaling parameters

Measurement & Scaling Dr. Surej P John. Comparative Scaling.

Scaling inBlast Neurotrauma - IRCOBI · strain scaling models were combined in order to isolate pressure and duration scaling. From the isolated scaling models the peak pressure scaling

JIMI falconer bio march 20 2018... · Lek Musik) (Guest Mix) OLD NORSK SESSIONS (Guest Mix) W (Guest Mix) eland (Guest mix) ONVERT SESSIONS (Guest mix) TURO SOUNDS (Guest mix) INNERVISIONS

IMPLEMENTATION OF FARM TO MARKET ROAD (FMR) … · 3. Longitudinal, transverse, hairline cracks, potholes/scaling and spalling on some sections of the seven FMRs were noted, thus,

Chapter Ten Measurement and Scaling: Noncomparative Scaling Techniques.

Guest Operating System · 9 Choosing and Installing Guest Operating Systems The following sections provide notes on installing specific guest operating systems under the VMware products

MAIN ACTIVE FAULTS FROM THE EASTERN PART OF ROMANIA ... · • Geological maps of faults expressed at the surface. • Geological sections across the active fault system. • Scaling

AWS Auto ScalingAWS Auto Scaling User Guide Features of AWS Auto Scaling What Is AWS Auto Scaling? AWS Auto Scaling enables you to conﬁgure automatic scaling for the AWS resources

Measurement and scaling fundamentals and comparative scaling

Report Number: 2536-18002 Project No.: 29225€¦ · IAPMO IGC 335-2018: 5 Rapid Scaling Test 5.1 General Test – FOLLOWED. The rapid scaling tests specified in Sections 5.3 to 5.4

Non ComparativeNon comparative scaling techniques.ppt Scaling Techniques

Scaling a marketing intelligence platform - Aleph.bet event #scaling

1 Knowledge Representation and Reasoning Focus on Sections 10.1-10.3, 10.6 Guest Lecturer: Eric Eaton University of Maryland Baltimore County Lockheed.

Challenges and Opportunities with UGT Enzymes · • In vitro scaling methods • Simple allometric scaling • Interspecies scaling methods: - single species scaling with and without