VIO Technology

90
Eserver pSeries © 2003 IBM Corporation "Any sufficiently advanced technology will have the appearance of magic." …Arthur C. Clarke Section 2: The Technology

Transcript of VIO Technology

Page 1: VIO Technology

Eserver pSeries

© 2003 IBM Corporation

"Any sufficiently advanced technology will have the appearance of magic."…Arthur C. Clarke

Section 2: The Technology

Page 2: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Section Objectives

On completion of this unit you should be able to:

– Describe the relationship between technology and solutions.

– List key IBM technologies that are part of the POWER5 products.

– Be able to describe the functional benefits that these technologies provide.

– Be able to discuss the appropriate use of these technologies.

Page 3: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

IBM and Technology

Science

Technology

Products

Solutions

Page 4: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Technology and innovation

Having technology available is a necessary first step.

Finding creative new ways to use the technology for the benefit of our clients is what innovation is about.

Solution design is an opportunity for innovative application of technology.

Page 5: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

When technology won’t ‘fix’ the problem

When the technology is not related to the problem.

When the client has unreasonable expectations.

Page 6: VIO Technology

Eserver pSeries

© 2003 IBM Corporation

POWER5 Technology

Page 7: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER4 and POWER5 CoresPOWER4 Core POWER5 Core

Page 8: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER5

Designed for entry and high-end servers

Enhanced memory subsystem

Improved performance

Simultaneous Multi-Threading

Hardware support for Shared Processor Partitions (Micro-Partitioning)

Dynamic power management

Compatibility with existing POWER4 systems

Enhanced reliability, availability, serviceability

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

GX+

Page 9: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Enhanced memory subsystem

Improved L1 cache design– 2-way set associative i-cache– 4-way set associative d-cache– New replacement algorithm (LRU vs. FIFO)

Larger L2 cache– 1.9 MB, 10-way set associative

Improved L3 cache design– 36 MB, 12-way set associative– L3 on the processor side of the fabric– Satisfies L2 cache misses more frequently– Avoids traffic on the interchip fabric

On-chip L3 directory and memory controller– L3 directory on the chip reduces off-chip delays after

an L2 miss– Reduced memory latencies

Improved pre-fetch algorithms

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enh

anced distributed switch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

Page 10: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Enhanced memory subsystem

L3 Cache

L3 Cache

ProcessorProcessor ProcessorProcessor ProcessorProcessor ProcessorProcessorProcessorProcessor ProcessorProcessorProcessorProcessor ProcessorProcessor

L2 Cache

L2 Cache

L2 Cache

L2 Cache

L2 Cache

L2 Cache

L2 Cache

L2 Cache

Fabric controller

Fabric controller

Fabric controller

Fabric controller

Memory controller

Memory controller

Memory Memory

L3 Cache

L3 Cache

Fabric controller

Fabric controller

Fabric controller

Fabric controller

Memory controllerMemory controller

Memory controllerMemory controller

Memory Memory

POWER4 system structure POWER5 system structureReduced

L3 latency

Faster access to memory

Larger SMPs

64-way

Number of chips cut

in half

L3

Dir

L3

DirL

3 D

irL

3 D

ir

Page 11: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Simultaneous Multi-Threading (SMT)

What is it?

Why would I want it?

Page 12: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Branch pipeline

Load/store pipeline

Fixed-point pipeline

Floating-point pipeline

POWER4 pipeline

MP ISS RF EA DC WB Xfer

MP ISS RF EX WB Xfer

MP ISS RF EX WB Xfer

MP ISS RF F6

Xfer

F6F6F6F6F6

D1 D2 D3 Xfer GD

IF BPCP

Instruction Crack and Group Formation

Instruction Fetch

Branch redirects

Interrupts & Flushes

Out-of-order processing

WB

Fmt

D0

IC

POWER4 instruction pipeline (IF = instruction fetch, IC = instruction cache, BP = branch predict, D0 = decode stage 0, Xfer = transfer, GD = group dispatch, MP = mapping, ISS = instruction issue, RF = register file read, EX = execute, EA = compute address, DC = data caches, F6 = six-cycle floating-point execution pipe, Fmt = data format, WB = write back, and CP = group commit)

POWER5 pipeline

Page 13: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

FX0FX1LS0LS1FP0FP1BFXCRL

Processor Cycles

i-Cac

he

Multi-threading evolution

Execution unit utilization is low in today’s microprocessors

25% of average execution unit utilization across a broad spectrum of environments

MemoryInstruction streams

Next evolution step

Page 14: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

FX0FX1LS0LS1FP0FP1BFXCRL

Processor Cycles

i-Cac

he

Coarse-grained multi-threading

Two instruction streams, one thread at any instance

Hardware swaps in second thread when long-latency event occurs

Swap requires several cycles

MemoryInstruction streams

Sw

ap

Sw

ap

Sw

ap

Next evolution step

Page 15: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Coarse-grained multi-threading (Cont.)

Processor (for example, RS64-IV) is able to store context for two threads

– Rapid switching between threads minimizes lost cycles due to I/O waits and cache misses.

– Can yield ~20% improvement for OLTP workloads.

Coarse-grained multi-threading only beneficial where number of active threads exceeds 2x number of CPUs

– AIX must create a “dummy” thread if there are insufficient numbers of real threads.

• Unnecessary switches to “dummy” threads can degrade performance ~20%

• Does not work with dynamic CPU deallocation

Page 16: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

FX0FX1LS0LS1FP0FP1BFXCRL

Processor Cycles

i-Cac

he

Fine-grained multi-threading

Variant of coarse-grained multi-threading

Thread execution in round-robin fashion

Cycle remains unused when a thread encounters a long-latency event

MemoryInstruction streams

Next evolution step

Page 17: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER5 pipeline

MP ISS RF EA DC WB Xfer

MP ISS RF EX WB Xfer

MP ISS RF EX WB Xfer

MP ISS RF F6

Xfer

F6F6F6F6F6

D1 D2 D3 Xfer GD

IF BPCP

Branch pipeline

Instruction Crack and Group Formation

Instruction Fetch

Branch redirects

Interrupts & Flushes

Out-of-order processing

WB

Fmt

D0

IC

POWER5 instruction pipeline (IF = instruction fetch, IC = instruction cache, BP = branch predict, D0 = decode stage 0, Xfer = transfer, GD = group dispatch, MP = mapping, ISS = instruction issue, RF = register file read, EX = execute, EA = compute address, DC = data caches, F6 = six-cycle floating-point execution pipe, Fmt = data format, WB = write back, and CP = group commit)

IFCP

Load/store pipeline

Fixed-point pipeline

Floating-point pipeline

POWER4 pipeline

Page 18: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

FX0FX1LS0LS1FP0FP1BFXCRL

Processor Cycles

i-Cac

he

Simultaneous multi-threading (SMT)

Reduction in unused execution units results in a 25-40% boost and even more!

MemoryInstruction streams

First evolution step

Page 19: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Simultaneous multi-threading (SMT) (Cont.)

Each chip appears as a 4-way SMP to software

– Allows instructions from two threads to execute simultaneously

Processor resources optimized for enhanced SMT performance

– No context switching, no dummy threads

Hardware, POWER Hypervisor, or OS controlled thread priority

– Dynamic feedback of shared resources allows for balanced thread execution

Dynamic switching between single and multithreaded mode

Page 20: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Dynamic resource balancing

Threads share many resources

– Global Completion Table, Branch History Table, Translation Lookaside Buffer, and so on

Higher performance realized when resources balanced across threads

– Tendency to drift toward extremes accompanied by reduced performance

Page 21: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Adjustable thread priority

Instances when unbalanced execution is desirable

– No work for opposite thread

– Thread waiting on lock

– Software determined non uniform balance

– Power management

Control instruction decode rate

– Software/hardware controls eight priority levels for each thread

0

0

0

1

1

1

1

1

2

2

Ins

tru

cti

on

s p

er

cy

cle

2,7 4,7 6,7 7,7 7,6 7,4 7,20,7 7,0 1,1

Thread 0 Priority - Thread 1 Priority

Thread 0 IPC Thread 1 IPC

Power Save Mode

Single-threaded operation

Hardware thread priorities

Page 22: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Single-threaded operation

Advantageous for execution unit limited applications

– Floating or fixed point intensive workloads

Execution unit limited applications provide minimal performance leverage for SMT

– Extra resources necessary for SMT provide higher performance benefit when dedicated to single thread

Determined dynamically on a per processor basis

Dormant

Null

Active

Software

Hardware or Software

Software

Software

Thread states

Page 23: VIO Technology

Eserver pSeries

© 2003 IBM Corporation

Micro-Partitioning

Page 24: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Micro-Partitioning overview

Mainframe inspired technology

Virtualized resources shared by multiple partitions

Benefits– Finer grained resource allocation– More partitions (Up to 254)– Higher resource utilization

New partitioning model– POWER Hypervisor– Virtual processors– Fractional processor capacity partitions– Operating system optimized for Micro-Partitioning exploitation– Virtual I/O

Page 25: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Shared processor pool

Processor terminologyShared processor

partitionSMT Off

Shared processor partitionSMT On

Dedicated processor partition

SMT Off

Deconfigured

Inactive (CUoD)

Dedicated

Shared

Virtual

Logical (SMT)

Installed physical processors

Entitled capacity

Page 26: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Shared processor partitions

Micro-Partitioning allows for multiple partitions to share one physical processor

Up to 10 partitions per physical processor

Up to 254 partitions active at the same time

Partition’s resource definition

– Minimum, desired, and maximum values for each resource

– Processor capacity

– Virtual processors

– Capped or uncapped• Capacity weight

– Dedicated memory• Minimum of 128 MB and 16 MB increments

– Physical or virtual I/O resources

CPU 0 CPU 1

CPU 3 CPU 4

LPAR 1 LPAR 2

LPAR 5 LPAR 6

LPAR 4LPAR 3

Page 27: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Understanding min/max/desired resource values

The desired value for a resource is given to a partition if enough resource is available.

If there is not enough resource to meet the desired value, then a lower amount is allocated.

If there is not enough resource to meet the min value, the partition will not start.

The maximum value is only used as an upper limit for dynamic partitioning operations.

Page 28: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Partition capacity entitlement

Processing units

– 1.0 processing unit represents one physical processor

Entitled processor capacity

– Commitment of capacity that is reserved for the partition

– Set upper limit of processor utilization for capped partitions

– Each virtual processor must be granted at least 1/10 of a processing unit of entitlement

Shared processor capacity is always delivered in terms of whole physical processors

Processing capacity1 physical processor1.0 processing units

0.5 processing unit 0.4 processing unit

Minimum requirement0.1 processing units

Page 29: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Capped and uncapped partitions

Capped partition

– Not allowed to exceed its entitlement

Uncapped partition

– Is allowed to exceed its entitlement

Capacity weight

– Used for prioritizing uncapped partitions

– Value 0-255

– Value of 0 referred to as a “soft cap”

Page 30: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Partition capacity entitlement example

Shared pool has 2.0 processing units available

LPARs activated in sequence

Partition 1 activated

– Min = 1.0, max = 2.0, desired = 1.5

– Starts with 1.5 allocated processing units

Partition 2 activated

– Min = 1.0, max = 2.0, desired = 1.0

– Does not start

Partition 3 activated

– Min = 0.1, max = 1.0, desired = 0.8

– Starts with 0.5 allocated processing units

Page 31: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Understanding capacity allocation – An example

A workload is run under different configurations.

The size of the shared pool (number of physical processors) is fixed at 16.

The capacity entitlement for the partition is fixed at 9.5.

No other partitions are active.

Page 32: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Uncapped – 16 virtual processors

16 virtual processors.

Uncapped.

Can use all available resource.

The workload requires 26 minutes to complete.

Uncapped (16PPs/16VPs/9.5CE)

0

5

10

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Elapsed time

Page 33: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Uncapped – 12 virtual processors

12 virtual processors.

Even though the partition is uncapped, it can only use 12 processing units.

The workload now requires 27 minutes to complete.

Uncapped (16PPs/12VPs/9.5CE)

0

5

10

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Elapsed time

Page 34: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Capped

The partition is now capped and resource utilization is limited to the capacity entitlement of 9.5.

– Capping limits the amount of time each virtual processor is scheduled.

– The workload now requires 28 minutes to complete.

Capped (16PPs/12VPs/9.5E)

0

5

10

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Elapses time

Page 35: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Dynamic partitioning operations

Add, move, or remove processor capacity– Remove, move, or add entitled shared processor capacity– Change between capped and uncapped processing– Change the weight of an uncapped partition– Add and remove virtual processors

• Provided CE / VP > 0.1 Add, move, or remove memory

– 16 MB logical memory block

Add, move, or remove physical I/O adapter slots

Add or remove virtual I/O adapter slots

Min/max values defined for LPARs set the bounds within which DLPAR can work

Page 36: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Dynamic LPAR

Standard on all new systems

HMC

AIX 5L

Linux

Hypervisor

Part#1

Production

Part#2 Part#3 Part#4

Legacy Apps

Test/Dev

File/Print

AIX 5L

AIX 5L

Move resources between live

partitions

Page 37: VIO Technology

Eserver pSeries

© 2003 IBM Corporation

Firmware

POWER Hypervisor

Page 38: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER Hypervisor strategy

New Hypervisor for POWER5 systems

– Further convergence with iSeries

– But brands will retain unique value propositions

– Reduced development effort

– Faster time to market

New capabilities on pSeries servers

– Shared processor partitions

– Virtual I/O

New capability on iSeries servers

– Can run AIX 5L

Page 39: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

H-Call Interface

POWER Hypervisor component sourcing

HSC VLANVLAN

VLAN IOALAN IOA

Nucleus (SLIC)

Virtual Ethernet

Capacity on Demand

Shared processor LPAR

Virtual I/O

Bus recovery Dump

Location codes

FSP

Load from flash

NVRAM

Message passing

pSeries

iSeries

255 partitions

Slot/tower concurrent maint. Drawer concurrent maint.

Partition on demand

HMC

SCSI IOA

I/O configuration

Page 40: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER Hypervisor functions

Same functions as POWER4 Hypervisor.– Dynamic LPAR– Capacity Upgrade on Demand

New, active functions.– Dynamic Micro-Partitioning– Shared processor pool– Virtual I/O– Virtual LAN

Machine is always in LPAR mode.– Even with all resources dedicated to one OS

Dynamic Micro-Partitioning

CPU 0 CPU 1

CPU 2 CPU 3

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

Shared processor pools

Disk LAN

Virtual I/ODynamic LPAR

PlannedActual

Client Capacity Growth

Capacity Upgrade on Demand

Page 41: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER Hypervisor implementation

Design enhancements to previous POWER4 implementation enable the sharing of processors by multiple partitions

– Hypervisor decrementer (HDECR)

– New Processor Utilization Resource Register (PURR)

– Refine virtual processor objects

• Does not include physical characteristics of the processor

– New Hypervisor calls

Page 42: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER Hypervisor processor dispatch Manage a set of processors on the machine

(shared processor pool).

POWER5 generates a 10 ms dispatch window.– Minimum allocation is 1 ms per physical

processor.

Each virtual processor is guaranteed to get its entitled share of processor cycles during each 10 ms dispatch window.– ms/VP = CE * 10 / VPs

The partition entitlement is evenly distributed among the online virtual processors.

Once a capped partition has received its CE within a dispatch interval, it becomes not-runnable.

A VP dispatched within 1 ms of the end of the dispatch interval will receive half its CE at the start of the next dispatch interval.

Shared processor pool

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

CPU 0 CPU 1

CPU 2 CPU 3

POWER Hypervisor’s

processor dispatch

Virtual processor capacity entitlement for six shared processor partitions

Page 43: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Dispatching and interrupt latencies

Virtual processors have dispatch latency.

Dispatch latency is the time between a virtual processor becoming runnable and being actually dispatched.

Timers have latency issues also.

External interrupts have latency issues also.

Page 44: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Shared processor pool Processors not associated with

dedicated processor partitions.

No fixed relationship between virtual processors and physical processors.

The POWER Hypervisor attempts to use the same physical processor.

– Affinity scheduling

– Home node

Shared processor pool

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

SMT CoreSMT Core

1.9 MB L2 Cache1.9 MB L2 Cache

Chip-Chip / MCM-MCM / SMPLink

Enhanced distributed sw

itch

SMT CoreSMT Core

L3 Dir

L3 Dir

Mem

Ctrl

Mem

Ctrl

CPU 0 CPU 1 CPU 2 CPU 3

POWER Hypervisor’s

processor dispatch

Virtual processor capacity entitlement for six shared processor partitions

Page 45: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Affinity scheduling

When dispatching a VP, the POWER Hypervisor attempts to preserve affinity by using:

– Same physical processor as before, or

– Same chip, or

– Same MCM

When a physical processor becomes idle, the POWER Hypervisor looks for a runnable VP that:

– Has affinity for it, or

– Has affinity to no-one, or

– Is uncapped

Similar to AIX affinity scheduling

Page 46: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Operating system support

Micro-Partitioning capable operating systems need to be modified to cede a virtual processor when they have no runnable work

– Failure to do this results in wasted CPU resources

• For example, an partition spends its CE waiting for I/O – Results in better utilization of the pool

May confer the remainder of their timeslice to another VP

– For example, a VP holding a lock

Can be redispatched if they become runnable again during the same dispatch interval

Page 47: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Example

POWER Hypervisor dispatch interval pass 1 (msec) POWER Hypervisor dispatch interval pass 2 (msec)

0 1 2 3 4 5 6 7 8 9

Physical processor 0

Physical processor 1

10 11 12 13 14 15 16 17 18 19

LPAR 2VP 0

LPAR 1VP 1

20

LPAR 3VP 2

LPAR2Capacity entitlement = 0.2 processing units; virtual processors = 1 (capped)

LPAR3Capacity entitlement = 0.6 processing units; virtual processors = 3 (capped)

LPAR1Capacity entitlement = 0.8 processing units; virtual processors = 2 (capped)

LPAR 1VP 1

LPAR 1VP 0

LPAR 3VP 0

LPAR 3VP 1

LPAR 3VP 2

LPAR 1VP 0

LPAR 3VP 1

LPAR 2VP 0

LPAR 1VP 1

LPAR 3VP 0

IDLE IDLE

Page 48: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER Hypervisor and virtual I/O

I/O operations without dedicating resources to an individual partition

POWER Hypervisor’s virtual I/O related operations– Provide control and configuration structures for virtual adapter

images required by the logical partitions– Operations that allow partitions controlled and secure access to

physical I/O adapters in a different partition– The POWER Hypervisor does not own any physical I/O devices;

they are owned by an I/O hosting partition

I/O types supported– SCSI– Ethernet– Serial console

Disk LAN

Page 49: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Performance monitoring and accounting

CPU utilization is measured against CE.– An uncapped partition receiving more than its CE will record 100%

but will be using more.

SMT– Thread priorities compound the variable speed rate.– Twice as many logical CPUs.

For accounting, interval may be incorrectly allocated.– New hardware support is required.

Processor utilization register (PURR) records actual clock ticks spent executing a partition.– Used by performance commands (for example, new flags) and

accounting modules.– Third party tools will need to be modified.

Page 50: VIO Technology

Eserver pSeries

© 2003 IBM Corporation

Virtual I/O Server

Page 51: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual I/O Server

Provides an operating environment for virtual I/O administration– Virtual I/O server administration– Restricted scriptable command line user interface (CLI)

Minimum hardware requirements– POWER5 VIO capable machine– Hardware management console– Storage adapter– Physical disk– Ethernet adapter– At least 128 MB of memory

Capabilities of the Virtual I/O Server– Ethernet Adapter Sharing– Virtual SCSI disk

• Virtual I/O Server Version 1.1 is addressed for selected configurations, which include specific models of EMC, HDS, and STK disk subsystems, attached using Fiber Channel

– Interacts with AIX and Linux partitions

Page 52: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual I/O Server (Cont.)

Installation CD when Advanced POWER Virtualization feature is ordered

Configuration approaches for high availability

– Virtual I/O Server• LVM mirroring• Multipath I/O• EtherChannel

– Second virtual I/O server instance in another partition

Page 53: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual SCSI

Allows sharing of storage devices

Vital for shared processor partitions

– Overcomes potential limit of adapter slots due to Micro-Partitioning

– Allows the creation of logical partitions without the need for additional physical resources

Allows attachment of previously unsupported storage solutions

Page 54: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

VSCSI server and client architecture overview Virtual SCSI is based on a

client/server relationship.

The virtual I/O resources are assigned using an HMC.

Virtual SCSI enables sharing of adapters as well as disk devices.

Dynamic LPAR operations allowed.

Dynamic mapping between physical and virtual resources on the virtual I/O server.

POWER Hypervisor

Client partition

Linux

Virtual I/O Server partition

Client partition

AIX

Physical adapter

VSCI client adapter

hdisk

VSCSI server adapter

VSCSI server adapter

VSCI client adapter

LVM

hdiskLogical volume 2

Logical volume 1

Physical disk (SCSI, FC)

Page 55: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual I/O Server partition

hdisk

Virtual devices

Are defined as LVs in the I/O server partition

– Normal LV rules apply

Appear as real devices (hdisks) in the hosted partition

Can be manipulated using Logical Volume Manager just like an ordinary physical disk

Can be used as a boot device and as a NIM target

Can be shared by multiple clients

POWER Hypervisor

Client partition

VSCI client adapter

LVM

LV

VSCSI server adapter

Virtual disk

LVM

hdisk

Page 56: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

SCSI RDMA and Logical Remote Direct Memory Access

SCSI transport protocols define the rules for exchanging information between SCSI initiators and targets.

Virtual SCSI uses the SCSI RDMA Protocol (SRP).– SCSI initiators and targets have the

ability to directly transfer information between their respective address spaces.

SCSI requests and responses are sent using the Virtual SCSI adapters.

The actual data transfer, however, is done using the Logical Redirected DMA protocol.

Reliable Command / Response TransportLogical Remote Direct Memory Access

POWER Hypervisor

Virtual I/O Server partition

Client partition AIX

Physical adapter

Physical adapter device

driver

VSCI device driver (target)

Device Mapping

VSCI device driver (initiator)

Data Buffer

Page 57: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual SCSI security

Only the owning partition has access to its data.

Data-information is copied directly from the PCI adapter to the client’s memory.

Page 58: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Performance considerations

Twice as many processor cycles to do VSCSI as a locally attached disk I/O (evenly distributed on the client partition and virtual I/O server)

– The path of each virtual I/O request involves several sources of overhead that are not present in a non-virtual I/O request.

– For a virtual disk backed by the LVM, there is also the performance impact of going through the LVM and disk device drivers twice.

If multiple partitions are competing for resources from a VSCSI server, care must be taken to ensure enough server resources (CPU, memory, and disk) are allocated to do the job.

If not constrained by CPU performance, dedicated partition throughput is comparable to doing local I/O.

Because there is no caching in memory on the server I/O partition, it's memory requirements should be modest.

Page 59: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Limitations

Hosting partition must be available before hosted partition boot.

Virtual SCSI supports FC, parallel SCSI, and SCSI RAID.

Maximum of 65535 virtual slots in the I/O server partition.

Maximum of 256 virtual slots on a single partition.

Support for all mandatory SCSI commands.

Not all optional SCSI commands are supported.

Page 60: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Implementation guideline

Partitions with high performance and disk I/O requirements are not recommended for implementing VSCSI.

Partitions with very low performance and disk I/O requirements can be configured at minimum expense to use only a portion of a logical volume.

Boot disks for the operating system.

Web servers that will typically cache a lot of data.

Page 61: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER Hypervisor

Virtual I/O Server

partition

Client partition

Virtual I/O Server

partition

LVM mirroring

This configuration protects virtual disks in a client partition against failure of:

– One physical disk

– One physical adapter

– One virtual I/O server

Many possibilities exist to exploit this great function!

Physical SCSI adapter

VSCSI server adapter

LVM

Physical disk (SCSI)

Physical SCSI adapter

Physical disk (SCSI)

VSCSI server adapter

LVM

VSCSI client

adapter

LVM

VSCSI client

adapter

Page 62: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

POWER Hypervisor

Virtual I/O Server

partition

Client partition

Virtual I/O Server

partition

Multipath I/O

This configuration protects virtual disks in a client partition against failure of:

– Failure of one physical FC adapter in one I/O server

– Failure of one Virtual I/O server

Physical disk is assigned as a whole to the client partition

Many possibilities exist to exploit this great function!

Physical FC adapter

VSCSI server adapter

LVM(hdisk)

Physical FC adapter

Physical disk ESS

VSCSI server adapter

LVM(hdisk)

VSCSI client

adapter

LVM

VSCSI client

adapter

SAN Switch

Page 63: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual LAN overview

Virtual network segments on top of physical switch devices.

All nodes in the VLAN can communicate without any L3 routing or inter-VLAN bridging.

VLANs provides:– Increased LAN security– Flexible network deployment over

traditional network devices

VLAN support in AIX is based on the IEEE 802.1Q VLAN implementation.– VLAN ID tagging to Ethernet

frames– VLAN ID restricted switch ports

Switch B Switch C

Switch A

Node A-1 Node A-2

Node B-1 Node B-2 Node B-3 Node C-1 Node C-2

VLAN 1

VLAN 2

X

Page 64: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual Ethernet

Enables inter-partition communication.

– In-memory point to point connections

Physical network adapters are not needed.

Similar to high-bandwidth Ethernet connections.

Supports multiple protocols (IPv4, IPv6, and ICMP).

No Advanced POWER Virtualization feature required.

– POWER5 Systems

– AIX 5L V5.3 or appropriate Linux level

– Hardware management console (HMC)

Page 65: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual Ethernet connections

VLAN technology implementation– Partitions can only access data directed to

them.

Virtual Ethernet switch provided by the POWER Hypervisor

Virtual LAN adapters appears to the OS as physical adapters– MAC-Address is generated by the HMC.

1-3 Gb/s transmission speed– Support for large MTUs (~64K) on AIX.

Up to 256 virtual Ethernet adapters– Up to 18 VLANs.

Bootable device support for NIM OS installations

Virtual Ethernet switch

POWER Hypervisor

Linux partition

AIX partition

Virtual Ethernet adapter

Virtual Ethernet adapter

AIX partition

Virtual Ethernet adapter

Page 66: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual Ethernet switch

Based on IEEE 802.1Q VLAN standard

– OSI-Layer 2

– Optional Virtual LAN ID (VID)

– 4094 virtual LANs supported

– Up to 18 VIDs per virtual LAN port

Switch configuration through HMC

Page 67: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

How it worksVirtual Ethernet adapter

Virtual VLAN switch port

PHYP caches source MAC

Check VLAN headerIEEE VLAN

header?

Insert VLAN header

Port allowed?

Dest. MAC in table?

Trunk adapter defined?

Configured associated switch port

Match for VLAN Nr. in

table?

DeliverPass to Trunk

adapterDrop packet

Y

N

N

N

N

Y

YN

Y

N

Y

Page 68: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Performance considerations

Virtual Ethernet performance

– Throughput scales nearly linear with the allocated capacity entitlement

Virtual LAN vs. Gigabit Ethernet throughput

– Virtual Ethernet adapter has higher raw throughput at all MTU sizes

– In-memory copy is more efficient at larger MTU

0

200

400

600

800

1000

Throughput/0.1 entitlement

[Mb/s]

0.1 0.3 0.5 0.8 1

15009000

65394

CPU entitlements

MTUsize

Throughput per 0.1 entitlement

0

2000

4000

6000

8000

10000

Throughput [Mb/s]

1

Throughput, TCP_STREAM

VLAN

Gb Ethernet

MTU 1500 1500 9000 9000 65394 65394Simpl./Dupl. S D S D S D

Page 69: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Limitations

Virtual Ethernet can be used in both shared and dedicated processor partitions provided with the appropriate OS levels.

A mixture of Virtual Ethernet connections, real network adapters, or both are permitted within a partition.

Virtual Ethernet can only connect partitions within a single system.

A system’s processor load is increased when using virtual Ethernet.

Page 70: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Implementation guideline

Know your environment and the network traffic.

Choose a high MTU size, as it makes sense for the network traffic in the Virtual LAN.

Use the MTU size 65394 if you expect a large amount of data to be copied inside your Virtual LAN.

Enable tcp_pmtu_discover and udp_pmtu_discover in conjunction with MTU size 65394.

Do not turn off SMT.

No dedicated CPUs are required for virtual Ethernet performance.

Page 71: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Connecting Virtual Ethernet to external networks Routing

– The partition that routes the traffic to the external work does not necessarily have to be the virtual I/O server.

Virtual Ethernet switch

POWER Hypervisor

Linux partition

AIX partition

3.1.1.103.1.1.10

AIX partition

3.1.1.11.1.1.100

Physical adapter

Virtual Ethernet switch

POWER Hypervisor

Linux partition

AIX partition

4.1.1.114.1.1.10

AIX partition

4.1.1.12.1.1.100

Physical adapter

IP subnet 1.1.1.X

AIX Server

LinuxServer

IP subnet 2.1.1.X

1.1.1.10 2.1.1.10

IP Router1.1.1.12.1.1.1

Page 72: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Shared Ethernet Adapter

Connects internal and external VLANs using one physical adapter.

SEA is a new service that acts as a layer 2 network switch.

– Securely bridges network traffic from a virtual Ethernet adapter to a real network adapter

SEA service runs in the Virtual I/O Server partition.

– Advanced POWER Virtualization feature required

– At least one physical Ethernet adapter required

No physical I/O slot and network adapter required in the client partition.

Page 73: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Shared Ethernet Adapter (Cont.)

Virtual Ethernet MAC are visible to outside systems.

Broadcast/multicast is supported.

ARP (Address Resolution Protocol) and NDP (Neighbor Discovery Protocol) can work across a shared Ethernet.

One SEA can be shared by multiple VLANs and multiple subnets can connect using a single adapter on the Virtual I/O Server.

Virtual Ethernet adapter configured in the Shared Ethernet Adapter must have the trunk flag set.

– The trunk Virtual Ethernet adapter enables a layer-2 bridge to a physical adapter

IP fragmentation is performed or an ICMP packet too big message is sent when the shared Ethernet adapter receives IP (or IPv6) packets that are larger than the MTU of the adapter that the packet is forwarded through.

Page 74: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual Ethernet and Shared Ethernet Adapter security

VLAN (virtual local area network) tagging description taken from the IEEE 802.1Q standard.

The implementation of this VLAN standard ensures that the partitions have no access to foreign data.

Only the network adapters (virtual or physical) that are connected to a port (virtual or physical) that belongs to the same VLAN can receive frames with that specific VLAN ID.

Page 75: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Performance considerations

Virtual I/O-Server performance

– Adapters stream data at media speed if the Virtual I/O server has enough capacity entitlement.

– CPU utilization per Gigabit of throughput is higher with a Shared Ethernet adapter.

0

500

1000

1500

2000

1 2 3 4

Virtual I/O Server Throughput, TCP_STREAMThroughput

[Mb/s]

MTU 1500 1500 9000 9000Simplex/Duplex simplex duplex simplex duplex

0

20

40

60

80

100

1 2 3 4

Virtual I/O Server normalized CPU utilisation, TCP_STREAM

CPU Utilisation [%cpu/Gb]

MTU 1500 1500 9000 9000Simplex/Duplex simplex duplex simplex duplex

Page 76: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Limitations

System processors are used for all communication functions, leading to a significant amount of system processor load.

One of the virtual adapters in the SEA on the Virtual I/O server must be defined as a default adapter with a default PVID.

Up to 16 Virtual Ethernet adapters with 18 VLANs on each can be shared on a single physical network adapter.

Shared Ethernet Adapter requires:

– POWER Hypervisor component of POWER5 systems

– AIX 5L Version 5.3 or appropriate Linux level

Page 77: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Implementation guideline

Know your environment and the network traffic.

Use a dedicated network adapter if you expect heavy network traffic between Virtual Ethernet and local networks.

If possible, use dedicated CPUs for the Virtual I/O Server.

Choose 9000 for MTU size, if this makes sense for your network traffic.

Don’t use Shared Ethernet Adapter functionality for latency critical applications.

With MTU size 1500, you need about 1 CPU per gigabit Ethernet adapter streaming at media speed.

With MTU size 9000, 2 Gigabit Ethernet adapters can stream at media speed per CPU.

Page 78: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Shared Ethernet Adapter configuration

The Virtual I/O Server is configured with at least one physical Ethernet adapter.

One Shared Ethernet Adapter can be shared by multiple VLANs.

Multiple subnets can connect using a single adapter on the Virtual I/O Server.

Virtual Ethernet switch

POWER Hypervisor

Linux partition

AIX partition

VLAN 210.1.2.11

VLAN 110.1.1.11

Virtual I/O Server

VLAN 1ent0

Physical adapter

VLAN 1

AIX Server10.1.1.14

Shared Ethernet Adapter

VLAN 2

Linux Server10.1.2.15

VLAN 2

Page 79: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Multiple Shared Ethernet Adapter configuration

Maximizing throughput

– Using several Shared Ethernet Adapters

– More queues

– More performance

Virtual Ethernet switch

POWER Hypervisor

Linux partition

AIX partition

VLAN 210.1.2.11

VLAN 110.1.1.11

Virtual I/O Server

VLAN 1

ent0

Physical adapter

VLAN 1

AIX Server10.1.1.14

Shared Ethernet Adapter

VLAN 2

Linux Server10.1.2.15

VLAN 2

Physical adapter

ent1

Page 80: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Multipath routing with dead gateway detection

This configuration protects your access to the external network against:

– Failure of one physical network adapter in one I/O server

– Failure of one Virtual I/O server

– Failure of one gateway

Virtual Ethernet switch

POWER Hypervisor

AIX partition

Physical adapter

External network

Physical adapter

Virtual I/O Server 2

Shared Ethernet Adapter

VLAN 29.3.5.21

Virtual I/O Server 2

ent0

Shared Ethernet Adapter

VLAN 19.3.5.11

VLAN 29.3.5.22

VLAN 19.3.5.12

Multipath routing with

dead gateway detection

default route to 9.3.5.10 via 9.3.5.12default route to 9.3.5.20 via 9.3.5.22

Gateway9.3.5.10

Gateway9.3.5.20

ent0

Page 81: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Shared Ethernet Adapter commands

Virtual I/O Server commands

– lsdev -type adapter: Lists all the virtual and physical adapters.

– Choose the virtual Ethernet adapter we want to map to the physical Ethernet adapter.

– Make sure the physical and virtual interfaces are unconfigured (down or detached).

– mkvdev: Maps the physical adapter to the virtual adapter, creates a layer 2 bridge, and defines the default virtual adapter with its default VLAN ID. It creates a new Ethernet interface (for example, ent5).

– The mktcpip command is used for TCP/IP configuration on the new Ethernet interface (for example, ent5).

Client partition commands

– No new commands are needed; the typical TCP/IP configuration is done on the virtual Ethernet interface that it is defined in the client partition profile on the HMC.

Page 82: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Virtual SCSI commands

Virtual I/O Server commands

– To map a LV:• mkvg: Creates the volume group, where a new LV will be created using the

mklv command.• lsdev: Shows the virtual SCSI server adapters that could be used for

mapping with the LV.• mkvdev: Maps the virtual SCSI server adapter to the LV.• lsmap -all: Shows the mapping information.

– To map a physical disk:• lsdev: Shows the virtual SCSI server adapters that could be used for

mapping with a physical disk.• mkvdev: Maps the virtual SCSI server adapter to a physical disk.• lsmap -all: Shows the mapping information.

Client partition commands

– No new commands needed; the typical device configuration uses the cfgmgr command.

Page 83: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Section Review Questions

1. Any technology improvement will boost performance of any client solution.

a. True

b. False

2. The application of technology in a creative way to solve client’s business problems is one definition of innovation.

a. True

b. False

Page 84: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Section Review Questions

3. Client’s satisfaction with your solution can be enhanced by which of the following?

a. Setting expectations appropriately.

b. Applying technology appropriately.

c. Communicating the benefits of the technology to the client.

d. All of the above.

Page 85: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Section Review Questions

4. Which of the following are available with POWER5 architecture?

a. Simultaneous Multi-Threading.

b. Micro-Partitioning.

c. Dynamic power management.

d. All of the above.

Page 86: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Section Review Questions

5. Simultaneous Multi-Threading is the same as hyperthreading, IBM just gave it a different name.

a. True.

b. False.

Page 87: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Section Review Questions

6. In order to bridge network traffic between the Virtual Ethernet and external networks, the Virtual I/O Server has to be configured with at least one physical Ethernet adapter.

a. True.

b. False.

Page 88: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Review Question Answers

1. b

2. a

3. d

4. d

5. b

6. a

Page 89: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Unit Summary

You should now be able to:

– Describe the relationship between technology and solutions.

– List key IBM technologies that are part of the POWER5 products.

– Be able to describe the functional benefits that these technologies provide.

– Be able to discuss the appropriate use of these technologies.

Page 90: VIO Technology

^Eserver pSeries

© 2003 IBM Corporation Concepts of Solution Design

Reference

You may find more information here:

IBM eServer pSeries AIX 5L Support for Micro-Partitioning and Simultaneous Multi-threading White Paper

Introduction to Advanced POWER Virtualization on IBM eServer p5 Servers SG24-7940

IBM eServer p5 Virtualization – Performance Considerations SG24-5768