PCI Passthrough and ITS Support in Xen / ARM :Xen Dev Summit 2015 Presentation
-
Upload
manish-jaggi -
Category
Technology
-
view
700 -
download
2
Transcript of PCI Passthrough and ITS Support in Xen / ARM :Xen Dev Summit 2015 Presentation
PCI Passthrough and GICv3-ITS in Xen ARM
Manish Jaggi
Vijaya Kumar KilariCavium, Inc. Proprietary
+ Demo on Dual Socket 48x2 Core ARMv8 Board
Page 2©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Agenda
Status of Xen Support from Cavium Top Level Architecture Additions in xen for pci-passthrough ITS architecture
– ARM specification– Virtual ITS driver in Xen
Xen NUMA Demo on Cavium ThunderX platform Questions
Page 3©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Status of Xen Support from Cavium
Xen 4.5+ (Current)– Demoed in Linaro Connect – Initial Support NUMA
Xen 4.6– Basic ThunderX platform support– Gicv3 Support.
Xen 4.7– vITS support– PCI Passthrough patches in Xen and Linux– NUMA Patches
Page 4©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Linaro Connect – DemoXen running on single socket 48 core - ThunderX
Page 5©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ThunderX System Dual Socket Reference Platform
x2
Standard Industry Form Factor: ½ SSI Motherboard 2U 19” Rack Mount Chassis
Volume Server I/O: PCIe Gen3 10Gb or 40Gb Ethernet Integrated SATA
Up to 128GB Memory
Full Systems Management w/ BMC and IPMI
http://cavium.com/pdfFiles/ThunderX_CRB_2S_Rev1.pdf
Page 6©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Xen NUMArunning on dual socket 48x2 cores
vCPU
dom0 domU
vITSvCPU
vITS
domU
vCPUvITS
Xen Hypervisor
DDR DDR
Node 0 (48 Cores) Node 1 (48 Cores)
Page 7©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Top Level Architecture
R/WMSI/X
vCPU
IO Virtualizaton with System MMU
dom0 domU
vITSvCPU
vITS
PCIe HostBridge
GICv3 ITS
DDR Controller
DDR
Xen Virtual ITS Driver
PCIe-EP1 PCIe-EP2
(DeviceID,MSI_Index)=>LPI
Interrupt Translation Table StreamID => ContextBank
ContextBank = {…, Domain PageTable, … }
vLPIvLPI
}
Page 8©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
additions in xen-arm… (proposed / implemented)
PCIe HostController Support in Xen.– pci_conf_read/write calls handled by host controller driver– device_tree based
vITS Emulation Support Hypercall to map Linux SegmentID to appropriate PCI HostController xl-toolstack additions
– Mapping of GITS_ITRANSLATER space in domain– assign_device hypercall enhanced to support vDeviceID
Frontend-Backend Changes– no communication for MSI.– Front-end PCI bus msi-parent => its node in guest device tree
SMMU additions
Page 9©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
PCIe Host Controller support in Xen
The init function in the pci host driver calls to register hostbridge callbacks:int pci_hostbridge_register(pci_hostbridge_t *pcihb);
struct pci_hostbridge_ops { u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,u32 reg, u32 bytes); void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
u32 reg, u32 bytes, u32 val);};
struct pci_hostbridge{ u32 segno; paddr_t cfg_base; paddr_t cfg_size; struct dt_device_node *dt_node; struct pci_hostbridge_ops ops; struct list_head list;};
Page 10©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
PHYSDEVOP_pci_host_bridge_add
#define PHYSDEVOP_pci_host_bridge_add 44struct physdev_pci_host_bridge_add { /* IN */ uint16_t seg; uint64_t cfg_base; uint64_t cfg_size;};
This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add hypercall. The handler code invokes … to update segment number in pci_hostbridge:
int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t cfg_size);
Page 11©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
xl toolstack additions - DOMCTL
For domU, while creating the domain, the toolstack reads the IPA from themacro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA isread from a new hypercall which returns the PA of the GITS_ITRANSLATER_SPACE.Subsequently the toolstack sends a hypercall to create a stage 2 mapping.
Hypercall Details: XEN_DOMCTL_get_itranslater_space
/* XEN_DOMCTL_get_itranslater_space */struct xen_domctl_get_itranslater_space { /* OUT variables. */ uint64_aligned_t start_addr; uint64_aligned_t size;};typedef struct xen_domctl_get_itranslater_space xen_domctl_get_itranslater_space;DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_itranslater_space;
Page 12©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
xl toolstack additions – device assignmentReserved Areas in guest memory spaceParts of the guest address space is reserved for mapping assigned pci device’s BAR regions. Toolstack is responsible for allocating ranges from this area and creating stage 2 mapping for the domain.This area is defined in public/arch-arm.h
/* For 32bit BARs*/ #define GUEST_BAR_BASE_32 <<>> #define GUEST_BAR_SIZE_32 <<>> /* For 64bit BARs*/ #define GUEST_BAR_BASE_64 <<>> #define GUEST_BAR_SIZE_64 <<>>
New entries in xenstore for device BARs/local/domain/0/backend/pci/1/0vdev-N BDF = "" BAR-0-IPA = "" BAR-0-PA = "" BAR-0-SIZE = ""
... BAR-M-IPA = "" BAR-M-PA = "" BAR-M-SIZE = "”
Page 13©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Hypercall Modification (XEN_DOMCTL_assign_device)
struct xen_domctl_assign_device { uint32_t dev; /* XEN_DOMCTL_DEV_* */ union { struct { uint32_t machine_sbdf; /* machine PCI ID of assigned device */ uint32_t guest_sbdf; /* guest PCI ID of assigned device */ } pci; struct { uint32_t size; /* Length of the path */ XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */ } dt; } u; };
Page 14©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
SMMU Code additions
iommu_ops functions PHYSDEVOP_pci_add_device .add_device = arm_smmu_add_dom0_dev,
PHYSDEVOP_pci_remove_device .remove_device = arm_smmu_remove_device
Page 15©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Mapping between streamID - deviceID - pci sbdf - requesterID
For a simpler case all should be equal to BDF.
But there are some devices that use the different requester ID for DMA transactions
Suggestions How to handle this
Page 16©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
pci-frontend bus gicv3-its node binding for domU It is assumed that toolstack would generate a gicv3-its node in domU device
tree. As of now the ARM PCI passthrough design supports device assignment to
the guests which have gicv3-its support.
All the devices assigned to domU are enumerated on a PCI frontend bus. On this bus interrupt parent is set as gicv3-its for ARM systems.
As the gicv3-its is emulated in xen, all the access by domU driver is trapped. This helps configuration & direct injection of MSI(LPI) into the guest. Frontend-backend communication for MSI is no longer required.
Frontend-backend communication is required only for reading PCI configuration space by dom0 on behalf of domU.
Page 17©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ITS
Interrupt Translation Service(ITS) is the specification from ARM to support PCI MSI(x).
MSI(x) are handled as Locality-specific Peripheral Interrupts (LPI) starting from IRQ number 8192.
LPIs are directly targeted to CPU. SW sends ITS command like MAPD, MAPVI, MOVI, INT, SYNC, INV to ITS
HW to prepare MSI(x) Translation. Command Completion notification using
– Polling– Interrupt notification by placing INT command
Page 18©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ITS HW-SW Interaction
ITT Table
ITT Table
ITT Table
CWRITER
CREADER
BASER
Command Queue
ITS Commands
Device Table
ITS HW
LPI Configuration
Table
LPI Pending Table (per CPU)
CPUS
CPUS
SOFTWARE
Allocated by SW used by
HW
Allocated by SW used by both
SW Write ITSCMDs to Queue
HW reads ITS CMDs and configures
CPUS
CPUS
Legend:
Page 19©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Major challenges in virtualizing ITS ITS Commands should be processed in with minimal latency
without blocking VCPU for long duration All guests should get fair amount of time in processing guest ITS
commands Guest cannot put Xen in DoS by sending commands continuously
– Solution: Do not send Guest ITS commands to HW. Just emulate them. Processing global ITS commands like SYNC, INVALL etc., on
platforms with Multi-node ITS– Solution: One Virtual ITS per domain and Ignore guests SYNC,
INVALL, DISCARD commands
Page 20©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Major challenges in virtualizing ITS
Handling guest ITS emulation that uses INT command for completion notification– Solution: Xen injects back virtual LPI to guest
when INT command is emulated.
Page 21©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ITS virtualization in XEN
ITS Virtualization– Command Queue virtualization– LPI configuration table virtualization– GITS registers virtualization
Page 22©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
XEN ITS InitializationXEN
Guest SOFTWARE
ITT Table
CWRITER
CREADER
BASER
Physical ITS command Queue
ITS Commands
Device Table
Memory Allocated by
XEN for ITS HW
(1) Guest sends PCI_DEVICE_ADD
_PHYSDEVOPS hyper call
(2) Allocates ITT table for the
device and sends MAPD command
to ITS HW
(3) Allocates LPIs (physical LPI) to
device and sends MAPVI commands
Page 23©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ITS command Virtualization
ITT Table
ITT Table
ITT Table
CWRITER
CREADER
BASER
Virtual Command
Queue
ITS Commands
Device Table
XEN
Guest SOFTWARE
Memory Allocated by
GUEST
(2) Xen uses Guest’s Device and ITT table
memory to note-down Guest ITS command
information
(1) Traps to Xen on Guest update of command in Virtual Queue
Page 24©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
MAPD/MAPVI ITS command Virtualization
ITT Table
CWRITER
CREADER
BASER
MAPD Devi ID, ITT IPA, Size
Device Table
XEN
Guest SOFTWARE
Memory Allocated by
GUEST
(1) XEN reads MAPD finds out IPA of ITT table and Size for the
devid
ITT IPA (8 bytes)
Size(8 bytes)
Virtual Command
Queue
MAPVI Dev ID, vID, Collection
Collection ID
vLPI (vID)
(3) XEN reads MAPVI
Command
(2) Xen uses Guest’s Device and ITT table
memory to note-down ITT Table IPA and Size
(4) Xen uses Guest’s Device Table to find
Address of ITT for the device and updates ITT indexed by ID with vLPI
and Collection ID
Page 25©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
LPI Routing to Guest
ITT TableDevice Table
XEN
Guest SOFTWARE
Memory Allocated by
Guest
(2) Xen queries Device Table and gets ITT
table
(1) Xen receives
pLPI
ITT IPA (8 bytes)
Size(8 bytes)
vLPI (vID)
Collection ID
HW
(3) From ITT table, Xen get
Virtual LPI (vLPI)
(4) Xen Injects vLPI to Guest
Page 26©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
References:
vITS Design doc– http://xenbits.xen.org/people/ianc/vits/draftG.pdf
Patches ( 22 )– http://osdir.com/ml/general/2015-07/msg35182.html
PCI Pass through Design doc– http://www.gossamer-
threads.com/lists/xen/devel/394962
Page 27©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Xen Dual(Socket / Node) NUMA Demo
vCPU
dom0 domU
vITSvCPU
vITS
domU
vCPUvITS
Xen Hypervisor
DDR DDRNode 0 (48 Cores) Node 1 (48 Cores)
Page 28©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
#xl listName ID Mem VCPUs State Time(s)Domain-0 0 2048 8 r----- 128.9domu-node0 1 2048 4 -b---- 1.4domu-node1 2 2048 4 -b---- 0.6
#xl cpupool-listName CPUs Sched Active Domain countPool-node0 48 credit y 2Pool-node1 48 credit y 1
xl cpupool-list -cName CPU listPool-node0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47Pool-node1 48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95