Policy-driven, Platform-aware Nova Scheduler

23
Policy-driven, Platform-aware Nova Scheduler Adrian Hoban, Principal Engineer, Intel Ramki Krishnan, Distinguished Engineer, CTO NFV, Dell Tim Hinrichs, CTO, Styra

Transcript of Policy-driven, Platform-aware Nova Scheduler

Page 1: Policy-driven, Platform-aware Nova Scheduler

Policy-driven, Platform-aware Nova SchedulerAdrian Hoban, Principal Engineer, Intel

Ramki Krishnan, Distinguished Engineer, CTO NFV, Dell

Tim Hinrichs, CTO, Styra

Page 2: Policy-driven, Platform-aware Nova Scheduler

2Dell - Restricted - Confidential

Team• Core Team (besides presenters)

– Arun Yerra, Dell

– Dilip Krishnaswamy, IBM Research

– Joseph Gasparakis, Intel

– Ruby Krishnaswamy, Orange

• Contributor Acknowledgement

– Anoop Ghanwani, Dell

– Diego Lopez, Telefonica (Operator)

– Francisco Javier, Telefonica (Operator)

– Frank Zdarsky, Red Hat

– Jim Hao Chen, Northwestern University

– Norival Figueira, Brocade

– Peter Willis, BT (Operator)

– Sridhar Ramaswamy, Brocade

– Steve Gordon, Red Hat

– Sylvain Bauza, Red Hat

– Uri Elzur, Intel

Page 3: Policy-driven, Platform-aware Nova Scheduler

3Dell - Restricted - Confidential

OpenStack Nova Scheduler Challenges

• Platform Features Beyond Compute

– SDS use case: High perf storage and compute isolation

– Wait for next OpenStack Release?

• Ease of Use

– Gen use case: Determine highly loaded or unusable hosts

– Build use case specific analysis tools?

• Initial Placement vs other Functions

– NFV use case: Dynamic monitoring and violation detection

– Design one-off monitoring framework?

Admin

User

Page 4: Policy-driven, Platform-aware Nova Scheduler

4Dell - Restricted - Confidential

Application Performance-aware Workload Placement (1)

Delivering “Low-latency, reliable delivery” workloads e.g. Broadcast Video, Distance Learning, Augmented Reality in the Telco Cloud

• NFV Orchestrator - End-to-end - Intra-dc, Inter-dc WAN etc.

• Exemplary VNFs - Stateful firewall, Wireless Video Proxy, Crypto

• Compute: Fine grained resource partitioning for VM

– Dedicated core(s) AND NUMA awareness AND L3 cache part [1] ANDSR-IOV *** ELSE **

– Dedicated core(s) AND NUMA awareness AND L3 cache partitioning AND DPDK vSwitch *** ELSE ***

– Dedicated physical server

• Network: Overlay/Underlay QoS

– High QoS AND Minimum buffer depth in switches

• Storage: High Performance Logging

– NVMe SSD based storage *** ELSE *** SSD based storage

Ref. [1] - Intel RDT - http://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html

3G 4G 5G

Premium Quality Video

Poor Quality Video Infrastructure Issues

Page 5: Policy-driven, Platform-aware Nova Scheduler

5Dell - Restricted - Confidential

Application Performance-aware Workload Placement (2)

Delivering “Classic enterprise" workloads, e.g. Email, CRM in the Telco Cloud

• Exemplary data plane VNFs - Stateful firewall, IDS/IPS, WAN Opt and IPSEC crypto

• Compute: Deterministic performance by avoiding memory contention

– NUMA awareness AND SR-IOV *** ELSE ***

– NUMA awareness

• Network: No HA requirement

• Storage: SSD for High performance logging

Delivering “Residential broadband" workloads, e.g. cost-effective Internet in the Telco Cloud

• Exemplary data plane VNFs - NAT

• Compute/Network: Max capacity limit

• Storage: HDD for Low cost

Page 6: Policy-driven, Platform-aware Nova Scheduler

6Dell - Restricted - Confidential

Policy-driven Scheduler Approach (1)Minimize Vendor Lock-in and DependencyMaximize feature velocity

• Extensibility– Admin/User can add new compute (Nova),

networking (Neutron), storage (Cinder) constraints on the fly

• Understandability– Admin/User uses human readable scheduling

policies and build analysis tools on a need basis

• Monitoring– Admin/User benefits from a single representation

for handling variation in resource utilization and initial placement

Minimize additional code

No custom analysis tools

No delay in monitoring feature

availability

Page 7: Policy-driven, Platform-aware Nova Scheduler

7Dell - Restricted - Confidential

Policy-driven Scheduler Approach (2)Best of Breed

• Imperative Interface Choices

– Extensions to current JSON filter - JSON Weight

• Declarative Interface Choices

– JSON Filter extensions to current Nova Flavors

– Datalog embedded in YAML for flexible constraint specification and database manipulation

Enable user to customize specific

applications

Address User understandability, Admin extensibility

Page 8: Policy-driven, Platform-aware Nova Scheduler

8Dell - Restricted - Confidential

Imperative Example

Policy-driven Scheduler

User Request

NUMA and SR-IOV

else

NUMA and more cores

Host 1 Host 2

Host 3 Host 4

Host1: SRIOV

Host2: NUMA, SRIOV

Host3: NUMA, more cores

Host4: L3 partitioning

Output

2

1

Host 2

Host 3

User Describes Desired

Hardware

Page 9: Policy-driven, Platform-aware Nova Scheduler

9Dell - Restricted - Confidential

Declarative Example

Policy-driven Scheduler

User Request

affinity: [“vm123”, “vm456”]

memory: 10GB

type: “low-latency, reliable-delivery”

Host 2

Output

Host 1 Host 2

Host 3 Host 4

Policy

Store

Policy

This type requires

local ephemeral

SSD-backed storage

Host2 data

memory: 20GB

storage: ssd

User Describes Workload

Page 10: Policy-driven, Platform-aware Nova Scheduler

10Dell - Restricted - Confidential

OpenStack Nova Scheduler

Host 1

Host 2

Host 3

Host N

Host 1

Host 3

Host 8

Host 9

Filters Weighting

• 30+ types of filters.

• Find the subset of suitable hosts.

• Order suitable hosts.

Host 8

Host 1

Host 9

Host 3

:

:

Page 11: Policy-driven, Platform-aware Nova Scheduler

11Dell - Restricted - Confidential

Nova Scheduler Filter• Administrator configures the filter list (30+ options)• scheduler_default_filters=RamFilter,Compute

Filter,AvailabilityZoneFilter,ComputeCapabi

litiesFilter,ImagePropertiesFilter,ServerGr

oupAntiAffinityFilter,ServerGroupAffinityFi

lter'

• Admin configures various filter input data sets such as the

flavor definition with extra_specsHost 1:

Host 3:

Host 8:

Host 9:

Each host complies with an imperative request based on user and admin input.

E.g. 4GB for VM, huge pages, AES-NI, same availability zone, PCIe accelerators, can meet

image property requirements, etc., etc.

Page 12: Policy-driven, Platform-aware Nova Scheduler

12Dell - Restricted - Confidential

Nova Scheduler Weight

• Configured by the administrator.• RAM

– Spread across hosts evenly based on RAM utilisation.

• Metrics– Weigh hosts based on a combination of the

weight associated with the specified host_state metrics.

• IO Ops– Weight hosts based on I/O operations.

• Affinity– Weights hosts based on the number of

instances from a given server group.– Affinity and Anti-Affinity options available.

Host 8: 10GB Free

Host 1: 7GB Free

Host 9: 3GB Free

Host 3: 1GB Free

RAM Centric

weighting policy

Page 13: Policy-driven, Platform-aware Nova Scheduler

13Dell - Restricted - Confidential

• Administrator input to the filter scheduler is largely static and Nova centric– E.g. flavour and extra_spec definitions, Host aggregate definitions, etc.

• Not possible to deploy to a given service level with different infrastructure resource allocations (in the same request) under policy governance.

• Not possible to modify the weighting configuration/policy for different parts of the environment such as per availability zone or host aggregates.

Problem Statement(s) – Nova Placement

Page 14: Policy-driven, Platform-aware Nova Scheduler

14Dell - Restricted - Confidential

Empower User: JsonFilter + JsonWeight

Filter Scheduler

Host Data

(Nova’s HostState)

User Request

NUMA and SR-IOV weighted 2

NUMA and more cores weighted 1

JsonFilter JsonWeight

Page 15: Policy-driven, Platform-aware Nova Scheduler

15Dell - Restricted - Confidential

Empower Admin 1: New Filter

Filter Scheduler

Policy StoreHost Data

(Config, File)(Nova’s HostState)

User Request

workload: “low-latency, reliable-delivery”

tenant-id: “pepsi”

AdminJsonFilter AdminJsonWeight

Pro: Extensible by admin to external data sources like Cinder and Neutron.

Con: New filter on already long list.

Page 16: Policy-driven, Platform-aware Nova Scheduler

16Dell - Restricted - Confidential

Empower Admin 2: Modify Existing Filters

Field Description

vCPUs Number of virtual CPUs

Memory_MB VM memory in megabytes

Disk Virtual root disk size in GB

Extra_specs Key-value pairs

Policy AND/OR/NOT of tests

Flavor fields

Field Description

ID Number of virtual CPUs

Name VM memory in megabytes

AvailabilityZone Virtual root disk size in GB

Hosts List of hosts in group

Metadata Key-value pairs

Policy AND/OR/NOT of tests

Host Aggregate FieldsPro: Extensible by admin.

Already part of workflow.

Con: Adds complexity toestablished filters

Page 17: Policy-driven, Platform-aware Nova Scheduler

17Dell - Restricted - Confidential

Status

• Concept stage with early drafts of several specs– Imperative: json-weight– Declarative:

– New scheduler: policy-based-scheduler– New filter+weight: admin-json-filter– Modify existing flavor: flavor-policy– New Host aggregate field: host-aggregate-policy

• 3 sessions at this summit– Wednesday, 9-10:30 (Nova scheduler working session)– Wednesday, 11-11:40 (Congress Integrations session)– Wednesday, 11:45-12:30 (NFV Orchestration BoF)

Page 18: Policy-driven, Platform-aware Nova Scheduler

18Dell - Restricted - Confidential

Key Takeaways

• Contributors: 10+ companies

• Goal: Policy-driven scheduling, Service-assured resource-allocation

• Approach:

– Imperative: User describes desired hardware in policy language OR

– JSON Weight

– Declarative: User describes application; admin maps application to hardware

– Admin JSON Filter, Admin JSON Weight

– Enhance Flavor and Host Aggregates

• Weekly meeting: 8am Pacific = 1300 UTC

– Please join us!

Page 19: Policy-driven, Platform-aware Nova Scheduler

19Dell - Restricted - Confidential

Page 20: Policy-driven, Platform-aware Nova Scheduler

20Dell - Restricted - Confidential

Intel Legal Notices and Disclaimers

• Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.

• No computer system can be absolutely secure.

• Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.

• Intel, the Intel logo and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• © 2016 Intel Corporation.

Page 21: Policy-driven, Platform-aware Nova Scheduler

21Dell - Restricted - Confidential

Policy Language: JsonFilter and JsonWeight

For "low-latency" workloads:

• At least 8GB of free ram

• At least 8 free vCPUs

• NUMA awareness

[‘or’, [‘and', ['=', '$user.type', 'low-latency'],

[‘>’, ’$host.free_ram_mb’, 8*1024],

[‘>’, ’$host.vcpus_total’ - '$host.vcpus_used', 8],

[‘not’, [‘=', '$host.numa_topology', 'None']]]]

Page 22: Policy-driven, Platform-aware Nova Scheduler

22Dell - Restricted - Confidential

Policy Language: YAML based policy

parameters:

availability_zone:

type: String

label: availability zone number

description: Name of the availability zone server

should be hosted on.

affinity :

type : String

label : Affinity

description: Affinity Group Id

ram :

type : integer

label : RAM

description: Minimum RAM size required by server

instance in GB.

hard_constraints:

ram_constraint:

operation_type : min

value : { get_param : ram }

affinity_constraint:

operation_type : equals

value : { get_param : affinity }

availability_zone_constraint:

operation_type : equals

value : { get_param : availability_zone }

soft_constraints:

ram_factor:

operation_type : multiplication

value : { get_param : ram-weight}

Page 23: Policy-driven, Platform-aware Nova Scheduler

23Dell - Restricted - Confidential

Policy Language: DataLog

main(host) :-

nova:host(host),

not eliminated_host(host),

max_host_score(host, max)

eliminated_host(host) :-

nova:host(host),

request:same_hosts(vm),

not nova:deployed(vm, host)

eliminated_host(host) :-

nova:host(host),

request:different_hosts(vm),

nova:deployed(vm, host)

max_host_score(host, max(score)) :-

weight(host, score)

weight(host, ram_weight) :-

request:ram(requested_ram),

nova:host_ram(host, actual_ram),

ram_weight = actual - requested_ram / 256