Lucian Popa Minlan Yu Steven Y. Ko UC Berkeley/ICSI Princeton Univ. Princeton Univ. Sylvia Ratnasamy...

Click here to load reader

download Lucian Popa Minlan Yu Steven Y. Ko UC Berkeley/ICSI Princeton Univ. Princeton Univ. Sylvia Ratnasamy Ion Stoica Intel Labs Berkeley UC Berkeley CloudPolice:

of 74

Transcript of Lucian Popa Minlan Yu Steven Y. Ko UC Berkeley/ICSI Princeton Univ. Princeton Univ. Sylvia Ratnasamy...

  • Slide 1

Lucian Popa Minlan Yu Steven Y. Ko UC Berkeley/ICSI Princeton Univ. Princeton Univ. Sylvia Ratnasamy Ion Stoica Intel Labs Berkeley UC Berkeley CloudPolice: Taking Access Control Out of the Network Slide 2 Context Infrastructure as a Service virtualized clouds Traffic internal to cloud Hypervisor VM Slide 3 Context Cloud computing requires network access control Slide 4 Context Cloud computing requires network access control Access control policy of tenant X = what network traffic is tenant X willing to accept Tenant X Y can talk to me Tenant Y Slide 5 Why Access Control in Clouds? (1) For isolation Policy: deny incoming traffic from any other tenant Amazonia Exbay Slide 6 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Increasingly common in cloud environments Low latency and high bandwidth Ease of service composition Amazonia Exbay Slide 7 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Exbay Real-time bidding advertising Amazonia Slide 8 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Exbay Real-time bidding advertising Send information about client Amazonia Ad Network 1 Ad Network 2 Ad Networks Slide 9 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Exbay Real-time bidding advertising Receive ad bids Amazonia Ad Network 1 Ad Network 2 Ad Networks Slide 10 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Exbay Real-time bidding advertising Amazonia Return ad of highest bidder Ad Network 1 Ad Network 2 Ad Networks Slide 11 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Exbay Real-time bidding advertising Amazonia Ad Network 1 Ad Network 2 Ad Networks Policy of Exbay: allow traffic from AdNetworks, deny all other traffic Slide 12 Why Access Control in Clouds? (2) For inter-tenant & tenant-provider communication Policy: allow/deny traffic from specific tenants Other service examples: database (SimpleDB), desktop, communication (SQS), map-reduce++, Facebook, host managing, locking, etc. Exbay Amazonia Ad Network 1 Ad Network 2 Ad Networks Slide 13 Why Access Control in Clouds? (3) For inter-tenant & tenant-provider communication Policy: weighted bandwidth allocation between tenants Exbay Amazonia Ad Network 1 Ad Network 2 Ad Networks Slide 14 Why Access Control in Clouds? (3) For inter-tenant & tenant-provider communication Policy: weighted bandwidth allocation between tenants Exbay Amazonia Ad Network 1 Ad Network 2 Ad Networks Share bandwidth fairly among tenants regardless of #VM sources Nextbay Slide 15 Why Access Control in Clouds? (3) For inter-tenant & tenant-provider communication Policy: weighted bandwidth allocation between tenants Exbay Amazonia Ad Network 1 Ad Network 2 Ad Networks Other example policies: Rate-limited access Allow only locally initiated connections Nighttime access only Nextbay Slide 16 Why Access Control in Clouds? (4) DoS protection One tenant can attack another tenant Reduce bandwidth and slow down machines Attackers more powerful: higher bandwidths Barrier is lower: pay for attacking hosts (compromise credit cards instead of hosts) Exbay Ad Network 1 Ad Network 2 Ad Networks Nextbay AmazoniaX Slide 17 Hence, the problem Want access control in clouds that Is resilient to DoS Supports rich inter-tenant policies Slide 18 Hence, the problem Want access control in clouds that Is resilient to DoS Supports rich inter-tenant policies Scales 100k servers 10k tenants Slide 19 Hence, the problem Want access control in clouds that Is resilient to DoS Supports rich inter-tenant policies Scales Tolerates high dynamicity 100k VMs started per day, more than one per second Slide 20 Hence, the problem Want access control in clouds that Is resilient to DoS Supports rich inter-tenant policies Scales Tolerates high dynamicity Traditional access control mechanisms not well suited to meeting these requirements Slide 21 Existing Access Control Cloud APIs are narrow On/off No locally initiated connections, no rate-limiting, no weighted allocation Mechanisms inherited from enterprises VLANs Firewalls Slide 22 Existing Access Control Cloud APIs are narrow On/off No locally initiated connections, no rate-limiting, no weighted allocation Mechanisms inherited from enterprises VLANs Firewalls But clouds != enterprises Slide 23 Clouds != Enterprises Enterprises are not multi-tenant Few DoS concerns between departments Typically simpler policies Enterprises dont have the same dynamicity and scale 10k tenants vs. 10s departments; 1 VM/s vs. mostly static Clouds have different network designs High bisection bandwidths, multiple paths, different L2/L3 mix Many new topologies: FatTree, VL2, BCube, DCell, etc. Slide 24 VLANS not well suited for clouds Inflexible policies Difficult to scale (cloud size & dynamicity) Limited number, spanning tree Limited network designs No L3 networks, no multiple paths, inter-VLAN through router Slide 25 Firewalls not well suited for Clouds Offering DoS protection is difficult Must be applied at source hard to update Inflexible policies Scale through prefix aggregation Difficult to manage 10k tenants with multiple prefixes, different scaling requirements No L3 networks Slide 26 Recap Traditional access control is not well suited for clouds Couple access control with network operation With switching VLANs With address assignment Firewalls Slide 27 Recap Traditional access control is not well suited for clouds Couple access control with network operation With switching VLANs With address assignment Firewalls CloudPolice takes access control out of the network Slide 28 Outline Part 1 Context and Motivation Access control for clouds: why and what? Limitations of traditional mechanisms Part 2 CloudPolice Approach Operation Cloud Police Slide 29 Goal Network Access Control for Clouds that is: 1. Independent of network topology and addressing 2. Scalable (millions hosts, high churn) 3. Flexible (on/off access, rated access, fair access) 4. Robust to (internal) DDoS attacks Slide 30 CloudPolice Sufficient and advantageous to implement access control only within hypervisors Hypervisor VM Slide 31 CloudPolice Sufficient and advantageous to implement access control only within hypervisors Trusted Network independent Full software programmability flexible Close to VMs block unwanted traffic before network and help DoS Easy deployability Hypervisor VM Slide 32 CloudPolice Sufficient and advantageous to implement access control only within hypervisors Hypervisor VM CloudPolice Policy Model Group = set of tenant VMs with same access control policy Slide 33 CloudPolice Sufficient and advantageous to implement access control only within hypervisors Hypervisor VM Policy = set of Rules Rule = IF Condition THEN Action CloudPolice Policy Model Slide 34 CloudPolice Sufficient and advantageous to implement access control only within hypervisors Hypervisor VM Condition = logical expression with predicates based on: Group of sender Packet header Current time History of traffic CloudPolice Policy Model Slide 35 CloudPolice Sufficient and advantageous to implement access control only within hypervisors Hypervisor VM Action: Allow Block Rate-limit (token bucket) CloudPolice Policy Model Slide 36 CloudPolice Sufficient and advantageous to implement access control only within hypervisors Hypervisor VM Action: Allow Block Rate-limit (token bucket) CloudPolice Policy Model Applied per flow source VM source group Slide 37 CloudPolice Hypervisor-based Hypervisor VM Src. VM Hypervisor VM Dst. VM Slide 38 CloudPolice Hypervisor-based Avoid DoS and wasted resources apply policy at source Hypervisor VM Src. VM Hypervisor VM Dst. VM Slide 39 CloudPolice Hypervisor-based How to apply destinations policy at the source hypervisor? Hypervisor VM Src. VM Hypervisor VM Dst. VM Slide 40 CloudPolice Hypervisor-based Centralized policy repository? Hypervisor VM Src. VM Hypervisor VM Dst. VM Slide 41 CloudPolice Hypervisor-based Centralized policy repository? Hypervisor VM Src. VM Allow? Hypervisor VM Dst. VM Slide 42 CloudPolice Hypervisor-based Centralized policy repository? Centralized service requires high availability and throughput 100k servers and 10 new flows/VM/s 1M decisions/s on average! Caching can be ineffective (random patterns, malicious pollution) Centralized service can be a DoS target Hypervisor VM Src. VM Allow? Hypervisor VM Dst. VM Slide 43 CloudPolice Hypervisor-based Decentralized Hypervisor VM Src. VM Hypervisor VM Dst. VM Slide 44 CloudPolice Hypervisor-based Decentralized Distribute all policies to all hypervisors? Hypervisor VM Src. VM Hypervisor VM Dst. VM Slide 45 CloudPolice Hypervisor-based Decentralized Distribute all policies to all hypervisors? Hypervisor VM Src. VM Hypervisor VM Dst. VM Allow? Slide 46 CloudPolice Hypervisor-based Decentralized Distribute all policies to all hypervisors? Too heavyweight if network independent Full group membership required; Group updates propagated everywhere 100k new VMs/day, 100k servers 100k updates/s on average Hypervisor VM Src. VM Hypervisor VM Dst. VM Slide 47 CloudPolice Hypervisor-based Decentralized Apply at destination and enforce at source Hypervisor VM Src. VM Hypervisor VM Dst. VM Apply destinations policy Slide 48 CloudPolice Hypervisor-based Decentralized Apply at destination and enforce at source Hypervisor VM Src. VM Hypervisor VM Dst. VM Enforce policys action Slide 49 Inspired by Internet Research Internet solutions to DDoS Push-back filters [AIP, Pushback, AITF, StopIt] Network Capabilities [SIFF, TVA] Handle large and dynamic networks, millions of users Slide 50 Inspired by Internet Research Internet solutions to DDoS Push-back filters [AIP, Pushback, AITF, StopIt] Network Capabilities [SIFF, TVA] Handle large and dynamic networks, millions of users More easily deployed: Clouds != Internet Clouds are controlled environments Both communication endpoints can be controlled Single administrative domain New tools: trusted software layer Hypervisor Slide 51 Outline Part 1 Context and Motivation Access control for clouds: why and what? Limitations of traditional mechanisms Part 2 CloudPolice Approach Operation Cloud Police Slide 52 CloudPolice Hypervisor XYZ Policies for X, Y and Z CloudPolice Each hypervisor needs to know for hosted VMs: group and policy Xs group policy: IF group = A allow IF group = B block IF group = C & port = 80 rate-limit to 100Mbps Ys group policy: Zs group policy: IF Policy could also be specified / updated by VM Installed by provider service that starts VMs Slide 53 CloudPolice Hypervisor XYZ Filter for incoming/outgoing flows Slide 54 CloudPolice Hypervisor XYZ ABC Start flow to C Z group CloudPolice inserts control packet containing group of Z and first packet header Slide 55 CloudPolice Hypervisor XYZ ABC CloudPolice verifies policy of destination VM If allowed, packets are forwarded to destination VM Block/rate-limit If blocked or rate limited, send control packet to source hypervisor to block or rate-limit source (flow/VM) Z group Soft-state and timeouts handle policy invalidations and packet losses Slide 56 Scalability CloudPolice takes the best of both worlds Centralized vs. every server stores all policies Load spread across all servers Maintaining and enforcing policies Update propagation is contained Group membership updates not propagated Policy updates propagated only to group Slide 57 Scalability CloudPoliceCentralizedAll to all Max LoadO(1)O(N)O(1) Group updateO(1) O(N) Policy updateO(|Group|)O(1)O(N) Slide 58 Security Analysis Sketch Attackers VMs corrupted or paid by malicious tenants Attacks considered Violate access control policies to reach destination DoS with unauthorized traffic DoS with authorized traffic Assumptions Hypervisors not compromised Slide 59 Security Analysis Sketch Violate access control policies to reach destination Policy distributed securely to hypervisor Control packets cannot be spoofed, only sent by hypervisors Hypervisor XYZ Fake group Slide 60 Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic Control packets block unauthorized traffic at source Slide 61 Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic Control packets block unauthorized traffic at source Attackers attempt to cause drops of control packets Block/rate-limit Slide 62 Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic Control packets block unauthorized traffic at source Attackers attempt to cause drops of control packets Retry or prioritize control packets Slide 63 Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic DoS with authorized traffic Also need performance isolation for full protection Congestion Slide 64 Security Analysis Sketch Violate access control policies to reach destination DoS with unauthorized traffic DoS with authorized traffic Also need performance isolation for full protection CloudPolice can implement some performance isolation rate-limit Rate-limit to fair share of destination link Share access link evenly between destination VMs Slide 65 Future Work Implement CloudPolice prototype Extend CloudPolice Policies with application-level semantics (dynamic policies) Policies based on group-wide state Beyond access control? More flexible actions, e.g., send to middlebox Performance isolation framework Slide 66 Summary Access control in cloud computing requires new mechanisms and extended policies CloudPolice Takes advantage of trusted hypervisors Inspired by past work on Internet DDoS protection Properties Network independent Scalable Flexible Robust to (internal) DDoS attacks Slide 67 Backup Slides Slide 68 Related Work OpenFlow & (Onix | Difane) & OpenVSwitch Decisions not based on logical identifier (group/tenant) Onix only isolation framework OpenFlow actions designed for switches (e.g., currently cant rate-limit) Require scaling central controller Vs. software update for CloudPolice Slide 69 Contributions Identify that new access control mechanism is needed in clouds Pinpoint the challenges and requirements Identify that access control should be done in hypervisors Propose CloudPolice, mechanism that satisfies requirements Slide 70 Compromise Single Hypervisor Can prevent compromised hypervisors from violating security policies 1. Security credentials associated with group identifier Cannot be sent if unknown (known only for hosted VMs) E.g., group ID has key in name 2. Prevent spoofed control packets in the network Like IP anti-spoofing in switches/routers Slide 71 Todays Cloud Mechanisms? Solutions not public Could be similar to our solution Could provide fewer properties API is narrow On/off between groups No locally initiated connections, no rate-limiting, no weighted allocation Slide 72 Feasibility Working on implementing CloudPolice prototype Fast path act on per flow state Open VSwitch and software routers [RouteBricks, PacketShader] suggest this is feasible Slow path execute policy and install flow state 1/N of requirements for centralized repository Few hosted VMs dominated by policy complexity Software router applications suggest if-then-else structures can be parsed fast [RBF] Slide 73 Other Related Work VL2s approach if it would be applied to hypervisors Centralized repository Can violate policies if IP of destination known Slide 74 Firewalls not Suited for Clouds Not well suited against DoS Must be applied at source hard to update Inflexible policies for clouds Scaling & network designs With no prefix aggregation Difficult to scale (100k+ entries) Needs updating on all VM starts (more than once/s) With prefix aggregation Complex to manage 10k tenants with multiple prefixes, different scaling requirements No L3 networks