Overload - Brown University

Overload

Qingyi Lu & Ke Ding

Outline

● Overview of overload● Motivation (mono → micro)● Common Problem● Difference ()● Wechat● Azure● ATOM

Overview of Overload

● Overload: System’s workload exceeds the maximum processing capacity of the system.

● Reasons: - Excessive visits

- Bottlenecks and failures within the system

- Backend failure and delay

Overview of Overload

● Problems: - CPU and memory could reach the bottleneck

- System’s ability to response could slow down

- System processing capacity could sharply fall down

● Common solutions: load balancing, flow control, monitoring, etc.

Overload Control of Monolith and Microservices

● Monolith:

A small number of service components with trivial dependencies.

● Microservices:

Increasingly complex in the architecture and dependency.

- All microservices must be monitored

- Hard to handle overload independently

- Need to adapt the service changes, workload dynamics and external environment

Overload Control in Practice

● Overload Control for Scaling WeChat Microservices

Complex core Service architecture

● Azure reponses for Covid19

Emergency solution

● ATOM: Model-Driven Autoscaling for Microservices

CPU & Replica

Overload Control for Scaling WeChat Microservices

Observation

1. WeChat’s microservice architecture

Complex dependency of services

2. Deployment of WeChat Services

Centralized or SLA-based overload control mechanism could not support highly rapid service changes at large scale

3. Dynamic Workload

Overload control mechanism should adaptively tolerate the workload fluctuation

Overload Scenarios

Subsequent Overload

Challenge & Insight

● No single entry point for services request and with complex call path

● Excessive request aborts waste the computational resources

● Excessive request affects user experience (due to the high latency of service response)

● Service Agnostic: Decoupling and Dynamic Development

● Independent but Collaborative: Granule and Subsequent Overload

● Efficient and Fair: Partial Failures

Design

1. How to detect overload: Overload Detection2. How to control the overload: Service Admission Control

Overload Detection

By average waiting time of requests in the pending queue (queuing time).

Why do not use response time?

Service Admission Control

1. Business-oriented Admission Control2. User-oriented Admission Control3. Session-oriented Admission Control4. Adaptive Admission Control5. Collaborative Admission Control

Business-oriented Admission Control

● Prioritized based on their business significance● Subsequent requests inherit the same business priority● Advantages:

- Service agnostic: business priority is independent to the business logic of any service

- Easy to maintain: business priority is assigned in the entry services & reflect the changes of basic and leap services

User-oriented Admission Control

Example:

Current business is T

Overload detected

Level change to T-1 partially failure

System underloaded

Level set back to T

Other Admission Control

● Session-oriented Admission Control:

Based on Session ID

● Adaptive Admission Control:

Adapt to the load status changes to minimize impact on the quality of the overall service

● Collaborative Admission Control:

Learn the latest admission level of the downstream server

Service Admission Control Workflow

Evaluation - Queuing time vs. Response time.

Evaluation - Difference Types

Lesson learned

● Overload control in the large-scale microservice architecture must be decentralized and autonomous in each service

● The algorithmic design of overload control should take into account a variety of feedback mechanisms

● An effective design of overload control is always derived from the comprehensive profiling of the processing behavior in the actual workload

Azure reponses for Covid19

Observations & Insight

Observation:

- Increasing large amount numbers of work from home, remote learning, stay connected with friends online

- Impact on healthcare: using huge amount of data to analyze virus

Insight:

- Help people adapt to this new world- Prioritize critical customers: doctor and nurse in hospital, emergency

management service, critical government infrastructure

For Goods Program

● Guiding principles: do no harm, outcome driven, unique value to affect outcomes, opening collaborative

● Example:

Response Framework

● Meet Demand

Address capacity in the hardest regions & scale up

● Forecast

Well prepared for the potential case

● Optimize

Optimizer services

Network

Incredible growth in VPN and WAN usage

- Wan scaling:

12 new edge sites

25% increased peering capacity

100+ terabits

Network

- Wan traffic optimization: load balance the traffics

Services on Azure - Teams

Services on Azure - Windows Virtual Desktop

Service scale out:

- More gateways & front ends per cluster- Additional clusters per region- Deployed to more regions for best performance- More regions coming for data residency

Optimization:

- Fine-tuned database indexes- Created client-side cache + read-only replicas- Rebalanced traffic routing for nearby regions

Azure Security

● Azure Active Directory● Application Proxy- Adjusted capacity: monitoring and alerting

- Increased scale unit: availability across regions

- Provided higher throttling: limits to customers

Confidential computing on AzureTrusted Execution Environment

Example:

ATOM: Model-Driven Autoscaling for Microservices

Observation and Challenge

1. Rule based auto scaling provides difference performance gains based on current workload. Vertical stands for CPU and horizontal stands for replica.

2. Previous method focus more on either vertical or horizontal rather than both

Insight

1. Estimate performance2. Auto-scale by changing CPU and replicas, which combines horizontal and

vertical scaling

Layered Queueing Network


1. In previous, they apply Utilization Technique as the feature for Least Square to estimate service demand. U = XD

2. Now we take the Queue Length Technique of response time and queue length. R = LD

ATOM: Autoscaling Microservice

1. Maximize the revenue of transactions2. MAPE-K (monitor, analyse, plan and execute with a shared knowledge base)

ATOM Algo Detail

ATOM: Autoscaling Microservice

1. UH: Horizontal scaling2. UV: Vertical scaling

Lesson learned & future direction

Lesson: Machine Learning helps us extract useful information to get a better strategies with high likelihood. Finding relevant features to decide auto-scaling is helpful.

Future direction: Can we move the offline training to online sequential training?

Comparisons - Similarities

● Same overview of Designs

- Detect overhead or performance

- Derive the solution

Comparisons - Difference

● DAGOR (WeChat)

Based more on ruled base method

● ATOM

Based more on machine learning method and feature method

● Azure

Based more on meeting demand first, forecast and then optimize

Questions from students

● Some requests also don't need everything to succeed (e.g. can send an OK, then resolve later). Is that something that can be handled?

● Is it possible to obtain the collection of services required to complete certain requests? In case a critical service is overloaded and thus has an admission level higher than the priority of the request, can we reject it at the entry service?

● Why are the default timeout and queueing time not even changed or optimized?

Overload - Brown University

Documents

Transcript of Overload - Brown University