PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED...

27
PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie Zhang Google Inc. Hong Tang Yahoo! Jiesheng Wu Microsoft Tao Yang UC Santa Barbara

Transcript of PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED...

Page 1: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY

DSN 2010

Jingyu ZhouShanghai Jiao Tong

Univ

Caijie ZhangGoogle Inc.

Hong TangYahoo!

Jiesheng WuMicrosoft

Tao YangUC Santa Barbara

Page 2: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Backgrounds

Many large-scale data-mining and offline applications in Google, Yahoo, Microsoft, Ask.com, etc. require High data parallelism/throughput Data persistence. But not so stringent

availability E.g., URL property service (UPS) at

Ask.com search offline mining platform Hundreds of app. modules access UPS

Page 3: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Examples of high-throughput data services for web mining/search

Internet Web documents

CrawlerCrawlerCrawler

Data mining job

Document DB

Document DB

Document DB

Data mining job

Data mining job…

Data/infoservice

Data/ infoservice

10-50 billion URLs

e.g. URL propertyservice. 100K-500K/s

Page 4: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Existing Approaches for High-performance and Persistence

Database systems suffer from high overhead, limits its

performance while supporting general features

Need more machine resources Related work and well-known techniques

for high availability Data replication Log-based recovery Checkpointing

Page 5: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Challenges and Focus of this work

System design with careful selection and integration of fault-tolerant techniques for high throughput computing. Trade off in availability, but allow some down

time. Low cost: logging/checkpoint.

Fine-grain for minimum service disruption. Local data recovery. Periodic remote backup.

Programming support Lightweight, simplifying construction of robust

data services.

Page 6: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

SLACH: Selective Logging & Adaptive CHeckpointing

Targeted data services Request-driven thread model. In-memory objects. Data independence.

Similar to key-value stores in BigTable/Dynamo, but higher throughput.

Page 7: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Architecture of SLACH

Page 8: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Main Techniques

Selective operation logging Only log write operations

(oid, op_type, parameters, timestamp) Write-ahead log, i.e., write then apply operations

Object-level checkpoint to avoid service disruptions with adaptive load control Ckpt objects one-by-one. Still allow concurrent

access of other objects Perform checkpointing when load is low to

amortize cost of checkpointing Light weight API while supporting legacy code.

Page 9: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Object-level Checkpoints

Page 10: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Adaptive Checkpointing Control Goal is to balance ckpt. cost and

recovery speed Ckpt. less frequently-> larger logs ->

lengthy recovery Ckpt. too often -> higher overhead

Ideally, High server load -> ckpt. less frequently Low server load -> ckpt. more frequently

Adjust between a Low Watermark (LW) & High Watermark (HW) of service loads

Loadcurr = α×loadprev+(1-α)×sample

Page 11: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Adaptive Checkpointing Frequency Ckpt. threshold between LB and UB

LB, UB are log size parameters, determined by app.

where

Threshold LB F(load) (UB LB),

Page 12: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

SLACH Programming Support Application developers

Call SLACH function log() to log an object operation Define 3 callback functions:

1) what to checkpoint (call SLACH’s ckpt() for each selected object,

2) recover one object from a checkpoint, 3) replay a log operation.

SLACH Provide functions log() and ckpt(). Call user’s checkpoint callback fun during checkpoint. Call a user’s recover function during checkpoint recover. Call a user’s replay function when recovering from a log.

Page 13: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

SLACH API for Applications

class SLACH::API {

public:

/* register ckpt. policy and parameters */

void register_policy(const Policy& p);

/* log one write operation */

void log(int64_t obj_id, int op, ...);

/* checkpoint one object */

void ckpt(int64_t obj_id, const void* addr, uint32_t size);

};

Page 14: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

SLACH Interface

class SLACH::Application {

protected:

/* application checkpoint callback function */

virtual void ckpt_callback()=0;

/* callback of loading one object checkpoint*/

virtual void load_one_callback(int64_t obj_id, const void *addr,uint32_t size)=0;

/* callback of replaying one operation log */

virtual void replay_one_callback(int64_t obj_id, int op, const para_vec& args)=0;

};

Page 15: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

An Example: Application-level codestruct Item {

double price;

int quantity;

};

class MyService : public SLACH::Application {

private:

Item obj[1000];

SLACH::API slach_; /* SLACH API */

static const int OP_PRICE=0;/* an op type */

public:

void update_price(int id, double p) {

slach_.log(id, OP_PRICE, &p, sizeof(p));

obj[id].price = p;

}

Application objects being accessed

Log selected object update operation

Page 16: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

An Example: Call-back functions

void ckpt_callback() {

for (int i=0; i<1000 ; i++)

slach_.ckpt(i, &obj[i], sizeof(obj[i]));

}

void load_one_callback(int64_t id, const void *p, uint32_t size) {

memcpy(&obj[id], p, size);

}

void replay_one_callback(int64_t id, int op, const para_vec& args) {

switch (op) {

case OP_PRICE:

obj[id].price = *(double*)args[0].second;

break;

// ...

}

}

};

SLACH calls this user function during checkpointing.

SLACH calls this when recovering an object from a checkpoint .

SLACH calls this when recovering an object by

log replaying

Page 17: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

SLACH Implementation and Applications

Part of Ask.com middleware infrastructure in C++ for data mining and search offline platform

Application samples: UPS (URL property service) for recording property

of all URLs crawled/collected. HIS (Host information service) for recording

property of all hosts crawled on the web. 20-80% of write traffic. Running on a cluster of

hundreds of machines. In production for last 3 years.

Significantly reduced development time (1-2 months vs. few days).

Page 18: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Characteristics of UPS/HIS

Perfor. characteristics of UPS/HIS per partition.

Parameters for adaptive ckpt. Control

Data Max. Read

Max. Write

UPS 1.9GB 110K Req/s 56K Req/s

HIS 2.1GB 58K Req/s 16K Req/s

UPS HIS

α Moving avg.

0.8 0.8

LB/UB low/upper b.

1M-8M entries

0.3M-1.8M

LW/HW L/H watermark

20%-85% 35%-85%

β Scaling 3 6

w Sampling win.

5s 5s

Page 19: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Evaluation

Impact of logging overhead System behavior during checkpointing Effectiveness of adaptive checkpoint

control Performance comparison of hash table

implementation using SLACH and BerkeleyDB

Page 20: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Evaluation Setting

Benchmarks UPS (URL property service) HIS (Host-level property service) Persistent Hash Table (PHT)

Metric: throughput loss percent

Hardware: a 15 node cluster, gigabit link

LossPercent (1SuccessfulRequests

TotalRequests) 100

Page 21: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Selective Logging Overhead of UPS

• Base: logging is disabled

• Log: selective logging is enabled

Negligible impact whenserver load < 40%.

Page 22: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

System Performance During Checkpointing (100% server load)

During ckpt, 8.9% throughput drop

During ckpt, 57.6% increase of response time

Page 23: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Effectiveness of Adaptive Threshold Controller – Performance Comparison in UPS

• Fixed threshold policy, 8M has lower runtime overhead – less frequent ckpt• Adaptive approach has comparable performance as fixed policy of 8M.

Page 24: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Effectiveness of Threshold Controller – Recovery Speed

• Fixed threshold -> fixed log size -> same recovery time• Adaptive approach: small log for light load (less recovery time), large log for higher load (more recovery time)

Page 25: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

PHT vs. Berkeley DB

30-B value, SLACH is 5.3 times higher

SLACH is better for all value sizes, because1. BDB incurs more per-operation

overhead2. BDB involves more disk I/Os

SLACH ckpt has less overhead1. BDB ckpt is not async2. SLACH fuzzy ckpt still allow

access

Page 26: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Conclusions

SLACH contributions A lightweight programming framework for very high-

throughput, persistent data services Simplify application construction while meeting reliability

demands Selective logging to enhance performance

System design with careful integration of multiple techniques Dynamic adjust ckpt. frequency to meet throughput

demands Fine-grained ckpt without service disruptions

Evaluation of integrated scheme in production applications.

Page 27: PROGRAMMING SUPPORT AND ADAPTIVE CHECKPOINTING FOR HIGH-THROUGHPUT DATA SERVICES WITH LOG-BASED RECOVERY DSN 2010 Jingyu Zhou Shanghai Jiao Tong Univ Caijie.

Data and Failure Models

Data independence and object-oriented access model Key-value store as in Dynamo/BigTable, but with

much higher throughput demand per machine Each object is a continuous memory block

Middleware infrastructure can handle noncontiguous ones

Fail-stop Focus on local recovery due to app. failures OS/Hardware failure can be dealt with remote ckpt.

Implemented, but not the scope of this paper