lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale…...

13
11/30/11 1 (Private) Cloud Computing with Mesos at Twi9er Benjamin Hindman @benh what is cloud computing? self-service scalable economic elastic virtualized managed utility pay-as-you-go what is cloud computing? “cloud” refers to large Internet services running on 10,000s of machines (Amazon, Google, Microsoft, etc) “cloud computing” refers to services by these companies that let external customers rent cycles and storage Amazon EC2: virtual machines at 8.5¢/hour, billed hourly Amazon S3: storage at 15¢/GB/month Google AppEngine: free up to a certain quota Windows Azure: higherlevel than EC2, applications use API what is cloud computing? cheap nodes, commodity networking selfservice (use personal credit card) and payas yougo virtualization from colocation, to hosting providers running the web server, the database, etc and having you just FTP your files … now you do all that yourself again! economic incentives provider: sell unused resources customer: no upfront capital costs building data centers, buying servers, etc

Transcript of lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale…...

Page 1: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

1

(Private)  Cloud  Computing  with  Mesos  at  Twi9er

Benjamin  Hindman @benh

what  is  cloud  computing?

self-service scalable

economic

elastic

virtualized

managed

utility

pay-as-you-go

what  is  cloud  computing? •  “cloud”  refers  to  large  Internet  services  running  on  10,000s  of  

machines  (Amazon,  Google,  Microsoft,  etc)

•  “cloud  computing”  refers  to  services  by  these  companies  that  let  external  customers  rent  cycles  and  storage –  Amazon  EC2:  virtual  machines  at  8.5¢/hour,  billed  hourly –  Amazon  S3:  storage  at  15¢/GB/month –  Google  AppEngine:  free  up  to  a  certain  quota –  Windows  Azure:  higher-­‐‑level  than  EC2,  applications  use  API

what  is  cloud  computing? •  cheap  nodes,  commodity  networking

•  self-­‐‑service  (use  personal  credit  card)  and  pay-­‐‑as-­‐‑you-­‐‑go

•  virtualization –  from  co-­‐‑location,  to  hosting  providers  running  the  web  server,  the  database,  etc  and  having  you  just  FTP  your  files  …  now  you  do  all  that  yourself  again!

•  economic  incentives – provider:  sell  unused  resources –  customer:  no  upfront  capital  costs  building  data  centers,  buying  servers,  etc

Page 2: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

2

“cloud  computing”

•  infinite  scale  …

“cloud  computing”

•  always  available  …

challenges  in  the  cloud  environment

•  cheap  nodes  fail,  especially  when  you  have  many – mean  time  between  failures  for  1  node  =  3  years – mean  time  between  failures  for  1000  nodes  =  1  day –  solution:  new  programming  models  (especially  those  where  you  can  efficiently  “build-­‐‑in”  fault-­‐‑tolerance)

•  commodity  network  =  low  bandwidth –  solution:  push  computation  to  the  data

moving  target

infrastructure  as  a  service  (virtual  machines)          è  software/platforms  as  a  service   why? •  programming  with  failures  is  hard •  managing  lots  of  machines  is  hard

Page 3: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

3

moving  target

infrastructure  as  a  service  (virtual  machines)          è  software/platforms  as  a  service   why? •  programming  with  failures  is  hard •  managing  lots  of  machines  is  hard

programming  with  failures  is  hard

•  analogy:  concurrency/parallelism –  imagine  programming  with  threads  that  randomly  stop  executing

– can  you  reliably  detect  and  differentiate  failures?

•  analogy:  synchronization –  imagine  programming  where  communicating  between  threads  might  fail  (or  worse,  take  a  very  long  time)

– how  might  you  change  your  code?

problem:  distributed  systems  are  hard

solution:  abstractions  (higher-­‐‑level  

frameworks)

Page 4: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

4

MapReduce

•  Restricted  data-­‐‑parallel  programming  model  for  clusters  (automatic  fault-­‐‑tolerance)

•  Pioneered  by  Google – Processes  20  PB  of  data  per  day

•  Popularized  by  Apache  Hadoop  project – Used  by  Yahoo!,  Facebook,  Twi9er,  …

beyond  MapReduce

•  many  other  frameworks  follow  MapReduce’s  example  of  restricting  the  programming  model  for  efficient  execution  on  clusters – Dryad  (Microsoft):  general  DAG  of  tasks – Pregel  (Google):  bulk  synchronous  processing – Percolator  (Google):  incremental  computation – S4  (Yahoo!):  streaming  computation – Piccolo  (NYU):  shared  in-­‐‑memory  state – DryadLINQ  (Microsoft):  language  integration – Spark  (Berkeley):  resilient  distributed  datasets

everything  else

•  web  servers  (apache,  nginx,  etc) •  application  servers  (rails) •  databases  and  key-­‐‑value  stores  (mysql,  cassandra)

•  caches  (memcached) •  all  our  own  twi9er  specific  services  …

managing  lots  of  machines  is  hard

•  ge9ing  efficient  use  of  out  a  machine  is  non-­‐‑trivial  (even  if  you’re  using  virtual  machines,  you  still  want  to  get  as  much  performance  as  possible)

nginx Hadoop nginx Hadoo

p

Page 5: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

5

managing  lots  of  machines  is  hard

•  ge9ing  efficient  use  of  out  a  machine  is  non-­‐‑trivial  (even  if  you’re  using  virtual  machines,  you  still  want  to  get  as  much  performance  as  possible)

nginx Hadoop

problem:  lots  of  frameworks  and  services  

…  how  should  we  allocate  resources  (i.e.,  parts  of  a  

machine)  to  each?  

idea:  can  we  treat  the  datacenter  as  one  big  computer  and  multiplex  

applications  and  services  across  available  machine  resources?

solution:  mesos

•  common  resource  sharing  layer   – abstracts  resources  for  frameworks

nginx Hadoop Mesos  nginx Hadoo

p

uniprograming   mul/programing  

Page 6: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

6

twi9er  and  the  cloud

•  owns  private  datacenters  (not  a  consumer) – commodity  machines,  commodity  networks

•  not  selling  excess  capacity  to  third  parties  (not  a  provider)

•  has  lots  of  services  (especially  new  ones) •  has  lots  of  programmers •  wants  to  reduce  CAPEX  and  OPEX

twi9er  and  mesos

•  use  mesos  to  get  cloud  like  properties  from  datacenter  (private  cloud)  to  enable  “self-­‐‑service”  for  engineers

(but  without  virtual  machines)

computa/on  model:  frameworks  •  A  framework  (e.g.,  Hadoop,  MPI)  manages  one  or  more  jobs  in  a  computer  cluster  

•  A  job  consists  of  one  or  more  tasks  •  A  task  (e.g.,  map,  reduce)  is  implemented  by  one  or  more  processes  running  on  a  single  machine  

 

23

cluster  

Framework Scheduler  (e.g.,  Job  Tracker)

Executor  (e.g.,  Task    Tracker)  

Executor  (e.g.,  Task  Traker)  

Executor  (e.g.,  Task  Tracker)  

Executor    (e.g.,  Task  Tracker)  

task  1 task  5

task  3 task  7 task  4

task  2 task  6

Job  1:  tasks  1,  2,  3,  4  Job  2:  tasks  5,  6,  7  

two-­‐level  scheduling  

•  Advantages:    – Simple  à  easier  to  scale  and  make  resilient  – Easy  to  port  exis/ng  frameworks,  support    new  ones  

•  Disadvantages:    – Distributed  scheduling  decision  à  not  op/mal   24

Mesos  Master  

Organization   policies

Resource   availability

Framework scheduler

Task schedule

Fwk schedule

Framework scheduler Framework  Scheduler  

Page 7: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

7

resource  offers  

•  Unit  of  alloca/on:  resource  offer    – Vector  of  available  resources  on  a  node  –   E.g.,    node1:  <1CPU,  1GB>,  node2:  <4CPU,  16GB>    

•  Master  sends  resource  offers  to  frameworks  

•  Frameworks  select  which  offers  to  accept  and  which  tasks  to  run  

25

Push  task  scheduling  to  frameworks

Hadoop    JobTracker  

MPI    JobTracker  

8CPU,  8GB  

Hadoop  Executor  

MPI  executor  

task  1  

task  1  

8CPU,  16GB  

16CPU,  16GB  

Hadoop  Executor  task  2  

Alloca/on  Module  

S1   <8CPU,8GB>  S2   <8CPU,16GB>  S3   <16CPU,16GB>  

S1   <6CPU,4GB>  S2   <4CPU,12GB>  S1   <2CPU,2GB>  

Mesos  Architecture:  Example  

26

(S1:<8CPU,  

8GB>,

 S2:<8CPU,  

16GB>)

(task1:[S1

:<2CPU,

4GB>];

 task2:[S2

:<4CPU,

4GB>])

S1:<8CPU,8GB>

S2:<8CPU,16GB>

S3:<16C

PU,

16GB>

Slaves  continuously  send  status  updates  about  resources

Pluggable  scheduler  to pick  framework  to   send  an  offer  to

Framework  scheduler  selects  resources  and  

provides  tasks

Framework  executors  launch  tasks  and  may  persist  across  tasks

task  1:<2CPU,4GB>

task  2:<4CPU,4GB>

(S1:<6CPU,4GB>,    

S3:<16CPU,16GB>)

([task1:S1:<4CPU,2GB])

Slave  S1  

Slave  S2  

Slave  S3  

Mesos  Master  task1:<4CPU,2GB>  

twi9er  applications/services

if  you  build  it  …  they  will  come

let’s  build  a  url  shortner  (t.co)!

development  lifecycle

1.  gather  requirements

2. write  a  bullet-­‐‑proof  service  (server) ‣  load  test ‣  capacity  plan ‣  allocate  &  configure  machines ‣  package  artifacts ‣  write  deploy  scripts ‣  setup  monitoring ‣  other  boring  stuff  (e.g.,  sarbanes-­‐‑oxley)

3.  resume  reading  timeline  (waiting  for  machines  to  get  allocated)

Page 8: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

8

development  lifecycle  with  mesos

1.  gather  requirements

2. write  a  bullet-­‐‑proof  service  (server) ‣  load  test ‣  capacity  plan ‣  allocate  &  configure  machines ‣  package  artifacts ‣  write  deploy  configuration  scripts ‣  setup  monitoring ‣  other  boring  stuff  (e.g.,  sarbanes-­‐‑oxley)

3.  resume  reading  timeline

t.co

•  launch  on  mesos! CRUD  via  command  line: $ scheduler create t_co t_co.mesos

Creating job t_co

OK (4 tasks pending for job t_co)

t.co

•  launch  on  mesos! CRUD  via  command  line: $ scheduler create t_co t_co.mesos

Creating job t_co

OK (4 tasks pending for job t_co)

tasks represent shards

t.co

cluster  

Scheduler  

Executor   Executor  

Executor  Executor    

task  1 task  5

task  3 task  7 task  4

task  2 task  6

$ scheduler create t_co t_co.mesos

Page 9: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

9

t.co

•  is  it  running?  (“top”  via  a  browser)

what  it  means  for  devs?

•  write  your  service  to  be  run  anywhere  in  the  cluster

•  anticipate  ‘kill  -­‐‑9’ •  treat  local  disk  like  /tmp

bad  practices  avoided

•  machines  fail;  force  programmers  to  focus  on  shared-­‐‑nothing  (stateless)  service  shards  and  clusters,  not  machines – hard-­‐‑coded  machine  names  (IPs)  considered  harmful

– manually  installed  packages/files  considered  harmful

– using  the  local  filesystem  for  persistent  data  considered  harmful

level  of  indirection  #ftw

nginx t.co

Mesos  

@DEVOPS_BORAT

Need  replace  server!

Page 10: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

10

level  of  indirection  #ftw

nginx t.co

Mesos  

@DEVOPS_BORAT

Need  replace  server!

level  of  indirection  #ftw

example  from  operating  systems?

isolation

Executor  task  1 task  5

what happens when task 5 executes: while (true) {}

isolation

•  leverage  linux  kernel  containers

task  1  (t.co) task 2 (nginx)

CPU

CPU

CPU RAM RAM

container 1 container 2

Page 11: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

11

software  dependencies

1.  package  everything  into  a  single  artifact 2.  download  it  when  you  run  your  task

(might  be  a  bit  expensive  for  some  services,  working  on  next  generation  solution)

t.co  +  malware what  if  a  user  clicks  a  link  that  takes  them  some  place  bad?

let’s  check  for  malware!

t.co  +  malware

•  a  malware  service  already  exists  …  but  how  do  we  use  it?

cluster  

Scheduler  

Executor   Executor  

Executor  Executor    

task  1 task  5

task  3 task  1 task  4

task  2 task  6

t.co  +  malware

•  a  malware  service  already  exists  …  but  how  do  we  use  it?

cluster  

Scheduler  

Executor   Executor  

Executor  Executor    

task  1 task  5

task  3 task  1 task  4

task  2 task  6

Page 12: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

12

t.co  +  malware

•  a  malware  service  already  exists  …  but  how  do  we  use  it?

cluster  

Scheduler  

Executor   Executor  

Executor  Executor    

task  1 task  5

task  3 task  1 task  4

task  2 task  6

how  do  we  name  the  malware  service?

naming  part  1

‣  service  discovery  via  ZooKeeper  

‣     zookeeper.apache.org

‣  servers  register,  clients  discover

‣  we  have  a  Java  library  for  this

‣     twi9er.github.com/commons

naming  part  2

‣  naïve  clients  via  proxy

naming

•  PIDs •  /var/local/myapp/pid

Page 13: lec25-cloud computing with mesos€¦ · 11/30/11 2 “cloudcomputing:” • infinitescale… “cloudcomputing:” • alwaysavailable… challengesinthecloudenvironment: • cheapnodesfail

11/30/11

13

t.co  +  malware

•  okay,  now  for  a  redeploy!  (CRUD)

$ scheduler update t_co t_co.config

Updating job t_co

Restarting shards ... Getting status ...

Failed Shards = []

...

rolling  updates  …

Updater {Job, Update Config}

Restart Shards Healthy?

Rollback

Complete?

Finish {Success, Failure}

Yes

Yes No

No

datacenter  operating  system

     Mesos +  Twi9er  specific  scheduler +  service  proxy  (naming) +  updater +  dependency  manager datacenter  operating  system  (private  cloud)

Thanks!