Expect the unexpected: Prepare for failures in microservices

67
Expect the unexpected: Anticipate and prepare for failures in micro services Bhak& Mehta @bhak&_mehta

Transcript of Expect the unexpected: Prepare for failures in microservices

Page 1: Expect the unexpected: Prepare for failures in microservices

Expect the unexpected: Anticipate and prepare for failures

in micro services  Bhak&  Mehta  

@bhak&_mehta  

Page 2: Expect the unexpected: Prepare for failures in microservices

Introduc&on  

•  Senior  So7ware  Engineer  at  Blue  Jeans  Network  

•  Worked  at  Sun  Microsystems/Oracle  for  13  years  

•  CommiGer  to  numerous  open  source  projects  including  GlassFish  Applica&on  Server  

Page 3: Expect the unexpected: Prepare for failures in microservices

My  recent  book  

Page 4: Expect the unexpected: Prepare for failures in microservices

Previous  book  

Page 5: Expect the unexpected: Prepare for failures in microservices

Blue  Jeans  Network  

Page 6: Expect the unexpected: Prepare for failures in microservices

Blue  Jeans  Network  

•  Video  conferencing  in  the  cloud  •  Customers  in  all  segments  •  Millions  of  users  •  Interoperable  •  Video  sharing,  Content  sharing  •  Mobile  friendly  •  Solu&ons  for  large  scale  events  

Page 7: Expect the unexpected: Prepare for failures in microservices

What  you  will  learn  

•  Microservices  architecture  •  Challenges  at  scale  •  Lessons  learned,  &ps  and  prac&ces  to  prevent  cascading  failures  

•  Resilience  planning  at  various  stages    •  Real  world  examples  

Page 8: Expect the unexpected: Prepare for failures in microservices

Customer B

Top  level  architecture    

INTERNET

Customer A

SIP, H.323

HTTP / HTTPS

Media Node

Web  Server  

Middleware  services  

Cache  

Service discovery

Messaging

 DB  

Proxy  layer  

Connector  Node  

Page 9: Expect the unexpected: Prepare for failures in microservices

Micro  services  architecture  

Page 10: Expect the unexpected: Prepare for failures in microservices

Path  to  Micro  services  

•  Advantages  – Simplicity  –  Isola&on  of  problems  – Scale  up  and  scale  down  – Easy  deployment  – Clear  separa&on  of  concerns  – Heterogeneity  and  polyglo&sm  

Page 11: Expect the unexpected: Prepare for failures in microservices

Microservices  

•  Disadvantages  – Not  a  free  lunch!  – Distributed  systems  prone  to  failures  – Eventual  consistency  – More  effort  in  terms  of  deployments,  release  managements  

–   Challenges  in  tes&ng  the  various  services  evolving  independently,  regression  tests  etc  

Page 12: Expect the unexpected: Prepare for failures in microservices

Monoliths  to  Micro  services  

Page 13: Expect the unexpected: Prepare for failures in microservices

Resilient  system  

•  Processes  transac&ons,  even  when  there  are  transient  impulses,  persistent  stresses  

•  Func&ons  even  when  there  are  component  failures  disrup&ng  normal  processing    

•  Accepts  failures  will  happen  •  Designs  for  crumple  zones  

Page 14: Expect the unexpected: Prepare for failures in microservices

Kinds  of  failures  

•  Challenges  at  scale  •  Integra&on  point  failures    

–   Network  errors    – Seman&c  errors.    – Slow  responses  – Outright  hang  – GC  issues  

Page 15: Expect the unexpected: Prepare for failures in microservices

   

Page 16: Expect the unexpected: Prepare for failures in microservices
Page 17: Expect the unexpected: Prepare for failures in microservices

An&cipate  failures  at  scale  

•  An&cipate  growth    •  Design  for  next  order  of  magnitude    •  Design  for  10x  plan  to  rewrite  for  100x      

Page 18: Expect the unexpected: Prepare for failures in microservices

   

Page 19: Expect the unexpected: Prepare for failures in microservices

Resiliency  planning  Stage  1  

•  When  developing  code  – Avoiding  Cascading  failures  

•  Circuit  breaker  •  Timeouts  •  Retry  •  Bulkhead  •  Cache  op&miza&ons  

– Avoid  malicious  clients  •  Rate  limi&ng  

Page 20: Expect the unexpected: Prepare for failures in microservices

Resiliency  planning  Stage  2  

•  Planning  for  dealing  with  failures  before  deploy  –  load  test  – a/b  test  –  longevity    

Page 21: Expect the unexpected: Prepare for failures in microservices

Resiliency  planning  Stage  3  

•  Watching  out  for  failures  a7er  deploy  – health  check  – metrics  

Page 22: Expect the unexpected: Prepare for failures in microservices

   

Page 23: Expect the unexpected: Prepare for failures in microservices

Cascading  failures  

Caused  by  Chain  reac&ons  For  example          One  node  in  a  load  balance  group  fails          Others  need  to  pick  up  work          Eventually  performance  can  degenerate    

Page 24: Expect the unexpected: Prepare for failures in microservices

Cascading  failures  with  aggrega&on  

Page 25: Expect the unexpected: Prepare for failures in microservices

Cascading  failure  with  aggrega&on  

Page 26: Expect the unexpected: Prepare for failures in microservices

 

Page 27: Expect the unexpected: Prepare for failures in microservices

Timeouts  

•  Clients  may  prefer  a  response    –   failure    –   success  –   job  queued  for  later  All  aggrega&on  requests  to  microservices  should  have  reasonable  &meouts  set          

Page 28: Expect the unexpected: Prepare for failures in microservices

Types  of  Timeouts  

•  Connec&on  &meout  – Max  &me  before  connec&on  can  be  established  or  Error  

•  Socket  &meout  – Max  &me  of  inac&vity  between  two  packets  once  connec&on  is  established  

     

Page 29: Expect the unexpected: Prepare for failures in microservices

Timeouts  paGern  

•  Timeouts  +  Retries  go  together  •  Transient  failures  can  be  remedied  with  fast  retries  

•  However  problems  in  network  can  last  for  a  while  so  probability  of  retries  failing    

Page 30: Expect the unexpected: Prepare for failures in microservices

Timeouts  in  code  In  JAX-­‐RS  Client client = ClientBuilder.newClient(); client.property(ClientProperties.CONNECT_TIMEOUT, 5000); client.property(ClientProperties.READ_TIMEOUT, 5000)  

Page 31: Expect the unexpected: Prepare for failures in microservices

Retry  paGern  

•  Retry  for  failures  in  case  of  network  failures,  &meouts  or  server  errors  

•  Helps  transient  network  errors  such  as  dropped  connec&ons  or  server  fail  over  

Page 32: Expect the unexpected: Prepare for failures in microservices

Retry  paGern  

•  If  one  of  the  services  is  slow  or  malfunc&oning  and  other  services  keep  retrying  then  the  problem  becomes  worse  

•  Solu&on  – Exponen&al  backoff  – Circuit  breaker  paGern  

Page 33: Expect the unexpected: Prepare for failures in microservices

Circuit  breaker  paGern  

Circuit  breaker  A  circuit  breaker  is  an  electrical  device  used  in  an  electrical  panel  that  monitors  and  controls  the  amount  of  amperes  (amps)  being  sent  through    

Page 34: Expect the unexpected: Prepare for failures in microservices

Circuit  breaker  paGern  

•  Safety  device  •  If  a  power  surge  occurs  in  the  electrical  wiring,  the  breaker  will  trip.    

•  Flips  from  “On”  to  “Off”  and  shuts  electrical  power  from  that  breaker  

Page 35: Expect the unexpected: Prepare for failures in microservices

Circuit  breaker  

•  Neflix  Hystrix  follows  circuit  breaker  paGern  •  If  a  service’s  error  rate  exceeds  a  threshold  it  will  trip  the  circuit  breaker  and  block  the  requests  for  a  specific  period  of  &me  

Page 36: Expect the unexpected: Prepare for failures in microservices

Bulkhead  

Page 37: Expect the unexpected: Prepare for failures in microservices

Bulkhead  

•  Avoiding  chain  reac&ons  by  isola&ng  failures  •  Helps  prevent  cascading  failures  

Page 38: Expect the unexpected: Prepare for failures in microservices

Bulkhead  

•  An  example  of  bulkhead  could  be  isola&ng  the  database  dependencies  per  service  

•  Similarly  other  infrastructure  components  can  be  isolated  such  as  cache  infrastructure  

Page 39: Expect the unexpected: Prepare for failures in microservices

Rate  Limi&ng  

•  Restric&ng  the  number  of  requests  that  can  be  made  by  a  client  

•  Client  can  be  iden&fied  based  on  the  access  token  used  

•  Addi&onally  clients  can  be  iden&fied  based  on  IP  address  

Page 40: Expect the unexpected: Prepare for failures in microservices

Rate  Limi&ng  

•  With  JAX-­‐RS  Rate  limi&ng  can  be  implemented  as  a  filter  

•  This  filter  can  check  the  access  count  for  a  client  and  if  within  limit  accept  the  request  

•  Else  throw  a  429  Error  •  Code  at  hGps://github.com/bhak&-­‐mehta/samples/tree/master/ratelimi&ng  

Page 41: Expect the unexpected: Prepare for failures in microservices

Cache  op&miza&ons  

•  Stores  response  informa&on  related  to  requests  in  a  temporary  storage  for  a  specific  period  of  &me  

•  Ensures  that  server  is  not  burdened  processing  those  requests  in  future  when  responses  can  be  fulfilled  from  the  cache  

Page 42: Expect the unexpected: Prepare for failures in microservices

Cache  op&miza&ons  

Gelng  from  first  level  cache  

Gelng  from  second    level  cache  

Gelng  from  the  DB  

Page 43: Expect the unexpected: Prepare for failures in microservices

Dealing  with  latencies  in  response  

•  Have  a  &meout  for  the  aggrega&on  service  •  Dispatch  requests  in  parallel  and  collect  responses  

•  Associate  a  priority  with  all  the  responses  collected  

Page 44: Expect the unexpected: Prepare for failures in microservices

Handling  par&al  failures  best  prac&ces  

•  One  service  calls  another  which  can  be  slow  or  unavailable  

•  Never  block  indefinitely  wai&ng  for  the  service  •  Try  to  return  par&al  results  •  Provide  a  caching  layer  and  return  cached  data  

 

Page 45: Expect the unexpected: Prepare for failures in microservices

Asynchronous  PaGerns  

•  PaGern  to  deal  with  long  running  jobs  •  Some  resources  may  take  longer  &me  to  provide  results  

•  Not  needing  client  to  wait  for  the  response  

Page 46: Expect the unexpected: Prepare for failures in microservices

Reac&ve  programming  model  

•  Use  reac&ve  programming  such  as  CompletableFuture  in  Java  8,  ListenableFuture  

•  Rx  Java  

Page 47: Expect the unexpected: Prepare for failures in microservices

Asynchronous  API  

•  Reac&ve  paGerns  •  Message  Passing  

– Akka  actor  model  

•  Message  queues  – Communica&on  between  services  via  shared  message  queues  

– Websockets  

Page 48: Expect the unexpected: Prepare for failures in microservices

Logging  

•  Complex  distributed  systems  introduce  many  points  of  failure  

•  Logging  helps  link  events/transac&ons  between  various  components  that  make  an  applica&on  or  a  business  service  

•  ELK  stack  •  Splunk,  syslog  •  Loggly  •  LogEntries  

Page 49: Expect the unexpected: Prepare for failures in microservices

Logging  best  prac&ces  

•  Include  detailed,  consistent  paGern  across  service  logs  

•  Obfuscate  sensi&ve  data  •  Iden&fy  caller  or  ini&ator  as  part  of  logs  •  Do  not  log  payloads  by  default  

Page 50: Expect the unexpected: Prepare for failures in microservices

Best  prac&ces  when  designing  APIs  for  mobile  clients  

– Avoid  chalness  – Use  aggregator  paGern      

Page 51: Expect the unexpected: Prepare for failures in microservices

Resilience  planning  Stage  2  

•  Before  deploy  – Load  tes&ng  – Longevity  tes&ng  – Capacity  planning  

Page 52: Expect the unexpected: Prepare for failures in microservices

Load  tes&ng  

•  Ensure  that  you  test  for  load  on  APIs  –  Jmeter  

•  Plan  for  longevity  tes&ng      

Page 53: Expect the unexpected: Prepare for failures in microservices

Capacity  Planning  

•  An&cipate  growth  •  Design  for  handling  exponen&al  growth  

Page 54: Expect the unexpected: Prepare for failures in microservices

Resilience  planning  Stage  3  

•  A7er  deploy  – Health  check  – Metrics  and  Monitoring  – Phased  rollout  of  features  

Page 55: Expect the unexpected: Prepare for failures in microservices

   

 

Page 56: Expect the unexpected: Prepare for failures in microservices

Health  Check  

•  Memory  •  CPU  •  Threads  •  Error  rate  •  If  any  of  the  checks  exceed  a  threshold  send  alert  

Page 57: Expect the unexpected: Prepare for failures in microservices

   

Page 58: Expect the unexpected: Prepare for failures in microservices

Metrics  

•  Response  &mes,  throughput  –  Iden&fy  slow  running  DB  queries  

•  GC  rate  and  pause  dura&on  – Garbage  collec&on  can  cause  slow  responses  

•  Monitor  unusual  ac&vity  •  Third  party  library  metrics    

– For  example  Couchbase  hits  – atop  

Page 59: Expect the unexpected: Prepare for failures in microservices

Metrics  

•  Load  average  •  Up&me  •  Log  sizes  

Page 60: Expect the unexpected: Prepare for failures in microservices

Monitoring  

Monitoring  server  

Produc&on  Environment  

CHECKS  

ALERTS  

Email  

Page 61: Expect the unexpected: Prepare for failures in microservices

Monitoring  Stack  • Log  Aggrega&on  framework  Applica&on  

• So7ware  analy&cs  tool  that  monitors  performance    

OS  /  Applica&on  Code  

• Collectd  /  Graphite  Network,  Server  

Page 62: Expect the unexpected: Prepare for failures in microservices

Rollout  of  new  features  

•  Phasing  rollout  of  new  features    •  Have  a  way  to  turn  features  off  if  not  behaving  as  expected  

•  Alerts  and  more  alerts!    

Page 63: Expect the unexpected: Prepare for failures in microservices

Real  &me  examples  

•  Neflix's  Simian  Army  induces  failures  of  services  and  even  datacenters  during  the  working  day  to  test  both  the  applica&on's  resilience  and  monitoring.  

•  Latency  Monkey  to  simulate  slow  running  requests  

•  Wiremock  to  mock  services  •  Saboteur  to  create  deliberate  network  mayhem  

Page 64: Expect the unexpected: Prepare for failures in microservices

Takeaway  

•  Inevitability  of  failures  – Expect  systems  will  fail  – Failure  preven&on  

Page 65: Expect the unexpected: Prepare for failures in microservices

           

Page 66: Expect the unexpected: Prepare for failures in microservices

References  •  hGps://commons.wikimedia.org/wiki/File:Bulkhead_PSF.png  •  hGps://en.wikipedia.org/wiki/Circuit_breaker#/media/

File:Four_1_pole_circuit_breakers_fiGed_in_a_meter_box.jpg  •  hGps://www.flickr.com/photos/skynoir/  Beer  in  hand:  skynoir/Flickr/Crea&ve  Commons  License  

Page 67: Expect the unexpected: Prepare for failures in microservices

Ques&ons  •  TwiGer:  @bhak&_mehta  •  Email:  bhak&@bluejeans.com