computational sprinting june Talk+Overview:+Computaonal+Sprin(ng+ 4 Computational Sprinting •...

download computational sprinting june Talk+Overview:+Computaonal+Sprin(ng+ 4 Computational Sprinting • Computaonal+Sprin(ng+

of 42

  • date post

    15-Jul-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of computational sprinting june Talk+Overview:+Computaonal+Sprin(ng+ 4 Computational Sprinting •...

  • Computa(onal  Sprin(ng  

    Arun  Raghavan*,  Yixin  Luo+,  Anuj  Chandawalla+,     Marios  PapaeAhymiou+,  Kevin  P.  Pipe+#,    

    Thomas  F.  Wenisch+,  Milo  M.  K.  Mar*n*  

    University  of  Pennsylvania,  Computer  and  Informa(on  Science*  

    University  of  Michigan,  Electrical  Eng.  and  Computer  Science+    University  of  Michigan,  Mechanical  Engineering#  

  • This  work  licensed  under  the  Crea(ve  Commons     A.ribu*on-­‐Share  Alike  3.0  United  States  License  

    •  You  are  free:   •  to  Share  —  to  copy,  distribute,  display,  and  perform  the  work   •  to  Remix  —  to  make  deriva(ve  works  

    •  Under  the  following  condi*ons:   •  A.ribu*on.  You  must  aQribute  the  work  in  the  manner  specified  by  the  author  or  

    licensor  (but  not  in  any  way  that  suggests  that  they  endorse  you  or  your  use  of  the   work).    

    •  Share  Alike.  If  you  alter,  transform,  or  build  upon  this  work,  you  may  distribute  the   resul(ng  work  only  under  the  same,  similar  or  a  compa(ble  license.    

    •  For  any  reuse  or  distribu(on,  you  must  make  clear  to  others  the  license   terms  of  this  work.  The  best  way  to  do  this  is  with  a  link  to:  

    h.p://crea*vecommons.org/licenses/by-­‐sa/3.0/us/    

    •  Any  of  the  above  condi(ons  can  be  waived  if  you  get  permission  from  the   copyright  holder.  

    •  Apart  from  the  remix  rights  granted  under  this  license,  nothing  in  this   license  impairs  or  restricts  the  author's  moral  rights.  

    2

  • Overview  of  My  (Other)  Research  

    • Mul*core  memory  systems   •  Adap(ve  cache  coherence  protocols   • Memory  consistency:  specifica(on  &  implementa(on   •  “Why  On-­‐chip  Cache  Coherence  is  Here  to  Stay”   Communica)ons  of  the  ACM,  July  2012  

    •  Transac*onal  memory   •  Seman(cs  (what  does  “atomic”  really  mean?)   •  Extending  transac(on  sizes  &  handling  overflow   •  Conflict  avoiding  hardware  via  repair  (true  &  false  sharing)  

    • Hardware  support  for  security   •  Goal:  C/C++  as  safe  and  secure  as  Java   •  Hardware/compiler  co-­‐design  to  provide  memory  safety    

    3 Computational Sprinting

  • Talk  Overview:  Computa(onal  Sprin(ng  

    4 Computational Sprinting

    • Computa(onal  Sprin(ng   •  Unsustainable  power  for  short,  intense  bursts  of  compute  

    •  Feasibility  study  [HPCA’12]   •  Explored  thermal,  electrical,  and  architectural  feasibility   •  Simula(on  results:  

    •  Significant  responsiveness  improvements  in  short  bursts   • With  same  dynamic  energy  consump(on  

    • Preliminary  results  with  sprin(ng  on  prototype-­‐proxy   •  Characterize  real  energy/performance  behavior   •  Sprin(ng  can  improve  energy  efficiency  due  to  race  to  halt      

  • Computa(onal  Sprin(ng  and  Dark  Silicon    

    • A  Problem:  “Dark  Silicon”  a.k.a.  “The  U(liza(on  Wall”     •  Increasing  power  density;  can’t  use  all  transistors  all  the  (me   •  Cooling  constraints  limit  mobile  systems    

    • One  approach:  Use  few  transistors  for  long  dura(ons   •  Specialized  func(onal  units  [Accelerators,  GreenDroid]   •  Targeted  towards  sustained  compute,  e.g.  media  playback  

    • Our  approach:  Use  many  transistors  for  short  dura(ons   •  Computa(onal  Sprin(ng  by  ac(va(ng  many  “dark  cores”     •  Unsustainable  power  for  short,  intense  bursts  of  compute   •  Responsiveness  for  bursty/interac(ve  applica(ons  

    • Our  goal:  responsiveness  of  16W  chip  in  1W  plamorm   5 Computational Sprinting Is this feasible?

  • Sprin(ng  Challenges  and  Opportuni(es  

    •  Thermal  challenges   •  How  to  extend  sprint  dura(on  and  intensity?        Latent  heat  from  phase  change  material  close  to  the  die  

    •  Electrical  challenges   •  How  to  supply  peak  currents?  Ultracapacitor/ba.ery  hybrid   •  How  to  ensure  power  stability?  Ramped  ac*va*on  (~100μs)        

    • Architectural  challenges   •  How  to  control  sprints?  Thermal  resource  management   •  How  do  applica(ons  benefit  from  sprin(ng?  

    10.2x  responsiveness  for  vision  workloads   via  a  16-­‐core  sprint  within  1W  TDP  

    6 Computational Sprinting

  • Outline  

    7 Computational Sprinting

    • Mo*va*on:  “Dark  Silicon”  and  interac*ve  apps   • Computa(onal  Sprin(ng   •  Feasibility  Study   • Performance  Evalua(on  

    •  Simula(on  results   •  Characteriza(on  of  a  real  system  

    • Conclusion  

  • 0

    Power  Density  Trends  for  Sustained  Compute  

    8 Computational Sprinting

    0

    po w

    er

    time

    time

    te m

    pe ra

    tu re

    Tmax Thermal limit

    > 10x

    How  to  meet  thermal  limit  despite     power  density  increase?  

  • Op(on  1:  Enhance  Cooling?  

    9 Computational Sprinting

    Mobile  devices  limited  to  passive  cooling  

    "

    te m

    pe ra

    tu re

    time

    Tmax

  • Op(on  2:  Decrease  Chip  Area?  

    10 Computational Sprinting

    Reduces  cost,  but  sacrifices     benefits  from  Moore’s  law      

  • Op(on  3:  Decrease  Ac(ve  Frac(on?  

    11 Computational Sprinting

    How  do  we  extract  applica*on  performance     from  this  “dark  silicon”?    

  • Accelerator  Cores?  

    • Heterogeneous  cores   [Conserva(on  Cores  ASPLOS’10,  GreenDroid  IEEE  Comm.,  QsCores  MICRO’11]    

    •  Ac(vate  different  parts  of  chip  based  on  applica(on   • Mobile  chips  already  employ  accelerators  

    12 Computational Sprinting

    NVIDIA  Tegra  2  (49  mm2)   Apple  A5  (122  mm2)  

  • Design  for  Responsiveness  

    • Observa*on:  today,  design  for  sustained  performance   • But,  consider  emerging  interac*ve  mobile  apps…   [Clemons+  DAC’11,    Hartl+  ECV’11,  Girod+  IEEE  Signal  Processing’11]  

    •  Intense  compute  bursts  in  response  to  user  input,  then  idle   •  Humans  demand  sub-­‐second  response  (mes   [Doherty+  IBM  TR  ‘82,  Yan+  DAC’05,  Shye+  MICRO’09,  Blake+  ISCA’10]  

    13 Computational Sprinting

    Peak  performance  during  bursts   limits  what  applica*ons  can  do    

  • Computa*onal  Sprin*ng   Designing  for  Responsiveness  

    14

  • Parallel  Computa(onal  Sprin(ng  

    15 Computational Sprinting

    Tmax po

    w er

    te

    m pe

    ra tu

    re

  • Parallel  Computa(onal  Sprin(ng  

    16 Computational Sprinting

    Tmax po

    w er

    te

    m pe

    ra tu

    re

    Effect of thermal capacitance

  • Parallel  Computa(onal  Sprin(ng  

    17 Computational Sprinting

    Tmax po

    w er

    te

    m pe

    ra tu

    re

    Effect of thermal capacitance