Big Data for Better Datacenters

Post on 14-Jun-2015

478 views 3 download

Tags:

description

Despite all the recent advancements in the operations management field, data center management today still largely remains as a black art. Administrators have limited visibility into their data center operations today and yet they have to make important operations management decisions every day. A typical data center generates about a Billion data points every day. A lot of insight could be gathered from this data but due to the large volume and scale, on-premise software solutions only collect limited subset of this data. This limits them to a very narrow view of the data center. We at CloudPhysics have taken a different approach to this problem. We created an analytics platform in the cloud, that provides the ability to query, slice and dice and mashup the data with multiple data-sources. This approach not only yields incredible insights but also solves many of the teething operational management issues that have not been solved before. In this talk we give an overview of the data center metadata and provide details on how CloudPhysics handles this data at scale using its platform.

Transcript of Big Data for Better Datacenters

Big  Data  for  Better  Datacenters

!Balaji  Parimi    

@vimAPIGuru  

Krishna  Raj  Raja    @esxtopGuru  

!Strata  2014,  Santa  Clara

World  Today

The  Irony

Analytics

Networking  becomes  mainstream  !No  effective  means  to  pool  storage  and  compute  capacity  !100s  of  silos  to  manage

1990s

Application

Network

Application

OS OS

Compute Compute

Storage Storage

Virtualization  becomes  mainstream  

!Improved  resource  pooling  !Increased  utilization  

2000s

Storage

Network

VM

Hypervisor

Compute

VM

VM VM

VM

Hypervisor

Compute

VM

VM VM

2010s

Cloud  has  become  mainstream  !Centralized  management

VM

Storage

Network

VM

Software  Defined  Datacenter

Compute

VM

VM VM

VM

Compute

VM VM

70%  of  all  the  x86  OS  instances  run    in  virtual  machines,  80%  by  2016    

!-­‐  Gartner  (2012)

Data  Centers  Have  Evolved  But  IT  Operations  Has  Not

IT  Operations  Today

Retrofitted

IT  Operations  Today

Tedious

IT  Operations  Today

Paralyzing

IT  Operations  Today

Lot  of  Guesswork!

IT  Operations  Should  Be  Data-­‐Driven,    “Intuition  and  Guesswork”  Should  be  Replaced  By

“Analytics  and  Predictive  Intelligence”

Datacenter  Efficiency

Statistically Significant Machine Metadata Available for Data

Scientists

Half  Million  Private  Clouds

Can  Private  Clouds  Match  The  Efficiency?

Data  Scientists Machine    Metadata+

Operational  Metadata  Per  Day

500  Virtual  Machines

                   50  Servers

Private  Cloud

1  Billion  Data  Points

25GB

60 million tweets650  Million  Users

2.5 billion pieces of content1  Billion  Users

Quadrillion datapoints (1015)

500,000 Datacenters

50  Million  VMs

Massive  Scale

Challenges

VM

                   Server                    Server

                   Server                    Server

VM VM VM

                                             Data  Center

VM VM

VM VM VM VM VM VM

                           Management  Server                                                                                Control  Plane  

Management    Driven  by  Collective  Intelligence

(Collect)

(Predict/Alert)

ORG

MGMT SERVER CLUSTER

SERVER

DISK STORE

STORAGE!CLUSTER

VDISK

VM

VNIC

VSWITCH

VPORTGROUP

One to Many

Data  Model

CONFIGURATION

PERFORMANCE

TASK & EVENTS

EXTERNAL DATA SOURCES

Types  of  Data  Streams

On-­‐Demand    Analytics  Engine

Prediction  and  Simulation  Engine

Pre-­‐Alerts  and  Notification

Distributed  Datastore            Complex  Stream  Processing

             Collector Data  Analyst  Administrator Partner

Data  Stream

Extract  /  Transform

Persist

Model

Stream  Mashup

Prediction

Alerting

Job  Scheduling

Resource  Management

Enabling  Private  Clouds

Collective  Intelligence

Use  Cases

Predicting  Out  of  Disk  Space  EventDa

tastore  Capacity

0%

25%

50%

75%

100%

April May June July August

Datastore  1 Datastore  2 Datastore  3

Disk  Space  Usage  is  Hard  to  Predict

Predicting  Out  of  Disk  SpaceProb

ability

0%

25%

50%

75%

100%

<  1  Month 6  Months 1  Year

100%  Full 90%  Full 80%  Full

Probability  Distribution  Using  Monte  Carlo  Simulation

Discovering  Causality

1. Virtual  Machine  18  2. Virtual  Machine  23  3. Virtual  Machine  108

Predicting  Potential  Outage

Predicting  SSD  Performance  ImpactLatency  Re

ducjon

0%

25%

50%

75%

100%

Solid  State  Drive  Cache  Per  VM

16  GB 32  GB 64  GB 128  GB

VM  1

 VM  2

 VM  3

Target  L

atency

62  GB

24  GB

Predicting  Configuration  Outliers

Predicting  Performance  OutliersCP

U  Co-­‐effi

cient

0

22.5

45

67.5

90

Virtual  Machine  #

0 3 6 9 12

Organizaion  A Organizajon  B Organizajon  C

Some  Closing  Thoughts

Towards  a  New  World

Thank  You

!Balaji  Parimi    

@vimAPIGuru  

Krishna  Raj  Raja    @esxtopGuru