Isilon.08 Gallagher-Isilon.08 Final

31
1 © Copyright 2013 EMC Corporation. All rights reserved. Taking Command of Big Data: Analytics and Storage Solutions for High Impact Business Insight Ryan Peterson Director, Solutions Architecture Isilon Storage Division

Transcript of Isilon.08 Gallagher-Isilon.08 Final

  • 1 Copyright 2013 EMC Corporation. All rights reserved.

    Taking Command of Big Data: Analytics and Storage Solutions for High Impact Business Insight

    Ryan Peterson Director, Solutions Architecture Isilon Storage Division

  • 2 Copyright 2013 EMC Corporation. All rights reserved.

    Roadmap Information Disclaimer EMC makes no representation and undertakes no obligations with

    regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, Roadmap Information).

    Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.

    Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC Non-Disclosure Agreement in place with your organization.

  • 3 Copyright 2013 EMC Corporation. All rights reserved.

    Agenda Quick Review of Isilon Key Features Quick Review of Hadoop Lessons Learned Common Misconceptions Hadoop Technology Review Hadoop Technology Challenges Lessons Learned Seeing Hadoop Differently Case Study Example Resources

  • 4 Copyright 2013 EMC Corporation. All rights reserved.

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    2009 2010 2011 2012 2013 2014

    Exab

    ytes

    The Unstructured Data Challenge

    By 2013, 80% of all storage capacity sold will be for unstructured data Source: Scale Out Storage in the Content Driven Enterprise: Unleashing the Value of Information Assets, IDC White Paper

    File based: 61.8% CAGR Block based: 23.7% CAGR

  • 5 Copyright 2013 EMC Corporation. All rights reserved.

    EMC Isilon Scale-Out NAS

    Single file system, single volume, global namespace for simplicity and ease of use Scales to over 20 PB Stripes data across all nodes for high resiliency and up to N+4 data protection Robust data backup and disaster recovery options Unmatched efficiency with > 80% storage utilization and automated storage tiering Worlds fastest NAS with over 100 GB/s throughput, 1.6M SPECsfs ops Integrated support for industry-standard protocols including NFS, SMB, HTTP, FTP,

    and HDFS for operational flexibility

    Native HDFS and HDFS 2.0 support

  • 6 Copyright 2013 EMC Corporation. All rights reserved.

    Hadoop Finding your Gold Nugget of Data

  • 7 Copyright 2013 EMC Corporation. All rights reserved.

    Created 6+ years ago

    Software platform designed to analyze massive amounts of unstructured data

    Two core components: Hadoop Distributed File System (HDFS) (storage) MapReduce (compute)

    Now a top-level Apache project backed by large, open source development community

    Hadoop

  • 8 Copyright 2013 EMC Corporation. All rights reserved.

    Hadoop is a complete solution

    Hadoop is a share-nothing architecture

    Hadoop is a mainstream technology

    Hadoop is only for Data Scientists

    Hadoop is only good with DAS

    HDFS is a robust file system

    Hadoop is an Engineering Exercise

    Hadoop Lessons Learned Common Misperceptions

  • 9 Copyright 2013 EMC Corporation. All rights reserved.

    Isilon HDFS interface

    Isilon supports the HDFS interfaces for the NameNode and DataNode to host and metadata and data

    Underlying filesystem is OneFS As simple as pointing the HDFS

    clients to the DNS name of the Isilon cluster!

  • 10 Copyright 2013 EMC Corporation. All rights reserved.

    Technology Review

  • 11 Copyright 2013 EMC Corporation. All rights reserved.

    Secondary NameNode

    DataNode / Task Tracker Job Tracker

    NameNode

    Technology Review

  • 12 Copyright 2013 EMC Corporation. All rights reserved.

    NameNode

    Manages the file system namespace Stores all the Metadata in the RAM Filenames, owners, group, access info Knows associated blocks Manages block replication

  • 13 Copyright 2013 EMC Corporation. All rights reserved.

    Secondary NameNode

    Manages edit log and check-pointing of NameNode metadata

    Does NOT provide NameNode failover Is not a backup or hot standby for the NameNode

  • 14 Copyright 2013 EMC Corporation. All rights reserved.

    Job Tracker

    Manages all the jobs to the cluster Tracks and reports the status of jobs

    and tasks

    Provides job queuing functionality

  • 15 Copyright 2013 EMC Corporation. All rights reserved.

    DataNode / Task Tracker

    Stores blocks of files on top of native host OS file system (e.g. EXT3, ZFS)

    Serves read/write requests from the clients Perform block creation, deletion, and replication Same block can be stored on multiple DataNodes for

    redundancy

  • 16 Copyright 2013 EMC Corporation. All rights reserved.

    Technology Challenges

  • 17 Copyright 2013 EMC Corporation. All rights reserved.

    Hadoop Technology Challenges

    Traditional Hadoop NameNode Architecture and Data Resiliency

    Data Protection and Version Control with Hadoop Manual Import and Export of Data Scalability of Traditional Hadoop Infrastructure Protocol Support Time to Results

  • 18 Copyright 2013 EMC Corporation. All rights reserved.

    Traditional NameNode Architecture

    NameNode

    When NameNode

    map is lost or damaged, data location

    information no longer exists

    NameNode provides

    location details of all stored information

    No automatic recovery of NameNode = downtime Even with NameNode failover due out soon in Hadoop, manual recovery required

  • 19 Copyright 2013 EMC Corporation. All rights reserved.

    Distributed (Clustered) NameNode When Using Isilon

    Metadata stored across systems same way as standard file metadata Built-in clustered redundancy across many nodes

    NameNode

    Clustering the NameNode on Isilon allows

    for the failure protection level Isilon

    already provides

    Clu

    ster

    ed N

    ameN

    ode

  • 20 Copyright 2013 EMC Corporation. All rights reserved.

    Snapshot/Version Control Before

    After

    Traditional HDFS does not have replication

    No Snapshotting of data Loss of Version control Not designed for Mission

    Critical

    Full Snapshot IQTM integration identifies changes

    Multi-threaded, Multi-Node Scale-Out replication

    Improved RPO/RTO for business continuity

    Geo-replicated Hadoop!

  • 21 Copyright 2013 EMC Corporation. All rights reserved.

    Traditional Share-Nothing Hadoop

    Existing Virtualized Data Center SHARE-NOTHING Hadoop Infrastructure

    Unstructured Data

    1

    Existing Primary Storage

    2 3 4 2 3 4 2 3 4 2 3 4

    Hadoop on a Stick (R=3) means 5 data copies ($$$$)

    Data has to copy to the Hadoop cluster before analysis can begin (Time to Results)

    How long would it take to copy all of your data to another storage platform? How would you maintain data consistency when a file changes on your primary storage?

  • 22 Copyright 2013 EMC Corporation. All rights reserved.

    Isilon Share-Everything Hadoop

    Existing Virtualized Data Center

    Use Native HDFS Protocol

    Unstructured Data

    1

    Start using Hadoop NOW with unused processing and RAM available in your VMware environment

    No replication required (Use your existing data)

    Access to same data via NAS and HDFS protocols

    Time to results extremely fast using already existing data with NO COPIES or wasted $$$$

    New Hadoop Compute Nodes

    Existing Primary Storage

  • 23 Copyright 2013 EMC Corporation. All rights reserved.

    Protocol Support Servers

    Servers

    Servers

    Before

    After

    HDFS is not visible to Windows, Unix, Linux, Apple, or any other file system natively

    Big Data is only used for Big Data

    Inherent Multi-Protocol Support in Isilon allows ubiquitous access to all file systems including Hadoop

    Big Data is actual data!

    Servers

  • 24 Copyright 2013 EMC Corporation. All rights reserved.

    Data Center Network

    Time-to-Results

    Data Copy Analysis In-Place Analysis

    Existing Primary Storage

    Hadoop on a Stick

    Have you ever copied 100TB from Primary Storage to a Hadoop system?

    How long does it take to copy

    100TB from one place to another

    over a 10GB link?

    >24 Hours

    Data Center Network

    Existing Primary Storage

    Hadoop Processing Nodes

    Reading relevant data to analysis

  • 25 Copyright 2013 EMC Corporation. All rights reserved.

    Dependent Scaling Traditional Hadoop HDFS

    Isilon HDFS

    Storage to Compute ratio is fixed Scaling compute means scaling

    capacity

    Difficult to provide QoS Compute upgrade is a forklift

    Scale compute independent of storage

    Achieve optimal performance balance even as workloads evolve

    No data migrations, ever! Add new performance as

    hardware evolves

    Compute

    Sto

    rage

    Required performance/ capacity

    Required Hadoop Cluster Nodes

  • 26 Copyright 2013 EMC Corporation. All rights reserved.

    Independent Scaling Traditional Hadoop HDFS

    Isilon HDFS

    Storage to Compute ratio is fixed Scaling compute means scaling

    capacity

    Difficult to provide QoS Compute upgrade is a forklift

    Scale compute independent of storage

    Achieve optimal performance balance even as workloads evolve

    No data migrations, ever! Add new performance as

    hardware evolves

    Compute

    Sto

    rage

    Required performance/ capacity

    Required Hadoop Cluster Nodes

  • 27 Copyright 2013 EMC Corporation. All rights reserved.

    Hadoop can be inexpensive

    Hadoop can be easy to deploy

    Hadoop can use my existing data

    Hadoop NameNode data can be protected

    Hadoop data can have uptime guarantees

    HDFS is better as a protocol than file system

    Isilon addresses many Hadoop challenges

    Hadoop Lesson Learned See Hadoop Differently

  • 28 Copyright 2013 EMC Corporation. All rights reserved.

    Return Path Captures Competitive Advantage with Hadoop Analytics and EMC Isilon

    Challenge Data growing 2550 terabytes per year Limited performance and capacity to support intensive

    Hadoop analytics Disparate systems lacked performance and capacity

    Solution X-series SmartPools, SmartConnect,

    SmartQuotas, InsightIQ

    Results Enables unconstrained access to email data for analysis Reduces shared storage data center footprint by 30 percent Improves availability and reliability for Hadoop analytics savings of $350,000 from lower power, cooling, and

    maintenance

    Applications Hadoop, internally

    developed email intelligence solutions

    Isilon serves NFS data across multiple product suites and makes it easily accessible to our Hadoop analytics team. Thats a significant business enabler, allowing Return Path to develop customer solutions much faster.

    DIZ CARTER VP Infrastructure

    Operations

  • 29 Copyright 2013 EMC Corporation. All rights reserved.

    For More Information EMC.com:

    EMC Isilon Scale-Out NAS: http://www.emc.com/isilon Scale-Out Storage Solutions for Hadoop:

    http://www.emc.com/big-data/scale-out-storage-hadoop.htm

    Solution Brief: EMC Big Data Storage and Analytics Solution White Paper: Hadoop on EMC Isilon Scale-Out NAS Analyst Report: EMCs Enterprise Hadoop Solution, Enterprise Strategy

    Group, 2012 Email me: [email protected]

  • 30 Copyright 2013 EMC Corporation. All rights reserved.

    Session Name Date Time

    Isilon Scale-Out NAS Overview and Future Directions Monday 5/6 Wednesday 5/8 1-2pm 8:30-9:30am

    Protecting & Backing Up the Isilon Cluster at Enterprise Scale

    Tuesday 5/7 Thursday 5/9

    10-11am 8:30-9:30am

    Get Better Insight into Your Isilon Cluster with Tools that Help You Manage Your Performance & Capacity

    Tuesday 5/7 Thursday 5/9

    10-11am 11:30am-12:30pm

    Related Sessions

    Birds of a Feather Date Time

    Online File Sharing & Collaboration Opportunities and Challenges in Deploying with On-Premise Storage Tuesday, 5/7 1-2pm

    Hadoop Opportunities and Challenges in Deploying with an Enterprise Infrastructure Wednesday, 5/8 1-2pm

  • 31 Copyright 2013 EMC Corporation. All rights reserved.

    Stop by the EMC ISILON Booth #124 and Wednesday Keynote for a chance to win

    Weigh your current Big Data at the EMC ISILON booth and get a t-shirt.

    Join one of our theater presentations and receive a FREE drink ticket at the Captains Lab.

    Discover the future of enterprise storage at Isilons Keynote Wednesday, April 8 11:30 AM

    Venetian Ballroom

    Drawing immediately following for a 3D Printer (Makerbot)

  • Taking Command of Big Data: Analytics and Storage Solutions for High Impact Business InsightRoadmap Information DisclaimerAgendaThe Unstructured Data ChallengeEMC Isilon Scale-Out NASHadoop Finding your Gold Nugget of DataSlide Number 7Hadoop Lessons Learned Common MisperceptionsIsilon HDFS interfaceTechnology ReviewTechnology ReviewNameNodeSecondary NameNodeJob TrackerDataNode / Task TrackerTechnology ChallengesHadoop Technology ChallengesTraditional NameNode ArchitectureDistributed (Clustered) NameNode When Using IsilonSnapshot/Version ControlTraditional Share-Nothing HadoopIsilon Share-Everything HadoopProtocol SupportTime-to-ResultsDependent ScalingIndependent ScalingHadoop Lesson Learned See Hadoop DifferentlyReturn PathFor More InformationRelated Sessions Stop by the EMC ISILON Booth #124 and Wednesday Keynote for a chance to winSlide Number 32