Big Data - Mini Workshop

download Big Data - Mini Workshop

of 29

Transcript of Big Data - Mini Workshop

  • 8/10/2019 Big Data - Mini Workshop

    1/29

    David Tarrant @davetaz

    Big Data? Big Opportunities?

  • 8/10/2019 Big Data - Mini Workshop

    2/29

    Provide some practical steps to managbig open data projects.

  • 8/10/2019 Big Data - Mini Workshop

    3/29

    Define Big Data

    Describe Adi"

    erent approaches to managing bigdata projects

    Apply enterprise tools to analyse a big dataset q

    O

  • 8/10/2019 Big Data - Mini Workshop

    4/29

    WHAT IS BIG D

  • 8/10/2019 Big Data - Mini Workshop

    5/29

    Big Data

    Dataset that are too large andcomplex to manipulate withstandard methods or tools.

  • 8/10/2019 Big Data - Mini Workshop

    6/29

    Excel

    Workbook WASlimited to 65,536 rows (216aka 1

    64-Bit operating system addressing limit is 264

    18,446,744,073,709,551,615

    q q t b m t h

  • 8/10/2019 Big Data - Mini Workshop

    7/29

    What is big data?

    Volume

    Velocity

    Variety

    Veracity

  • 8/10/2019 Big Data - Mini Workshop

    8/29

    What is big data?

    Volume

    Velocity

    Variety

    Veracity

    We create around 4 zettabytes ofdata day.

    Thats 1 sextillion bytes per day(128-Bit OS required)

  • 8/10/2019 Big Data - Mini Workshop

    9/29

    What is big data?

    Volume

    Velocity

    Variety

    Veracity

    The data is created quicker thanwe can curate its storage.

  • 8/10/2019 Big Data - Mini Workshop

    10/29

    What is big data?

    Volume

    Velocity

    Variety

    Veracity

    The data is continuously changingin structure, format and detail.

  • 8/10/2019 Big Data - Mini Workshop

    11/29

    What is big data?

    Volume

    Velocity

    Variety

    Veracity

    The data quality is highly variable and

    a"ected by changing perception oftruth and fact.

  • 8/10/2019 Big Data - Mini Workshop

    12/29

    Big Data

    Taken collectively. All digital data is bigdata. Looking at a facet might revealthat you are looking at a dataset thatonly conforms to one or two of the Vs.

    Can you name a dataset that shows thecharacteristics of all 4 Vs?

  • 8/10/2019 Big Data - Mini Workshop

    13/29

    A few more Vs

    Value and Viability

    More data does not mean better results.

    In fact often entirely the opposite is true.

    Sample selection is critical to all good statistic studies.

    Not being able to control selection may lead to an incorrect con

  • 8/10/2019 Big Data - Mini Workshop

    14/29

    Conclusion

    The majority of datasetsare large.

    Lots of rows with lots of joins that can beprocessed. If you know how to exploitcomputing power available.

  • 8/10/2019 Big Data - Mini Workshop

    15/29

    Define Big Data

    Discuss the di"

    erent approaches to managing bdata projects

    Apply enterprise tools to process a large dataset

    O

  • 8/10/2019 Big Data - Mini Workshop

    16/29

    Download the data, buy a mac p

    12 Cores

    64Gb RAM

    7,500

  • 8/10/2019 Big Data - Mini Workshop

    17/29

    50,000+ 80 Cores

    4Tb RAM

    http://browser.primatelabs.com/geekbench3/913858

    Download the data, build a clust

  • 8/10/2019 Big Data - Mini Workshop

    18/29

    Take the data to the cloud

    32 cores

    244Gb RAM

    $0.3293 per Hour

    !"#$%%&'()&*&+,-).,*%/.0%1-(2&-./324#/(%

  • 8/10/2019 Big Data - Mini Workshop

    19/29

    Separate yourpipelines

  • 8/10/2019 Big Data - Mini Workshop

    20/29

    Compute

    Comput!"#$%%&'()&*&+,-

    Data

  • 8/10/2019 Big Data - Mini Workshop

    21/29

    Queries: Google speed

  • 8/10/2019 Big Data - Mini Workshop

    22/29

    Data

  • 8/10/2019 Big Data - Mini Workshop

    23/29

    Compute Compute Compute

    Compute Compute

    Web Server

  • 8/10/2019 Big Data - Mini Workshop

    24/29

    Compute Compute Compute

    Compute Compute

    Web Server

    MAP

    REDUC

    E

  • 8/10/2019 Big Data - Mini Workshop

    25/29

    Map-reduce services

    7,,89/ :18 ;

  • 8/10/2019 Big Data - Mini Workshop

    26/29

    Define Big Data

    Discuss the di"erent approaches to managing b

    data projects

    Apply enterprise tools to process a large dataset

    O

  • 8/10/2019 Big Data - Mini Workshop

    27/29

    Exercise

    Visualising 6m rows of

    data in Socrata.

  • 8/10/2019 Big Data - Mini Workshop

    28/29

    Compute

    Comput

    Data

  • 8/10/2019 Big Data - Mini Workshop

    29/29

    David Tarrant @davetaz

    Thank-you