Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

31
1 Testing Big Data Prepared by: Anca Andreea Sfecla, Quality Assurance Manager Embarcadero Technologies Romania @ CODECAMP 2013, 20 th April 2013

description

 

Transcript of Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Page 1: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

1

Testing Big Data

Prepared by: Anca Andreea Sfecla, Quality Assurance Manager Embarcadero Technologies Romania

@ CODECAMP 2013,

20th April 2013

Page 2: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Page 3: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

What is Big Data?

• “Big Data is the frontier of a firm’s ability to store, process, and access all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.” - Forrester Research

• “Big data creates a new layer in the economy which is all about information, turning information, or data, into revenue. In 2013, big data is forecast to drive $34 billion of IT spending” – Gartner Research

Page 4: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Characteristics

Big Data

Volume

Variety

Velocity

Value

Page 5: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Characteristics

Big Data

Volume

Variety

Velocity

Value

Page 6: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Characteristics

Big Data

Volume

Variety

Velocity

Value

Page 7: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Characteristics

Big Data

Volume

Variety

Velocity

Value

Page 8: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Characteristics

Big Data

Volume

Variety

Velocity

Value

Page 9: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Success Stories

• Detecting infections in premature infants up to 24 hours before they exhibit symptoms

• Reducing the cost of sequencing a genome from $10,000 to less than $100

• Predict flu outbreaks by analyzing massive number of Google searches related to flu symptoms

Page 10: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

EDW versus Big Data

Page 11: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

EDW versus Big DataClean Data Unclean Data

Gigabytes to Terabytes(1000 GB)

Petabytes(1000 TB) to Exabytes(1000 PB)

Simplified, Structured Complex, Semi or Unstructured

Data from relational database

Data from non-relational flat file storage

Centralized data Distributed data

Structured Database Schema

Customized-instant schema, generated

Page 12: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Solutions

Microsoft Big Data Solution

Page 13: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Solutions

Page 14: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Solutions

Page 15: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Page 16: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Processing using Hadoop Framework

Page 17: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Analytics

Web Logs StreamingData Social Data Transactional

Data (RDBMS)

Enterprise Data Warehouse

HAD

OO

P

HivePig

MapReduce(Job Execution)HBase(NoSQL DB)

HDFS (Hadoop Distributed File System)

Processed Data

Data Load using Sqoop

ETL Process

Big Data Architecture

Page 18: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Architecture

Big Data Analytics

Web Logs StreamingData Social Data Transactional

Data (RDBMS)

Enterprise Data Warehouse

HAD

OO

P

HivePig

MapReduce(Job Execution)HBase(NoSQL DB)

HDFS (Hadoop Distributed File System)

Processed Data

Data Load using Sqoop

ETL Process

1 Pre-HadoopProcessing

Page 19: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Possible problems• incorrect data captured from source systems

• incorrect storage of data

• incomplete or incorrect replications

Page 20: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Analytics

Web Logs StreamingData Social Data Transactional

Data (RDBMS)

Enterprise Data Warehouse

HAD

OO

P

HivePig

MapReduce(Job Execution)HBase(NoSQL DB)

HDFS (Hadoop Distributed File System)

Processed Data

Data Load using Sqoop

ETL Process

Big Data Architecture

1 Pre-HadoopProcessing

2 Map-Reduce process validation

Page 21: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Possible problems• coding issues in map-reduce jobs

• jobs working correctly when run in standalone node, but working incorrectly when run on multiple nodes

• incorrect aggregations, node configurations and incorrect output format

Page 22: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Analytics

Web Logs StreamingData Social Data Transactional

Data (RDBMS)

Enterprise Data Warehouse

HAD

OO

P

HivePig

MapReduce(Job Execution)HBase(NoSQL DB)

HDFS (Hadoop Distributed File System)

Processed Data

Data Load using Sqoop

ETL Process

Big Data Architecture

1 Pre-HadoopProcessing

2 Map-Reduce process

validation

3 Data Extract and Load Process

Page 23: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Possible problems• incorrectly applied transformation

rules

• incomplete data extract from HDFS

• incorrect load of HDFS files into analysis tools

Page 24: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Analytics

Web Logs StreamingData Social Data Transactional

Data (RDBMS)

Enterprise Data WarehouseH

ADO

OP HivePig

MapReduce(Job Execution)HBase(NoSQL DB)

HDFS (Hadoop Distributed File System)

Processed Data

Data Load using Sqoop

ETL Process

Big Data Architecture

1 Pre-HadoopProcessing

2 Map-Reduce process

validation

3 Data Extract and Load Process

Reports testing

Page 25: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Possible problems

• report definitions not set as per requirement

• report data issues

• layout and format issues

Page 26: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Big Data Analytics

Web Logs StreamingData Social Data Transactional

Data (RDBMS)

Enterprise Data WarehouseH

ADO

OP HivePig

MapReduce(Job Execution)HBase(NoSQL DB)

HDFS (Hadoop Distributed File System)

Processed Data

Data Load using Sqoop

ETL Process

Big Data Architecture

1 Pre-HadoopProcessing

2 Map-Reduce process

validation

3 Data Extract and Load Process

Non

Fun

ction

al T

estin

g Reports testing

Page 27: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Possible problems• imbalance in input splits

• redundant sorts

• moving most of the aggregation computations to the Reduce process

• node failures

• data corruption

Page 28: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

New to the tester

• Semi-structured and unstructured data

• Immense volumes of dynamic, complex data

• Test environment

• Big Data ecosystem

• Pure programming tools

• Non-SQL interrogations

Page 29: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Testing Big Data

• Big

• Fast

• Complex

• Rewarding

Page 30: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Q&A

Page 31: Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero

Prepared by Anca Sfecla, QAM - Embarcadero Technologies

Thank you!

& Please fill in your evaluation form [email protected]