BIG DATA TESTING

15
BIG DATA TESTING By QA InfoTech

description

Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analysed with traditional computing techniques. - PowerPoint PPT Presentation

Transcript of BIG DATA TESTING

Page 1: BIG  DATA TESTING

BIG DATA TESTING

By QA InfoTech

Page 2: BIG  DATA TESTING

Scenario

Page 3: BIG  DATA TESTING

OMG!! Did he just asked me to catch rats in a place full of snakes

3

Page 4: BIG  DATA TESTING

Agenda

1. What is Big Data2. Characteristic of Big Data3. Meaning of BIG DATA to “US”4. Hadoop6. Submitting a Map Reduce Job

Page 5: BIG  DATA TESTING

What is BIG DATA?

• ‘Big Data’ is similar to ‘small data’, but bigger in size

• Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.

• Walmart handles more than 1 million customer transactions every hour.

• Facebook handles 40 billion photos from its user base.

• Decoding the human genome originally took 10years to process; now it can be achieved in one week.

Page 6: BIG  DATA TESTING

Three Characteristics of Big Data V3s

Volume

•Data

quantity

Velocity

•Data Speed

Variety

•Data Types

Page 7: BIG  DATA TESTING

What BIG DATA TESTING mean to Testers?

Take into consideration these 3 perspectives:• Data • Infrastructure• Validation Tools

Page 8: BIG  DATA TESTING

Now the questions comes what technology is needed for handling BIG DATA ?

1.HADOOP

Page 9: BIG  DATA TESTING

Hadoop & Its Components

• Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing.

Source: http://www.trieuvan.com/apache/hadoop/common/

Page 10: BIG  DATA TESTING

How is Hadoop Helping?

• HDFS: Java based distributed FS that can run and store all kinds of data• Map Reduce: A software programming model for processing large set of

data in parallel• YARN: A resource management framework for scheduling and handling

resource requests from distributed applications

Page 11: BIG  DATA TESTING

11

This is our Input File : Input Sampleset.txt

Page 12: BIG  DATA TESTING

12

Map Reduce Program For Max Temperature :Driver Class

Job job = new Job();job.setJarByClass(MaxTemperatureDriver.class);job.setJobName("Max Temperature");

FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);job.setReducerClass(MaxTemperatureReducer.class);

Page 13: BIG  DATA TESTING

13

Mapper Class

@Overridepublic void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();String year = line.substring(15, 19);int airTemperature;if (line.charAt(87) == '+') { // parseInt doesn't like leading plus// signsairTemperature = Integer.parseInt(line.substring(88, 92));} else {airTemperature = Integer.parseInt(line.substring(87, 92));}

Page 14: BIG  DATA TESTING

14

Reducer Class

@Overridepublic void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE;for (IntWritable value : values) {maxValue = Math.max(maxValue, value.get());}context.write(key, new IntWritable(maxValue));}}