An introduction to Test Driven Development on MapReduce

24

Transcript of An introduction to Test Driven Development on MapReduce

An introduction to Test Driven Development on MapReduce

What is TDD

• Test first development approach where developers write test cases to capture the failure cases and improve the system to the acceptable state.

Why it is difficult in Hadoop

• Hadoop is a distributed framework designed to run on a larger cluster with terra bytes of data

• Mimic the behavior of a Hadoop cluster is very hard

The Best Practice

• Golden Rule of Programming

Always abstract your business logic. This will make easier for you to unit test

Example

public class StockMeanReducer extends Reducer <Text,DoubleWritable,Text,DoubleWritable>{

private DoubleWritable writable = new DoubleWritable();

@Override

public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException

{

double total = 0;

int count = 0;

for(DoubleWritable stockPrice : values)

{

total += stockPrice.get();

count++;

}

writable.set(total / count);

context.write(stockText, writable);

}

The best approach – Abstraction

public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable>

{

private DoubleWritable writable = new DoubleWritable();

private final StockMean stockMean = new StockMean();

@Override

public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws

IOException, InterruptedException

{

stockMean.reset();

for(DoubleWritable stockPrice : values)

{

stockMean.add(stockPrice.get());

}

writable.set(stockMean.calculate());

context.write(stockText, writable);

}

}

The best approach – Abstraction- cont

public class StockMean

{

private double total = 0;

private int instance = 0;

public void add(final double total)

{

this.total += total;

++this.instance;

}

public double calculate()

{

return total / (double) instance;

}

public void reset()

{

this.total = 0;

this.instance = 0;

}

}

Testing Map Reduce Jobs

• Best Practices are fine. Still I need to test the code inside my mapper and reducer. What shall I do??

Introduction to MRUNIT

• MRUnit is a Map Reduce unit testing framework.

• Developed by cloudera and been open sourced and currently in Apache Incubator.

• Developed on top of Mockito mock object framework

• It is a generic framework that you can use with both Junit and TestNG

MRUnit – Testing Mapper

Unit Test

Mapper

MapDriver

Mock OutputCollector

MR Unit

(1) Set up and execute test

(2) Call Map method with key / value

(3) Map output is captured

(4) Compare the expected outputs

Sample Mapper

public class StockMeanMapper extends

Mapper<Text,DoubleWritable,Text,DoubleWritable>

{

@Override

protected void map(Text key, DoubleWritable value, Context

context)

throws IOException, InterruptedException

{

if(key == null) return;

if(key.toString().equalsIgnoreCase("xyz")) return;

context.write(key, value);

}

}

Mapper Unit Test

public class StockMeanMapperTest {

private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper;

private MapDriver<Text,DoubleWritable,Text,DoubleWritable> driver;

@Before

public void setUp()

{

mapper = new StockMeanMapper();

driver = new MapDriver<Text,DoubleWritable,Text,DoubleWritable>(mapper);

}

@Test

public void testPositiveConditionStockMeanMapper() throws IOException

{

List<Pair<Text, DoubleWritable>> results = driver.withInput(new Text("rahul"), new

DoubleWritable(1))

.withOutput(new Text("rahul"), new DoubleWritable(1))

.run();

assertEquals(1, results.size());

}

}

MRUnit – Testing Reducer

Unit Test

Reducer

ReduceDriver

Mock OutputCollector

MR Unit

(1) Set up and execute test

(2) Call Reduce method with key / value

(3) Reduce output is captured

(4) Compare the expected outputs

Sample Reducer

public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable>

{

private DoubleWritable writable = new DoubleWritable();

private final StockMean stockMean = new StockMean();

@Override

public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws

IOException, InterruptedException

{

stockMean.reset();

for(DoubleWritable stockPrice : values)

{

stockMean.add(stockPrice.get());

}

writable.set(stockMean.calculate());

context.write(stockText, writable);

}

}

Reducer Unit Test

public class StockMeanReducerTest {

private ReduceDriver<Text,DoubleWritable,Text,DoubleWritable> driver;

private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer;

@Before

public void setup()

{

reducer = new StockMeanReducer2();

driver = new ReduceDriver<Text,DoubleWritable,Text,DoubleWritable>(reducer);

}

@Test

public void testStockPositive() throws IOException

{

Pair<Text,DoubleWritable> assertPair = new Pair<Text,DoubleWritable>(new Text("ananth"),

new DoubleWritable(300));

List<Pair<Text,DoubleWritable>> results = driver.withInput(new Text("ananth"),

Arrays.asList(new

DoubleWritable(500),

new DoubleWritable(100)))

.run();

assertEquals(assertPair, results.get(0));

}

}

MRUnit – Testing MapReduce

Unit Test

Reducer

MR Unit

(1) Set up and execute test

(4) Call Reduce method with key / value (5) Compare the expected

outputs

MapReduceDriver

MapDriver

(3)Shuffle

ReduceDriver

Mapper(2) Call Map method with key / value

(3) MRUnit perform it’s own in memory shuffle phase

MapReduce Unit Test

public class StockMeanMapReduceTest

{

private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper;

private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer;

private MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable> driver;

@Before

public void setup()

{

mapper = new StockMeanMapper();

reducer = new StockMeanReducer2();

driver = new MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable>(mapper,reducer);

}

MapReduce Unit Test – Contd..

@Testpublic void testPositive() throws IOException{

Pair<Text,DoubleWritable> inputPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(300));Pair<Text,DoubleWritable> inputPair2 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(100));Pair<Text,DoubleWritable> inputPair3 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400));Pair<Text,DoubleWritable> inputPair4 = new Pair<Text,DoubleWritable>(new Text("xyz"), new DoubleWritable(50));

Pair<Text,DoubleWritable> assertPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(200));Pair<Text,DoubleWritable> assertPair2 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400));

List<Pair<Text,DoubleWritable>> assertPair = Arrays.asList(assertPair1,assertPair2);

List<Pair<Text,DoubleWritable>> results = driver.withInput(inputPair1).withInput(inputPair2).withInput(inputPair3).withInput(inputPair4).run();

assertEquals(assertPair, results);

}

Wait, there is one more thing!!!

• Hadoop is all about data.

• We can’t always assume that data will be 100% perfect.

• So do MRUnit unit testing by mocking Object is enough??

Hadoop LocalFile System

• Hadoop API provides LocalFileSystem, which enable you to read data from your local file system and test your map reduce jobs.

• Best practice is to take a sample of your real data and load in to local file system and test it out.

• LocalFileSystem only work in Linux based System.

How can I test LocalFileSystem in Windows? – A little hack

public class WindowsLocalFileSystem extends LocalFileSystem

{

public WindowsLocalFileSystem()

{

super();

}

@Override

public boolean mkdirs (

final Path path,

final FsPermission permission)

throws IOException

{

final boolean result = super.mkdirs(path);

this.setPermission(path, permission);

return result;

}

Hack Contd..

@Override

public void setPermission (

final Path path,

final FsPermission permission)

throws IOException

{

try {

super.setPermission(path, permission);

}

catch (final IOException e) {

System.err.println("Cant help it, hence ignoring IOExceptionsetting persmission for path \"" + path +

"\": " + e.getMessage());

}

}

}

How to use it?

public class StockMeanDriver extends Configured implements Tool

{

/**

* @param args

* @throws Exception

*/

public static void main(String[] args) throws Exception {

ToolRunner.run(new StockMeanDriver(), null);

}

How to use it – contd..

@Override

public int run(String[] arg0) throws Exception

{

Configuration conf = getConf();

conf.set("fs.default.name", "file:///");

conf.set("mapred.job.tracker", "local");

conf.set("fs.file.impl", "org.intellipaat.training.hadoop.fs.WindowsLocalFileSystem");

conf.set("io.serializations","org.apache.hadoop.io.serializer.JavaSerialization," +

"org.apache.hadoop.io.serializer.WritableSerialization");

Job job = new Job(conf,"Stock Mean");

job.setJarByClass(StockMeanDriver.class);

job.setMapperClass(StockMeanMapper2.class);

job.setReducerClass(StockMeanReducer2.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(DoubleWritable.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(DoubleWritable.class);

job.setInputFormatClass(TextInputFormat.class);

job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path("input"));

FileOutputFormat.setOutputPath(job, new Path("output"));

job.waitForCompletion(Boolean.TRUE);

return 0;

}}