An introduction to Test Driven Development on MapReduce

What is TDD

• Test first development approach where developers write test cases to capture the failure cases and improve the system to the acceptable state.

Why it is difficult in Hadoop

• Hadoop is a distributed framework designed to run on a larger cluster with terra bytes of data

• Mimic the behavior of a Hadoop cluster is very hard

The Best Practice

• Golden Rule of Programming

Always abstract your business logic. This will make easier for you to unit test


public class StockMeanReducer extends Reducer <Text,DoubleWritable,Text,DoubleWritable>{

private DoubleWritable writable = new DoubleWritable();


public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException


double total = 0;

int count = 0;

for(DoubleWritable stockPrice : values)


total += stockPrice.get();



writable.set(total / count);

context.write(stockText, writable);


The best approach – Abstraction

public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable>


private DoubleWritable writable = new DoubleWritable();

private final StockMean stockMean = new StockMean();


public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws

IOException, InterruptedException



for(DoubleWritable stockPrice : values)





context.write(stockText, writable);



The best approach – Abstraction- cont

public class StockMean


private double total = 0;

private int instance = 0;

public void add(final double total)

{ += total;



public double calculate()


return total / (double) instance;


public void reset()

{ = 0;

this.instance = 0;



Testing Map Reduce Jobs

• Best Practices are fine. Still I need to test the code inside my mapper and reducer. What shall I do??

Introduction to MRUNIT

• MRUnit is a Map Reduce unit testing framework.

• Developed by cloudera and been open sourced and currently in Apache Incubator.

• Developed on top of Mockito mock object framework

• It is a generic framework that you can use with both Junit and TestNG

MRUnit – Testing Mapper

Unit Test



Mock OutputCollector

MR Unit

(1) Set up and execute test

(2) Call Map method with key / value

(3) Map output is captured

(4) Compare the expected outputs

Sample Mapper

public class StockMeanMapper extends




protected void map(Text key, DoubleWritable value, Context


throws IOException, InterruptedException


if(key == null) return;

if(key.toString().equalsIgnoreCase("xyz")) return;

context.write(key, value);



Mapper Unit Test

public class StockMeanMapperTest {

private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper;

private MapDriver<Text,DoubleWritable,Text,DoubleWritable> driver;


public void setUp()


mapper = new StockMeanMapper();

driver = new MapDriver<Text,DoubleWritable,Text,DoubleWritable>(mapper);



public void testPositiveConditionStockMeanMapper() throws IOException


List<Pair<Text, DoubleWritable>> results = driver.withInput(new Text("rahul"), new


.withOutput(new Text("rahul"), new DoubleWritable(1))


assertEquals(1, results.size());



MRUnit – Testing Reducer

Unit Test



Mock OutputCollector

MR Unit

(1) Set up and execute test

(2) Call Reduce method with key / value

(3) Reduce output is captured

(4) Compare the expected outputs

Sample Reducer

public class StockMeanReducer2 extends Reducer <Text,DoubleWritable,Text,DoubleWritable>


private DoubleWritable writable = new DoubleWritable();

private final StockMean stockMean = new StockMean();


public void reduce(final Text stockText, Iterable<DoubleWritable> values, Context context) throws

IOException, InterruptedException



for(DoubleWritable stockPrice : values)





context.write(stockText, writable);



Reducer Unit Test

public class StockMeanReducerTest {

private ReduceDriver<Text,DoubleWritable,Text,DoubleWritable> driver;

private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer;


public void setup()


reducer = new StockMeanReducer2();

driver = new ReduceDriver<Text,DoubleWritable,Text,DoubleWritable>(reducer);



public void testStockPositive() throws IOException


Pair<Text,DoubleWritable> assertPair = new Pair<Text,DoubleWritable>(new Text("ananth"),

new DoubleWritable(300));

List<Pair<Text,DoubleWritable>> results = driver.withInput(new Text("ananth"),



new DoubleWritable(100)))


assertEquals(assertPair, results.get(0));



MRUnit – Testing MapReduce

Unit Test


MR Unit

(1) Set up and execute test

(4) Call Reduce method with key / value (5) Compare the expected






Mapper(2) Call Map method with key / value

(3) MRUnit perform it’s own in memory shuffle phase

MapReduce Unit Test

public class StockMeanMapReduceTest


private Mapper<Text,DoubleWritable,Text,DoubleWritable> mapper;

private Reducer<Text,DoubleWritable,Text,DoubleWritable> reducer;

private MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable> driver;


public void setup()


mapper = new StockMeanMapper();

reducer = new StockMeanReducer2();

driver = new MapReduceDriver<Text,DoubleWritable,Text,DoubleWritable,Text,DoubleWritable>(mapper,reducer);


MapReduce Unit Test – Contd..

@Testpublic void testPositive() throws IOException{

Pair<Text,DoubleWritable> inputPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(300));Pair<Text,DoubleWritable> inputPair2 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(100));Pair<Text,DoubleWritable> inputPair3 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400));Pair<Text,DoubleWritable> inputPair4 = new Pair<Text,DoubleWritable>(new Text("xyz"), new DoubleWritable(50));

Pair<Text,DoubleWritable> assertPair1 = new Pair<Text,DoubleWritable>(new Text("ananth"), new DoubleWritable(200));Pair<Text,DoubleWritable> assertPair2 = new Pair<Text,DoubleWritable>(new Text("rahul"), new DoubleWritable(400));

List<Pair<Text,DoubleWritable>> assertPair = Arrays.asList(assertPair1,assertPair2);

List<Pair<Text,DoubleWritable>> results = driver.withInput(inputPair1).withInput(inputPair2).withInput(inputPair3).withInput(inputPair4).run();

assertEquals(assertPair, results);


Wait, there is one more thing!!!

• Hadoop is all about data.

• We can’t always assume that data will be 100% perfect.

• So do MRUnit unit testing by mocking Object is enough??

Hadoop LocalFile System

• Hadoop API provides LocalFileSystem, which enable you to read data from your local file system and test your map reduce jobs.

• Best practice is to take a sample of your real data and load in to local file system and test it out.

• LocalFileSystem only work in Linux based System.

How can I test LocalFileSystem in Windows? – A little hack

public class WindowsLocalFileSystem extends LocalFileSystem


public WindowsLocalFileSystem()





public boolean mkdirs (

final Path path,

final FsPermission permission)

throws IOException


final boolean result = super.mkdirs(path);

this.setPermission(path, permission);

return result;


Hack Contd..


public void setPermission (

final Path path,

final FsPermission permission)

throws IOException


try {

super.setPermission(path, permission);


catch (final IOException e) {

System.err.println("Cant help it, hence ignoring IOExceptionsetting persmission for path \"" + path +

"\": " + e.getMessage());




How to use it?

public class StockMeanDriver extends Configured implements Tool



* @param args

* @throws Exception


public static void main(String[] args) throws Exception { StockMeanDriver(), null);


How to use it – contd..


public int run(String[] arg0) throws Exception


Configuration conf = getConf();

conf.set("", "file:///");

conf.set("mapred.job.tracker", "local");

conf.set("fs.file.impl", "");

conf.set("io.serializations","," +


Job job = new Job(conf,"Stock Mean");










FileInputFormat.addInputPath(job, new Path("input"));

FileOutputFormat.setOutputPath(job, new Path("output"));


return 0;
