Hadoop Summit 2013 : Continuous Integration on top of hadoop

31
Continuous Integration on top of hadoop Wisely Chen & Neal Lee Tuesday, June 11, 13

description

This topic is on Hadoop Summit 2013 San Jose.

Transcript of Hadoop Summit 2013 : Continuous Integration on top of hadoop

Page 1: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Continuous Integration on top of hadoop

Wisely Chen & Neal Lee

Tuesday, June 11, 13

Page 2: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Agenda

• Who I am

• Problem

• Solution

• Demo

• Q&A

Tuesday, June 11, 13

Page 3: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Who I am

• Wisely Chen ( [email protected] )

• Release manager of Yahoo![Taiwan] shopping and data team

• Love to promote open source tech at Taiwan

• Ruby and Rails : Coscup 2006, Ubisunrise 2007, OSDC 2007

• Puppet : PHPConf 2012 , RubyConf 2012

• Release Practice : Webconf 2013, Coscup 2012

Tuesday, June 11, 13

Page 4: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Who I am

• Neal Lee (@neal_lee)

• Data Engineer at Yahoo![Taiwan]

• Aiming to build up an easy use of self-service BI platform connecting to Hadoop.

Tuesday, June 11, 13

Page 5: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Story 1

Tuesday, June 11, 13

Page 6: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Another Story

Tuesday, June 11, 13

Page 7: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Yet Another Story

Tuesday, June 11, 13

Page 8: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Solution

Tuesday, June 11, 13

Page 9: Hadoop Summit 2013 : Continuous Integration on top of hadoop

One click

• Manual commit code to SCM

• And DONE

• Auto unit testing

• Auto push beta for performance testing

• Auto push to production grid

• Auto trigger code

Tuesday, June 11, 13

Page 10: Hadoop Summit 2013 : Continuous Integration on top of hadoop

This feeling is 爽!

Tuesday, June 11, 13

Page 11: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Continuous Integration

Tuesday, June 11, 13

Page 12: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Continuous Integration

• A software engineering practice • Maintain code repos

• Automate the build

• Make the build self-testing

• Everyone commit to the baseline everyday

• Every commit should be a build

• Test in a clone of production environment

• Make it easy to get the latest deliverables

• Everyone can see the result of latest build

• Automate deployment

Tuesday, June 11, 13

Page 13: Hadoop Summit 2013 : Continuous Integration on top of hadoop

We focus on• A software engineering practice

• Maintain code repos

• Automate the build

• Make the build self-testing

• Everyone commit to the baseline everyday

• Every commit should be a build

• Test in a clone of production environment

• Make it easy to get the latest deliverables

• Everyone can see the result of latest build

• Automate deployment

Tuesday, June 11, 13

Page 14: Hadoop Summit 2013 : Continuous Integration on top of hadoop

CI flow

4. CI slave exec localUnitTest

7. CI slave exec

Performanc

11. CI exec pig

People

DEV Alpha Beta Grid Prod Grid

2. notify CI

5. deploy 8. deploy

CI Master

1. CommitCode

SCM

3. Call 6. Call

10. Call

9. git tag

12. notify user

Tuesday, June 11, 13

Page 15: Hadoop Summit 2013 : Continuous Integration on top of hadoop

CI flow

4. CI slave exec localUnitTest

CI slave exec

PerformancCI exec pig

People

DEV Alpha Beta Grid Prod Grid

2. notify CI

CI Master

1. CommitCode

SCM

3. Call

5. Notify user

Tuesday, June 11, 13

Page 16: Hadoop Summit 2013 : Continuous Integration on top of hadoop

CI flow

4. CI slave exec localUnitTest

7. CI slave exec

PerformancCI exec pig

People

DEV Alpha Beta Grid Prod Grid

2. notify CI

5. deploy

CIMaster

1. CommitCode

SCM

3. Call 6. Call

8.Notify user

Tuesday, June 11, 13

Page 17: Hadoop Summit 2013 : Continuous Integration on top of hadoop

CI flow

4. CI slave exec localUnitTest

7. CI slave exec

PerformancCI exec pig

People

DEV Alpha Beta Grid Prod Grid

2. notify CI

5. deploy 8. deploy

CI Master

1. CommitCode

SCM

3. Call 6. Call

9. Notify user

Tuesday, June 11, 13

Page 18: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Unit Test

4. CI slave exec localUnitTest

7. CI slave exec

Performanc

11. CI exec pig

People

DEV Alpha Beta Grid Prod Grid

2. notify CI

5. deploy 8. deploy

CI Master

1. CommitCode

SCM

3. Call 6. Call

10. Call

9. git tag

12. notify user

Tuesday, June 11, 13

Page 19: Hadoop Summit 2013 : Continuous Integration on top of hadoop

PigUnit

• A simple xUnit framework

• No cluster set up is required in local mode

• Unit testing, regression testing, and rapid prototyping on the fly

Tuesday, June 11, 13

Page 20: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Using PigUnit

• Coding

• Write PigUnit test case

• Run local PigUnit test

• Push to grid

• Run Pig on grid

• Get right result !

Tuesday, June 11, 13

Page 21: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Unit test is live doc

• Unit test is runnable live doc

• Pass test case and meet previous requirement

Tuesday, June 11, 13

Page 22: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Performance Test

4. CI slave exec localUnitTest

7. CI slave exec

Performanc

11. CI exec pig

People

DEV Alpha Beta Grid Prod Grid

2. notify CI

5. deploy 8. deploy

CI Master

1. CommitCode

SCM

3. Call 6. Call

10. Call

9. git tag

12. notify user

Tuesday, June 11, 13

Page 23: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Vaidya

• Rule based performance diagnosis of M/R jobs

• Extensible framework

• You can add your own rules

• Write complex rules using existing rules

Tuesday, June 11, 13

Page 24: Hadoop Summit 2013 : Continuous Integration on top of hadoop

CI toolset

CI slave exec localUnitTest

CI slave exec

PerformancCI exec pig

People

DEV Alpha Beta Grid Prod Grid

notify CI

deploy deploy

CI

CommitCode

SCM

Call

Vaidya

BASH

Tuesday, June 11, 13

Page 25: Hadoop Summit 2013 : Continuous Integration on top of hadoop

CI is flexible

• MapReduce can use MapUnit

• Hive can use hive_test

• Pig can use PigUnit

Tuesday, June 11, 13

Page 26: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Github trigger CI

Tuesday, June 11, 13

Page 27: Hadoop Summit 2013 : Continuous Integration on top of hadoop

CI testing build pipeline

Tuesday, June 11, 13

Page 28: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Testing Trend

Tuesday, June 11, 13

Page 29: Hadoop Summit 2013 : Continuous Integration on top of hadoop

DEMO

Tuesday, June 11, 13

Page 30: Hadoop Summit 2013 : Continuous Integration on top of hadoop

Conclusion

• Auto testing will save your life

• CI will boost your productivity

• This process can feed in any platform

Tuesday, June 11, 13

Page 31: Hadoop Summit 2013 : Continuous Integration on top of hadoop

謝謝大家

Tuesday, June 11, 13