Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle...

15
Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project Founder Brussels, FOSDEM 2008, Saturday Feb 23 rd , 2008

Transcript of Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle...

Page 1: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Kettle Lightning Talk

Matt CastersChief Architect Pentaho Data Integration Kettle Project Founder

Brussels, FOSDEM 2008, Saturday Feb 23rd, 2008

Page 2: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

AgendaBusiness Intelligence

Kettle

User Community

Developer Community

Useful links

Page 3: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Business IntelligenceWikipedia:

...technologies, applications, and practices for 

the collection, integration, analysis, and 

presentation of business information.

http://en.wikipedia.org/wiki/Business_intelligence

Page 4: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Business IntelligenceFrom source systems …

to the data warehouse …

to reports …

to analyses …

to dashboard reports …

to better information

Page 5: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Kettle : typical use­casesLoad data from text files and store it into a database

Export data from database to text­file or more other databases

Data migration between database applications

Exploration of data in existing databases (tables, views, etc.)

Information improvement using lookups

Data cleaning

Application integration

Data warehouse population 

...

Page 6: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Kettle

Kettle

Extraction

Transportation

Transformation

Loading

Environment

Page 7: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Kettle : KettleKettle is a recursive acronym

For more information: see this presentation**

** if you read this slide more than 2 times, you can stop

Page 8: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Kettle : ExtractionExtract data from :

25+ database typesMySQL, PostgreSQL, SQLite, ...Oracle, SQL Server, etc

Text filesXML filesXLS filesXbase files (dBase, Foxpro, etc)File systems informationGenerated dataMS Access filesLDAPGeo­data...

Page 9: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Kettle : Transportation

Transportation of dataEngine based data transfer (no code generator)Very flexible pathways:

splittingpartitioningmergingjoiningduplicatingclustering (MPP)

Page 10: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Kettle : TransformationFlexibly transform data

Looking up datadatabasesfilesmemory...

CalculatingScripting

JavaScript, SQL, RegExpSplittingMappingSelectingFilteringPivotting ...

Page 11: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Kettle : LoadingLoad data into a target format

Database loadsData warehouse populationPartitioned loadingParallel loadingClustering

Page 12: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Kettle : EnvironmentFull GUI called “Spoon” to edit every option in Kettle

Drag & DropDebuggerRich GUI

Command line toolsexecute jobsexecute transformations

Web serverclusteringremote execution

Programming API for Java

Plugin eco­system

...

Page 13: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

User communityPaying Pentaho customers

Large and small corporationsAll possible sectors

Lone rangers & Hobbiests

All regions on Earth

Meet on our Forum : 18,000 posts in 2 years

Use our JIRA case tracking systems

Download more than 10,000 copies of Kettle per month

http://www.ohloh.net/projects/3624?p=Kettle

http://www.softpedia.com/progClean/Kettle­Clean­80094.html

Page 14: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Developer Community

24 commiters contributed code the last 12 months

Hundreds of others sent in patches, i18n, bug reports, docs

We receive daily contributions

We have translators for 8 locale (like fr_FR, it_IT, es_ES, etc)

We have open discussions on our developer list

ANYONE CAN CONTRIBUTE!!

Page 15: Kettle Lightning Talkkettle3.s3.amazonaws.com/Kettle-at-FOSDEM-2008.pdf · 2008-02-25 · Kettle Lightning Talk Matt Casters Chief Architect Pentaho Data Integration Kettle Project

Useful linksOur homepage: http://kettle.pentaho.org

Our Forum: http://forums.pentaho.org/forumdisplay.php?f=69

Our case tracker: http://jira.pentaho.org/browse/PDI

Our wiki : http://wiki.pentaho.org/http://wiki.pentaho.org/display/EAI/Latest+Pentaho+Data+Integration+%28aka+Kettle%29+Documentation

Our IRC Channel: ##pentaho (on Freenode)

Developers mailing list: http://groups.google.com/group/kettle­developers

My humble blog: http://www.ibridge.be

My coordinates: mcasters at pentaho dot org