Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer -...

31
Copyright © 2015 KNIME.com AG Integrating Big Data is as easy as 1,2,3 … 4! Tobias Kötter KNIME.com

Transcript of Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer -...

Page 1: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Integrating Big Data is as easy as 1,2,3 … 4!

Tobias Kötter

KNIME.com

Page 2: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG 2

Variety, Volume, Velocity

Page 3: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Variety, Volume, Velocity

Variety:• integrating heterogeneous data (and tools)

Volume:• from small files...

• ...to distributed data repositories (Hadoop)

• bring the tools to the data

Velocity:• from distributing computationally heavy

computations...

• ...to real time scoring of millions of records/sec.

3

Page 4: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Every Minute…

4

Page 5: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

IoT

5

Page 6: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG 6

The Challenge

Page 7: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Energy Usage Prediction from Smart Meters Data

• Read Smart Meter Energy Data

• Clean Up and Aggregate total Energy Usage by hour, week, day, month, year

• Calculate Behavioral Measures for each Smart Meter

• Cluster Smart Meters with Similar Behavior (k-Means)

• Predict Energy Usage in Clustered Smart Meters (Auto-Regressive Time Series Prediction)

7

Workflow 1

Workflow 2

Workflow 3

Page 8: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Workflow 1: PrepareData

8

~ 2 days

Page 9: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG 9

Big Data

Page 10: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Big Data Support

• KNIME Big Data Access Nodes

– in database processing

– preconfigured connectors

• Big Data Platforms

– HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, any big data platform really!

• Spark MLlib integration (coming soon)

• Streaming Executor (coming soon)

Page 11: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Virtual Machines

• Hortonworks:

http://hortonworks.com/products/hortonworks-sandbox/

• Cloudera:http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms.html

• Virtual Box

https://www.virtualbox.org/

• VMWare Player

http://www.vmware.com/

11

Page 12: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Accessing Big Data: Database Connector

Generic Database Connector

– Can connect to any JDBC source

– Register new JDBC driver via preferences page

12

Page 13: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Register JDBC Driver

13

Open KNIME and go toFile -> Preferences

Increase connection timeout for long running retrieval operations

Page 14: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Accessing Big Data: Dedicated Connectors

Dedicated pre-configured connectors

– Bundling necessary JDBC drivers

– Easy to use

– DB specific behavior/capability

Some dedicated connectors are part of the KNIME Analytics Platform, some belong to the commercial KNIME Big Data Extension

14

works for most Hadoop HIVE installations, including Hortonworks

free

Page 15: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Dedicated Connector

15

Accessing Big Data: Dedicated Connectors

Page 16: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Data Table Selection

16

Page 17: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG 17

In-Database Processing

Page 18: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Manipulation

• Filter rows and columns

• Join tables/queries

• Sort your data

• Write your own query

• Aggregate* your data

18

Similar Settings as GroupBy node

Similar Settings as Joiner node

* Database GroupBy node exposes DB specific aggregation methods

Page 19: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Adding SQL Queries for average Measures

19

Page 20: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG 20

Page 21: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Average Monthly Values

21

Page 22: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Import Data from Database into KNIME

22

< 30 min

Page 23: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

New Big Data Platform?

23

No problem!Just change the connector node!

Page 24: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Other Useful Database Nodes

• Drop table

– missing table handling

– cascade option

• Execute any SQL statement e.g. DDL

• Manipulate existing queries

24

Executes severalqueries separatedby ; and new line

Page 25: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG 25

KNIME Big Data Extension

Page 26: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

KNIME Big Data Extension

• KNIME Big Data Access Nodes

– preconfigured connectors

– HDFS File Handling

– Hive/Impala Loader

• Big Data Platforms

– HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, SAP Hana (to be), Teradata (to be), …

• Spark MLlib integration (coming soon)

• Streaming Executor (coming soon)

Page 27: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

HDFS File Handling

• KNIME & Extensions -> KNIME File Handling Nodes

• HDFS Connection and HDFS File Permission nodes

27

Page 28: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Hive/Impala Loader

28

• Upload a KNIME data table to Hive/Impala

Page 29: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

KNIME Big Data Extension: Download and Install

KNIME.com Extension Store

License Required!

Installation Instructions

http://tech.knime.org/installation-instructions

Product Description

http://www.knime.org/knime-big-data-extension

Page 30: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

License on KNIME Store

http://tech.knime.org/knime-store

30-day trial license available with special Promotion [email protected]

Page 31: Integrating Big Data - Meetup · – Java Hadoop/BigData developers! –Senior Software Engineer - Web Development •Whitepaper “KNIME opens the Doors to Big Data ...

Copyright © 2015 KNIME.com AG

Thank You!

[email protected]

• We are hiring

– Java Hadoop/BigData developers!

– Senior Software Engineer - Web Development

• Whitepaper “KNIME opens the Doors to Big Data”http://www.knime.org/files/big_data_in_knime_1.pdf

• Blog Post “Integrating Big data is as Easy as 1,2,3,4”http://www.knime.org/blog/integrating-big-data-is-as-easy-as-1-2-3-4

31