Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

31
Data at Scales & the Values of Starting Small Aldrin Piri - @aldrinpiri DataWorks Summit 2017 – Munich

Transcript of Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

Page 1: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

Data at Scales & the Values of Starting SmallAldrin Piri - @aldrinpiriDataWorks Summit 2017 – Munich

Page 2: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key: 'Apache NiFi’ Value: 'PMC Member'Key: 'Work’ Value: ’Sr. Member of Technical Staff @ Hortonworks'Key: 'Working with NiFi Since’ Value: '2010’

Page 3: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaApache NiFi: A Primer

Apache MiNiFi

Architecture

Apache NiFi: The Ecosystem

Community

Page 4: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Byte Scales for Data

SI Prefix- 100

kilo 103

mega 106

giga 109

tera 1012

peta 1015

exa 1018

zetta 1021

yotta 1024

“Big Data”

”everything else”

Greek for

Page 5: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The Problem at HandProducers A.K.A Things

AnythingAND

Everything

Internet!

Consumers• User• Storage• System• …More Things

Page 6: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case: Courier Service

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center

Kafka

Core Data Center at HQ

Server Cluster

Others

Storm / Spark / Flink / Apex

Kafka

Storm / Spark / Flink / Apex

On Delivery Routes

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

NiFi NiFi NiFi NiFi NiFi NiFi

Gathering data from disparate sources

NiFi

Page 7: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi: A Primer

Key Features and Principles

• Guaranteed delivery• Data buffering

- Backpressure- Pressure release

• Prioritized queuing• Flow specific QoS

- Latency vs. throughput- Loss tolerance

• Data provenance

• Recovery/recording a rolling log of fine-grained history

• Visual command and control

• Flow templates• Pluggable/multi-role

security• Designed for extension• Clustering

Page 8: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi is based on Flow Based Programming (FBP)FBP Term NiFi Term DescriptionInformation Packet

FlowFile Each object moving through the system.

Black Box FlowFile Processor

Performs the work, doing some combination of data routing, transformation, or mediation between systems.

Bounded Buffer

Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates.

Scheduler Flow Controller

Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use.

Subnet Process Group

A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.

Page 9: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

FlowFiles are like HTTP dataHTTP Data FlowFile

HTTP/1.1 200 OKDate: Sun, 10 Oct 2010 23:26:07 GMTServer: Apache/2.2.8 (CentOS) OpenSSL/0.9.8gLast-Modified: Sun, 26 Sep 2010 22:04:35 GMTETag: "45b6-834-49130cc1182c0"Accept-Ranges: bytesContent-Length: 13Connection: closeContent-Type: text/html

Hello world!

Standard FlowFile AttributesKey: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'fileSize’ Value: '23609'FlowFile Attribute Map ContentKey: 'filename’Value: '15650246997242'Key: 'path’ Value: './’

Binary Content *

Header

Content

Page 10: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

FlowFiles & Data Agnosticism

NiFi is data agnostic! But, NiFi was designed understanding that users

can care about specifics and provides tooling

to interact with specific formats, protocols, etc.

ISO 8601 - http://xkcd.com/1179/

Robustness principle

Be conservative in what you do, be liberal in what you accept from others“

Page 11: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Page 12: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache MiNiFi

NiFi lives in the data center. Give it an enterprise server or a cluster of them.

MiNiFi lives as close to where data is born and is a guest on that device or system

“Let me get the key parts of NiFi close to where data begins and provide bidirectional data transfer"

Page 13: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache MiNiFi

Limited computing capability Limited power/network Restricted software library/platform availability No UI Physically inaccessible Not frequently updated Competing standards/protocols Scalability Privacy & Security

Realities of computing outside the comforts of the data center

Page 14: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

MiNiFi: Precedent from NiFi

Provides the semantics between two NiFi components across network boundaries– A custom protocol for inter-NiFi communication– Secure, Extensible, Load Balanced & Scalable Delivery to Cluster

Extracted out to a client library which powers integration into popular frameworks like Apache Spark, Apache Storm, Apache Flink, and Apache Apex

Attributes and the FlowFile format maintained

A quick look at NiFi Site to Site

https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site

Page 15: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

MiNiFi: Precedent from NiFi

Fine-grained, event level access of interactions with FlowFiles– CREATE, RECEIVE, FETCH, SEND, DOWNLOAD, DROP, EXPIRE, FORK, JOIN …

Captures the associated attributes/metadata at the time of the event

A map of a FlowFile’s journey and how they relate to other FlowFiles in a system– MiNiFi enables us to get more and further illuminate the map of data processing

A deeper dive into provenance

http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance

Page 16: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

MiNiFi: Precedent from NiFi

RECEIVE event

Page 17: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache MiNiFi

The feedback loop is longer and not guaranteed– Removal of Web Server and UI

Declarative configuration– Lends itself well to CM processes– Extensible interface to support varying formats

• Currently provided in YAML

Reduced set of bundled components

Departures from NiFi in getting the right fit

Page 18: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache MiNiFi: Scoping

Go small: Java – Write once, run anywhere*– Feature parity and reuse of core NiFi libraries

Go smaller: C++ – Write once**, run anywhere

Go smallest: Write n-many times, run anywhere Language libraries to support tagging, FlowFile format, Site to Site protocol, and provenance generation without a processing framework– Mobile: Android & iOS– Language SDKs

Provide all the key principles of NiFi in varying, smaller footprints

Page 19: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

WHAT IS THIS!?

A NiFi FOR ANTS!?!

Page 20: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

MiNiFi: Use Case - Connected Car

Outside vehicle’s network firewall

On telematics layer

VEHICLE NETWORK FIREWALL

TRANSMIT

EXECUTE FILTER

PRIORITIZE

PARSE

LISTEN

ROUTE

Page 21: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connecting the DropsSOURCES REGIONAL

INFRASTRUCTURECORE

INFRASTRUCTURE

Page 22: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case: Courier Service with Apache NiFi & MiNiFi

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center

Kafka

Core Data Center at HQ

Server Cluster

Others

Storm / Spark / Flink / Apex

Kafka

Storm / Spark / Flink / Apex

On Delivery Routes

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Client Libraries

Client Libraries

MiNiFi

MiNiFiNiFi NiFi NiFi NiFi NiFi NiFi

Client Libraries

Gathering data from disparate sources

Page 23: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Page 24: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi: The Ecosystem

Site-to-Site in MiNiFi instances provides machine-to-machine (M2M) communication– Data arrives to NiFi in a transparent manner allowing integration to existing flows

Similar attention to extensibility in both Java and C++ clients allows agents to fit the needs of your organization

Reduced footprint allows NiFi functionality to aid in production of high fidelity data, more closely attributable and tracked from where it is generated

We’ve provided a framework to extend the reach of data ingest

Page 25: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi: The Ecosystem

Enter the MiNiFi Command & Control – Provide tooling to map the UX of

interactive command and control in NiFi to the design and deploy approach of MiNiFi

But more instances complicate my operational management!

https://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Command+and+Control

Page 26: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi: The Ecosystem

Configuration Management of Flows & Versioning– The evolution of templates to better support SDLC functions– https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows

Extension Repositories– Publish & Share extension bundles (NARs)– https://cwiki.apache.org/confluence/display/NIFI/Extension+Repositories+%28aka+Extension+Regis

try%29+for+Dynamically-loaded+Extensions

Variable Registry– Initial framework support & file-based implementation– https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry

Building on efforts for reusable components in the community

Page 27: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Why Apache NiFi & MiNiFi?

Moving data is multifaceted in its challenges and these are present in different contexts at varying scopes– Think of our courier example and organizations like it: inter vs intra, domestically, internationally

Provide common tooling and extensions that are commonly needed but be flexible for extension– Leverage existing libraries and expansive Java ecosystem for functionality– Allow organizations to integrate with their existing infrastructure

Empower folks managing your infrastructure to make changes and reason about issues that are occurring– Data Provenance to show context and data’s journey– User Interface/Experience a key component

Page 28: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Learn more and join us!

Apache NiFi sitehttps://nifi.apache.org

Subproject MiNiFi sitehttps://nifi.apache.org/minifi/

Subscribe to and collaborate [email protected]@nifi.apache.org

Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFIhttps://issues.apache.org/jira/browse/MINIFI

Follow us on Twitter@apachenifi

Page 29: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi Crash CourseThursday, 6 April

11:15 AM – 1:45PM, Room 12• Learn more about NiFi, the community, and work through a hands-on lab

• Seats available on a first come, first served basis

• Make sure you are in possession of the latest version of VirtualBox

• More details: https://tinyurl.com/nifi-cc-munich17

Page 30: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Learn, Share at Birds of a FeatherIOT, STREAMING & DATA FLOW

Thursday, April 65:50 pm, Room 5

Page 31: Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank You