Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank...

49
Apache NiFi Apache NiFi 21st century open-source data flows 21st century open-source data flows München, 2018-05-18 Frank Thiele

Transcript of Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank...

Page 1: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Apache NiFiApache NiFi21st century open-source data flows21st century open-source data flows

München, 2018-05-18Frank Thiele

Page 2: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

ContentContentIntroduction

Technology

Functionality

Live demo

Summary

Page 3: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 4: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Apache NiFiApache NiFiBased on NSA project NiagaraFilesAutomation of data flows between applicationsAvailable under Apache License since 2014Development taken over by Hortonworks (2015)Release v1.0.0, August 2016Latest v1.6.0, April 2018

Page 5: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Apache NiFi – What is it?Apache NiFi – What is it?...comparable to Ab Initio or Talend Data Integration...or Apache Camel

Data Flow Management Platform...more than a classical ETL Framework

Page 6: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

It models your data flowIt models your data flow

Page 7: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

... and more complex data flows... and more complex data flows

Page 8: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Apache NiFi – Use casesApache NiFi – Use casesWorkflow modeling with data flowsConnects different technology stacksMass data processingCentralization of complex data flowsAccountability of data flowsIOT, Telecommunication, Banking, ...ETL – Extract, Transform, Load

Page 9: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 10: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

ContentContentIntroduction

Technology

Functionality

Live demo

Summary

Page 11: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 12: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 13: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

ConceptsConceptsFlow-based programmingFlowFiles, processors, controllers, groups and connectionsPass-by-reference between processorsDedicated repositories as storageHappy and error paths are modeled equivalently

Page 14: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Flow-based programmingFlow-based programming

Page 15: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

FlowFilesFlowFilesThe data bucket of NiFiMeta data container

UUIDAttributesContent link

Best Practice: Modify and read attributes, not contentTransferred from one processor to another – by reference

Page 16: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 17: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Repositories – FlowFileRepositories – FlowFileStorage for meta dataWrite-ahead log for data consistency/persistenceSnapshots/Checkpoints for restoringCopy-on-writeSwapping supported

Page 18: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Repositories – ContentRepositories – ContentStorage for actual data of a FlowFileCan be scaled over partitions and cluster nodesContent is immutable (write-once, copy-on-write)

Page 19: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Repositories – ProvenanceRepositories – ProvenanceStorage for history of the FlowFilesCan be scaled over partitions and cluster nodesAny FlowFile can be viewed at any point in timeApache Lucene Index

Page 20: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

ContentContentIntroduction

Technology

Functionality

Live demo

Summary

Page 21: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 22: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

FlexibilityFlexibilityHigh number of available processors (ca. 250)Freely programmable processors for customizationsMany programming languages supported in addition to Java(e.g. Python, Lua, JS)Template supportConfiguration changes of the stream without downtimeNiFi Registry for configuration management

Page 23: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

NiFi RegistryNiFi RegistrySince v1.5, so January 2018Configuration management

Page 24: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

NiFi Registry UINiFi Registry UI

Page 25: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

NiFi Registry integrationNiFi Registry integration

Page 26: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Further advantagesFurther advantagesOpen source (active development by Hortonworks)Paradigm shift from file and content to events and attributesAdvanced backpressure controlMulti-tenancyUnit and integration testsReduced I/O overhead

Page 27: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

ContentContentIntroduction

Technology

Functionality

Live demo

Summary

Page 28: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 29: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 30: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Live demoLive demoUI OverviewStream exampleFeature examples

Data provenanceBack pressureError handlingETL with Jolt

Page 31: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

UI overviewUI overview

Page 32: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Stream exampleStream example

Page 33: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Processor configurationProcessor configuration

Page 34: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Data provenanceData provenance

Page 35: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Data provenance – GraphData provenance – Graph

Page 36: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Data provenance – DetailsData provenance – Details

Page 37: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Data provenance – AttributesData provenance – Attributes

Page 38: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Data provenance - Content 1Data provenance - Content 1

Page 39: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Data provenance – Content 2Data provenance – Content 2

Page 40: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Back pressureBack pressure

Page 41: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Error handlingError handling

Page 42: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

ETL with JOLTETL with JOLT

Page 43: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

ContentContentIntroduction

Technology

Functionality

Live demo

Summary

Page 44: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Apache NiFi – Why again?Apache NiFi – Why again?Used in many productive systems (e.g. NSA, Slovak Telekom, ...)Solves many things already (UI, Tracing, Transactional, ...)Easily customizableOpen Source

Reduce your OPEXCan be extended by YOUIs extended and fixed by Community

It scalesCloud ready

Page 45: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 46: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

Apache NiFi – Use casesApache NiFi – Use casesWorkflow modeling with data flowsReduce latency of your dataCentralization of complex data flowsBig Data and BI data flowsIntegration of new/different technologiesAccountability and lineageComplex Event Processing*ETL*

Page 47: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 48: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or
Page 49: Apache NiFi - TNG · Apache NiFi 21st century open-source data flows München, 2018-05-18 Frank Thiele. Content Introduction ... comparable to Ab Initio or Talend Data Integration...or

ContactContactFrank Thiele ([email protected])