Apache Ka)a - files.ondemand.cloudera.com · § Web site acYvity tracking – Web applicaIon sends...

81
Apache Ka)a Chapter 1 201601

Transcript of Apache Ka)a - files.ondemand.cloudera.com · § Web site acYvity tracking – Web applicaIon sends...

ApacheKa)aChapter1

201601

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-2

Inthischapteryouwilllearn

§ WhatKa8aisandwhatadvantagesitoffers

§ Aboutthehigh-levelarchitectureofKa8a

§ WhatseveralusecasesforKa8aare

§ Howtocreatetopics,publishmessages,andreadmessagesfromthecommandlineandinJavacode

ApacheKa)a

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-3

ChapterTopics

ApacheKa8a

§ Overview

§  UseCases

§ Messages,Topics,andParIIons

§  ProducersandConsumers

§ MessageOrderingGuarantees

§  UsingtheJavaAPI

§  EssenIalPoints

§  Hands-OnExercise:UsingKa)afromtheCommandLine

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-4

§ ApacheKa8aisadistributedcommitlogservice– Widelyusedfordataingest– Offersscalability,performance,reliability,andflexibility– Conceptuallysimilartoapublish-subscribemessagingsystem

§ OriginallycreatedatLinkedIn,butnowanopensourceApacheproject– DonatedtotheApacheSoXwareFoundaIonin2012– GraduatedfromtheApacheIncubatorin2013– IncludedaspartofClouderaLabsin2014– SupportedbyClouderaforproducIonusewithCDHin2015

WhatisApacheKa)a?

Apache Kafka

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-5

§ Scalable– Ka)aisadistributedsystemthatsupportsmulIplenodes

§ Fault-tolerant– Dataispersistedtodiskandreplicatedthroughoutthecluster

§ Highthroughput– Eachbrokercanprocesshundredsofthousandsofmessagespersecond

§ Lowlatency– DataisdeliveredinafracIonofasecond

§ Flexible– DecouplestheproducIonofdatafromitsconsumpIon

CharacterisIcsofKa)a

*

*Usingmodesthardware,withmessagesofatypicalsize

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-6

§ Messagesrepresentarbitraryuser-definedcontent– Forexample,applicaIoneventsorsensorreadings

§ AnoderunningtheKa8aserviceiscalledabroker– AproducIonclustertypicallyhasmanyKa)abrokers– Ka)aalsodependsontheZooKeeperserviceforcoordinaIon

§ Producerspushmessagestoabroker– Theproducerassignsatopic,orcategory,toeachmessage

§ ConsumerspullmessagesfromaKa8abroker– Theyreadonlymessagesinrelevanttopics

High-LevelArchitecture:Terminology

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-7

High-LevelArchitecture:Example

Producer #1 Producer #2

Consumer #1 Consumer #2

Kafka Cluster

Broker

Broker

Broker

Broker

Broker

Broker

login_failure

Producer #3

call_placed

login_failure

login_failure call_placed

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-8

ChapterTopics

ApacheKa8a

§ Overview

§ UseCases

§ Messages,Topics,andParIIons

§  ProducersandConsumers

§ MessageOrderingGuarantees

§  UsingtheJavaAPI

§  EssenIalPoints

§  Hands-OnExercise:UsingKa)afromtheCommandLine

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-9

§ Ka8aisusedforavarietyofusecases,suchas– LogaggregaIon– Messaging– WebsiteacIvitytracking– OperaIonalmetrics– Streamprocessing– Eventsourcing

§ AsubsetofthesecouldalsobedonewithFlume– Forexample,aggregaIngWebserverlogdataintoHDFS

§ Ka8aoUenbecomesabeVerchoiceasusecasecomplexitygrows

WhyKa)a?

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-10

§ Distributedmessagebus/centraldatapipeline– EnableshighlyscalableEAI,SOA,CEPandmicroservicearchitectures– DecouplesserviceswithastandardizedmessageabstracIon– SupportsmulIplemessageclientlanguageswithhighthroughput

§ LogaggregaYon– Ka)acancollectlogsfrommulIpleservices– LogscanbemadeavailabletomulIpleconsumers,suchasHadoopandApacheSolr

CommonKa)aUseCases(1)

EAI: EnterpriseApplicaIonIntegraIonSOA:Service-OrientedArchitectureCEP:ComplexEventProcessing

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-11

§ WebsiteacYvitytracking– WebapplicaIonsendseventssuchaspageviewsandsearchestoKa)a– Eventsbecomeavailableforreal-Imeprocessing,dashboards,andofflineanalyIcsinHadoop

§ AlerYngandreporYngonoperaYonalmetrics– Ka)aproducersandconsumersoccasionallypublishtheirmessagecountstoaspecialKa)atopic– AservicecomparescountsandsendsanalertupondetecIngdataloss

§ Streamprocessing– AframeworksuchasSparkStreamingreadsdatafromatopic,processesit,andwritesprocesseddatatoanewtopicwhereitbecomesavailableforusersandapplicaIons– Ka)a’sstrongdurabilityhelpstofacilitatethisusecase

CommonKa)aUseCases(2)

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-12

ChapterTopics

ApacheKa8a

§ Overview

§  UseCases

§ Messages,Topics,andParYYons

§  ProducersandConsumers

§ MessageOrderingGuarantees

§  UsingtheJavaAPI

§  EssenIalPoints

§  Hands-OnExercise:UsingKa)afromtheCommandLine

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-13

§ MessagesinKa8aarevariable-sizebytearrays– AllowsforserializaIonofdatainanyformatyourapplicaIonrequires– Commonformatsincludestrings,JSON,andAvro

§ Thereisnoexplicitlimitonmessagesize– OpImalperformanceusuallyoccurswithmessagesofafewKBinsize– Werecommendthatyoudonotexceed1MBpermessage

§ Ka8aretainsallmessagesforadefinedYmeperiod– Thisperiodcanbesetonglobalorper-topicbasis– Messageswillberetainedregardlessofwhethertheywereread– TheyarediscardedautomaIcallyaXertheretenIonperiodisexceeded

Messages

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-14

§ Thereisnoexplicitlimitonthenumberoftopics– Ka)aworksbe@erwithafewlargetopicsthanmanysmallones

§ Atopiccanbecreatedexplicitlyorsimplybypublishingtothetopic– Controlledbytheauto.create.topics.enableproperty– Werecommendthattopicsbecreatedexplicitly

Topics

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-15

§ EachtopicisdividedintosomenumberofparYYons*– ParIIoningimprovesscalabilityandthroughput

§ AtopicparYYonisanorderedandimmutablesequenceofmessages– NewmessagesareappendedtotheparIIonastheyarereceived– EachmessageisassignedauniquesequenIalIDknownasanoffset

TopicParIIoning

*NotethatthisisunrelatedtoparIIoninginHDFS,MapReduce,orSpark

Older messages Newer messages

Partition 1

Partition 2

0 1 2 3 4 5 6 7 8 9

Time

Partition 0

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 8

Producer A

Producer B

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-16

§ EachparYYoncanbereplicatedacrossaconfigurablenumberofbrokers*– Doingsoisrecommended,asitprovidesfaulttolerance

§ EachbrokeractsasaleaderforsomeparYYonsandafollowerforothers– Followerspassivelyreplicatetheleader– Iftheleaderfails,afollowerwillautomaIcallybecomethenewleader

ReplicaIon

*NotethatthisisunrelatedtoHDFSreplicaIon

Broker B

Partition 0

Partition 2

Broker C

Partition 0

Partition 1

Partition 0

Broker A

Partition 1

Partition 2

Partition 1

Partition 2

Leader

Follower

Legend

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-17

§ InproducYon,youwilllikelystartKa8aviaClouderaManager– Inthisclass,wemuststartitmanuallyontheVM

§ SinceKa8adependsonZooKeeper,wemuststartthatservicefirst

§ WecanthenstarttheKa8aservice

StarIngtheKa)aBroker

$ sudo service zookeeper-server start

$ sudo service kafka-server start

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-18

§ Ka8aincludesaconvenientsetofcommandlinetools– ThesearehelpfulforexploringandexperimentaIon

§ Thekafka-topicscommandoffersasimplewaytocreateKa8atopics– Providethetopicnameofyourchoice,suchasdevice_status – YoumustalsospecifytheZooKeeperconnecIonstringforyourcluster

CreaIngTopicsfromtheCommandLine

$ kafka-topics --create \ --zookeeper localhost:2181 \ --replication-factor 1 \ --partitions 1 \ --topic device_status

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-19

§ Usethe--listparametertolistalltopics

DisplayingTopicsfromtheCommandLine

$ kafka-topics --list \ --zookeeper localhost:2181

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-20

ChapterTopics

ApacheKa8a

§ Overview

§  UseCases

§ Messages,Topics,andParIIons

§  ProducersandConsumers

§ MessageOrderingGuarantees

§  UsingtheJavaAPI

§  EssenIalPoints

§  Hands-OnExercise:UsingKa)afromtheCommandLine

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-21

§ ProducerspublishmessagestoKa8atopics– Theycommunicatewithabroker,notaconsumer

ProducerRecap

Producer #1 Producer #2

Kafka ClusterBroker

Broker

Broker

Broker

Broker

Broker

login_failure

Producer #3

call_placed

login_failure

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-22

§ AproducerisresponsibleforselecYngparYYonsformessagesitpublishes– ThisisprimarilydonetobalancetheloadacrossallparIIons– TheproducerwritesmessagestoaparIIoninorder– ApluggablePartitionerclassselectstheparIIonforeachmessage

SelecIngtheParIIon

Older messages Newer messages

Partition 1

Partition 2

0 1 2 3 4 5 6 7 8 9

Time

Partition 0

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 8

Producer A

Producer B

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-23

§ ProducerscancollectmulYplemessagestowritetoaparYYon– Thisreducesthenumberofrequestsmadetobrokers– SuchrequestssenttobrokerscontainonebatchperparIIon

§ BatchingiscontrolledthroughproperYessetfortheproducer– Thedefaultistosendmessagesimmediately– Batchsizeisconfigurable,asisthemaxImetowaitbeforesending

Aside:MessageBatchesIncreaseThroughputandLatency

Older messages Newer messages

Partition 1

Partition 2

0 1 2 3 4 5 6 7 8

Time

Partition 0

0 1 2 3 4 5 6 8

0 1 2 3 4 5 6 7 Producer B

9

Producer A7

8

9 10

9 10

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-24

§ Theproducerisconfiguredwithalistofoneormorebrokers– ItasksthefirstavailablebrokerfortheleaderofthedesiredparIIon

§ Theproducerthensendsthemessagetotheleader– Theleaderwritesthemessagetoitslocallog– Eachfollowerthenwritesthemessagetoitsownlog– AXeracknowledgementsfromfollowers,themessageiscommi@ed

MessagesareReplicated

Broker B

Partition 0

Partition 2

Broker C

Partition 0

Partition 1

Broker A

Partition 1

Partition 2

Partition 1

Partition 2

1 Partition 02

3Producer

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-25

§ Youcancreateaproducerusingthekafka-console-producertool

§ Specifyoneormorebrokersinthe--broker-listopYon– Eachbrokerconsistsofahostname,acolon,andaportnumber– IfspecifyingmulIplebrokers,separatethemwithcommas– Inourcasethereisonebroker:localhost:9092

§ Youmustalsoprovidethenameofthetopic– Wewillpublishmessagestothetopicnameddevice_status

CreaIngaProducerfromtheCommandLine(1)

$ kafka-console-producer \ --broker-list localhost:9092 \ --topic device_status

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-26

§ YoumayseeafewlogmessagesintheterminalaUertheproducerstarts

§ Itwillthenacceptinputintheterminalwindow– Eachlineyoutypewillbeamessagesenttothetopic

§ UnYlyouhaveconfiguredaconsumerforthistopic,you’llseenootheroutputfromKa8a

CreaIngaProducerfromtheCommandLine(2)

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-27

§ AconsumerreadsmessagesthatwerepublishedtoKa8atopics– Theycommunicatewithabroker,notaproducer

§ ConsumeracYonsdonotaffectotherconsumers– Forexample,issuingtheKa)acommandlinetoolto"tail"thecontentsofatopicdoesnotchangewhatisconsumedbyotherconsumers

§ Theycancomeandgowithoutimpactontheclusterorotherconsumers

ConsumerRecap

Consumer #1 Consumer #2

login_failure call_placed

Kafka ClusterBroker

Broker

Broker

Broker

Broker

Broker

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-28

§ Youcancreateaconsumerwiththekafka-console-consumertool

§ ThisrequirestheZooKeeperconnecYonstringforyourcluster– UnlikecreaIngaproducer,whichinsteadrequiredalistofbrokers

§ Thecommandalsorequiresatopicname– Inourcase,wewillusedevice_status

§ Youcanuse--from-beginningtoreadallavailablemessages– Otherwise,itwouldreadonlynewmessages

CreaIngaConsumerfromtheCommandLine

$ kafka-console-consumer \ --zookeeper localhost:2181 \ --topic device_status \ --from-beginning

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-29

§ UsingUNIXpipesorredirecYon,youcanreadinputfromfiles– Thedatacanthenbesenttoatopicusingthecommandlineproducer

§ Thisexampleshowshowtoreadinputfromafilenamedalerts.txt – Eachlineinthisfilebecomesaseparatemessageinourtopic

§ ThistechniquecanbeaneasywaytointegratewithexisYngprograms

WriIngFileContentstoTopicsviatheCommandLine

$ cat alerts.txt | kafka-console-producer \ --broker-list localhost:9092 \ --topic device_status

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-30

§ MessaginghastwotradiYonalmodels– Queuing– Publish-subscribe

§ Withqueuing,apoolofconsumersmayreadfromaserverandeachmessagegoestooneofthem

§ Inpublish-subscribe,themessageisbroadcasttoallconsumers

§ AKa8aconsumergroupisaconsumerabstracYonthatgeneralizesbothofthesemodels

HowdoesKa)adifferfromtradiIonalmessagemodels?

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-31

§ Eachmessagepublishedtoatopicisdeliveredtooneconsumerinstancewithineachsubscribingconsumergroup

§ Consumerinstancescanbeinseparateprocessesoronseparatemachines

§ ThediagrambelowdepictsaKa8aclusterwithtwobroker(servers)– ThebrokersarehosIngfourparIIons,P0-P3– ConsumergroupAhastwoconsumerinstancesandgroupBhasfour

Ka)aConsumerGroupOperaIon

Ka8aCluster P0 P3 P1 P2

ConsumerGroupA

C1 C2

ConsumerGroupB

C3 C4 C5 C6

Broker1 Broker2

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-32

§ Ka8afuncYonslikeatradiYonalqueuewhen– Allconsumerinstancesbelongtothesameconsumergroup– Inthiscase,agivenmessageisreceivedbyoneconsumer

§ Ka8afuncYonsliketradiYonalpublish-subscribewhen– Eachconsumerinstancebelongstoadifferentconsumergroup– Inthiscase,allmessagesarebroadcasttoallconsumers

Ka)aConsumerGroupConfiguraIons

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-33

§ Inbetweenthetwoextremesofqueuingorpublish-subscribeliesabalancedsoluYon– Atopiccanhaveoneconsumergroupforeach“logicalsubscriber”

§ Inthisapproach,eachconsumergroupiscomposedofmanyconsumerinstances– Thisprovidesscalabilityandfaulttolerance– Amountstopublish-subscribesemanIcswherethesubscriberisaclusterofconsumersinsteadofasingleprocess

Using“LogicalSubscribers”

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-34

ChapterTopics

ApacheKa8a

§ Overview

§  UseCases

§ Messages,Topics,andParIIons

§  ProducersandConsumers

§ MessageOrderingGuarantees

§  UsingtheJavaAPI

§  EssenIalPoints

§  Hands-OnExercise:UsingKa)afromtheCommandLine

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-35

§ AtradiYonalqueueretainsmessagesinorderontheserver– Theserverhandsoutmessagestoconsumersintheordertheyarestored

§ Insomemessagesystems,messagesdeliveredtoconsumersasynchronouslymayarriveoutoforderatdifferentconsumers– MessageorderiseffecIvelylostinthepresenceofparallelconsumpIon

§ Theworkaroundistoallowonlyoneprocesstoconsumefromaqueue– Thisisthe"exclusiveconsumer"approach– Thereisnoparallelism

TradiIonalMessageOrdering

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-36

§ ParYYonswithinKa8atopicsmakeitpossibletoprovideaconsumergroupwith– Messageorderingguarantees– Loadbalancing

§ ParYYonsareassignedtoconsumersinaconsumergroup– EachparIIonisconsumedbyexactlyoneconsumerinthegroup– TheconsumerofaparIIonistheonlyreaderofthatparIIonandconsumesthedatainorder

§ ThenumberofconsumerscannotexceedthenumberofparYYons

Ka)aOrdering

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-37

§ Ka8aonlyprovidesatotalorderovermessageswithinaparYYon,notbetweendifferentparYYonsinatopic

§ Per-parYYonorderingcombinedwiththeabilitytoparYYondatabykeyissufficientformostapplicaYons

§ SomeapplicaYonsrequiretotalorderingforagiventopic– AccomplishthisbycreaIngjustoneparIIonforthetopic– Notethatthismeansonlyoneconsumerprocessisallowed

Ka)aOrderingTip

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-38

§ MessagessentbyaproducertoaparYculartopicparYYonwillbeappendedintheordertheyaresent– Forexample,ifmessageM1issentbythesameproducerasmessageM2,andM1issentfirst,then– M1willhavealoweroffsetthanM2– M1willappearearlierinthelogthanM2

§ Aconsumerseesmessagesintheorderinwhichtheyarestoredinthelog

§ ForatopicwithreplicaYonfactorN,uptoN-1serverfailurescanoccurwithoutlosinganymessagescommiVedtothelog

Ka)aGuarantees

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-39

ChapterTopics

ApacheKa8a

§ Overview

§  UseCases

§ Messages,Topics,andParIIons

§  ProducersandConsumers

§ MessageOrderingGuarantees

§ UsingtheJavaAPI

§  EssenIalPoints

§  Hands-OnExercise:UsingKa)afromtheCommandLine

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-40

§ Ka8a’sJavaAPIallowsyoutoeasilycreateproducersandconsumers– Yourcodecansendmessagestoatopicusingaproducer– Yourcodecanalsoreadmessagessenttoatopicusingaconsumer

§ Thenextthreeslidesshowsamplecodeforasimpleproducerthatsendsamessagetoatopic

Ka)aJavaAPI:Producer

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-41

SimpleProducer(1):ImportStatementsandClassDeclaraIon

package com.loudacre.example; import java.util.Properties; import java.util.concurrent.Future; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.ProducerConfig; import org.apache.kafka.clients.producer.ProducerRecord; import org.apache.kafka.clients.producer.RecordMetadata; import org.apache.kafka.common.serialization.StringSerializer; public class ProducerExample { public static void main(String[] args) {

Note:filecon0nuesonnextslide

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-42

SimpleProducer(2):ProducerProperIesConfiguraIon

Properties props = new Properties();

// This is a comma-delimited list of brokers to contact props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

// This specifies that the write will only be committed // after all brokers with replicas have acknowledged it props.put(ProducerConfig.ACKS_CONFIG, "all");

// # of bytes to collect in message batch before sending props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384); // Specifies classes used for message serialization props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName()); props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

Note:filecon0nuesonnextslide

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-43

SimpleProducer(3):MessageCreaIonandPublicaIon

// Create a Producer using our configuration properties Producer<String, String> producer =

new KafkaProducer<String, String>(props); // Specify the topic and value for the message String topic = "app_events"; String value = "CART_ADD,alice,0872584";

// Create and send the message ProducerRecord<String, String> message = new ProducerRecord<String, String>(topic, value); producer.send(message); // Close the producer once we no longer need it producer.close(); } }

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-44

§ Thenextfewslidesprovidesamplecodeforasimpleconsumer– Thisconsumerreadsmessagespostedtotheselectedtopic

Ka)aJavaAPI:Consumer

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-45

High-LevelConsumer(1):ImportsandClassDeclaraIon

package com.loudacre.example; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Properties; import kafka.consumer.Consumer; import kafka.consumer.ConsumerConfig; import kafka.consumer.ConsumerIterator; import kafka.consumer.KafkaStream; import kafka.javaapi.consumer.ConsumerConnector; import kafka.serializer.Decoder; import kafka.serializer.StringDecoder; public class ConsumerExample { public static void main(String[] args) {

Note:filecon0nuesonnextslide

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-46

High-LevelConsumer(2):PropertyConfiguraIon

// Define required properties and configure the consumer Properties props = new Properties(); props.put("zookeeper.connect", "localhost:2181"); props.put("group.id", "example"); ConsumerConfig cfg = new ConsumerConfig(props); ConsumerConnector consumer = Consumer.createJavaConsumerConnector(cfg); // Prepare to subscribe to app_events with one thread String topic = "app_events"; Map<String, Integer> tpx=new HashMap<String, Integer>(); tpx.put(topic, Integer.valueOf(1)); // Set up the message decoder and subscribe to the topic Decoder<String> dec = new StringDecoder(null); Map<String, List<KafkaStream<String, String>>> sm = consumer.createMessageStreams(tpx, dec, dec);

Note:filecon0nuesonnextslide

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-47

High-LevelConsumer(3):MessageProcessing

// Get our topic's stream and iterate over its messages for (KafkaStream<String, String> str : sm.get(topic)) { ConsumerIterator<String, String> i = str.iterator();

// Process each incoming message while (i.hasNext()) { String message = i.next().message(); System.out.println("Message was: " + message); } } } }

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-48

ChapterTopics

ApacheKa8a

§ Overview

§  UseCases

§ Messages,Topics,andParIIons

§  ProducersandConsumers

§ MessageOrderingGuarantees

§  UsingtheJavaAPI

§  EssenYalPoints

§  Hands-OnExercise:UsingKa)afromtheCommandLine

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-49

§ NodesrunningtheKa8aservicearecalledbrokers

§ Producerspublishmessagestocategoriescalledtopics

§ Messagesinatopicarereadbyconsumers– MulIpleconsumerinstancescanbelongtoaconsumergroup– Ka)aretainsmessagesforadefined(butconfigurable)amountofIme– Consumersmaintainanoffsettotrackwhichmessagestheybeenread

§ TopicsaredividedintoparYYonsforperformanceandscalability– TheseparIIonsarereplicatedforfaulttolerance

EssenIalPoints

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-50

ThefollowingoffermoreinformaYonontopicsdiscussedinthischapter

§ TheApacheKa8aWebsite– http://kafka.apache.org/

§ Real-TimeFraudDetec:onArchitecture– http://tiny.cloudera.com/kmc01a

§ Ka8aReferenceArchitecture– http://tiny.cloudera.com/kmc01b

§ TheLog:WhatEverySoDwareEngineerShouldKnow…– http://tiny.cloudera.com/kmc01c

Bibliography

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-51

ChapterTopics

ApacheKa8a

§ Overview

§  UseCases

§ Messages,Topics,andParIIons

§  ProducersandConsumers

§ MessageOrderingGuarantees

§  UsingtheJavaAPI

§  EssenIalPoints

§  Hands-OnExercise:UsingKa8afromtheCommandLine

©Copyright2010-2016Cloudera.Allrightsreserved.Nottobereproducedorsharedwithoutpriorwri@enconsentfromCloudera. 01-52

§ Inthisexercise,youwilluseKa8a’scommandlineuYliYestocreateanewtopic,publishmessagestothetopicwithaproducer,andreadmessagesfromthetopicwithaconsumer– PleaserefertotheHands-OnExerciseManualforinstrucIons

Hands-OnExercise:UsingKa)afromtheCommandLine

Integra(ngFlumeandKa0aChapter2

201601

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-2

Inthischapteryouwilllearn

§ WhattoconsiderwhenchoosingbetweenFlumeandKa<aforausecase

§ HowFlumeandKa<acanworktogether

§ HowtoconfigureaFlumesourcethatreadsfromaKa<atopic

§ HowtoconfigureaFlumesinkthatpublishestoaKa<atopic

Integra(ngFlumeandKa0a

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-3

ChapterTopics

IntegraBngFlumeandKa<a

§ Overview

§  UseCases

§  Configura(on

§  TipsforDeployment

§  Essen(alPoints

§  Hands-OnExercise:UsingKa0aasaFlumeSink

§  Hands-OnExercise:UsingKa0aasaFlumeSource

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-4

§ BothFlumeandKa<aarewidelyusedfordataingest– Althoughthesetoolsdiffer,theirfunc(onalityhassomeoverlap– SomeusecasescouldbeimplementedwitheitherFlumeorKa0a

§ HowdoyoudeterminewhichisabeGerchoiceforyourusecase?

ShouldIUseKa0aorFlume?

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-5

§ FlumeisefficientatmovingdatafromasinglesourceintoHadoop– ItofferssinksthatwritetoHDFS,anHBasetable,oraSolrindex– Easilyconfiguredtosupportcommonscenarios,withoutwri(ngcode– Canalsoprocessandtransformdataduringtheingestprocess

Characteris(csofFlume

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-6

§ Ka<aisapublish-subscribemessagingsystem– Itoffersmoreflexibilityforconnec(ngmul(plesystems– ProvidesbeEerdurabilityandfaulttolerancethanFlume– Typicallyrequireswri(ngcodeforproducersand/orconsumers– NodirectsupportforprocessingmessagesorloadingintoHadoop

Characteris(csofKa0a

Apache Kafka

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-7

§ BothsystemshavestrengthsandlimitaBons

§ Youdon’tnecessarilyhavetochoosebetweenthem– Itispossibletousebothwhenimplemen(ngyourusecase

§ Fla<aistheinformalnameforFlume-Ka<aintegraBon– ItusesaFlumeagenttoreadfromorwritemessagestoKa0a

§ ItisimplementedasaKa<asourceandsinkforFlume– ThesecomponentsshipwithFlume,star(ngwithCDH5.2.0– AKa0achannelalsonowshipswithFlume,star(ngwithCDH5.3.0

Fla0a=Flume+Ka0a

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-8

ChapterTopics

IntegraBngFlumeandKa<a

§ Overview

§ UseCases

§  Configura(on

§  TipsforDeployment

§  Essen(alPoints

§  Hands-OnExercise:UsingKa0aasaFlumeSink

§  Hands-OnExercise:UsingKa0aasaFlumeSource

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-9

§ ByusingtheKa<asink,Flumecanpublishmessagestoatopic

§ Inthisexample,anapplicaBonusesFlumetopublishapplicaBonevents– Theapplica(onsendsdatatotheFlumesourcewheneventsoccur– Theeventdataisbufferedinthechannelun(litistakenbythesink– SinceweuseaKa0asink,theeventsarepublishedtoaspecifiedtopic– AnyKa0aconsumercanthenreadmessagesforapplica(onevents

UsingFlumeasaKa0aProducer

Application

Source (netcat)

Channel (Memory)

Sink (Kafka)

Flume Agent Kafka Cluster

Broker

Broker

Broker

Consumer

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-10

§ ByusingtheKa<asource,Flumecanreadmessagesfromatopic– Itcanthenwritethemtoyourdes(na(onofchoiceusingaFlumesink

§ Inthisexample,theProducersendsmessagestoKa<abrokers– TheFlumeagentusesaKa0asource,whichactsasaconsumer– TheKa0asourcereadsmessagesinaspecifiedtopic– Themessagedataisbufferedinthechannelun(litistakenbythesink– ThesinkthenwritesthedataintoHDFS

UsingFlumeasaKa0aConsumer

Kafka Cluster

Broker

Broker

Broker

Producer

Source (Kafka)

Channel (Memory)

Sink (HDFS)

Flume Agent Hadoop Cluster

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-11

ChapterTopics

IntegraBngFlumeandKa<a

§ Overview

§  UseCases

§  ConfiguraBon

§  TipsforDeployment

§  Essen(alPoints

§  Hands-OnExercise:UsingKa0aasaFlumeSink

§  Hands-OnExercise:UsingKa0aasaFlumeSource

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-12

§ ThetablebelowdescribesseveralproperBesoftheKa<asink

Configura(on:UsingFlumeasaKa0aProducer(1)

Application

Source (netcat)

Channel (Memory)

Sink (Kafka)

Flume Agent Kafka Cluster

Broker

Broker

Broker

Consumer

Name DescripBon

type Mustbesettoorg.apache.flume.sink.kafka.KafkaSink

brokerList Comma-separatedlistofbrokers(formathost:port)tocontact

topic ThetopicinKa0atowhichthemessageswillbepublished.

batchSize Howmanymessagestoprocessinonebatch

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-13

§ ThisistheFlumeconfiguraBonfortheexampleonthepreviousslide

Configura(on:UsingFlumeasaKa0aProducer(2)

# Define names for the source, channel, and sink agent1.sources = source1 agent1.channels = channel1 agent1.sinks = sink1 # Define the properties of our source, which receives event data agent1.sources.source1.type = netcat agent1.sources.source1.bind = localhost agent1.sources.source1.port = 44444 agent1.sources.source1.channels = channel1 # Define the properties of our channel agent1.channels.channel1.type = memory agent1.channels.channel1.capacity = 10000 agent1.channels.channel1.transactionCapacity = 1000

Note:filecon.nuesonnextslide

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-14

§ TheremainingporBonoftheconfiguraBonfilesetsuptheKa<asink

Configura(on:UsingFlumeasaKa0aProducer(2)

# Define our Kafka sink, which publishes to the app_event topic agent1.sinks.sink1.type = org.apache.flume.sink.kafka.KafkaSink agent1.sinks.sink1.topic = app_events agent1.sinks.sink1.brokerList = localhost:9092 agent1.sinks.sink1.batchSize = 20 agent1.sinks.sink1.channel = channel1

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-15

§ ThetablebelowdescribesseveralproperBesoftheKa<asource

Configura(on:UsingFlumeasaKa0aConsumer(1)

Name DescripBon

type org.apache.flume.source.kafka.KafkaSource

zookeeperConnect ZooKeeperconnec(onstring(e.g.,localhost:2181)

groupId UniqueIDtousefortheconsumergroup(default:flume)

topic NameofKa0atopicfromwhichmessageswillberead

Kafka Cluster

Broker

Broker

Broker

Producer

Source (Kafka)

Channel (Memory)

Sink (HDFS)

Flume Agent Hadoop Cluster

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-16

Configura(on:UsingFlumeasaKa0aConsumer(2)

§ ThisistheFlumeconfiguraBonfortheexampleonthepreviousslide– ItdefinesasourceforreadingmessagesfromaKa0atopic

# Define names for the source, channel, and sink agent1.sources = source1 agent1.channels = channel1 agent1.sinks = sink1 # Define a Kafka source that reads from the calls_placed topic # The "type" property line wraps around due to its long value agent1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource agent1.sources.source1.zookeeperConnect = localhost:2181 agent1.sources.source1.topic = calls_placed agent1.sources.source1.groupId = flume agent1.sources.source1.channels = channel1

Note:filecon.nuesonnextslide

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-17

Configura(on:UsingFlumeasaKa0aConsumer(2)

§ ThisistheFlumeconfiguraBonfortheexampleonthepreviousslide

# Define the properties of our channel agent1.channels.channel1.type = memory agent1.channels.channel1.capacity = 10000 agent1.channels.channel1.transactionCapacity = 1000 # Define the sink that writes call data to HDFS agent1.sinks.sink1.type=hdfs agent1.sinks.sink1.hdfs.path = /user/training/calls_placed agent1.sinks.sink1.hdfs.fileType = DataStream agent1.sinks.sink1.hdfs.fileSuffix = .csv agent1.sinks.sink1.channel = channel1

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-18

ChapterTopics

IntegraBngFlumeandKa<a

§ Overview

§  UseCases

§  Configura(on

§  TipsforDeployment

§  Essen(alPoints

§  Hands-OnExercise:UsingKa0aasaFlumeSink

§  Hands-OnExercise:UsingKa0aasaFlumeSource

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-19

§ Ka<ahasasignificantlysmallerproducerandconsumerecosystem– UseKa0aifyou’repreparedtoimplementproducersandconsumers

§ UseFlumeifitssourcesandsinksmatchyourrequirements– Flumehasmanybuilt-insourcesandsinksfromwhichtochoose– Usingthemrequiresonlyconfigura(on,notwri(ngcode

UseKa0aforCustomProducersandConsumers

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-20

§ Flumecanprocessdatain-flightusinginterceptors– Thesecanbeveryusefulforfilteringortransformingdata

§ Ka<arequiresanexternalstreamprocessingsystem– SparkStreamingisapopularchoice

UseFlumeforFilteringandTransformingData

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-21

§ BothKa<aandFlumearereliablesystemsthatcanguaranteenodataloss

§ However,Flumedoesnotreplicateevents– Asaresult,ifanodewiththeFlumeagentcrashes,youwillloseaccesstotheeventsinthechannelun(lyourecoverthedisks– Thisistrueevenwhenusingthefilechannel

§ UseKa<aifyouneedaningestpipelinewithveryhighavailability

UseKa0aforHighAvailability

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-22

§ YoucanconfigureaFlumeagenttousemulBplechannels– Eachchannelsendsdatatoanassociatedsink

§ ThiscanbeusedtowritedatatoHDFSandKa<asimultaneously

AFlumeAgentCanWritetoMul(pleSinks

Flume Agent

Channel Kafka Sink

Source

Channel HDFS Sink

Write data to HDFS

Publish toKafka topic

Application

Broker

Broker

Broker

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-23

ChapterTopics

IntegraBngFlumeandKa<a

§ Overview

§  UseCases

§  Configura(on

§  TipsforDeployment

§  EssenBalPoints

§  Hands-OnExercise:UsingKa0aasaFlumeSink

§  Hands-OnExercise:UsingKa0aasaFlumeSource

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-24

§ FlumeandKa<aaredisBnctsystemswithdifferentdesigns– Youmustweighttheadvantagesanddisadvantagesofeachwhenselec(ngthebesttoolforyourusecase

§ FlumeandKa<acanbecombinedwithFla<a– ThisistheinformalnameforFlumecomponentsforKa0aintegra(on– YoucanreadmessagesfromatopicusingaKa0asource– YoucanpublishmessagestoatopicusingaKa0asink

Essen(alPoints

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-25

ThefollowingoffermoreinformaBonontopicsdiscussedinthischapter

§ Fla<a:ApacheFlumeMeetsApacheKa<aforEventProcessing– http://tiny.cloudera.com/kmc02a

§ DesigningFraud-DetecBonArchitectureThatWorksLikeYourBrainDoes– http://tiny.cloudera.com/kmc02b

Bibliography

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-26

ChapterTopics

IntegraBngFlumeandKa<a

§ Overview

§  UseCases

§  Configura(on

§  TipsforDeployment

§  Essen(alPoints

§  Hands-OnExercise:UsingKa<aasaFlumeSink

§  Hands-OnExercise:UsingKa0aasaFlumeSource

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-27

§ Inthisexercise,youwilluseFlume’sKa<asinktowritedatathatwasreceivedbyaFlumeagentintoaKa<asink– PleaserefertotheHands-OnExerciseManualforinstruc(ons

Hands-OnExercise:UsingKa0aasaFlumeSink(Fla0a)

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-28

ChapterTopics

IntegraBngFlumeandKa<a

§ Overview

§  UseCases

§  Configura(on

§  TipsforDeployment

§  Essen(alPoints

§  Hands-OnExercise:UsingKa0aasaFlumeSink

§  Hands-OnExercise:UsingKa<aasaFlumeSource

©Copyright2010-2016Cloudera.Allrightsreserved.NottobereproducedorsharedwithoutpriorwriEenconsentfromCloudera. 02-29

§ Inthisexercise,youwilluseFlume’sKa<asourcetoreaddatapublishedtoaKa<atopicandwriteittoadirectoryinHDFS– PleaserefertotheHands-OnExerciseManualforinstruc(ons

Hands-OnExercise:UsingKa0aasaFlumeSource(Fla0a)