Spark Streaming Info
-
Upload
doug-chang -
Category
Software
-
view
688 -
download
0
Transcript of Spark Streaming Info
![Page 1: Spark Streaming Info](https://reader036.fdocuments.us/reader036/viewer/2022072000/55d4dc6abb61ebb1238b4570/html5/thumbnails/1.jpg)
Spark Streaming
Much easier than StormReplaces Storm spouts/bolts with Akka Actors
Better API(make time part of API) and integrationHadoop 2.3/Spark 0.9.1
![Page 2: Spark Streaming Info](https://reader036.fdocuments.us/reader036/viewer/2022072000/55d4dc6abb61ebb1238b4570/html5/thumbnails/2.jpg)
Sbt setup
Create a separate sbt project; sbt run Includes the jars and sets the class path
Batch and Streaming, http://spark.apache.org/docs/latest/quick-start.html
Create a project directory Add dependencies; scalaized maven
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0"
scalaVersion:="2.10.3"
Manage the sbt/scala versions locally
![Page 3: Spark Streaming Info](https://reader036.fdocuments.us/reader036/viewer/2022072000/55d4dc6abb61ebb1238b4570/html5/thumbnails/3.jpg)
Maven setup
Run the demo using maven/eclipse Easier, maven central to find jars/artifacts Add the external libs using maven to local repo
and mvn package in spark source distro Eclipse: add Scala Nature, Maven project
![Page 4: Spark Streaming Info](https://reader036.fdocuments.us/reader036/viewer/2022072000/55d4dc6abb61ebb1238b4570/html5/thumbnails/4.jpg)
Demo
Connect to twitter stream and process Test Twitter4j connection w/Java first. Print out a
twitter stream Batch Mode: sc.stop(); RealTime Streaming
stream.awaitTermination(). Dstream/scala lazy evaluation
Create a stream using #:: like the recursive List operator. (#iphone,1)#:(#andriod,3)#(#apple,10). Unlike a list head/tail behave differently. Head is a val.
![Page 5: Spark Streaming Info](https://reader036.fdocuments.us/reader036/viewer/2022072000/55d4dc6abb61ebb1238b4570/html5/thumbnails/5.jpg)
Spark Streams
StreamingContext start scheduler JobScheduler.scala: starts JobGenerator and runs
them in a thread pool JobGenerator.scala: Starts event actor, checkpoint
writer, for each thread Storage:
DStream appends to blockgenerator BlockGenerator.scala: Spark BlockGenerator w/2
threads. On termination wait for blockpush thread to join.
![Page 6: Spark Streaming Info](https://reader036.fdocuments.us/reader036/viewer/2022072000/55d4dc6abb61ebb1238b4570/html5/thumbnails/6.jpg)
Kafka Streaming Demo
KafkaUtils/Consumer connection IOItec connection lib Need to add more features/testing for faults Read source how to fill out params Start zookeeper, start a producer, define a
topic, etc...
Send data from the producer
![Page 7: Spark Streaming Info](https://reader036.fdocuments.us/reader036/viewer/2022072000/55d4dc6abb61ebb1238b4570/html5/thumbnails/7.jpg)
Demo Output showing console producer to Spark Consumer
![Page 8: Spark Streaming Info](https://reader036.fdocuments.us/reader036/viewer/2022072000/55d4dc6abb61ebb1238b4570/html5/thumbnails/8.jpg)
Producer/Executor
Match the broker-id in the server conf file with groupID in the consumer call
val kafkaInputs = (1 to 5).map { _ =>
KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1))
![Page 9: Spark Streaming Info](https://reader036.fdocuments.us/reader036/viewer/2022072000/55d4dc6abb61ebb1238b4570/html5/thumbnails/9.jpg)
Producer
Use awaitTermination() to get infinite loop so you can see what you enter into the producer; Start w/1 executor
val stream = new StreamingContext("local[2]","TestObject", Seconds(1)) val kafkaMessages=
KafkaUtils.createStream(stream,"localhost:2181","1",Map("testtopic"->1)) //create 5 executors val kafkaInputs = (1 to 5).map { _ => KafkaUtils.createStream(stream,"localhost:2181", "1", Map("testtopic" -> 1)) kafkaMessages.print() stream.start() stream.awaitTermination()