Webinar: MongoDB Connector for Spark

52
Tweet using #MongoDBWebinar Follow @blimpyacht & @mongodb MongoDB Connector For Spark @blimpyacht

Transcript of Webinar: MongoDB Connector for Spark

Page 1: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

MongoDB Connector For Spark@blimpyacht

Page 2: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Page 3: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

HDFS

Distributed Data

Page 4: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Spark Stand Alone

YARN

Mesos

HDFS

Distributed Resources

Page 5: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

YARN

SparkMesos

HDFS

Spark Stand Alone

Hadoop

Distributed Processing

Page 6: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

YARN

SparkMesos

Hive

Pig

HDFS

Hadoop

Spark Stand Alone

Domain Specific Languages

Page 7: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

YARN

SparkMesos

Hive

Pig

SparkSQL

Spark Shell

SparkStreaming

HDFS

Spark Stand Alone

Hadoop

Page 8: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

YARN

SparkMesos

Hive

Pig

SparkSQL

Spark Shell

SparkStreaming

Spark Stand Alone

Hadoop

Page 9: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Stand AloneYARN

SparkMesos

SparkSQL

SparkShell

SparkStreaming

Page 10: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Stand AloneYARN

SparkMesos

SparkSQL

SparkShell

SparkStreaming

Page 11: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

executor

Worker Node

executor

Worker Node Master

Spark Connector

Driver Application

Page 12: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Parellelize

Parellelize

Parellelize

Parellelize

Page 13: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Parellelize

Parellelize

Parellelize

Parellelize

Transform

Transform

Transform

Transform

Page 14: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Transformationsfilter( func )union( func )intersection( set )distinct( n )map( function )

Page 15: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Parellelize

Parellelize

Parellelize

Parellelize

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Page 16: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Parellelize

Parellelize

Parellelize

Parellelize

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Action

Action

Action

Action

Page 17: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Actionscollect()count()first()take( n )reduce( function )

Page 18: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Parellelize

Parellelize

Parellelize

Parellelize

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Action

Action

Action

Action

Result

Result

Result

Result

Page 19: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Parellelize

Parellelize

Parellelize

Parellelize

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Action

Action

Action

Action

Result

Result

Result

Result

Lineage

Page 20: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Parellelize

Parellelize

Parellelize

Parellelize

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Action

Action

Action

Action

Page 21: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Parellelize

Parellelize

Parellelize

Parellelize

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Action

Action

Action

Action

Page 22: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Parellelize

Parellelize

Parellelize

Parellelize

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Transform

Action

Action

Action

Action

Result

Result

Result

Result

Page 23: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Using the Connector

Page 24: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

https://github.com/mongodb/mongo-spark

Page 25: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Page 26: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

http://spark.apache.org/docs/latest/

Page 27: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Page 28: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Page 29: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

{

"_id" : ObjectId("578be1fe1fe699f2deb80807"),

"user_id" : 196,

"movie_id" : 242,

"rating" : 3,

"timestamp" : 881250949

}

Page 30: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

./bin/spark-shell \ --conf \

"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0

Page 31: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

./bin/spark-shell \ --conf \

"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0

Page 32: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

./bin/spark-shell \ --conf \

"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0

Page 33: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

./bin/spark-shell \ --conf \

"spark.mongodb.input.uri=mongodb://127.0.0.1/movies.movie_ratings" \ --conf \ "spark.mongodb.output.uri=mongodb://127.0.0.1/movies.user_recommendations" \ --packages org.mongodb.spark:mongo-spark-connector_2.10:1.0.0

Page 34: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

import com.mongodb.spark._import com.mongodb.spark.rdd.MongoRDDimport org.bson.Document

val rdd = sc.loadFromMongoDB()for( doc <- rdd.take( 10 ) ) println( doc )

Page 35: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Read Config Write Config

Page 36: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Aggregation Filters$match | $project | $group

Page 37: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

JSONJSONJSONJSONJSONJSONJSONJSONJSONJSONJSON

Page 38: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

JSONJSONJSONJSONJSONJSONJSONJSONJSONJSONJSON

Page 39: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

val aggRdd = rdd.withPipeline( Seq( Document.parse( "{ $match: { Country: \"USA\" } }" ) ) )

Page 40: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Spark SQL + Dataframes

Page 41: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

RDD + Schema = Dataframe

Page 42: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Page 43: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

JSONJSONJSONJSONJSONJSONJSONJSONJSONJSONJSON

$sample

Page 44: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Page 45: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Data Locality mongos

Page 46: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

Courses and Resources

Page 47: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

https://university.mongodb.com/courses/M233/about

Page 48: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

https://databricks.com/blog

Page 49: Webinar: MongoDB Connector for Spark

MongoDB Connector Highlights

DataSource

Page 50: Webinar: MongoDB Connector for Spark

Use Cases

1 Real-Time

Page 51: Webinar: MongoDB Connector for Spark

Use Cases

2 Batch

Page 52: Webinar: MongoDB Connector for Spark

Tweet using #MongoDBWebinarFollow @blimpyacht & @mongodb

THANKS!@blimpyacht