Hugfr SPARK & RIAK -20160114_hug_france
-
Upload
hug-france -
Category
Technology
-
view
1.641 -
download
0
Transcript of Hugfr SPARK & RIAK -20160114_hug_france
![Page 1: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/1.jpg)
SPARK & RIAKINTRODUCTION TO THE SPARK-RIAK-CONNECTOR
LATERALTHOUGHTS
![Page 2: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/2.jpg)
Me, Myself & I
Associate at LateralThoughts.com
Scala, Java, Python Developer
Data Engineer @ Axa & Carrefour
Apache Spark Trainer with Databricks
LATERALTHOUGHTS
![Page 3: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/3.jpg)
And the Other One …
Director Sales @ Basho Technologies
(Basho make Riak)
Ex of MySQL France
Co-Founder MariaDB
Funny Accent
![Page 4: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/4.jpg)
Quick Introduction …2011 Creators of Riak
Riak KV: NoSQL key value database Riak S2: Large Object Storage
2015 New Products Basho Data Platform: Integrated NoSQL databases, caching, in-memory analytics, and search
Riak TS: NoSQL Time Series database
120+ employees
Global Offices Seattle (HQ), Washington DC, London, Paris, Tokyo
300+ Enterprise customers, 1/3 of the Fortune 50
![Page 5: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/5.jpg)
![Page 6: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/6.jpg)
PRIORITIZED NEEDS
High Availability - Critical Data
High Scale – Heavy Reads & Writes
Geo Locality – Multiple Data Centers
Operational Simplicity – Resources
Don’t Scale as Clusters
Data Accuracy – Write Conflict Options
∂
RIAK S2 USE CASES
Large Object Store Content Distribution
Web & Cloud Services Active Archives
∂
RIAK KV USE CASES
User Data Session Data Profile Data
Real-time Data Log Data
∂
RIAK TS USE CASES
IoT/Devices Financial/Economic
Scientific Observations Log Data
![Page 7: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/7.jpg)
The Evolution of NoSQL
Unstructured Data Platforms
Multi-Model Solutions
Point Solutions
![Page 8: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/8.jpg)
Basho Data Platform …
![Page 9: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/9.jpg)
ABOUT SPARK & RIAK
![Page 10: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/10.jpg)
Spark & Riak
Disclaimer, the following presentation uses :
Spark v1.5.2
Spark-Riak-Connector v1.1.0
![Page 11: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/11.jpg)
Pre-Requisites
To use the Spark Riak Connector, as of now, you need to build it yourself :
Clone https://github.com/basho/spark-riak-connector
`git checkout v1.1.0`
`mvn clean install`
![Page 12: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/12.jpg)
Bootstrapped project
![Page 13: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/13.jpg)
Reading from
Connect to a Riak KV Cluster from Spark
Query it :
Full Scan
Using Keys
Using secondary indexes (2i)
![Page 14: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/14.jpg)
Connecting to
![Page 15: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/15.jpg)
Loading data from
riakBucket[V](bucketName: String): RiakRDD[V]
riakBucket[V](bucketName: String, bucketType: String): RiakRDD[V]
riakBucket[K, V](bucketName: String, convert: (Location, RiakObject) => (K, V)): RiakRDD[(K, V)]
…
On your Spark Context, you can use :
![Page 16: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/16.jpg)
add a query, otherwise…
![Page 17: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/17.jpg)
Find all :
Find by key(s) :
![Page 18: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/18.jpg)
Implicits that will give you the riak* methods
![Page 19: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/19.jpg)
Reading from
Using case classes
Using Secondary Indexes
![Page 20: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/20.jpg)
Basic I/O
![Page 21: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/21.jpg)
Mapping Objects - Buckets
![Page 22: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/22.jpg)
![Page 23: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/23.jpg)
Adding fields during save
![Page 24: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/24.jpg)
Spark Riak Connector - RoadmapBetter Integration with Riak TS
Enhanced DataFrames - based on Riak TS Schema APIs
Server-side aggregations and grouping - using TS SQL commands
Speed
Data Locality (partition RDDs according to replication in the cluster) - launch Spark executors on the same nodes where the data resides.
Better mapping from vnodes to Spark workers using coverage plan
Better support for Riak data types (CRDT) and Search queries
Today requires using Java Riak client APIs
Spark Streaming
Provide example and sample integration with Apache Kafka
Improve reliability using Riak for checkpoints and WAL
Add examples and documentation for Python support
DRAFT
![Page 25: Hugfr SPARK & RIAK -20160114_hug_france](https://reader036.fdocuments.us/reader036/viewer/2022081414/58a197c01a28ab97118b5e39/html5/thumbnails/25.jpg)
Thank you@ogirardot
https://github.com/ogirardot/spark-riak-example
https://speakerdeck.com/ogirardot/spark-and-riak-introduction-to-the-spark-riak-connector
@mcarney23
fr.basho.com