IBM Bluemix Paris meetup - Big Data & Analytics dans le Cloud - Epitech- 201611094
Get Data, Build Apps and Analyze Data Using IBM Bluemix Data and Analytics...
Transcript of Get Data, Build Apps and Analyze Data Using IBM Bluemix Data and Analytics...
© 2015 IBM Corporation
Get Data, Build Apps and Analyze Data Using IBM Bluemix Data and Analytics (Session 6748)Eric Cattoir - @CattoirEric Yves Debeer - @yvesdebeer Bert Waltniel - @BertWaltniel
Innovation is the new currency
“Two guys in a Starbucks can have access to the same computing power as a Fortune 500 company.”
Jim DetersFounder, Galvanize
To really disrupt, a business should focus on building differentiation and rent the rest
Devs can quickly compose apps with new APIs and digital services to add features and increase engagement in areas like:
• Analytics, cognition • Mobile, location • Internet of Things • Social engagement • Identity • Reviews • Travel • Messaging … • His/her company’s private APIs and services
7
Customer ManagedService Provider Managed
IBM SoftLayer
Bluemix started as a public PaaSBluemix started with a major focus on developer productivity in the public cloud.
Infrastructure as a Service
Code
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
Code
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
Platform as a Service
8
Customer ManagedService Provider Managed
IBM SoftLayer
We listened. Now we’re evolving to become even more flexible.Capabilities in Bluemix now span PaaS and IaaS and can be delivered as a public, dedicated, or on-premises* implementation.
Infrastructure as a Service
Code
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
Code
Data
Runtime
Middleware
OS
Virtualization
Servers
Storage
Networking
Platform as a Service
Built ontechnologies:
How does Bluemix work?Bluemix is underlined by three key open compute technologies: Cloud Foundry, Docker, and OpenStack. It extends each of these with a growing number of services, robust DevOps tooling, integration capabilities, and a seamless developer experience.
9
Flexible Compute Options to Run Apps / ServicesInstant Runtimes Containers Virtual Machines
Platform Deployment Options that Meet Your Workload Requirements
Bluemix Public
Bluemix Dedicated
Bluemix Local*
DevOps Tooling Your Own Hosted Apps / Services
Integration and API Mgmt
Powered by IBM SoftLayer In Your Data Center
+ + +
+ +
+ Always focused on what’s next
Catalog of Services that Extend Apps’ Functionality
Web Data Mobile AnalyticsCognitive IoT Security Yours
+
*Bluemix Local coming Summer 2015
Bluemix is built on IBM SoftLayer
10
Dallas
London (now)
Bluemix Public LocationSoftLayer Data Center
A different kind of data center • Every location designed, built, and operated to the same standardized, “pod” based spec • 24/7 on-site security and rigorous controls • Expanding to 40 data centers worldwide
Global network of networks • Public, private, and management networks all separate •More than 2,000Gbps between data centers and network points of presence (PoPs) •Unmetered inbound public bandwidth and fully unmetered bandwidth between data centers
Entirely automated • SoftLayer API controls everything - more than 3000 documented methods and 180 distinct services •Bare metal and virtualized servers in the same platform
The highest performing cloud infrastructure available.
Sign up in minutes. Pay for what you use.
11
Cloud based pricing models to serve developer needs.
• 30 day trial (no credit card required) - Designed to allow testing of an entire application on the platform
Friction free adoption
• Free tier for every service - encourages experimentation of new services for applications already running on Bluemix
• Pay-as-you-go - optimized for flexibility, no term commitment
Multiple Commitment Models
• Subscription - term based optimized for cost, discounted from pay as you go rates
• Zero to coding in less than 5 minutesSelf Service
• Credit card over the web in many countries - or through your IBM rep
13
Requirements Based Top-Down Design Integration and Reuse Competence Centers Better Decisions Enterprise Focus
Opportunity-Oriented Experimentation Throwaway Hackathons Business Innovation Functional Focus
Traditional Agile Data Analytics
Business Agility through Data
15
Elastic Provisioning
Pay-as-You-Go
Manage High Volume External Data Sources
Self-Service Through a Browser
SQL / NOSQL – Unstructured Data
Access Data Anywhere, Anytime
Leverage Current Cloud Apps
Agility and Elasticity through Cloud
Work with Cloud Data Services in Bluemix
17
DashDB
ANALYZEDATA
DataWorks
GETDATA
Apache Kafka
Streams
Cloudant
Redis
PUT DATA TO WORK
PostgresDB2
MongoDB
ReThinkDBObject Storage
Graph DB
Sensors
Internet
Social Media
CustomerConversations
Internal & External Data Sources
Back Office Applications
Notebooks
INTERACTGAIN INSIGHT
VISUALIZE
Your Own Data & Analytics
Applications
Predictive Analytics
Iterate
Example: Health Management Platform
18
Cloudant
GETDATA
PUT DATA TO WORK
ANALYZEDATA
INTERACTGAIN INSIGHT
VISUALIZE
Streams
Clinical & Wearable Device Sensors
Fitbit, JawboneDevice Data
Lab ResultsPatient Conversations
Internal & External Data Sources
Health Results from RDBMS
DataWorks
DashDBNotebooks
Get Data from Own or Public Data
• Import Data into data servicese.g. dashDB, Cloudant, Mongo, … through respective load tools
• Create Connections to diverse data sources on-premise or cloudfor use in analytics e.g. Notebooks
• Load Data from diverse sources into cloud data services in-context,powered by DataWorks
X
Get Data from Bluemix’s Analytics Exchange
• Explore available data sets • Find interesting data • Access data from Bluemix apps • Analyze Data in ▪ Apache Spark & Notebooks
▪ Dash DB
▪ Watson Analytics
▪ …
X
Build Applications using Bluemix Data Services
Connect your applicationsto use Data in Bluemix • Select the Database Service Instance
you want to use and pick your plan (upper right)
• Get service credentials to use in your code (lower right)
• Use the APIs, passing the credentials you obtained ▪ from Node.js, Liberty, or
other apps on Bluemix
▪ or from apps running on other platformor devices
• Manage from the context of Bluemix under Bluemix login
X
Analyze Data with IBM Analytics for Spark
• Go to Work with Data -> Analytics and create a new service instance
• Interactive Notebooks ▪ Use Python, Scala with Spark ▪ Associate an Object Storage
for accessing&uploading of Data ▪ Connect to Data Sources e.g. Files,
Cloudant, DashDB, on-premise DBs, … • Spark Submit
▪ Download Apache Spark Submit ▪ Develop your own Spark Jobs ▪ Run and monitor your Jobs
X
+
Example: New York Accidents Analytics
• NY City Public Data ▪ Accident Data from NYPD ▪ Road Condition Data ▪ Weather History Data
• We created a set of Notebooks to ▪ Cleanse Data to get it in proper shape for
visualization and analytics ▪ Visualize Data to better understand its content
and structure ▪ Analyze Data to identify patterns and correlations
in the data ▪ Predict future Incident Likelyhoods from data ▪ Visualize Insights from descriptive and
predictive analytics
X
Resilient Distributed Datasets (RDDs)• A collection of elements that Spark works on in parallel. • May be kept in memory or on disk. • Applications can also explicitly tell Spark to cache an RDD, which is great for iterative
algorithms. • An RDD contains the “raw data”, plus the function to compute it. • Fault-tolerance: if any partition of an RDD is lost, it will automatically be recomputed
using the transformations that originally created it.
RDD built from a Java collection
RDD built from an external dataset(local FS, HDFS, Hbase,…)
Working with RDDs: Transformations and Actions• Transformations are lazy: they do not compute their results right away. They are
added to the operations of the RDD ▪ optimize the required calculations ▪ recover from lost data partitions
• Examples: map(func), filter(func), union(), join(), groupByKey
• Actions are executed immediately, and trigger execution of all prior transformations on an RDD
• Examples: reduce(func), collect(), saveAsSequenceFile()
• func are Java/Scala/Python functions that you write
• Call persist() on an RDD if you plan to reuse it later
Spark in Action – Word Count in Scalaval conf = new SparkConf().setAppName(“WordCount”)
val sc = new SparkContext(conf)
val file = sc.textFile(“swift://fileContainer.spark/input.txt”)
val words = file.flatMap(line => tokenize(line))
val wordMap = words.map(x => (x, 1))
val wordCounts = wordMap.reduceByKey(_ + _)
wordCounts.saveAsTextFile(“swift://fileContainer.spark/output.txt”)
Tokenize is
def tokenize(text : String) : Array[String] = {
text.toLowerCase.replaceAll("[^a-zA-Z0-9\\s]", "").split("\\s+")
}
// Adapted from Word Count example on http://spark-project.org/examples/
1 RDD = 1 line of the document
Transformations
Action
X
Reader Node.js
Topic Kafka
Spark Streaming
Notebook
WatsonTone
Analyzer
Results Cloudant DB
REST API Node.js
Insights App
Node.jsMessage Hub provides elastic high velocity message queue
Node.js Reader receives Twitter data stream and writes to Topic
Algorithms in Scala detect Tweets of interest
Watson enriches Tweets with tone &sentiment info
Cloudant stores insight data with HADR at scale
Insight App lets users explore and interact with results
Combine Services: Analytics of Twitter Data
X
Reader Node.js
Topic Kafka
Spark Streaming
Notebook
WatsonTone
Analyzer
Results CloudantDB
REST API Node.js
Insight App Node.js
Reader Node.js
Topic Kafka
Reader Node.js
Topic KafkaStock
Quotes
Topic Kafka
Alert Gen Node.js
Predictive Analytics
Push Service
Streaming Analytics using multiple Data Sources
Conclusion
• You can achieve greater Business Agility and faster Insights through Cloud based Innovation without upfront investment
• IBM Cloud Data Services provide open, cloud based data and analytics services that enable fast cloud based innovation
• Bluemix - Data & Analytics at https://bluemix.net/data features and integrates cloud data services, enabling you to ▪ Get Data from your own or public data sources ▪ Build Applications using cloud data & analytics services ▪ Analyze Data with Spark&Notebooks at https://bluemix.net/data/analytics , Hadoop,
dash DB, … • Combine and Integrate cloud data & analytics services with each other, as well as with
other Bluemix services, e.g. through the new Message Hub based on Apache Kafka
X