Data Culture Series - Keynote & Panel - Reading - 12th May 2015

66
DATA CULTURE SERIES – 12 th May 2015 Reading

Transcript of Data Culture Series - Keynote & Panel - Reading - 12th May 2015

DATA CULTURE SERIES – 12th May 2015Reading

? Who is using Data to drive the future of their business?

? Who is using Predictive Analytics / Machine Learning yet to

change their business model?

UK Business Lead for BI & Advanced Analytics

4

Jon Woodward : Connect & Follow

4

@JLWoodward

www.linkedin.com/in/jonathanwoodward

#DataCulture

SQL Server

PowerBI

AzureML

Hadoop

DataFactory

DocumentDB

Search

EventHub

Stream Analytics

Revolution R

Azure DW

Azure Data Lake

Industry transformation driving opportunities

2015…We have reached a Tipping Point

Of organizations will

consider cloud

deployment

50%

Of new licence spend will

be for Data Discovery &

Analytics

50%

Of BI & Analytics spend

will be driven by the

Business

50%

Of Users will be touched

by BI and Analytics

50%

Core to Vision

Start Justin

Digital Work & LifeExperiences

Data…Driving the Experience

UK Economy - data dividend

The Microsoft

data platformMobileReports

Natural

language queryDashboardsApplications

StreamingRelational

Internal &

externalNon-relational NoSQL

Orchestration

Machine

learningModeling

Information

management

Complex event

processing

Data Culture Series

Data Culture Exec

Session

Data Culture

Summit

4 events – final event 14th May, LondonCXO Level – Invite only

10 events; 1000 customersPower User, Analyst, Architect, Developer, DBA, Data Scientist

Final 2 events this fiscal (Reading, London)

Data Culture Summit

Date Location

12 May READING Data Culture series

19 May LONDON Data Culture series

Summer Break

Date Location

September 16th

/17th

London 2 Day Data Culture Event

Nov London Future Decoded

Jan TBC 2 Day Data Culture Event

Value of Data

IoT

Business Apps

CMO, CFO, Sales

Business Case

For Data

CDO, CIO, CTO

Architect Level

Data Platform Workshop

Modernising your Data Platform

Data Developer

Multi-track - Hands-on

BI, Advanced Analytics , IoT, Data Services, Big Data

Dashboard in a

Day

Analyst

Hands on BI

#DataCulture

Microsoft Data Culture - UK

Time

10.00 – 10.30 Intro – Jon Woodward

10:30 – 11:30 KeynoteRic Howe – Data Platform Update (Build & Ignite)Niel Miller – Revolution Analytics Overview

11:30 – 12:30 Immersion Tracks - Overview

12:30 – 13:15 Lunch & Expo

13:15 – 15:00 Immersion Hands on

15:00 - 15:15 Break & Expo

15:15 - 16:30 Immersion Hands on

16:30 – 17:00 Panel and l Close

Microsoft, HortonWorks, KPMG, Revolution

RIC HOWEData Platform Update – from Build & Ignite

Microsoft Azure Data Lake

HDFS compatible

Unlimited Storage,

Petabyte files

Optimised for massive throughput

High-frequency, low-latency,

near-real-time

Native format

Azure Data Warehouse A mashup of Azure SQL v12, and PDW

Uses Azure Data Lake for storage*

Features PolyBase Can connect to Azure HDInsight

But also to Cloudera and Hortonworks clusters, in cloud or on-prem

Compute and storage scale separately Compare to Amazon Redshift

Integrations Azure Data Factory

Power BI

Face, Speech, Vision, Text, Recommendations, Churn, and more

Example Face API (How-Old.net)

Pattern recognition

Give it a photo, it will guess gender and age

Integrated with Bing images to make finding photos simple

Azure Machine Learning – new APIs

dbo.Patients

Jane Doe

Name

243-24-9812

SSN

USA

Country

Jim Gray 198-33-0987 USA

John Smith 123-82-1095 USA

dbo.Patients

Jane Doe

Name

1x7fg655se2e

SSN

USA

Jim Gray 0x7ff654ae6d USA

John Smith 0y8fj754ea2c USA

Country

Result Set

Jim Gray

Name

Jane Doe

Name

1x7fg655se2e

SSN

USA

Country

Jim Gray 0x7ff654ae6d USA

John Smith 0y8fj754ea2c USA

dbo.Patients

SQL Server

ciphertext

Query

Always EncryptedHelp protect data at rest and in motion, on-premises & cloud

TrustedApps

SELECT Name FROM

Patients WHERE SSN=@SSN

@SSN='198-33-0987'

Result Set

Jim Gray

Name

SELECT Name FROM

Patients WHERE SSN=@SSN

@SSN=0x7ff654ae6d

Column Encryption

Key

Enhanced

ADO.NET

Library

ColumnMasterKey

Client side

PolyBaseQuery relational and non-relational data with T-SQL

T-SQL query

SQL Server Hadoop

Quote:

************************

**********************

*********************

**********************

***********************

$658.39

Jim Gray

Name

11/13/58

DOB

WA

State

Ann Smith 04/29/76 ME

Data Scientist

Interact directly with data

Built-in to SQL Server

Data Developer/DBAManage data and

analytics together

Built-in advanced analyticsIn-database analytics at massive scale

Example Solutions

• Sales forecasting

• Warehouse efficiency

• Predictive maintenance

Relational Data

Analytic Library

T-SQL Interface

Extensibility

?R

R Integration

010010

100100

010101

Microsoft Azure

Marketplace

New R scripts

010010

100100

010101

010010

100100

010101

010010

100100

010101

010010

100100

010101

010010

100100

010101

• Credit risk protection

Enhanced Analysis & Reporting ServicesScalable on-premises BI solutions & new modern reports

Internet Explorer Firefox SafariChromeEdge

Order history

Name Date Item

0x21ba906fdb52 1ba906fd 2ba906f

0x19ca706fbd9a 5re316rl 1da813t

1x59cm676rfd8b 1re306fd 3ha706f

2y36cg776rgd5b 3bg606fl 1ba906i

1t64ce87r6pd7d 5ba616rj 2ra933f

0y16cj676r6fd3e 1ra806fd 3ra806t

3x47cr876r6fd9g 2hh906fj 1sa906f

1x11cj576rf6d3d 6be916gi 3sa523t

2t74ce6676rfd9c 1hi9306fj 2ga906f

0y47cm776rfd1b 3bi506gd 1wa806f

4x32cj6676rfd9y 3ha916fi 2ba913i

0x77cf6676rfd3x 5re926gi 1ba902f

2t22cm676rfd3a 1ra536fe 1ea667i

0x19ca706fbd9a 5re316rl 1da813t

Order history

Name Date Item

0x21ba906fdb52 1ba906fd 2ba906f

0x19ca706fbd9a 5re316rl 1da813t

1x59cm676rfd8b 1re306fd 3ha706f

2y36cg776rgd5b 3bg606fl 1ba906i

0x19ca706fbd9a 5re316rl 1da813t

Stretch SQL Server into Microsoft AzureSecurely stretch cold tables to Azure with remote query processing

App

Query

Microsoft Azure

Customer data

Product data

Order History

Stretch to cloud

Query

Always Encrypted

Connect live to on-premises dataLive Connectivity to SQL Server Analysis Services

Interactive query

AZURE STREAM ANALYTICSHDINSIGHT

Data platform

POWER BI 2.0

AZURE SQL DATABASE

AZURE MACHINE LEARNING

SQL SERVER 2014 DOCUMENT DB

AZURE SQL DATA WAREHOUSE

REVOLUTION R

AZURE DATA FACTORY AZURE DATA LAKE

Connect and Follow

NEIL MILLERRevolution Analytics OverviewBigger Data with R

BIGGER DATA?

R

Revolution R

Enterprise

Revolution Analytics Proprietary

(A Zettabyte has 21 zeros)

(40,000,000,000,000,000,000,000)

(= 3 million books per person)

Volume

Variety

Velocity

Revolution Analytics Proprietary

But…

• Wider data sets (many more variables / features)

• Real time scoring (steaming data in fast…) Revolution Analytics Proprietary

THE PERFECT STORM

+ Computing Power

+ Bigger Data

+ Pace of Business

+ Customer Expectations

+ Data Science

+ Computer Science

+ Management Science

Better

Business

Decisions

Better

Business

Outcomes

Revolution Analytics Proprietary

- Robert Gentleman & Ross Ihaka, 1993

- Version 1.0 in 2000

- 3.0+ Million Global Users

- 6200+ “Packages”

- R in Universities = New Talent

- Open Source = Access To Innovation

- Programming Agility

- Huge range of predictive analytics

We love R!

Revolution Analytics Proprietary

Source: www.rexeranalytics.com

Revolution R Enterprise solves these problems!

?

Revolution Analytics Proprietary

OUR COMPANY

The leading providerof advanced

analytics software and services

based on open source R, since 2007

OUR PRODUCTS

REVOLUTION R: The enterprise-grade

predictive analytics application platform

based on the R language

Revolution Analytics Proprietary

Language

Interpreter and

Standard R

Algorithm Suites

Development &

Deployment Tooling

Big Data Distributed

Execution Platform

R +

CR

AN

Revo

R

DistributedR

ConnectR

ScaleR

DevelopR DeployR

Revolution R Enterprise Big Data Big Analytics

Ready

– Enterprise

readiness

– High performance

analytics

– Multi-platform

architecture

– Data source

integration

– Development tools

and Integration

tools

Enterprise Technical

Support

Revolution Analytics Proprietary

Revolution Analytics Proprietary

File NameCompressed

File Size (MB) No. RowsOpen Source R

(secs)Revolution R

(secs)

Tiny 0.3 1,235 0.001 0.05

V. Small 0.4 12,353 0.21 0.05

Small 1.3 123,534 0.03 0.03

Medium 10.7 1,235,349 1.94 0.08

Large 104.5 12,353,496 60.69 0.42

Big (full) 12,960.0 123,534,969 Memory! 4.89

V.Big 25,919.7 247,069,938 Memory! 9.49

Huge 51,840.2 494,139,876 Memory! 18.92

22 years of US

flight data

124m rows, 29

variables

Linear

Regression

model - arrival

delay as

function of

day-of-week Tests run on 4 core machine, 16GB RAM and 500GB SSD

Revolution Analytics Proprietary

DistributedR

ScaleR

ConnectR

DeployR

In the Cloud Cloud

Workstations & Servers DesktopsServer

Clustered Systems Microsoft HPCLinux

EDW Teradata

HadoopHortonworksClouderaMapR

+ HD Insights

+ SQL Server vNext

+ Azure ML

+ Power BI

Revolution Analytics Proprietary

In-database analytics at massive scale

Data Scientist

Interact directly with data

SQL Server

Data Developer/DBAManage data and

analytics together

ExtensibilityExample Solutions

• Fraud detection

• Sales forecasting

• Warehouse efficiency

• Predictive maintenance

010010

100100

010101

Relational Data

Analytic LibraryNative functions

T-SQL Interface

Benefits Faster deployment of ML models

Faster performance

(Move compute to the data)

Improved scalability

In-DB Analytic Scenarios Real-time fraud detection

Customer churn analysis

Product recommendations

R

R Integration

coming!

Microsoft Confidential. Preliminary Information. Dates and capabilities subject to change. Microsoft makes no warranties, express or implied.

Revolution Analytics Proprietary

Assemble and standardize

all of a marketer’s data into

a Hadoop cluster

Apply the rigor of a medical

researcher with patented

methodology

Know whom

to reachIdentify and attribute

the revenue drivers

Revolution Analytics Proprietary

More info at:

http://www.revolutionanalytics.com/content/datasong%E

2%80%99s-big-data-analytics-platform-marketing-

optimization-helps-clients-understand

Features

Ensemble of models used:

SVM, Random Forests,

and Neural Networks.

Then Logistic Regression

used to assess pass / fail

Pass

Or

Fail

Pre-processing

crop and align to

fixed size.

Feature

extraction.

More info at :

http://info.revolutionanalytics.com

/30apr15-iot-and-the-

manufacturing-floor.html

Revolution Analytics Proprietary

Hadoop

Edge & Data Node2 x Data Node

• Use new 3rd party data-sources of categorical data to automatically

create new variables (features). e.g consumer spend across various

categories, locations etc.

• Split and analyze features in parallel to measure predictive quality for

credit-risk and default

• Champion / Challenger: Select top ‘n’ new features and compare

against existing features in credit risk models.

• Introduce new “Golden” features once proven to enhance model

• Legacy solution took several months to code with 6 week run process

(with manual intervention). Unsuitable for production runs!

• Revolution code implemented in Hadoop using massive parallel

processing machine-learning to automate feature selection.

• 6 weeks processing reduced to a < 24 hour automated execution

process

External 3rd Party

Data Sources

Customer Credit

Process

Revolution Analytics Proprietary

What next?

• If you are using R (or SAS, SPSS, Matlab…) today and need scale,

speed, support and get on the road to Microsoft Advanced

Analytics come and talk to us!

[email protected]

[email protected]

• More info? www.revolutionanalytics.com

Revolution Analytics Proprietary

Tracks Rooms

DATA CULTURE SERIES – 12th May 2015Reading

Panel

Ric Howe Microsoft

Tim Marston HortonWorks

Andrew Morgan KPMG

Simon Field Revolution Analytics

#DataCulture

Ric H (Microsoft) Q : With all the recent announcements at //Build and Ignite, what aspects should we be most excited about

#DataCulture

Andrew M (KPMG)Q : Does Data Science really require Data Scientists

#DataCulture

Tim M (Hortonworks)Q : if Hadoop is the answer to Big Data, where is it heading…what is the future vision

#DataCulture

Simon F (Revolution)Q : With the decades of investment in SAS, why are companies moving to R and why should we?

Trial : Revolution R Open

Trial : Hadoop (HDInsight, HDP)

61

Get Hands on…

Trial : SQL Server 2014

Trial : PowerBI

Trial : Machine Learning (AzureML)

SQL Saturday – Manchester , July 25th

SQL Relay , October 12-22nd – 8 Locations

PASS BA*, London , November

PASS Summit*- US, October 27-30th

SQL Saturday – Edinburgh , June 13th

62

Community Events

62

* See Jen Stirrup for Discount

• Contact our Azure advisory team at [email protected]

• Developer showcases, demos and deep technical learning with MS experts

• http://aka.ms/tdoazure

• Connect with Microsoft and others in IoT.

• Contact [email protected]

http://aka.ms/iotworkshop• Download the Hands On Lab

• June 11th Reading: 1-Day IoT & Data “Hackathon” hands on learning• Waitlist registrants receive 1st priority for next event

IoT Next Steps

Come Back for more…

Date Location

16 September READING

10 November LONDON

27 November READING

3 December LONDON

27 January LONDON

24 February LEEDS

24 March EDINBURGH

8 April BIRMINGHAM

12 May READING

19 May LONDON

Get Ready for Sept…

UK Business Lead for BI & Advanced Analytics

65

Jon Woodward : Connect & Follow

65

@JLWoodward

www.linkedin.com/in/jonathanwoodward

#DataCulture

THANK YOU

?