CloudML talk at DevFest Madurai 2016

Post on 14-Jan-2017

37 views 0 download

Transcript of CloudML talk at DevFest Madurai 2016

● BigData● BigQuery● CloudML● Cloud API

Karthik PadmanabhanDeveloper Relations

@ karthik_padman

Big data and machine learning at Google

Big Query Cloud Dataflow Cloud ML

Anything you can ask in SQL

Parallel processing, batch and stream

Machine learning, neural networks

Big data and machine learning at Google

Big Query Cloud Dataflow Cloud ML

Anything you can ask in SQL

Parallel processing, batch and stream

Machine learning, neural networks

Big data and machine learning at Google

Big Query Cloud Dataflow Cloud ML

Apache Beam Tensorflow

Open source

Big data and machine learning at Google

Big Query Cloud Dataflow Cloud ML

Apache Beam Tensorflow

Vision API

Speech API

Translate API

Pre-trained models

Cloud Dataflow demo

Vision API demo

Tensorflow demo

Photo credit: Matt Chanphoto credit - isaiah115 on flickr

Photo credit: Matt Chan

Google Research Publications

Google Research Publications

Open Source Implementations

Bigtable

Flume

Dremel

Managed Cloud Versions

Bigtable Bigtable

Flume Dataflow

Dremel BigQuery

BigQuery demo

Google BigQueryGoogle BigQuery

02 Count some stuff

SELECT count(word)FROM publicdata:samples.shakespeare

Words in Shakespeare

SELECT sum(requests) as totalFROM [fh-bigquery:wikipedia.pagecounts_20150212_01]

Wikipedia hits over 1 hour

SELECT sum(requests) as totalFROM [fh-bigquery:wikipedia.pagecounts_201505]

Wikipedia hits over 1 month

Several years of Wikipedia data

SELECT sum(requests) as totalFROM [fh-bigquery:wikipedia.pagecounts_201105], [fh-bigquery:wikipedia.pagecounts_201106], [fh-bigquery:wikipedia.pagecounts_201107],

...

SELECT SUM(requests) AS totalFROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")')

Several years of Wikipedia data

How about a RegExp

SELECT SUM(requests) AS totalFROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")')WHERE (REGEXP_MATCH(title, '.*[dD]inosaur.*'))

03 How did it do that?o_O

Qualities of a good RDBMS

Qualities of a good RDBMS

● Inserts & locking● Indexing● Cache● Query planning

Qualities of a good RDBMS

● Inserts & locking● Indexing● Cache● Query planning

Storing data

-- -- -- ---- -- -- ---- -- -- --

Table

Columns

Disks

Reading data: Life of a BigQuery

SELECT sum(requests) as sumFROM ( SELECT requests, title FROM [fh-bigquery:wikipedia.pagecounts_201501] WHERE (REGEXP_MATCH(title, '[Jj]en.+')) )

Life of a BigQuery

L L

MMixer

Leaf

Storage

L L L L

M M

M

Life of a BigQuery

Root Mixer

Mixer

Leaf

Storage

Life of a BigQueryQuery

L L L L

M M

MRoot Mixer

Mixer

Leaf

Storage

Life of a BigQueryLife of a BigQuery

Root Mixer

Mixer

Leaf

StorageSELECT requests, title

L L L L

M M

M

Life of a BigQueryLife of a BigQuery

Root Mixer

Mixer

Leaf

Storage5.4 Bil

SELECT requests, title

WHERE (REGEXP_MATCH(title, '[Jj]en.+'))L L L L

M M

M

Life of a BigQueryLife of a BigQuery

Root Mixer

Mixer

Leaf

Storage5.4 Bil

SELECT sum(requests)

5.8 MilWHERE (REGEXP_MATCH(title, '[Jj]en.+'))

SELECT requests, title

L L L L

M M

M

Life of a BigQueryLife of a BigQuery

Root Mixer

Mixer

Leaf

Storage5.4 Bil

SELECT sum(requests)

5.8 MilWHERE (REGEXP_MATCH(title, '[Jj]en.+'))

SELECT requests, title

SELECT sum(requests)

L L L L

M M

M

04 Something Useful Use Wikipedia data to pick a movie

1. Wikipedia edits2. ???3. Movie recommendation

Follow the edits

Follow the edits

Same editor

select title, id, count(id) as editsfrom [publicdata:samples.wikipedia]where title contains 'Hackers' and title contains '(film)' and wp_namespace = 0group by title, idorder by editslimit 10

Pick a great movie

select title, id, count(id) as edits from [publicdata:samples.wikipedia]where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where

id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)'group each by title, idorder by edits desclimit 100

Find edits in common

Discover the most broadly popular filmsselect id from ( select id, count(id) as edits from [publicdata:samples.wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20)

Edits in common, minus broadly popularselect title, id, count(id) as edits from [publicdata:samples.wikipedia]where contributor_id in ( select contributor_id from [publicdata:samples.wikipedia] where

id=264176 and contributor_id is not null and is_bot is null and wp_namespace = 0 and title CONTAINS '(film)' group by contributor_id) and wp_namespace = 0 and id != 264176 and title CONTAINS '(film)' and id not in (

select id from ( select id, count(id) as edits from [publicdata:samples.wikipedia] where wp_namespace = 0 and title CONTAINS '(film)' group each by id order by edits desc limit 20 ) )group each by title, idorder by edits desclimit 100

Interesting challenges await

The plan

01

02

03

04

05

A (very) brief overview of machine learning

Vision API

Speech API

Natural Language API

Tears (of joy)

Confidential & ProprietaryGoogle Cloud Platform 51

Machine Learning is

using many examples to answer questions

Confidential & ProprietaryGoogle Cloud Platform 52

Confidential & ProprietaryGoogle Cloud Platform 53

Why the sudden explosion in machine learning?

Confidential & ProprietaryGoogle Cloud Platform 54

Confidential & ProprietaryGoogle Cloud Platform 55

Confidential & ProprietaryGoogle Cloud Platform 56

Confidential & ProprietaryGoogle Cloud Platform 57

Google Cloud is

The Datacenter as a Computer

Confidential & ProprietaryGoogle Cloud Platform 58

Confidential & ProprietaryGoogle Cloud Platform 59

Confidential & ProprietaryGoogle Cloud Platform 60

Confidential & ProprietaryGoogle Cloud Platform 61

Confidential & ProprietaryGoogle Cloud Platform 62

So what's special?

● Sound → Text

● Pixels → Meaning

Understanding the real world is hard

Confidential & ProprietaryGoogle Cloud Platform 63

How can we make it easier?

Confidential & ProprietaryGoogle Cloud Platform 64

Cloud Speech API Cloud Vision API

Confidential & ProprietaryGoogle Cloud Platform 6565

Speech API● Speech to text transcription in over 80 languages

● Supports streaming and non-streaming recognition

● Filters inappropriate content

● Demo!

67

{ "labelAnnotations": [ { "mid": "/m/0c9ph5", "description": "Flower", "score": 98 }, { "mid": "/m/05s2s", "description": "Plant", "score": 93 }, { "mid": "/m/03bmqb", "description": "Flora", "score": 83 }, { "mid": "/m/0k3b9", "description": "Hydrangea", "score": 81 }, ] }

Label Detection

67

68

{

"landmarkAnnotations" : [

{

"boundingPoly" : {

"vertices" : [

{

"x" : 52,

"y" : 25

},

...

]

},

"mid" : "\/m\/0b__kbm",

"score" : 0.4231607,

"description" : "The Wizarding World of Harry Potter",

"locations" : [

{

"latLng" : {

"longitude" : -81.471261,

"latitude" : 28.473

}

}

]

}

]

}

Landmark Detection

68

69

{..."itemListElement": [ { "@type": "EntitySearchResult", "result": { "@id": "kg:/m/0b__kbm", "name": "The Wizarding World of Harry Potter", ...

"detailedDescription": { "articleBody": "The Wizarding World of Harry Potter is a themed area spanning two theme parks – Islands of Adventure and Universal Studios Florida – at the Universal Orlando Resort in Orlando, Florida, USA.\n",

...

Knowledge Graph sidebarGET https://kgsearch.googleapis.com/v1/entities:search?ids=%2Fm%2F0b__kbm&key={API_KEY}

70

"faceAnnotations" : [

{

"headwearLikelihood" : "VERY_UNLIKELY",

"surpriseLikelihood" : "VERY_UNLIKELY",

"rollAngle" : 8.5484314,

"angerLikelihood" : "VERY_UNLIKELY",

"detectionConfidence" : 0.9996134,

"joyLikelihood" : "VERY_LIKELY",

"panAngle" : 18.178885,

"sorrowLikelihood" : "VERY_UNLIKELY",

"tiltAngle" : -12.244568,

"underExposedLikelihood" : "VERY_UNLIKELY",

"blurredLikelihood" : "VERY_UNLIKELY"

"landmarks" : [

{

"type" : "LEFT_EYE",

"position" : {

"x" : 268.25815,

"y" : 491.55255,

"z" : -0.0022390306

}

},

...

Face Detection

70

{

"type" : "RIGHT_EYE",

"position" : {

"x" : 418.42868,

"y" : 508.22632,

"z" : 49.302765

}

},

{

"type" : "MIDPOINT_BETWEEN_EYES",

"position" : {

"x" : 359.86551,

"y" : 500.2868,

"z" : -7.9241152

}

},

{

"type" : "NOSE_TIP",

"position" : {

"x" : 358.51404,

"y" : 611.80286,

"z" : -31.350466

}

},

...

Confidential & ProprietaryGoogle Cloud Platform 71

Confidential & ProprietaryGoogle Cloud Platform 72

Confidential & ProprietaryGoogle Cloud Platform 73

Confidential & ProprietaryGoogle Cloud Platform 74

How about some meaning in those words?

Confidential & ProprietaryGoogle Cloud Platform 75

Natural Language API

Three methods:

1. Analyze entities - Montreal is a city in Canada

2. Analyze sentiment - I love Montreal

3. Analyze syntax - Michelle Obama is married to

Barack Obama

Confidential & ProprietaryGoogle Cloud Platform 76

https://cloud.google.com/nl

77

Free tears!

78

● Vision API - 1,000 requests / month

● Speech API - 60 minutes / month

● Natural Language API - 5,000 units /

month (1 unit = 1000 unicode

characters)

Free tears!tiers

Thank you!@karthik_padman

Resources:

Speech APIcloud.google.com/speech

Vision APIcloud.google.com/vision

Natural Language APIcloud.google.com/nl

Thank you!

Karthik PadmanabhanDeveloper RelationsGoogle Cloud Platform@ karthik_padman

Try BigQuery: bigquery.google.comCloud APICloudML

Slides:

About you

● Game developers?● Data people?● Students?● Not techies at all?