Google на конференции Big Data Russia

28
Google confidential Do not distribute Google confidential Do not distribute Big Data with Google Cloud Platform Focus on insight, not infrastructure Daniel Bergqvist Solution Engineer, Big Data Technologies Olga Strelova Cloud Platform Sales, Tel: +7 495 734-71-41, [email protected]

description

Презентация от компании Google Russia — на конференции Big Data Russia (http://bigdatarussia.ru/).

Transcript of Google на конференции Big Data Russia

Page 1: Google на конференции Big Data Russia

Google confidential │ Do not distribute Google confidential │ Do not distribute

Big Data with Google Cloud PlatformFocus on insight, not infrastructure

Daniel BergqvistSolution Engineer, Big Data TechnologiesOlga StrelovaCloud Platform Sales,Tel: +7 495 734-71-41, [email protected]

Page 2: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Why Big Data?

Page 3: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Big Data is driving Big Value

Used data from telematic sensors in over 46K vehicles to:● Reduce daily routes by

85 million miles ● Saved 8.4 million gallons

of fuel● Saved over $30 million

in miles cut/driver/day

Created Snapshot device to collect data on driving habits and user behavior in real-time

Calculated applicable discount to driver’s monthly premium based on their individual behavior

Analyzed the activity of their entire customer base (over 7M customers and 19B images)

Uncovered trends that improved customer acquisition, retention and value through optimized marketing

Page 4: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Trends

Increasing Digitization of Human & Economic

Activity

Falling Costs of Storage & Computing

Increasing Pace of Innovation

Page 5: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Opportunities with Big Data

Recognize and seize market trends before your competitors

Capture business value from information

Create a smarter, learning organization

1

2

3

Page 6: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Complex technical infrastructure to

support distributed computing

Requires specialized expertise

Big Data is Hard Big Data is Expensive

Time consuming

Big Data remains inaccessible

Storage costs scale with larger

datasets

Computing resources must be provisioned for peak-loads

Personnel are expensive

Page 7: Google на конференции Big Data Russia

Google confidential │ Do not distribute

No complex data architecture

required

Use the technical and

product skillsets you already have

Big Data is Hard Big Data is Expensive

Google is making Big Data accessible

Pay on-demand for only the

resources you use

Take advantage of falling prices

& Moore’s Law

Reduce infrastructure management

burden

EasyAffordable

Query within seconds and get real-time

results

Page 8: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Where did these come from?

Page 9: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Cloud Storage Cloud SQL Cloud

Datastore

To organize the world’s information and make it universally accessible and useful

Page 10: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Page 11: Google на конференции Big Data Russia

Search1B Searches/Month >25% of F500 (GSA)

Android1.5M+ activation per day

900+ M devices

YouTube100 hours of video

uploaded per minute

G+500M+ accounts;

135M+ active in stream

Apps500M+ Gmail

Google Services in Numbers

Chrome310M+ browser users

Maps & Earth1B+ downloads; 200M+ mobile;

10M+ activations on iOS

Cloud Platform4.75M+ apps; 250K+

developers

Page 12: Google на конференции Big Data Russia
Page 13: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Google is a pioneer in Big Data

SpannerDremelMapReduce

Big Table Colossus

2012 20132002 2004 2006 2008 2010

GFS MillWheel

Flume

Page 14: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Store

Cloud Storage Cloud SQL Cloud

Datastore

Capture Analyze

BigQuery Dataflow

We help you manage the entire lifecycle of Big Data

Open Source Tools

Pub/Sub

Process

DataflowStorage DatastoreSQL

Page 15: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Computing Patterns

BigQuery

Open Source Tools

Cloud Pub/Sub

Cloud Dataflow

Our Big Data products

• Successor to MapReduce and based on Google technologies, including Flume and MillWheel• Fully managed service• Create data pipelines that ingest, transform and analyze in batch or streaming mode • Takes care of deploying, maintaining and scaling infrastructure

• Interactive analysis of large scale datasets, providing real-time insights• Run fast, SQL queries against virtually limitless datasets in seconds• Full visibility and control with pricing, only pay for querying and storage• No complex data architecture required

• Event management system that simplifies analytics application architecture • Connect your services with reliable, many-to-many asynchronous messaging• Guarantees that messages will be delivered whether or not all consumers are online • Provides a single global ingestion point, not dependent on zone or regional availability• Scales to what you need with no wasted capacity

• Run Hadoop and other FOSS on Cloud platform; take advantage of performance, ease of use and cost efficiency • Using cloud resources eliminates capital costs and reduces administration time• With one command line, start a cluster running Hadoop, Hive, Pig, Spark or Shark in order to get up and running

quickly and without worrying about configuration hassles• Using GCP storage products allows you to take advantage of accessing data within any Hadoop deployment

Page 16: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Lets look at specific examples

Page 17: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Using Google Cloud Platform for marketing analytics enables a deeper understanding of how marketing investments are performing

What Cloud Platform offers:

● Easily micro-segment by looking for discreet patterns in large sets of customer data

● Measure campaigns by combining multiple datasets that can track campaigns across channels and users across stages of the buying funnel

● Market-mix modeling to optimize spend across channels

● Identify patterns and trends in real-time to improve customer acquisition and ROI

1. Marketing Analytics The Technology

Integration between Google Analytics Premium and BigQuery allows for data mashups, analysis of user interaction across multiple devices, and complex queries at lightening speed to gain deeper, broader insights

Cloud Dataflow helps you ingest and analyze data from both live campaigns, existing CRM tools, and any other data sources you need

Open Source Tools and Connectors allow you to harness the power of many open-source tools such as Hadoop and Spark to provide flexibility when analyzing campaign data BigQuery enables interactive analysis of unlimited amounts of data allowing you to seize opportunities and optimize in a timely manner, thereby increasing acquisition and ROI

Page 18: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Home furnishing retailer Rooms To Go simplifies the consumer

shopping experience by offering completely designed room

packages.

Boosting Sales While Improving Shopping Experience

Page 19: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Using Google Cloud Platform for sensor data & IoT enables use of diffuse data sources to optimize large-scale systems & improve production processes

What Cloud Platform offers:

● Scalable, reliable platform for capturing and managing IoT data

● Ability to run analytics (streaming and historical) over this data

● Improve customer experiences based on faster responses to events

● Cost effective storage needed to process vast amounts of data

2. Sensor Data & IoT The Technology

Google Cloud Storage, Cloud SQL, and Datastore provide scalable and secure ways to store data

Pub/Sub provides a reliable system for event collection and management

Dataflow allows to filter, aggregate and enrich data both for streaming and batch analysis under one API

BigQuery allows for interactive analysis of unlimited data to uncover trends in large databases and across all customers in order to improve customer experience

Page 20: Google на конференции Big Data Russia

Connected Equipments/Devices

Lennox International Inc. is an American company. Through its subsidiaries, it is a provider of climate control products for the heating, ventilation, air conditioning, and refrigeration markets in housing and commercial sectors around the world.

Goal: Capture detailed product performance data and ambient conditions from the installed units for better innovation and customer service

● Innovation: Finding out areas for product improvements and new designs

● Customer Delight: Providing energy settings advice proactively to customer based on usage, weather conditions etc...

● Customer Service: Predictive maintenance to avoid major breakdowns

● Cost Savings: Better understanding of failure points feeding back into better design, helping reduce warranty and replacement costs

Page 21: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Using Google Cloud Platform for Log Dataenables easy management of massive log files constantly ingesting real-time data with much shorter response times

What Cloud Platform offers:

● Better management of massive log files● An efficient platform for capturing, managing

and analyzing IoT infrastructure● The ability to continuously identify customer

trends and take timely actions

3. Log Data The Technology

BigQuery handles log files of massive volume, constantly ingesting real-time data with much shorter response times

Pub/Sub provides a fully managed service for reliable event ingestion, distribution and notifications, which automatically scales to what you need with no wasted capacity

Dataflow is a pipeline management system that allows you to examine a real-time stream of data as well as compare it to historical data in order to capture significant patterns and activities

Apps running in Compute Engine and App Engine benefit from advanced log analytics based on data streaming with real-time alerts

Page 22: Google на конференции Big Data Russia

Phones

BigQuery Storage

BigQuery Workflows

Big Query

Hadoop MapReduce Workflows

Compute Engine

App Engine

Cloud Storage

Big Query• Business Analysts• Applications • Visualizations

Motorola

Page 23: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Using Google Cloud Platform for SaaSenables ease of management for analytics

What Cloud Platform offers:

● Ease of integration with open source tools

● A platform to capture, process and analyze large scale analytics without needing to worry about building a complex infrastructure

● Technology that scales and requires minimal administration

● The most cost effective, fastest way to store and analyze data

4. SaaS The TechnologyConnectors and Tools for Hadoop data sources allow you to easily install different open source processing frameworks such as Spark, Shark, Hive and Pig to take advantage of interoperability and portability within all these frameworks as well as other Google Cloud Platform products under one system

Dataflow takes care of ingestion, transformation and analysis of data, providing real-time access to application and consumer data across a set of devices

Compute Engine allows you to easily scale up and down depending on your workload. Also, per minute billing lets you pay for exactly what you use and sustained-use discounts automatically reward you for running steady-state workloads

BigQuery provides a 99.9% uptime SLA and you only pay for the storage you need and queries you run, giving you full visibility and control

Cloud Storage and Big Query require no hardware/software eliminating capital expenditure or the need to build complex infrastructure

Page 24: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Streak - CRM in email

Managing millions of interactions and recommendations/day with Prediction API and BigQuery

Page 25: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Using Google Cloud Platform for Hadoop Workloads enables an easy and effective way to unlock the power of the Apache Hadoop framework

What Cloud Platform offers:

● Quick startup times● Unmatched value with per-minute billing to optimize

for scale and speed● Agility to mix and match data with multiple open

source software and cloud services without worrying about configuration

● Greater stability for running Hadoop● Flexibility and control of resizing your cluster

depending on workload● An easy way to leverage the Hadoop framework

without worrying about investing in costly infrastructures and administration

5. Traditional Hadoop Workloads The Technology

Compute Engine virtual machines start in seconds

bdutil allows you to easily deploy and use the best tools from the open-source ecosystem. With one command line, you can start a cluster running Hadoop, Hive, Pig, Spark or Shark in order to get up and running quickly without worrying about configuration hassles

Cloud Storage frees you from the burden of investing in complex disks and machines and provides flexibility to scale up and down when needed

Connectors provide access to Cloud Storage, BigQuery and Datastore, which allow you to turn down your cluster without losing any of your data and take advantage of accessing your data within any of your Hadoop deployments

Page 26: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Cdiscount.com

France's largest e-commerce site, Cdiscount.com, is using Compute Engine because it's 15x faster than their on premise data warehouse.

Page 27: Google на конференции Big Data Russia

Google confidential │ Do not distribute

Google probably processes more information than any company on the

planet and tends to have to invent tools to cope with the data. As a result its

technology runs a good five to 10 years ahead of the competition.

Bloomberg Businessweek, June 2014

Page 28: Google на конференции Big Data Russia

Google confidential │ Do not distribute