Data Lake Data Pipeline & (in Google Cloud)

24
Megazone Google Cloud Team Jung hoo Park Data Pipeline & Data Lake (in Google Cloud)

Transcript of Data Lake Data Pipeline & (in Google Cloud)

Page 1: Data Lake Data Pipeline & (in Google Cloud)

Megazone Google Cloud Team

Jung hoo Park

Data Pipeline & Data Lake

(in Google Cloud)

Page 2: Data Lake Data Pipeline & (in Google Cloud)

● What is a Data Pipeline?

● Data Engineering in Google Cloud

Agenda

Page 3: Data Lake Data Pipeline & (in Google Cloud)

01What is a Data Pipeline?

Page 4: Data Lake Data Pipeline & (in Google Cloud)
Page 5: Data Lake Data Pipeline & (in Google Cloud)

Conceptual Data Pipeline

Application ETL / ELT Data Analyst

PC Game

Data Scientist

Mobile Game

Media Contents Delivery Service

etc….

Ingest

Processing

Governance

Data Wrangling

Visualization

Report & Dashboard

Model Training

Model Serving

Streaming

Batch

Page 6: Data Lake Data Pipeline & (in Google Cloud)
Page 7: Data Lake Data Pipeline & (in Google Cloud)

서버 구매 서버설정 OS 인스톨 OS 설정 OS 최적화 OS 디버그 프로비저닝

재설정

스케일

일반적인 데이터 처리 환경 프로비저닝

Page 8: Data Lake Data Pipeline & (in Google Cloud)
Page 9: Data Lake Data Pipeline & (in Google Cloud)

02Google Cloud Service for Data Pipeline

Page 10: Data Lake Data Pipeline & (in Google Cloud)

StackdriverLogging

CloudPub/Sub

CloudStorage

Cloud IoT Core

CloudDatastore

CloudBigtable

CloudDataproc

CloudData ow

BigQuery

CloudDatalab

Ingest Process Store Analyze

CloudSpanner

Visualize

BigQuery

CloudDatalab

Data Studio

BigQueryStreaming API

3rd Party

Transfer Appliance

CloudDatalab

Transfer Service

CloudSQL

Cloud Dataprep

Cloud Dataprep

CloudComposer

Page 11: Data Lake Data Pipeline & (in Google Cloud)

StackdriverLogging

CloudPub/Sub

CloudStorage

Cloud IoT Core

CloudDatastore

CloudBigtable

CloudData ow

BigQuery

CloudDatalab

Ingest Process Store Analyze

CloudSpanner

Visualize

BigQuery

CloudDatalab

Data Studio

BigQueryStreaming API

3rd Party

Transfer Appliance

CloudDatalab

Transfer Service

CloudSQL

Cloud Dataprep

Cloud Dataprep

CloudComposer

CloudDataproc

Page 12: Data Lake Data Pipeline & (in Google Cloud)

Ingest

Page 13: Data Lake Data Pipeline & (in Google Cloud)

Cloud Pub/Sub

Global by default

No provisioning, auto-everything

Exactly-once processing

Seek and replay

Page 14: Data Lake Data Pipeline & (in Google Cloud)

Storage Transfer Service

Centralized job management

High-performance copies

Transfer data from cloud to cloud

Data security

Transfer data from bucket to bucket

Page 15: Data Lake Data Pipeline & (in Google Cloud)

Processing

Page 16: Data Lake Data Pipeline & (in Google Cloud)

Cloud Dataflow

Dataflow SQL

Dataflow Template

Streaming Engine

Inline Monitoring

Dataflow Shuffle

Auto-Scaling

Page 17: Data Lake Data Pipeline & (in Google Cloud)

Cloud Composer

Integration

Python Language

Multi-Cloud

Fully-Managed

Hybrid

Open Source

Page 18: Data Lake Data Pipeline & (in Google Cloud)

Store & Analyze

Page 19: Data Lake Data Pipeline & (in Google Cloud)

Cloud Storage

Pub/Sub Notifications for Cloud Storage

Customer-managed encryption keys

Object Version Management

Cloud Audit Logs with Cloud Storage

Retention Policies

Object Life-cycle Management

Page 20: Data Lake Data Pipeline & (in Google Cloud)

Google Cloud Storage Class

Page 21: Data Lake Data Pipeline & (in Google Cloud)

BigQuery

Foundation for AI & BI

Big data ecosystem integration

Petabyte Scale

Geo-expansion

Data Transfer Service

Serverless

Page 22: Data Lake Data Pipeline & (in Google Cloud)

03Demo

Page 23: Data Lake Data Pipeline & (in Google Cloud)

Architecture: Demo Scenario

BigQuery

Real-Time Events

Cloud Pub/Sub Cloud Dataflow

Page 24: Data Lake Data Pipeline & (in Google Cloud)

Thank you