Data Lake Data Pipeline & (in Google Cloud)
Transcript of Data Lake Data Pipeline & (in Google Cloud)
Megazone Google Cloud Team
Jung hoo Park
Data Pipeline & Data Lake
(in Google Cloud)
● What is a Data Pipeline?
● Data Engineering in Google Cloud
Agenda
01What is a Data Pipeline?
Conceptual Data Pipeline
Application ETL / ELT Data Analyst
PC Game
Data Scientist
Mobile Game
Media Contents Delivery Service
etc….
Ingest
Processing
Governance
Data Wrangling
Visualization
Report & Dashboard
Model Training
Model Serving
Streaming
Batch
서버 구매 서버설정 OS 인스톨 OS 설정 OS 최적화 OS 디버그 프로비저닝
재설정
스케일
일반적인 데이터 처리 환경 프로비저닝
02Google Cloud Service for Data Pipeline
StackdriverLogging
CloudPub/Sub
CloudStorage
Cloud IoT Core
CloudDatastore
CloudBigtable
CloudDataproc
CloudData ow
BigQuery
CloudDatalab
Ingest Process Store Analyze
CloudSpanner
Visualize
BigQuery
CloudDatalab
Data Studio
BigQueryStreaming API
3rd Party
Transfer Appliance
CloudDatalab
Transfer Service
CloudSQL
Cloud Dataprep
Cloud Dataprep
CloudComposer
StackdriverLogging
CloudPub/Sub
CloudStorage
Cloud IoT Core
CloudDatastore
CloudBigtable
CloudData ow
BigQuery
CloudDatalab
Ingest Process Store Analyze
CloudSpanner
Visualize
BigQuery
CloudDatalab
Data Studio
BigQueryStreaming API
3rd Party
Transfer Appliance
CloudDatalab
Transfer Service
CloudSQL
Cloud Dataprep
Cloud Dataprep
CloudComposer
CloudDataproc
Ingest
Cloud Pub/Sub
Global by default
No provisioning, auto-everything
Exactly-once processing
Seek and replay
Storage Transfer Service
Centralized job management
High-performance copies
Transfer data from cloud to cloud
Data security
Transfer data from bucket to bucket
Processing
Cloud Dataflow
Dataflow SQL
Dataflow Template
Streaming Engine
Inline Monitoring
Dataflow Shuffle
Auto-Scaling
Cloud Composer
Integration
Python Language
Multi-Cloud
Fully-Managed
Hybrid
Open Source
Store & Analyze
Cloud Storage
Pub/Sub Notifications for Cloud Storage
Customer-managed encryption keys
Object Version Management
Cloud Audit Logs with Cloud Storage
Retention Policies
Object Life-cycle Management
Google Cloud Storage Class
BigQuery
Foundation for AI & BI
Big data ecosystem integration
Petabyte Scale
Geo-expansion
Data Transfer Service
Serverless
03Demo
Architecture: Demo Scenario
BigQuery
Real-Time Events
Cloud Pub/Sub Cloud Dataflow
Thank you