INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand,...

32
© 2019 Snowflake Inc. All Rights Reserved INTRODUCTION TO SNOWFLAKE BEST PRACTICES GRAHAM MOSSMAN, SALES ENGINEER

Transcript of INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand,...

Page 1: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

INTRODUCTION TO SNOWFLAKE

BEST PRACTICESGRAHAM MOSSMAN, SALES ENGINEER

Page 2: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

AGENDA

2

• Virtual Warehouse Management

• Cost Management

• Business Unit Chargebacks

Page 3: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

VIRTUAL WAREHOUSE MANAGEMENT

Page 4: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 4

VIRTUAL WAREHOUSE MANAGEMENT

Considerations• Key SLA’s and challenges with

meeting SLA’s

• Data load and transformation workloads

• Reporting, ad hoc analysis, and data science workloads

• Cost management

Agenda• Sizes and approach to right-sizing

• Scaling up vs. scaling out

• Automating suspend/resume, sizing, and multi-cluster scale-out

• Aligning with workload patterns, environments, roles, and chargeback needs

• Monitoring workload patterns

Page 5: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

WAREHOUSE SIZESSizes Servers / Cluster Credits / Hour Notes

X-Small 1 1 Default size when created using CREATE WAREHOUSE.

Small 2 2

Medium 4 4

Large 8 8

X-Large 16 16 Default size for warehouses created in the web UI.

2X-Large 32 32

3X-Large 64 64

4X-Large 128 128

5

Page 6: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 6

RIGHT-SIZING• Start with a sizable, single query workload

• Keep in mind 1 minute billing minimum

• Linear performance improvements are cost neutral

• Step back one warehouse size when performance is no longer linear

• Workload patterns will determine best size

• Best to start undersized, increase over time as workload patterns are better understood

Page 7: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 7

SCALING UP VS. OUT Scaling Up (X-Small → 4X-Large)• Improves individual query performance• Improves data load performance concurrency

for numerous files (dozens to 1000’s) when loading a given table

• Programmatically resize a warehouse throughout the load window as workload patterns change

Scaling Out (multi-cluster warehouse)• Improves level of session/query concurrency• Set cluster size based on typical minimal

workload; auto-scaling will kick in during periods of increased query activity to meet demand and avoid queueing

• Cluster MUST be large enough for largest queryThe numbers in the grid are the

Snowflake credits consumed for an hour’s worth of compute

Page 8: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 8

AUTOMATING SUSPEND/RESUME

Auto Suspend/Resume• On-demand, end-user workloads• Suspend idle time setting should take into

account data caching

Programmatic Suspend/Resume• Scheduled jobs where process orchestration is

controlled• Programmatically resume at the start of

processing and suspend at the end of processing to avoid idle time costs

Page 9: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 9

ALIGNING WITH WORKLOADS Separation by workload pattern:

• Environments: DEV / TEST / PROD• Overlapping ELT workflows• Consumer types: reporting, ad hoc analysis,

data science• Business Units for cost tracking: marketing

data science, R&D data science, etc.Additional considerations:

● Data load performance is a function of the number files and available threads for concurrency

● Query concurrency is better optimized with multi-cluster warehouses vs a larger single cluster

● Resource monitors should be used in order to adequately govern credit usage

Data science

ETL

Dev/QA

BI/Visualization(Auto scaling)

Page 10: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

ALIGNING WITH WORKLOADS - EXAMPLE● Should reflect units of workload management

○ ETL○ BI / Dashboards○ Ad hoc Reporting○ Data Science

ContinuousLoading (4TB/day) S3

<5min SLA

Virtual Warehouse

MediumData Loads &

Transformation

Virtual Warehouse

Large

Virtual Warehouse2X-Large

Reporting(Segmented)

Ad hoc Analysis

Virtual WarehouseX-Large - Multi-Cluster

Prod DB

10

Page 11: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 11

MONITORING WORKLOADS

● The Web UI provides a visual representation of usage activity for a virtual warehouse within the last 14 days

● The WAREHOUSE_LOAD_HISTORY table function in INFORMATION_SCHEMA provides a queryable representation of usage activity for a virtual warehouse within the last 14 days

○ Excessive idle periods can be identified where the AVG_RUNNING column is 0, indicating auto suspend idle time may need to be shortened or handled programmatically

○ Excessive queuing can be identified with the AVG_QUEUED_LOAD column, indicating a possible need to resize or enable multi-clustering

● Create a process to capture daily deltas into a user table for maintaining longer periods of history and to query across all virtual warehouses at once with a single SQL statement for trend analysis

Warehouse Load Over Time is available in the WebUI by clicking on the Warehouse Name

Page 12: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

COST MANAGEMENT

Page 13: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 13

Considerations• Compute Costs• Storage Costs• Service Costs• Data Transfer (Egress) Costs• Monitoring & Alerting

COST MANAGEMENT

Agenda

● Resources Incurring Costs● Compute

○ Viewing Usage○ Resource Monitors

● Storage○ Time Travel & Fail-Safe○ Viewing Usage

● Services○ Non-warehouse compute

● Data Egress

Page 14: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

RESOURCES INCURRING COSTS

Materialized ViewsAccount

Virtual Warehouses

Databases Schemas

Tables

Permanent

Temp/Transient

AutomaticClustering

Service

Stages

Internal

Cross-RegionExtract Egress

PipesCompute Costs

Storage CostsService CostsPass-through Costs

Materialized Views

14

Page 15: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 15

• Web UI• Billing & Usage page (under Account)

• INFORMATION_SCHEMA table function• WAREHOUSE_METERING_HISTORY

• ACCOUNT USAGE share views

• WAREHOUSE_METERING_HISTORY

VIEWINGCOMPUTE USAGE

● Virtual Warehouses

Page 16: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 16

RESOURCEMONITORS

• Align with team-by-team warehouse separation for granular cost governance

• Set at account level if team-by-team quotas are not needed

• Leverage tiered triggers with escalating actions (e.g., Notify > Notify > Suspend)

• Enable notifications using ACCOUNTADMIN role and set e-mail address

Page 17: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

STORAGE FUNDAMENTALS

17

Page 18: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 18

TIME TRAVELSTORAGE

• High churn detected with ratio such as:

TIME_TRAVEL_BYTES / ACTIVE_BYTES

from TABLE_STORAGE_METRICS view

• For Enterprise (or higher), retention period can be up to 90 days; verify retention period on all large or high-churn tables

• Reduce retention period if data can be regenerated/reloaded and time/effort to do so is within acceptable boundaries/SLAs

• Use periodic zero-copy-cloning (snapshots) instead of time travel to provide longer retention period at discrete points in time (daily, weekly, etc)

Areas Of Focus• Dimensional Tables• Persistent Staging Areas• Materialized Relationships,

Derivations, Other Business Rules

Page 19: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 19

FAIL-SAFESTORAGE

• Permanent tables follow full CDP lifecycle; temp/transient tables NEVER use fail-safe

• Utilize temp tables for session-specific intermediate results in complex data processing workflow

• Temporary tables are dropped (and storage released) as soon as session ends

• Utilize transient tables for staging where frequent truncate/reload operations occur

• Consider designating databases/schemas as transient to simplify table creation

Areas Of Focus• Staging Tables• Intermediate Result Tables• Work Areas for Developers, Analysts

& Data Scientists• Reporting Tool Materialized Results

Page 20: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 20

• Web UI• Billing & Usage page (under Account)• Tables (under Databases)

• SHOW TABLES / MATERIALIZED VIEWS• INFORMATION_SCHEMA views

• TABLES• TABLE_STORAGE_METRICS

• INFORMATION_SCHEMA table function• STAGE_STORAGE_USAGE_HISTORY

for daily storage by internal stage• ACCOUNT USAGE share views

• TABLE_STORAGE_METRICS for active, time travel and fail-safe storage

• STAGE_STORAGE_USAGE_HISTORY for daily storage by internal stage

VIEWINGSTORAGE USAGE

● Tables

○ Active/Current Storage○ Time Travel Storage

○ Fail-Safe Storage● Materialized Views● Internal Stage

Page 21: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 21

• Web UI• Billing & Usage page (under Account)• Special warehouse entry per service:

■ AUTOMATIC_CLUSTERING

■ MATERIALIZED_VIEW_MAINTENANCE

■ SNOWPIPE

• INFORMATION_SCHEMA table function• AUTOMATIC_CLUSTERING_HISTORY

• MATERIALIZED_VIEW_REFRESH_HISTORY

• PIPE_USAGE_HISTORY for daily storage by internal stage

• ACCOUNT USAGE share views• PIPE_USAGE_HISTORY

● Automatic Clustering

● Materialized Views

● Snowpipe

VIEWINGSERVICES USAGE

Page 22: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 22

● Data exits cloud provider region○ To another region within the

same cloud provider

○ To different cloud provider

● Data Export via COPY INTO

● Data Replication (in preview)

VIEWINGDATA EGRESS

• Web UI• Billing & Usage page (under Account)

• INFORMATION_SCHEMA table function• DATA_TRANSFER_HISTORY table

function for data transfer events across an entire account

• ACCOUNT USAGE share views• DATA_TRANSFER_HISTORY

• For customers under capacity contracts, this is a pass-through charge; on-demand customers pay a small markup for egress charges.

Page 23: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

BUSINESS UNITCHARGEBACKS

Page 24: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 24

BUSINESS UNIT CHARGEBACKS

Agenda• Designing for Cost Allocations

• Snowflake Shared Database

• Allocating Chargebacks

Considerations• Business Units Supported

• Teams Incurring Costs

• Granularity of Chargebacks

Page 25: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

DESIGNING FOR CHARGEBACKSConceptual layer defined with naming conventions and governed with RBAC

Account

BusinessUnit 1

Virtual Warehouses

Databases Schemas

Tables

Permanent

Temp/Transient

AutomaticClustering

ServiceMaterialized

Views

Stages

Internal

Cross-RegionExtract Egress

Pipes

BusinessUnit 2

Virtual Warehouses

Databases Schemas

Tables

Permanent

Temp/Transient

AutomaticClustering

ServiceMaterialized

Views

Stages

Internal

Cross-RegionExtract Egress

Pipes

Business Unit 1

Business Unit 1

Data Science Virtual Warehouses

25

Page 26: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 26

SNOWFLAKE SHARED DATABASE

• ACCOUNT_USAGE Schema• READER_ACCOUNT_USAGE

Schema

ACCOUNT_USAGE• Warehouse, Storage, Transfer, and

most Information Schema views • Includes records for dropped objects• Retention time of 1 year• Data latency of 45 min to 3 hours

READER_ACCOUNT_USAGE• Similar views for Reader Account

usage (Warehouse, Query History, Load History)

Page 27: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved 27

ALLOCATING CHARGEBACKS

• Separate compute and storage resources between each relevant business unit or cost center

• Use well defined naming conventions to name warehouses and databases according to the owning business units

• Govern resource use with role based access control (RBAC)

• Use the SNOWFLAKE shared database to develop custom reporting to automate tracking

• Business Units & Cost Centers

• Warehouses & Databases

• RBAC• Reporting

Page 28: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

SNOWFLAKE PROFESSIONAL SERVICES

Page 29: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

Reveal additional use cases for modern data analytics & data

sharing for even greater benefits

Identify New Use CasesShorten Time to Value

Achieve project outcomes faster and deliver data-driven insights and ROI

sooner than you expected

Efficient Consumption

Guidance and knowledge transferto help utilize Snowflake

fully and efficiently

WHY ENGAGE WITHSNOWFLAKE PROFESSIONAL SERVICES

29

Best Practices

Migration Readiness

Package Offerings:

Role Based Security

Snowflake 360

Custom Packages

Technical Account Manager

Page 30: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

Technical Resources Learn all the content and ways to

get help. Find from blogs and

articles to ideas and

announcements.

LODGE COMMUNITY

PROFESSIONAL SERVICES

VISIT THE SERVICE HUBs!

Learn About Best

PracticesCome learn the tips our team has

identified across our customer

base.

CUSTOMER SUCCESS

Optimize SnowflakeLearn about our available service

offerings and how we can help

optimize your Snowflake

implementation.

CUSTOMER SUPPORT

Discuss Issues You’ve

EncounteredChat with Support Engineers, live,

about issues you’re having and get

advice on potential resolutions.

Provide FeedbackAlready a community member? Tell

us what is working and what is not

in the Lodge.

Proactive Support We are using Snowflake on

Snowflake to get proactive about

helping customers solve problems,

before they become bigger issues.

Learn From Other

Customers Share with us your use case and

learn what others are doing with

similar needs.

Tailored SolutionsAlready engaged with a partner?

Let’s work together. Experiencing

problems or just not sure where to

start? We can design a solution to

help.

Page 31: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

Questions?

Page 32: INTRODUCTION TO SNOWFLAKE BEST PRACTICES€¦ · SUSPEND/RESUME Auto Suspend/Resume • On-demand, end-user workloads • Suspend idle time setting should take into account data caching

© 2019 Snowflake Inc. All Rights Reserved

Thank You