What's New with AWS Lake Formation

34
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sanjay Srivastava - Product manager, AWS Lake Formation Mert Hocanin – Big data architect, AWS Lake Formation August 2021 What's New with AWS Lake Formation Securing and Governing Your Data Lake

Transcript of What's New with AWS Lake Formation

Page 1: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Sanjay Srivastava - Product manager, AWS Lake Formation

Mert Hocanin – Big data architect, AWS Lake Formation

August 2021

What's New with AWS Lake FormationSecuring and Governing Your Data Lake

Page 2: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates.

What is a data lake?

A data lake is a centralized, curated, and secured repository that stores all your data, both in its original form and prepared for

analysis.

A data lake enables you to break down data silos and combine different types of analytics and ML to gain insights and guide better

business decisions.

Page 3: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates.

The Lake House Approach

S C A L A B L E D A T A L A K E S

P U R P O S E - B U I L T D A T A S E R V I C E S

A U T O M A T E D D A T A M O V E M E N T

C E N T R A L G O V E R N A N C E

P E R F O R M A N T A N D C O S T - E F F E C T I V E

Non-relational databases

Machinelearning

Datawarehousing

Loganalytics

Big data processing

Relationaldatabases

GovernedStorage

Amazon S3

Page 4: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Aurora

Amazon DynamoDB

Amazon Elasticsearch

Service

Amazon SageMaker

AmazonEMR

Amazon Redshift

Build data lakes quickly

Easily discover and share data

Simplify security management

Catalog all of your data assets and easilyshare datasets between consumers

Centrally define and enforce security, governance, and auditing policies

Move, store, update, and catalog your data fasterAutomatically organize and optimize your data

AWS Lake Formation

Build a secure data lake in days

Page 5: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Lake Formation - Recap

Page 6: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Components of the security and governance layer

S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data1.csv

S3 Bucket S3 Partitions S3 Object

Data is organized inApache Hive style tables

2018

Oct. Nov. Dec.

29 30 1

Americas

Page 7: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Components of the security and governance layer

Data is organized inApache Hive style tables

Data catalogs providedatabase and table abstraction

2018

S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data1.csv

Oct. Nov. Dec.

29 30 1

Americas

S3 Object

S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data3.csv…

S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data2.csv

AWS Glue Data Catalog

Database 1

Database 2

Page 8: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Components of the security and governance layer

Data is organized inApache Hive style tables

Data catalogs providedatabase and table abstraction

2018

S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data1.csv

Oct. Nov. Dec.

29 30 1

Americas

S3 Object

S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data3.csv…

S3://IOTDeviceData/region=Americas/year=2018/month=Nov/day=30/data2.csv

AWS Glue Data Catalog

Database 1

Database 2

Lake Formation provides authorization layer over Glue Data Catalog

Fine grained Access ControlsDatabaseTableColumnsRows (preview)

Page 9: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Centralized security and governance layer

AWS Lake Formation

Amazon Redshift Spectrum

AWS Glue ETL

Amazon Athena

Amazon EMR

Partner

solutions

Data is organized inApache Hive style tables

Data catalogs providedatabase and table abstraction

2018

Oct. Nov. Dec.

29 30 1

Americas

S3 Object AWS Glue Data Catalog

Database 1

Database 2

Page 10: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Secure data sharing

Page 11: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates.

To share data across accounts you were . . .

Producer Consumer Producer Consumer

Page 12: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Data sharing made simple with Lake Formation

Share entire database

Share multiple tables

Share columns & rows

Page 13: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Lake Formation cross-account sharing

Producer

GrantResources

Consumer

Createresource links

Shared resources

Analytic engines use resource links

Permissions onshared resources

Share

Sales OpportunitiesSales Opportunities

eu_sales

eu_oppsEU Account

Consumer “soft links” shared resources

Data lake admin delegates access to users

Producer GRANTs permissions to consumers

Data catalogData catalogAWS Lake

Formation

AmazonAthena

AWSGlue

AmazonEMR

Amazon Redshift

Page 14: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

LF Supports common data sharing topologies

Centralized, hub and spoke Across orgs and companies

e.g., vendors, suppliers, aggregators, distributors

Data mesh

Organization

Page 15: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Lake Formation – What’s new

Page 16: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Share a large fact table across groups and departments

Why row-level security?

Today, requires multiple redacted datasets

Admins see all records

Doctors and nurses see their patients’ records

Store managers see their store’s records

Regional managers see all stores’ records

Page 17: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Lake Formation row-level security

Read APIs uniformly enforce granular compliance policiesRow-level security permissions

Row filter expressions are “WHERE” clause in “PartiQL”

Supports many S3-based table formatsOpen and managedGoverned, Amazon Redshift data shares, Apache HiveApache Iceberg, Apachi Hudi, Delta Lake, . . .

Easy to audit permissions and access

Page 18: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Lake Formation cell-level security

cell-level security permissions builds on row-level security

Restrict column access based on row predicates

Reuse data filters to scale governance

Data filter = set of columns along with row expression‘SELECT” columns “*” or “Column1, Column2” “WHERE” clause in “PartiQL”

Mask out restricted data with multiple data filter grants

Select * where country=US

Select * except IPwhere country=UK

Country IP

UK

UK

US

US

Country IP

UK ********

UK ********

US

US

Effective access with masked IP column

US-Non-Sensitive

UK-Non-Sensitive

Page 19: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Demo: Lake Formation cell-level security

Setup: ‘customer’ table with sensitive data for US and Canada.

- Compliance require US analyst can only see US rows and Canada analyst only sees Canada rows.

Country

Canada

Canada

US

US

Select * where country=US

US-Analyst

Select * except addresswhere country=Canada

Canada-Analyst

Summary: create named data filters specifying row and column permissions,

grant permissions on named data filters

Page 20: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Lake Formation: Why LF tag-based access control?

Difficult to scale permissions as

number of resources and principals

increases

Tight coupling

Permissions cannot be granted before

resources are created

Policy explosion–every resource add requires permissions update

Page 21: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Lake Formation: Tag-based access control

Classify data using LF Tag ontology

Grant principals access on LF tags independently

Tag databases, table, columns as resources are created

Scale management of large number of resources easily with LF-tag hierarchy

Item Region Email AddressSales Price

141414 West [email protected] $65.00

124141 [email protected]

$41.50

135355 East [email protected] $54.10

423514 East [email protected] $81.43

Email Address First Name Last Name channel Acq Costs

[email protected] Andy McDowell Facebook $2.00

[email protected] Kathy Bates Facebook $1.75

[email protected] Jenna Bush adwords $1.40

Table: Sales

Table: Mktg

Scale enforcement easilyDept=Sales

PII !=true

Sales Mgr

PII=true

Executive

Dept=MKTG

PII=true

Marketing Exec

Page 22: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Demo: Lake Formation Tag-based access control

Setup: Database ‘Sales ‘ with ‘customer’ table containing PII data.

- Compliance require data analysts to have access to non PII data.

Country Address DOB

Table: Customer

Tag PII=false

Business-Analyst

Summary: create an ontology, decouple resource creation and access grants to

scale governance

Page 23: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Challenges in managing your data

• Complex ETL

• Delays in data freshness

• Expensive, brittle & error-prone

Continuous updates

• over-scan data

• Lots of small files

• Partition updates

• Management overhead

Inconsistent performance

• Difficult to find needle in very large haystack

Complying with regulations

Page 24: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Introducing Governed Tables

New type of S3 table

ACID

ACID transactions

Metadata and data

Multiple operations

many tables

many users

various engines

No lock-in

Retain control over data

Remains in your S3 buckets

Open file formats:Parquet, CSV, JSON, . . .

Import and export

Popular table formats

Apache Hudi, Delta Lake, Apache Iceberg

Time travel

Access version of data lake at an earlier point in

time

Page 25: What's New with AWS Lake Formation

Reading Governed Tables with Query acceleration

PartiQL

In Preview

Page 26: What's New with AWS Lake Formation

Writing to governed tablesM A N I F E S T T R A N S A C T I O N S A N D R E A D

In Preview

Page 27: What's New with AWS Lake Formation

Catalog TransactionsC R E A T E / U P D A T E / D E L E T E T A B L E S W I T H T R A N S A C T I O N S E M A N T I C S

BeginTransactionCreate Table/Delete Table/Update TableUpdate data..CommitTransaction

In Preview

Page 28: What's New with AWS Lake Formation

Governed Tables: Storage optimizer

Automatic Compaction

In Preview

Page 29: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Transactions make data lakes trustworthy

“. . . Transactional ETL processes are an important part of how we

ensure data integrity and . . . required additional development time and

complexity. We’re excited about AWS Lake Formation Transactions’

ability to simplify our ETL and reduce the overall effort needed to

produce trustworthy data in our data lake.”

Rob Hruska

Engineering Director

Hudl

Page 30: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Demo – Governed tables

Governed tables make it easy to perform transactional reads and writes data easily.

In this demo we:

- Write to governed tables from Glue and Python

- Read from Athena, EMR, and Python script

Simple and Easy

No cluster to setup, no spark runtime required

Page 31: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Transactions, row-level security, and acceleration

New AWS Lake Formation update and access APIs to S3 data lakes

Accelerates access to S3 data lakes

Row-level security and updates

Open and public APIs –Build your own application

Integrations

Page 32: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

AWS Lake Formation new feature availability

Lake Formation Tag-based - GA Governed Tables & Row/Cell-level security - Preview

Northern Virginia Northern Virginia

Oregon Oregon

Ohio

Ireland

Tokyo

Seoul

Singapore

Sydney

Page 33: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Resources to get you started

https://aws.amazon.com/blogs/big-data/category/analytics/aws-lake-formation/

• Part1: Getting started with Governed tables

• Part 2: Creating a governed table for streaming data sources

• Part 3: Using ACID transactions on governed tables

• Part 4: Implement cell-level and row-level security

• Part 5: Securing data lakes with row-level access control

• Easily manage your data lake at scale using LF tag-based access control

Page 34: What's New with AWS Lake Formation

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Sign up for preview

AWS Lake Formation: Transactions, row-level security, and accelerationA C C E L E R A T E A N D G O V E R N A C C E S S T O Y O U R A M A Z O N S 3 D A T A L A K E

Sign up here: https://aws.amazon.com/lake-formation/preview/