cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin,...

Post on 01-Jan-2021

5 views 0 download

Transcript of cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin,...

Evolution of a moderncloud-based data lake

Viacheslav Inozemtsev

O’Reilly Software Architecture Conference

Berlin, 07.11.2019

viacheslav.inozemtsev@zalando.de

OUTLINE

Introduction of Zalando

Why data lake?

What is a data lake?

Evolution of the data lake

Future

3

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Who am I?

Viacheslav Inozemtsev

● 2010 - Specialist in Applied Mathematics and Computer Science at the Tomsk State University, Russia

● 2014 - Master of Computer Science at the University of Bonn, Germany● Total of 8 years of experience as Data and Software Engineer● Last 3 years at Zalando

4

Introduction of Zalando

5

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers

6

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers● 14000 employees

7

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers● 14000 employees● 2000 engineers and data scientists

8

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers● 14000 employees● 2000 engineers and data scientists● 200 engineering teams

9

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers● 14000 employees● 2000 engineers and data scientists● 200 engineering teams● Variety of major data systems

○ Messaging Bus○ BI Data Warehouse○ Google Analytics platform○ Custom datasets

10

Why data lake?

11

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

12

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

13

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

14

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

15

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

16

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

17

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

18

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

19

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

20

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:

21

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:○ easy to publish and consume

22

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:○ easy to publish and consume○ compliant

23

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:○ easy to publish and consume○ compliant○ secure

24

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:○ easy to publish and consume○ compliant○ secure○ cost-efficient

25

What is a data lake?

26

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is a data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking

27

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is a data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking

● Central big data system○ provides as much different data as possible○ has to be fast and easy to use

28

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is a data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!

● Central big data system○ provides as much different data as possible○ has to be fast and easy to use

29

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is a data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!

● Central big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!

30

Evolution of the data lake

31

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

32

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

33

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

34

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

35

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

36

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

37

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

38

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

39

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

40

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

41

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

42

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

43

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

44

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

45

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

46

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

47

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

48

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

49

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 3: Revolution

https://martinfowler.com/articles/data-monolith-to-mesh.html

50

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - centralized

51

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - federated

52

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!

● Central big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!

53

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is data lake?

● Federated exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!

● Federated big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!

54

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is data lake?

● Federated exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!○ resolved by centralized governance

● Federated big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!

55

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is data lake?

● Federated exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!○ resolved by centralized governance

● Federated big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!○ resolved by decentralized ownership

56

Future

57

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Future

● Make data lake fully federated

58

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Future

● Make data lake fully federated● Abstract physical storage into logical layer of datasets

59

Evolution of a moderncloud-based data lake

Viacheslav InozemtsevO’Reilly Software Architecture Conference

Berlin, 07.11.2019

viacheslav.inozemtsev@zalando.de

www.zalando.com

jobs.zalando.com/tech