cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin,...

59
Evolution of a modern cloud-based data lake Viacheslav Inozemtsev O’Reilly Software Architecture Conference Berlin, 07.11.2019 [email protected]

Transcript of cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin,...

Page 1: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

Evolution of a moderncloud-based data lake

Viacheslav Inozemtsev

O’Reilly Software Architecture Conference

Berlin, 07.11.2019

[email protected]

Page 2: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

OUTLINE

Introduction of Zalando

Why data lake?

What is a data lake?

Evolution of the data lake

Future

Page 3: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

3

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Who am I?

Viacheslav Inozemtsev

● 2010 - Specialist in Applied Mathematics and Computer Science at the Tomsk State University, Russia

● 2014 - Master of Computer Science at the University of Bonn, Germany● Total of 8 years of experience as Data and Software Engineer● Last 3 years at Zalando

Page 4: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

4

Introduction of Zalando

Page 5: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

5

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers

Page 6: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

6

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers● 14000 employees

Page 7: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

7

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers● 14000 employees● 2000 engineers and data scientists

Page 8: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

8

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers● 14000 employees● 2000 engineers and data scientists● 200 engineering teams

Page 9: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

9

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando

● 29M active customers● 14000 employees● 2000 engineers and data scientists● 200 engineering teams● Variety of major data systems

○ Messaging Bus○ BI Data Warehouse○ Google Analytics platform○ Custom datasets

Page 10: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

10

Why data lake?

Page 11: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

11

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

Page 12: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

12

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

Page 13: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

13

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

Page 14: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

14

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

Page 15: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

15

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

Page 16: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

16

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

Page 17: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

17

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

Page 18: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

18

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

Page 19: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

19

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

Page 20: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

20

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:

Page 21: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

21

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:○ easy to publish and consume

Page 22: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

22

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:○ easy to publish and consume○ compliant

Page 23: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

23

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:○ easy to publish and consume○ compliant○ secure

Page 24: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

24

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Why data lake?

● To enable sharing of the data so, that it is:○ easy to publish and consume○ compliant○ secure○ cost-efficient

Page 25: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

25

What is a data lake?

Page 26: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

26

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is a data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking

Page 27: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

27

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is a data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking

● Central big data system○ provides as much different data as possible○ has to be fast and easy to use

Page 28: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

28

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is a data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!

● Central big data system○ provides as much different data as possible○ has to be fast and easy to use

Page 29: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

29

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is a data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!

● Central big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!

Page 30: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

30

Evolution of the data lake

Page 31: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

31

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

Page 32: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

32

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

Page 33: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

33

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

Page 34: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

34

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

Page 35: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

35

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

Page 36: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

36

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

Page 37: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

37

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

Page 38: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

38

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception

Page 39: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

39

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 40: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

40

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 41: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

41

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 42: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

42

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 43: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

43

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 44: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

44

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 45: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

45

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 46: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

46

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 47: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

47

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 48: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

48

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization

Page 49: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

49

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 3: Revolution

https://martinfowler.com/articles/data-monolith-to-mesh.html

Page 50: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

50

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - centralized

Page 51: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

51

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - federated

Page 52: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

52

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is data lake?

● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!

● Central big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!

Page 53: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

53

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is data lake?

● Federated exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!

● Federated big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!

Page 54: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

54

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is data lake?

● Federated exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!○ resolved by centralized governance

● Federated big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!

Page 55: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

55

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"What is data lake?

● Federated exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!○ resolved by centralized governance

● Federated big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!○ resolved by decentralized ownership

Page 56: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

56

Future

Page 57: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

57

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Future

● Make data lake fully federated

Page 58: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

58

Please write the title in all capital letters

Put images in the grey dotted box "unsupported placeholder"Future

● Make data lake fully federated● Abstract physical storage into logical layer of datasets

Page 59: cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de. OUTLINE Introduction of Zalando Why

59

Evolution of a moderncloud-based data lake

Viacheslav InozemtsevO’Reilly Software Architecture Conference

Berlin, 07.11.2019

[email protected]

www.zalando.com

jobs.zalando.com/tech