cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin,...
Transcript of cloud-based data lake Evolution of a modern · O’Reilly Software Architecture Conference Berlin,...
Evolution of a moderncloud-based data lake
Viacheslav Inozemtsev
O’Reilly Software Architecture Conference
Berlin, 07.11.2019
OUTLINE
Introduction of Zalando
Why data lake?
What is a data lake?
Evolution of the data lake
Future
3
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Who am I?
Viacheslav Inozemtsev
● 2010 - Specialist in Applied Mathematics and Computer Science at the Tomsk State University, Russia
● 2014 - Master of Computer Science at the University of Bonn, Germany● Total of 8 years of experience as Data and Software Engineer● Last 3 years at Zalando
4
Introduction of Zalando
5
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando
● 29M active customers
6
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando
● 29M active customers● 14000 employees
7
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando
● 29M active customers● 14000 employees● 2000 engineers and data scientists
8
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando
● 29M active customers● 14000 employees● 2000 engineers and data scientists● 200 engineering teams
9
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Introduction of Zalando
● 29M active customers● 14000 employees● 2000 engineers and data scientists● 200 engineering teams● Variety of major data systems
○ Messaging Bus○ BI Data Warehouse○ Google Analytics platform○ Custom datasets
10
Why data lake?
11
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
12
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
13
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
14
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
15
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
16
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
17
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
18
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
19
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
20
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
● To enable sharing of the data so, that it is:
21
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
● To enable sharing of the data so, that it is:○ easy to publish and consume
22
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
● To enable sharing of the data so, that it is:○ easy to publish and consume○ compliant
23
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
● To enable sharing of the data so, that it is:○ easy to publish and consume○ compliant○ secure
24
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Why data lake?
● To enable sharing of the data so, that it is:○ easy to publish and consume○ compliant○ secure○ cost-efficient
25
What is a data lake?
26
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"What is a data lake?
● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking
27
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"What is a data lake?
● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking
● Central big data system○ provides as much different data as possible○ has to be fast and easy to use
28
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"What is a data lake?
● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!
● Central big data system○ provides as much different data as possible○ has to be fast and easy to use
29
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"What is a data lake?
● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!
● Central big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!
30
Evolution of the data lake
31
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception
32
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception
33
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception
34
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception
35
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception
36
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception
37
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception
38
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 1: Inception
39
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
40
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
41
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
42
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
43
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
44
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
45
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
46
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
47
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
48
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 2: Optimization
49
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - Stage 3: Revolution
https://martinfowler.com/articles/data-monolith-to-mesh.html
50
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - centralized
51
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Evolution of the data lake - federated
52
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"What is data lake?
● Central exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!
● Central big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!
53
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"What is data lake?
● Federated exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!
● Federated big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!
54
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"What is data lake?
● Federated exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!○ resolved by centralized governance
● Federated big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!
55
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"What is data lake?
● Federated exchange system○ connects producers and consumers of the data○ defends data from malicious misuse or accidental leaking○ open vs hide!○ resolved by centralized governance
● Federated big data system○ provides as much different data as possible○ has to be fast and easy to use○ scale vs performance!○ resolved by decentralized ownership
56
Future
57
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Future
● Make data lake fully federated
58
Please write the title in all capital letters
Put images in the grey dotted box "unsupported placeholder"Future
● Make data lake fully federated● Abstract physical storage into logical layer of datasets
59
Evolution of a moderncloud-based data lake
Viacheslav InozemtsevO’Reilly Software Architecture Conference
Berlin, 07.11.2019
www.zalando.com
jobs.zalando.com/tech