Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load...

54
Languages, Systems and Paradigms for Big Data Dario COLAZZO

Transcript of Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load...

Page 1: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Languages, Systems and Paradigms for

Big Data Dario COLAZZO

Page 2: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Plan

• Introduction to Big Data

• MapReduce

• Hadoop and its ecosystem,

• RDD et Spark

• Pig Latin & Hive

Page 3: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Modalités de contrôle

• Projet :

• analytics via MapReduce + Spark

• experimental evaluation on the cluster @Dauphine

• report

• Written exam

• Final grade = 0.7*Ex + 0.3*Pr

Page 4: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

About me• PhD in Computer Science at Università di Pisa

• Post-docs: Università di Venezia, Université Paris-Sud

• MdC at Université Paris-Sud, LRI-INRIA Saclay

• Professor at Univ. Paris-Dauphine since September 2013

• Research and teaching topics:

• languages and systems for databases

• focus on Web data (XML, RDF, JSON)

• expressive and efficient processing of Big Data

• Publications : http://dblp.uni-trier.de/pers/hd/c/Colazzo:Dario

Page 5: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Big data and Data science: an introduction

Page 6: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Contents

• What?

• Where?

• Why?

• How?

Contents

2

y What?y Where?y Why?y How?

Page 7: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Some numbers

Page 8: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

An introductory videoA short video

3(http://www.intel.it/content/www/it/it/big-data/big-data-101-animation.html)http://www.intel.fr/content/www/fr/fr/big-data/big-data-101-animation.html?

Page 9: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

How many, how much?How many data in the world?

44 Zettabytes by 2020

800 Terabytes, 2000

160 Exabytes, 2006 (1EB = 1018B)

4.5 Zettabytes, 2012 (1ZB = 1021B)

How much is a zettabyte?

1,000,000,000,000,000,000,000 bytes

A stack of 1TB hard disks that is 25,400 km high

How many data in a day?

7 TB, Twitter

10 TB, Facebook

90% of world's data:

generated over last two years!

640K ought to be enough for anybody.

Page 10: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Data grows fast!Data grows fast!

Page 11: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

How much fast?Growth rate

8http://www.sec.gov/Archives/edgar/data/51143/000110465913015636/a13-6155_18k.htm

Page 12: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Proliferation of data sourcesProliferation of data sources

Page 13: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Proliferation of devicesGlobal Internet device forecast

Page 14: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Users generate dataUser-generated content

Page 15: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Internet of ThinksInternet of Things

Internet of ThingsInternet of Things

Page 16: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Types of dataData types

16

Page 17: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

What is more important?• The ‘Big’?

• The ‘Data’?

• Both?

• Neither

What organisations do with data

What is more important?

19

y The “Big”y The “Data” y Bothy Neither

What organizations do with big data

"Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom"

Cliff Stoll

“Data is not information, information is not knowledge,

knowledge is not understanding, understanding is not wisdom”

Cliff Stoll

Page 18: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Not just a matter of VolumeThe four "V’s" of Big Data

17

y Not just a matter of volume..

Page 19: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Big Data: V4 +ValueVolume:Terabyte(1012), Petabyte(1015), Exabyte(1018), Zettabyte (1021)

Variety: Structured, semi-structured, unstructured; Text, image, audio, video, record

Velocity: Periodic, Near Real Time, Real Time

Veracity: Quality of the data can vary greatly

Value: Big data can generate huge competitive advantages

Big Data: V4+VALUE

21

y Volume:Terabyte(1012), Petabyte(1015), Exabyte(1018), Zettabyte (1021)

y Variety: Structured, semi-structured, unstructured; Text, image, audio, video, record

y Velocity: Periodic, Near Real Time, Real Timey Veracity: Quality of the data can vary greatlyy Value: Big data can generate huge competitive advantages

Page 20: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

What’s new?

22

The wide availability of data allows us to apply more sophisticated models and you get much more accurate results than in the past!

It is a capital mistake to theorize before one has data

The bigger the data set you have, the more accurate the predictions about the future will be

AnthonyGoldbloom

The wide availability of data allows us to apply more sophisticated

models and you get much more accurate results than in the past!

Page 21: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Bigger = Smarter?YES

algorithms work better

tolerate errors

discover the ‘long tail’ and ‘corner case’

BUT

more heterogeneity

data grows very fast

still need humans to ask right questions

Bigger = Smarter?y YES!y algorithms work much bettery tolerate errorsy discover the "long tail" and "corner cases"

y BUT:y more heterogeneityy data grows faster than energy on chipy still need humans to ask right questions

23

Page 22: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Why now?• Because we have large amount of data

• Data originates already in digital form

• ~40% growth per year

• Because we can

• 500$ for a drive in which to store all the music of the world

• 40 years of Moore's Law

• large computational resources

• 64% of organizations have invested in big data in 2013

• 34 billion $ invested in big data in 2013

• “Because we reached dead end with logic”

Why now?

25

y Because we have datay Data born already in digital form y 40% of data growth per year

y Because we can y 500$ for a drive in which to store all the music of the world y 40 years of Moore's Law o large computational resources y 64% of organizations have invested in big data in 2013 y 34 billion $ invested in big data in 2013

y “Because we reached dead end with logic”

Why now?

25

y Because we have datay Data born already in digital form y 40% of data growth per year

y Because we can y 500$ for a drive in which to store all the music of the world y 40 years of Moore's Law o large computational resources y 64% of organizations have invested in big data in 2013 y 34 billion $ invested in big data in 2013

y “Because we reached dead end with logic”

Why now?

25

y Because we have datay Data born already in digital form y 40% of data growth per year

y Because we can y 500$ for a drive in which to store all the music of the world y 40 years of Moore's Law o large computational resources y 64% of organizations have invested in big data in 2013 y 34 billion $ invested in big data in 2013

y “Because we reached dead end with logic”

Page 23: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

bigger=smarter, an example

Google Translate you collect snippets of translations

you match sentences to snippets

you continuously debug your system

Why does it work? there are tons of snippets on the Web

the accuracy improves as the training set grows

A simple example of bigger=smarter

26

y Google Translate y you collect snippets of translationsy you match sentences to snippetsy you continuously debug your system

y Why does it work?y there are tons of snippets on the Weby the accuracy improves as the training

set grows

Page 24: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Other success storiesMore success stories

Page 25: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Other casesCrime Prevention in Los Angeles

Diagnosis and treatment of genetic diseases

Investments in the financial sector

Generation of personalized advertising

Astronomical discoveries

...

Page 26: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Use casesUse cases

30

Today’s Challenge New Data What’s Possible

HealthcareExpensive office visits

Remote patientmonitoring

Preventive care, reduced hospitalization

ManufacturingIn-person support Product sensors Automated diagnosis,

support

Location-Based ServicesBased on position Real time location data Geo-advertising, traffic,

local search

Public SectorStandardized services Citizen surveys Tailored services,

cost reductions

RetailOne size fits all

marketingSocial media Sentiment analysis

segmentation

Page 27: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

The Big Data ProcessThe big data process

31

Acquisition

Extraction

Integration

Analysis

Interpretation

Decision

Goal: to make effective strategic decisions exploiting the availability of big data

Page 28: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Big Data in actionBig Data in action

32

Requires:y selectiony filteringy Metadata

generation y managing

provenance

Acquisition

Extraction

Integration

Analysis

Interpretation

Decision

Page 29: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Big Data in actionBig Data in action

33

Requires: y transformation y normalization y cleaning y aggregation y error handling

Acquisition

Extraction

Integration

Analysis

Interpretation

Decision

Page 30: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Big Data in actionBig Data in action

34

Requires:y standardization y conflict

management y reconciliation y mapping definition

Acquisition

Extraction

Integration

Analysis

Interpretation

Decision

Page 31: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Big Data in actionBig Data in action

35

Requires:y exploration y data mining y machine learning y visualization

Acquisition

Extraction

Integration

Analysis

Interpretation

Decision

Page 32: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Big Data in actionBig Data in action

36

Requires:y Knowledge of the

domain y Knowledge of the

provenancey Identification of

patterns of interest y Flexibility of the

process

Acquisition

Extraction

Integration

Analysis

Interpretation

Decision

Page 33: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Big Data in actionBig Data in action

37

Requires:y managerial skillsy continuous

improvement of the process

Acquisition

Extraction

Integration

Analysis

Interpretation

Decision

Page 34: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

A simple exampleProblem: The sale of lollipops is going down! Acquisition:

Sales by customer region and time Surveys of users Social networks

Extraction: Data loading from receipts Automatic reading of questionnaires Data extraction from twitter

Integration: On the basis of user types

Analysis : lollipops bought by people older than 25 lollipops preferred by people younger than 10

Interpretation: Moms believe: lollipops = bad teeth Boys and girls believe that lollipops are for babies

Decision: We make lollipops without sugar We ask dentists to advertise our lollipops We make commercials targeted to boys and girls

A simple example of a big data process

38

y Problem: The sale of lollipops is going down! y Acquisition:

y Sales by customer, region and time y Surveys of users y Social networks

y Extraction: y Data loading from receipts y Automatic reading of questionnaires y Data extraction from twitter

y Integration: y On the basis of user types

y Analysis: y lollipops bought by people older than 25y lollipops preferred by people younger than 10

y Interpretation: y Moms believe: lollipops = bad teethy Boys and girls believe that lollipops are for babies

y Decision: y We make lollipops without sugar y We ask dentists to advertise our lollipops y We make commercials targeted to boys and girls

Page 35: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Risks and Challenges Performance, performance, performance!

Data grows faster than energy on chip

Efficiency

Scalability

Effectiveness

Heterogeneity

Flexibility

Privacy

Costs

Risks and Challenges of Big Data

39

y Performance, performance, performance!y Data grows faster than energy on chipy Efficiencyy Scalability

y Effectivenessy Heterogeneityy Flexibility y Privacyy Costs

Page 36: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Effectiveness : a failure story Google Flu Trends:

over-estimated the relevance of flu for 100 of 108 weeks

‘’Google would not comment on thisyear’s difficulties. But several researchers suggest that the problems may be due to widespread media coverage of this year’s severe US flu season, including the declaration of a public-health emergency by New York state last month. The press reports may have triggered many flu-related searches by people who were not ill.’’ Nature (http://www.nature.com/news/when-google-got-flu-wrong-1.12413)

Effectiveness: a failure storyy Google Flu Trendsy over-estimated the prevalence of flu for 100 out of 108 weeks

Page 37: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Risk of bad interpretationsRisks of bad interpretation

Page 38: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Privacy: unpleasant drawbacksAOL search data leak (NYT, 8/9/2006)

Anonymous Netflix vs IMDb database (Wired, 12/13/2007)

Why Johnny Can’t Browse The Internet In Peace (Forbes, 8/1/2012)

How Companies Learn Your Secrets (N Y T, 16/2/2012)

Privacy: Unpleasant drawbacks

46

y AOL search data leak (NYT, 8/9/2006)y Anonymous Netflix vs IMDb database (Wired, 12/13/2007)y Why Johnny Can’t Browse The Internet In Peace (Forbes,

8/1/2012)y How Companies Learn Your Secrets (NYT, 16/2/2012)

Page 39: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

PerformancePerformance: taming Big Data..

Page 40: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Distribution and parallelism Distributed Architecture

Clusters of computers that work together to a common goal

Scale out not up!

Fault- tolerance

Resource replication

Eventual consistency

Distributed processing

Shared-nothing model

New programming paradigms

Distribution of resources and services

48

y Distributed Architecture y Clusters of computers that work together to a common goaly Scale out not up!

y Fault- tolerance y Resource replicationy Eventual consistency

y Distributed processingy Shared-nothing modely New programming paradigms

Distribution of resources and services

48

y Distributed Architecture y Clusters of computers that work together to a common goaly Scale out not up!

y Fault- tolerance y Resource replicationy Eventual consistency

y Distributed processingy Shared-nothing modely New programming paradigms

Page 41: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

The Big Data landscapeThe Big Data Landscape

49

Page 42: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

The new software stack

New programming environments designed to get their parallelism not from a supercomputer, but from computing clusters

Bottom of the stack: distributed file system (DFS)

We have a winner!

The New Software Stacky New programming environments designed to get their parallelism

not from a supercomputer but from computing clustersy Bottom of the stack: distributed file system (DFS)y We have a winner!

50

I think there is a world market for about five computers.

Page 43: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

On top of Hadoop Hundreds of different (high-level) programming solutions

Two main scenarios:

Analytics (mainly batch)

collecting, transforming, and modelling data with the goal of discovering useful information and supporting decision-making

Real-time processing

processing data and returning the results sufficiently quickly to affect the environment at that time

e-commerce, search engines, booking,…

On the top of Hadoopy Hundreds of different (high-level)

programming solutions

y Two main scenarios:y Analytics (mainly batch)y collecting, transforming, and modeling data with the goal of discovering useful

information and supporting decision-making

y Real-time processingy processing data and returning the results sufficiently quickly to affect the

environment at that timey e-commerce, search engines, booking, …

Page 44: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

The Big Data flow

52

The Big Data flow

ETL

Real TimeStreams

Distributed File System (HDFS)

NoSQL(HBase,

Cassandra,MongoDB)

New SQL(Oracle,VoltDB,

Teradata)

Real TimeProcessing

Analytics (Vertica,

Cloudera,Greenplum)

BatchProcessing

Page 45: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Techniques for Big Data analysis

Extract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems Cloud computing Analytics

Data mining Association rule learning Classification Cluster analysis Regression

Machine learning Supervised learning Unsupervised learning

Crowdsourcing …..

Techniques for big data analysis

53

y Extract, transform, and load (ETL)y Data fusion and data integrationy Distributed file systemy NoSQL database systemsy Cloud computingy Analytics

y Data miningy Association rule learningy Classification y Cluster analysisy Regression

y Machine learningy Supervised learningy Unsupervised learning

y Crowdsourcing y …

Page 46: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Goal of AnalyticsGoals of analytics

Page 47: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Data scientist: a brand new profession

Data Scientist: The Sexiest Job of the 21st Century [Harward Business Review 2013]

Data scientist? A guide to 2015's hottest profession [Mashable 2015]

Data scientist: a brand new profession

57

y Data Scientist: The Sexiest Job of the 21st Century [HarwardBusiness Review 2013]

y Data scientist? A guide to 2015's hottest profession [Mashable 2015]

Page 48: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Skills of data scientistsSkills of data scientists

58

Page 49: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Conclusions We live in the era of Big Data

Wide range of availability in different areas

Big opportunities to solve big problems

They can create value

The challenge is how to manage and use them

New technologies are needed

Methodological aspects are important

A rapidly evolving area

Data scientists: the current hottest profession in IT

Conclusions

61

y We live in the era of Big Datay Wide range of availability in different areas y Big opportunities to solve big problemsy They can create value y The challenge is how to manage and use themy New technologies are neededy Methodological aspects are important y A rapidly evolving areay Data scientists: the current hottest profession in IT

Page 50: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

So, let us face big data projects…So, let us face big data projects..

62

Page 51: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

…with a Bruce Willis’ attitude!..with a Bruce Willis attitude!

63

Page 52: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

References

"Big Data: The next frontier for innovation, competition, and productivity". McKinsey&Company, 2012.

"Challenges and Opportunities with Big Data". A community white paper developed by leading researchers across the United States, 2012.

"Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics". Bill Franks, John Wiley & Sons, 2012.

Page 53: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Questions?

Photo credit: Jimmy Lin

Page 54: Languages, Systems and Paradigms for Big Datacolazzo/BD/partie1.pdfExtract, transform, and load (ETL) Data fusion and data integration Distributed file system NoSQL database systems

Thanks and credits for this introductory part

Jimmy Lin (University of Maryland)

Riccardo Torlone (Università di Roma)