04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved
RDBMS and Hadoop - Co-existence or competition
Ram Mohan
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 2
Introduction to RDBMS What is Hadoop and Map-Reduce Hadoop and RDBMS – A comparison Co-Existence – Practical Example - Master Website Q&A
Session Agenda!
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 3
Relational DBMS Based on Relational Mathematics principles Data is represented in terms of rows and columns of a table Relational Terminology
◦ Tuple (Row)◦ Attribute (Column)◦ Relation (Table)
Integrity Constraints◦ Primary Key◦ Foreign Key◦ Alternate Key
ACID Test ◦ Atomicity◦ Consistency◦ Isolation◦ Durability
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 4
Normalization Normalization - process of removing data redundancy by decomposing
relations in a Database. De normalization - carefully introduced redundancy to improve query
performance.
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 5
Relational DBMS
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 6
Example Data S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris P# PNAME COLOR WEIGHT CITY P1 Nut Red 12 London P2 Bolt Green 17 Paris P3 Screw Blue 17 Rome P4 Screw Red 14 London S# P# QTY S1 P1 300 S1 P2 200 S1 P3 400 S2 P1 300 S2 P2 400 S3 P2 200
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 7
Five computers & a 640k ;-)
Moore’s Law
"I think there is a world market for about five computers"
"640k ought to be enough for anybody"
Thomas Watson 1943, Chairman of the board of IBM
Attributed to Bill Gates in 1981.
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 8
The Big Data Challenges Sources of Data and the amount of data to analyze is growing
exponentially Stale data exists because DW solutions cannot ingest the vast amounts of
data fast enough Lack of performance for advanced analytics and complex queries The number of users and the concurrency of users is increasing rapidly
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 9
Hadoop Architecture
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 10
Reliably store petabytes of replicated data across thousand of nodes◦ Data divided in to 64 MB blocks, each block replicated three times
Master/Slave architecture◦ Master NameNode contains block locations◦ Slave Datanode manages blocks on local FS
Built on local commodity hardware◦ No RAID required
Hadoop – HDFS(Hadoop Distributed File System)
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 11
Reliably store petabytes of replicated data across thousand of nodes◦ Data divided in to 64 MB blocks, each block replicated three times
Master/Slave architecture◦ Master NameNode contains block locations◦ Slave Datanode manages blocks on local FS
Built on local commodity hardware◦ No RAID required
Hadoop – HDFS(Hadoop Distributed File System)
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 12
Map-Reduce Model
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 13
Is not intended for realtime querying. Does not support random access. Significant learning curve Provides barebones functionality out of the box but scaling is built-in and
inexpensive
Hadoop – Limitations
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 14
Joining◦ In a single query, get all products in an order with their product information
Secondary Indexing◦ Get CustomerId by e-mail
Referential Integrity Realtime Analysis. Millions are trained in SQL and relational data modelling RDBMS provides tremendous functionality, but is extremely difficult and
costly to scale
Where SQL Makes life easy
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 15
Master Website – A Practical Example
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 16
Profile Information – That is provided during sign up Intelligence generated ie the output of the analytic jobs. Any online purchasing track records and account management Reporting tools
Master Website – RDBMS Use Cases
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 17
Generating Intelligence from the continuous stream of data◦ Wall Posts on Facebook
New tags to be added based on the old logs available, due to new requirements
Master Website – Hadoop Use Cases
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 18
A Practical Example – Facebook Architecture
04/12/2023Copyright © 2011 Flytxt B.V. All rights reserved 19
THANK YOU
Top Related