BIG DATA Analysis for page ranking using Map Reduce

18
B B ig ig D D ata Analysis for ata Analysis for Page Ranking using Page Ranking using Map/Reduce Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, S.F.R.College for Women, Sivakasi.

Transcript of BIG DATA Analysis for page ranking using Map Reduce

BBig ig DData Analysis for Page ata Analysis for Page

Ranking using Map/ReduceRanking using Map/Reduce

R.Renuka, R.Vidhya Priya, III B.Sc., IT, S.F.R.College for Women, Sivakasi.

OverviewIntroductionWhat is Big Data!Why Big Data?4 V’s Of Big DataBig Data Analytics TechnologiesMap/Reduce Applications Case StudyConclusion

IntroductionData have outgrown the storage and processing capabilities of a single host.

Two fundamental challenges: – how to store and – how to work with voluminous data sizes, and, – how to understand data and turn it into a

competitive advantage.

What is Big Data! ‘Big-data’ is similar to ‘Small-data’, but

bigger

But having data bigger requires different approaches: techniques, tools & architectures

To solve: New problems and old problems in a better

way.

The Blind men and the Elephant

Why Big Data?Key enablers for the growth of “Big

Data” are:

Increase of Processing Power

Increase of Storage Capacities

Availability of Data

4 V’s of Big Data

Big Data Analytics Technologies

Hadoop

PLATFORA

WibiData

PIG

Hive

MapReduce

NoSQL databases

Column-oriented databases

HadoopHadoop is a distributed file system and data processing engine

Hadoop has two components:– The Hadoop distributed file system

(HDFS)– The MapReduce programing.

Map / ReduceA High level abstracted framework for distributed processing of large datasets

Fault Tolerant , Parallelization

Computation consists of two phasesMapReduce

A Master-Slave architecture

Computations occurs in multiple slave nodes

And it tries to provide data locality as much as possible.

MR modelMap– Process a key/value pair to generate

intermediate key/value pairsReduce– Merge all intermediate values associated with

the same key

Users implement interface of two primary methods:

1. Map: (key1, val1) → (key2, val2)2. Reduce: (key2, [val2]) → [val3]

Applications

Homeland Security

Finance Smarter Healthcare

Multi-channel sales

Telecom

Manufacturing

Traffic Control

Trading Analytics

Fraud and Risk

Log Analysis

Search Quality

Retails

Case Study

Conclusion

Real-time big data isn’t just a process

for storing petabytes or exabytes of data

in a data warehouse, It’s about the ability

to make better decisions and take

meaningful actions at the right time.

Queries ??