facebook, Twiter, Google

16
150 2 2 [email protected] 90 % facebook, Twiter, Google Hadoop

Transcript of facebook, Twiter, Google

150

2

2

[email protected]

90%

facebook, Twiter, Google

Hadoop

151

Hadoop

inter-disciplinary

152

Hadoop

1

3VVolumeVarietyVelocity

Edd Dumbell

O'Reilly

Meta Group

GartnerIBMV

VeracityOracleVValue

01

ValueValue

153

IT

01

Data analyist or data scientist)

Veracity:

Velocity: Volume:

Variety:

Value

القيمة

154

21(The sexiest job of 21st century)

11Data science Diagram-nVen

02

Hacking Skills *

software

Math & statistics knowledge*

155

Substantive expertise*

12

Numeric(Categorical data

2Storage

data

warehouse(DW)

data البيانات

numeric nominal

ordinal

categorical discrete

continious

156

(Structured Query Language)

SQL

NoSQL

DW

cloud computing

3

تطبيق النموذج

تحديد النموذج المناسب

الأداء والتقييم

اختيار طرق التحليل

اختيار المتغيرات وتحديد نوعها

تحضير البيانات

جمع البيانات

المعطيات منهجية معينة تحديد الهدف

القيام ببناء نموذج تنبئي، من أجل

( يبين الخطوات المتبعة 04الشكل )

وأول مرحلة تعتبر من مراحل

التحليل وهي تحديد طبيعة المشكلة

بدقة، وعدم العناية بالبداية سيؤدي

إلى عدة مغالطات من جهة وإلى

تضييع وقت معتبر لأنه في الغالب

يؤدي إلى إعادة تشخيص المشكلة من

على المراحل المبينة جديد ثم المرور

( في كل مرة بهدف 04في الشكل )

الوصول إلى أحسن نموذج.

157

SAPBusiness

Intelligence (BI)04

منهجية تحليل البيانات الضخمة -05-شكل

158

4

Appache

Hadoop

servers

Hadoop

06

Hadoop

41Hadoop

linuxlinux based-systemMac os

Peta-

Hadoop and related technologies

159

Bytes

Hadoop

Hadoop file System

Map reduce

42Hadoop

Hadoop Distributed File System(HDFS)*

mega, giga, peta

(write-once, read-many-times)

.clustering

MapReduce*

Ruby, Python , Java, C++

MapReduceReduce-task Map-

taskLisp

Key-value

map

Reducemap

HDFSReduce

160

pythontuples,

lists and dictionaries

Hadoop06

Hadoop

Hadoop

R , SQLHadoop

Data Mining

Algorithms for

Predictive Analytics in

R packages to create

jobds in HDFS

R + Hadoop =

RHadoop

R connector to Hadoop

Mahoot for machine

learning

R +Hive =RHive

SQL +Hadoop = Sqoop

HA

DO

OP

161

5

x

x

08

structured

Extract transform and load(ETL)

DWBusiness

Intelligence(BI)

162

Unstructured and semi-structured

NoSQL

Hadoop(HDFS)

MapReduceHadoop

163

I&T

164

BIBLOGRAPHIE

1 Feinleib David, big data bootcamp, Springer Science+Business Media

New York, 2014. 2 Baaziz Abdelkaader, Quoniam Luc, "How to use big data technologies to

optimize operations in upstream petrolium industry", International Jornal of

innovation, sep 2013. 3Katal A, Wazid M, Goudar R H, "Big Data: Issues, Challenges, Tools and

Good Practices", IEEE, 2013. 4 Grus Joel, Data science from scratch: first principles with python, O'Reilly

media, 2015. 5 lian Duan & Ye Xiong, "big data analytics and business analytics",

journal of management analytics, may 2015. 6 Vezzoso Simonetta, " competition policy in a world of big data", Research

handbook on degital transformations, 2016. 7 Sastry, Hanumanth Sistla, and M. S. Prasad Babu, "Big data and

predictive analytics in ERP systems for automating decision making

process", IEEE, 2014. 8

Mohandy Soumendra, Jagadeesh madhu, Srivatsa harsha, Big data

imperatives, Springer Science +Business Media New York, 2013.

165

9Holmes alex, Hadoop in practice, Manning publications, 2012.

10 Tom white, Hadoop: The definitive guide, 3rd ed. O'Reilly media, 2012.

11 Kenny petter, Business Problems and Data Science Solutions, Springer

Science +Business Media New York, 2014. 12

Biswas Sanjib, Sen Jadip, "A proposal architecture for big data driven

supply chain analytics".