© 2007 IBM Corporation IBM SWG – Application Integration ...
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine...
-
Upload
june-mclaughlin -
Category
Documents
-
view
215 -
download
0
Transcript of © 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine...
© 2012 IBM Corporation
IBM Security Systems
11© 2013 IBM Corporation
11
Ecommerce
Antoine Harfouche
© 2012 IBM Corporation
IBM Security Systems
22© 2013 IBM Corporation
22
Big Data AnalyticsLecture Series
Adapted from Kalapriya KannanIBM Research LabsJuly, 2013
© 2012 IBM Corporation
IBM Security Systems
33© 2013 IBM Corporation
33
What is the aim of the course
Focus is on “Systems” and applications for cloud-based storage and processing of BIG DATA.
+Big Data - Definition+Big Data - Analytics+Big Data - Storage (HDFS)+Big Data - Computing (Map/Reduce)+Big Data - Database (HBase)+Big Data – Graph DB (Titan)+Big Data - Streaming (Strom)
© 2012 IBM Corporation
IBM Security Systems
44© 2013 IBM Corporation
44
“Learning is not just restricted to listening, it is actively asking relevant questions”
© 2012 IBM Corporation
IBM Security Systems
55© 2013 IBM Corporation
55
Get Convinced about “Big Data” Understand why we need a different paradigm. Ascertain with confidence the need to look at data computing in a different way. Realize the potential of big data
–All of you are skilled enough to get into it.
What we will not do–Do research on why things have evolved into the current trends as it stands.–Try to be hands-on – But not guaranteed
Aim
© 2012 IBM Corporation
IBM Security Systems
66© 2013 IBM Corporation
66
What are we going to understand
What is Big Data?
Why we landed up there?To whom does it matterWhere is the money?Are we ready to handle it?What are the concerns?Tools and Technologies
–Is Big Data <=> Hadoop
© 2012 IBM Corporation
IBM Security Systems
77© 2013 IBM Corporation
77
Simple to start
What is the maximum file size you have dealt so far?– Movies/Files/Streaming video that you have used?– What have you observed?
What is the maximum download speed you get?Simple computation
– How much time to just transfer.
© 2012 IBM Corporation
IBM Security Systems
88© 2013 IBM Corporation
88
What is big data?
“Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few.
This data is “big data.”
© 2012 IBM Corporation
IBM Security Systems
99© 2013 IBM Corporation
99
Huge amount of data
There are huge volumes of data in the world:+From the beginning of recorded time until 2003,
+ We created 5 billion gigabytes (exabytes) of data.
+In 2011, the same amount was created every two days+In 2013, the same amount of data is created every 10 minutes.
© 2012 IBM Corporation
IBM Security Systems
1010© 2013 IBM Corporation
1010
Big data spans three dimensions: Volume, Velocity and Variety
Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information.
– Turn 12 terabytes of Tweets created each day into improved product sentiment analysis – Convert 350 billion annual meter readings to better predict power consumption
Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.
– Scrutinize 5 million trade events created each day to identify potential fraud – Analyze 500 million daily call detail records in real-time to predict customer churn faster – The latest I have heard is 10 nano seconds delay is too much.
Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.
– Monitor 100’s of live video feeds from surveillance cameras to target points of interest – Exploit the 80% data growth in images, video and documents to improve customer
satisfaction
© 2012 IBM Corporation
IBM Security Systems
1111© 2013 IBM Corporation
1111
Finally….
`Big- Data’ is similar to ‘Small-data’ but bigger
.. But having data bigger it requires different approaches:
Techniques, tools, architecture… with an aim to solve new problems
Or old problems in a better way
© 2012 IBM Corporation
IBM Security Systems
1212© 2013 IBM Corporation
1212
Whom does it matter
Research Community Business Community - New tools, new capabilities, new infrastructure, new business
models etc., On sectors
Financial Services..
© 2012 IBM Corporation
IBM Security Systems
1313© 2013 IBM Corporation
1313
How are revenues looking like….
© 2012 IBM Corporation
IBM Security Systems
1414© 2013 IBM Corporation
1414
The Social Layer in an Instrumented Interconnected World
2+ billion
people on the
Web by end 2011
30 billion RFID tags today
(1.3B in 2005)
4.6 billion camera phones
world wide
100s of millions of GPS
enabled devices
sold annually
76 million smart meters in 2009… 200M by 2014
12+ TBs of tweet data
every day
25+ TBs oflog data
every day
? T
Bs
of
dat
a ev
ery
da
y
© 2012 IBM Corporation
IBM Security Systems
1515© 2013 IBM Corporation
1515
What does Big Data trigger?
From “Big Data and the Web: Algorithms for Data Intensive Scalable Computing”, Ph.D Thesis, Gianmarco
© 2012 IBM Corporation
IBM Security Systems
1616© 2013 IBM Corporation
1616
BIG DATA is not just HADOOP
Manage & store huge volume of any data
Hadoop File System
MapReduce
Manage streaming data Stream Computing
Analyze unstructured data Text Analytics Engine
Data WarehousingStructure and control data
Integrate and govern all data sources
Integration, Data Quality, Security, Lifecycle Management, MDM
Understand and navigate federated big data sources
Federated Discovery and Navigation
© 2012 IBM Corporation
IBM Security Systems
1717© 2013 IBM Corporation
1717
Types of tools typically used in Big Data Scenario
Where is the processing hosted?–Distributed server/cloud
Where data is stored?–Distributed Storage (eg: Amazon s3)
Where is the programming model?–Distributed processing (Map Reduce)
How data is stored and indexed?–High performance schema free database
What operations are performed on the data?–Analytic/Semantic Processing (Eg. RDF/OWL)
© 2012 IBM Corporation
IBM Security Systems
1818© 2013 IBM Corporation
1818
When dealing with Big Data is hard
When the operations on data are complex:–Eg. Simple counting is not a complex problem.–Modeling and reasoning with data of different kinds can get
extremely complexGood news with big-data:
–Often, because of the vast amount of data, modeling techniques can get simpler (e.g., smart counting can replace complex model-based analytics)…
–…as long as we deal with the scale.
© 2012 IBM Corporation
IBM Security Systems
1919© 2013 IBM Corporation
1919
Why Big-Data?
Key enablers for the appearance and growth of ‘Big-Data’ are:
+Increase in storage capabilities+Increase in processing power+Availability of data
© 2013 IBM Corporation
IBM Security Systems
2020
IBM big data • IBM big data • IBM big data
IBM big data • IBM big data • IBM big data
IBM
big
da
ta
•
IBM
big
da
taIB
M b
ig d
ata • IB
M b
ig d
ata
THINK