Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data...

53
Español Mario Nemirovsky

Transcript of Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data...

Page 1: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Español Mario Nemirovsky

Page 2: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

English Mario Nemirovsky

Page 3: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Silicon Valley version Mario Nemirovsky

Page 4: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

4

In March 2012, the Obama administration announced the big data research and development initiative.

The leading IT companies, such as SAG, Oracle, IBM, Microsoft, SAP and HP, have spent more than $15 billion on buying data management and analytics software.

This industry on its own is worth more than $100 billion.

Page 5: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

1. ¿Cuán grande es big data?

2. ¿De dónde proviene la data?

3. ¿Dónde se guarda?

4. ¿Cómo se analiza?

5. ¿Cómo se visualiza? (luego)

6. ¿Quién lo necesita?

Page 6: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Google was processing 20 PB a day in 2008

Wayback Machine had 3 PB +100 TB/month (3/2009)

Facebook has 2.5 PB of user data + 15 TB/day (4/2009)

eBay has 6.5 PB of user data + 50 TB/day (5/2009)

640K ought to be enough for anybody.

Page 7: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Large Hadron Collider in 2012

40 000 000 000 000 B/S (40 TB/S)

Air Bus A380 Generate 640TB per Flight

Twitter Generate 12 TB of data per day

New York Stock Exchange 1TB of data everyday

Walmart alone had 30 Billion RFID sensors in 2012

Page 8: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

The Model of Generating/Consuming Data has Changed

8

Old Model: Few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming data

Page 9: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Lots of data is being collected ◦ Web data, e-commerce

◦ department/grocery stores

◦ Bank/Credit Card

◦ Social Network

◦ Health

◦ Genetics

Big Data Everywhere!

Page 10: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,
Page 11: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Source: Avanade Global Survey: The Business Impact of Big Data, November 2010

Page 12: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Science ◦ Data bases from astronomy, genomics, environmental data,

transportation data, …

Humanities and Social Sciences ◦ Scanned books, historical documents, social interactions data,

new technology like GPS …

Business & Commerce ◦ Corporate sales, stock market transactions, census,

Entertainment ◦ Hollywood movies, MP3 files, …

Medicine ◦ MRI & CT scans, patient records, …

Page 13: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

HP envisions 1 trillion sensors in use around the world

There are many types of sensors temperature, pressure, level, humidity

speed, motion, distance

light or the presence/absence

Page 14: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

IoT: “expansion of connectivity” using IP networking of “things” into public and private IP networks, linking computing and storage resources, and also people

The “Industrialization” of IP networks reaches domains previously characterized by application specific, often non-IP networks

“Smart Objects” include

Organized into: Vehicles, Intelligent traffic controls and lighting elements, industrial automation, healthcare, etc.

Actuators: act on devices (e.g. turn on/off an engine, a light, close a valve, or even trigger a complex set of actions)

Sensors: measure power quality/voltage/…, pressure, mechanical constraints, video, pollution, gas/water/.. leaks, motion

Smart tags (RFID)

Rodolfo Milito

Page 15: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Today’s Dominant Endpoints

Dominant Endpoints in 2025

Industrial Automation

Healthcare

Intelligent Buildings

Precision Agriculture Transportation and Connected Vehicles

A person behind every device Devices clustered in systems

Rodolfo Milito

Page 16: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Non-trivial Extension of Cloud Computing from the Core to the Edge that enables a

whole new wave of services and applications

Virtualization, Multi-tenancy, & some distinctive features

fog = cloud close to the ground

Suites of Use Cases - (Mobile) Content Delivery

• Low latency Apps (gaming, streaming, augmented reality ...)

- Geo-distributed apps • Sensor/actuator networks, Smart Cities

- Large-scale distributed control systems • Connected Vehicle, Int.Transportation, Smart Grid

Fog is the platform where the Internet meets the physical world

Rodolfo Milito

Page 17: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Grid Data Latency Hierarchy (taken from Jeff Taft)

Multiple uses of same datum (latency requirements/destinations)

FO

G

CLO

UD

in

terp

lay

Rodolfo Milito

Page 18: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

¿Dónde se guarda?

What makes big data different? Why isn't saving/moving/copying big data as simple as using the tools we already have?

Page 19: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Big Data Store

• Difficult/slow transfers • Expense for storage/backup • Difficult to share and publish

Page 20: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,
Page 21: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

The process of examining large amounts of data of various types to uncover hidden patterns, unknown correlations, and other valuable information.

Page 22: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Predictive Power of Big Data Analytics in Healthcare

Analysis Of Farm Soil

Improving Oil and Gas Operations

Retailers are Using Big Data Analytics to Outperform Others

…..

Page 23: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Need of immediate response time

– Can't afford latency of sending up and back the chain

Closed-loop control

– In controlling physical systems – cannot depend on speed and availability of resources back at the data center – e.g. smart traffic light system

Privacy, Data-ownership considerations

– Regulatory and business concerns may not allow moving the data

Improved scale and aggregate throughput via parallelism

◦ -- Data sources often naturally distributed

Avoid sending unnecessary Data

– Offload centralized resources that would otherwise have to filter through volumes of uninteresting/useless data.

Rodolfo Milito

Page 24: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Analysis Sensing

Action

Data Centers (Central or Distributed)

Em

erg

ing F

ootp

rint fo

r D

istrib

ute

d In

tellig

ence &

Pro

cessin

g

Mobility NGN Cloud Security Video

Core

Multi-Service Edge

Edge (Embedded Systems and Sensors)

• “Data at Rest” aggregated collection and storage

• “Data at Rest” ETL and Analytics for Structured & Unstructured Data

• “What if” Analytics • Predictive Analytics • Streaming/CEP Analytics • Applications • Visualization & Reporting

• Networked Data Collection • Processing at the Edge

• Streaming ETL (e.g. Filtering, Transformation, Aggregation)

• Streaming/CEP Analytics • Real-time Alerts and Actions • Applications Execution

• Localized Visualization & Reporting

• Networked Data Collection • Processing at the Edge

• Streaming ETL (e.g. Cleansing, Filtering, Transformation, Aggregation)

• “Skinny” Streaming/CEP Analytics, Alerts and Actions

Rodolfo Milito

Page 25: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

• It is not just lots of data (structured?)

• It is not just exponential growth of data

• It is new ways of making sense over data that require changes to existing architectures.

• Big Data, the term, in its current use, implies many other things, like:

• Apache Hadoop Framework

• Commodity hardware leveraging Moore’s law

• Infinite scalability

• No data temples

Page 26: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

No single standard definition…

“Big Data” is unstructured data whose scale,

diversity, and complexity require new

architecture, techniques, algorithms, and

analytics to manage it and extract value and

hidden knowledge from it…

29

Page 27: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Data Volume

Data volume is increasing exponentially

30

Page 28: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Various formats, types, and structures

Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc…

Static data vs. streaming data

31

Page 29: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Data is begin generated fast and need to be processed fast

Online Data Analytics

33

Page 30: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

how we can capture the most important data as it happens and deliver that to the right people in real-time

how we can store the data

how we can analyze and understand it given its size and our computational capacity

other challenges from privacy and security to access

Page 31: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Greater than the challenges are the opportunities

We can extract ◦ insight and knowledge ◦ identify trends ◦ use the data to improve productivity ◦ gain competitive advantage ◦ create substantial value for the world economy

Big data provides an opportunity to find insight in new and emerging types of data.

Argentina can take advantage of these

opportunities

Page 32: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Discovery of useful, possibly unexpected, patterns in data

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

Page 33: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Aggregation and Statistics ◦ Data warehouse and OLAP

Indexing, Searching, and Querying ◦ Keyword based search

◦ Pattern matching (XML/RDF)

Knowledge discovery ◦ Data Mining

◦ Statistical Modeling

Page 34: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Security

Finance Smarter Healthcare

Telecom

Manufacturing

Traffic Control

Trading Analytics Fraud and Risk

Precision Agriculture

Search Quality

Retail: Churn, NBO

Multi-channel sales

Page 35: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

• HealthCare - Deep Analytics (pattern recognition) - Assisted Living, Home Care, Athletics Apps

Precision Agriculture Oil and Gas Transportation Smart Cities - Smart Traffic Lights - Pollution Monitoring - Infrastructure Health Monitoring

Connected Vehicle & Rail Smart Grid Retail Industry

Rodolfo Milito

Page 36: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

IP WAN Backhaul

(Cellular, Broadband, Ethernet, Serial)

802.11 Wi-Fi or Ethernet LAN

Aggregation Point

(e.g., Farm House)

Mobile Endpoints

(Tractors, Implements)

Small Cell

Cellular

Fixed Endpoints

(Environmental Sensors – Water, Nitrogen)

IPv6 enabled

802.15.4g/e

RF Mesh

Internet / Cloud / VPN

Satellite

Endpoint

IPv6 Stack 802.11

Wi-Fi

1 2 3

Macro Cell

Cellular

2

Satellite

Rodolfo Milito

Page 37: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Category Requirements Pluses Minuses Comments

Intelligent Irrigation System

Sensor network and access Edge and Core integration (sensor information + weather forecast)

Better yields Water savings Sensors can also measure soil conditions

Cost of deployment

Wi-Fi infrastructure helps

Produce Tracking

Tagging & tracking system

Provenance guarantees

Opportunities in Precision Farming Rodolfo Milito

Page 38: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Rodolfo Milito

Page 39: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Smart Water Structural Health

Intelligent Transportation

Environmental Monitoring

Safety & Security

Public Lightin

g

Se

rvic

es

PublicCloud

Subscription Based Services

Private Cloud

Security, ITS, Lighting, Water

Op

era

tio

n

Infr

as

tru

ctu

re

En

d P

oin

ts

Ethernet

WiFi, 802.11P, Wave2M, Low Power RF, PLC, 802.15.4, etc.

NMS

S+CC Service Delivery Platform

IoT

for

Sm

art &

Connecte

d C

om

munit

ies

Sm

art

Tra

ffic

Lig

ht

Syste

m

Rodolfo Milito

Page 40: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Op

era

tio

n

Infr

as

tru

ctu

re

En

d P

oin

ts

Se

rvic

es

Public Cloud

Subscription-based Services

Private (OEM) Cloud

Data Center/Virtual Servers

Enterprise Cloud

Enterprise Video, Voice, Data

V2V Communication (802.11p)

Electrical Charging Network Charging Stations,

Other Services (802.11p ?)

Mobile WiFi Offload Wi-Fi Hotspots, 802.11u, 3G/4G

Consumer Network Home/Dealership Wi-Fi Hotspots, Femtocells

Mobile SP 1 Mobile SP 1 Communications Service Providers, “Fog”

VNO Policy Enforcement, Flow-based

Management, DPI

Software

DSRC Roadside Infrastructure 802.11p (V2I)

Mobile SP 1 Mobile SP 1

Energy Service Providers

(Smart Grid)

V2I/Upstream Communication (Wi-Fi, 3G/4G, 802.11p, etc.)

Rodolfo Milito

Page 41: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Travelers Centers

Vehicles Field

47

Rodolfo Milito

Page 42: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Roadside multi-purpose equipment based on convergence of routing, computing and wireless technologies

Distributed, multi-tenancy computing model

Supporting multiple wireless technologies

Located with other traffic control equipment

Purpose - Managed Service ◦ Regulate traffic (Traffic Router – cars, IP

packets, same) ◦ Collect tolls taxes (per transaction fee

collection) ◦ E-Commerce support ◦ Content delivery ◦ Traffic sensor management (e.g., Sensys)

Rodolfo Milito

Page 43: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Big Data Integration is Multidisciplinary

Less than 10% of Data world are genuinely relational

Meaningful data integration in the real, messy, schema-less

and complex Big Data world of database and semantic web

using multidisciplinary and multi-technology methode

The Billion Triple Challenge

Web of data contain 31 billion RDf triples, that 446million of

them are RDF links, 13 Billion government data, 6 Billion

geographic data, 4.6 Billion Publication and Media data, 3 Billion

life science data

BTC 2011, Sindice 2011

Demonstrate the Value of Semantics: let data integration drive

DBMS technology

Large volumes of heterogeneous data, like link data and RDF

Page 44: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

53

Page 45: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Jobs - The U.S. could face a shortage by 2018 of 140,000 to 190,000

people with "deep analytical talent" and of 1.5 million people capable of analyzing data in ways that enable business decisions. (McKinsey & Co)

- Big Data industry is worth more than $100 billion and growing at almost 10% a year (roughly twice as fast as the software business)

Page 46: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

In 2008 it the paper ¨Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society¨ ◦ Just as search engines have transformed how we

access information, other forms of bigdata computing can and will transform the activities of companies, scientific researchers, medical practitioners, and our nation's defense and intelligence operations.

In 2012, the Obama administration announced the Big Data Research and Development Initiative

Page 47: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Let´s catch the wave Argentina! Puedes ser un líder en Big Data

Qué debemos hacer para subirnos al tren….

Page 48: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,
Page 49: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

9

- Government

In 2012, the Obama administration announced the Big Data Research

and Development Initiative

84 different big data programs spread across six departments

- Private Sector

- Walmart handles more than 1 million customer transactions every hour,

which is imported into databases estimated to contain more than

2.5 petabytes of data

- Facebook handles 40 billion photos from its user base.

- Falcon Credit Card Fraud Detection System protects 2.1 billion active

accounts world-wide

- Science

- Large Synoptic Survey Telescope will generate

140 Terabyte of data every 5 days.

- Large Hardon Colider 13 Petabyte data produced in 2010

- Medical computation like decoding human Genome

- Social science revolution

- New way of science (Microscope example)

Page 50: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

◦ Crowler Ingestion processes of the data

Custom processing

highly specialized

◦ User accessing and using data Transaction processing (storage processing gfs) capture thru interaction

spaner

◦ Processing it analysys

Mapreduce hadoop (batch mode)

Machine learning

Smart quering

Required many eng. Teams to solve this …

Page 51: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Data from everywhere ◦ You should not care where from

Medical – health genone genetic map and tracking

Consumer related kmart.target, walmart

Auto industry car status

Page 52: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Internet plays a key role

Enterprise, health, retail, government, finantial

New DB, new Storage

What new ◦ 3V volume,velocity,variety

◦ 4S source,size,speed,structure

◦ Tipical data Create,Read,Update,Delete CRUD now Create,Replicate,Apende (not delet just apend),Processing

Page 53: Español Mario Nemirovsky€¦ · 4 In March 2012, the Obama administration announced the big data research and development initiative. The leading IT companies, such as SAG, Oracle,

Retailing Finantial Healthcare Data from video IoT Hadoop is leader in 2 key elements ◦ Distributed file system ◦ Mapreduce

BD on the Cloud Oportunities ◦ Farmers whether crop faliors ◦ Pandemics ◦ Heath care 150B saving

IoT cisco predicts that in 2015 4.8 Billon Therabytes trafic