MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

11
MIS 3500 Instructor: Bob Travica Newer DB Topics 2015

Transcript of MIS 3500 Instructor: Bob Travica Newer DB Topics 2015.

MIS 3500Instructor: Bob Travica

Newer DB Topics2015

2

Big Data

3 big V:

Volume: terabytes (15 zeroes), petabytes (18 zeroes)

Variety: Social media, communications, sensors everywhere*, Internet of Things, video feeds, GPS… Implication: various formats

Velocity: wired and wireless continuous feeds

3

Goals and Uses

Goals:

Integrate data on the same object across sources (Customer, Citizen etc.; spatial mashups)

Analysis: Existing patterns, Predictive analysis

Application domains:

Monitoring for business & other purposes (sensors)

Marketing (relationship mktg., Sentiment analysis is social media…)

Energy grid management

Transportation networks management

Health (analysis of cancer cell behavior and of patient vital signs)

Science (human genome)

Policy analysis (United Nations’ system for predicting social problems)

4

Big Data Tasks

5

Machine-generated data (sensors); automatic creation and transfer *

Home appliances (security, energy consumption, heating, food, entertainment)

Monitoring/Control (cars, athletic equipment, machinery, appliances)*

Example: Smart power grid**

Smart meter; Internet & Wi-Fi connectivity

6

Technologies

Hadoop (framework for file system and processing of large datasets on server clusters)*

Machine learning – automated construction of models to fit data (instead of hypothesis testing as with DW and Analytics)

Open source

Notable developers: Yahoo, Facebook, Yahoo!, Google, Microsoft

Microsoft Azure-based

Hadoop

7

DATA

PROCESSING

8

A database for Big Data

Distributed, non-relational, scalable

Based on Google’s BigTable *

Row Key (reversed URL)

Time Stamp

Column Key – “Anchor” (Family) + URLpart (Qualifier)

"com.cnn.www" t9 anchor:cnnsi.com = "CNN"

"com.cnn.www" t8 anchor:my.look.ca = "CNN.com"

Row Key Time Stamp

Column Key – “Contents” + keyword in tagged content

"com.cnn.www" t6 contents:html = "<html>… "

"com.cnn.www" t5 contents:html = "<html>… "

"com.cnn.www" t3 contents:html = "<html>… "

DATA are cites of “CNN*” Referencing sites

DATA are webpages Compressed. There can be anyNumber of unbound Contents Columns.

All columns put together make a “BigTable”.

9

NoSQL – Not Only SQL

10

Modern Database environments

11

Modern Database environments