E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and...

44
1 © 2011 Columbia University E6885 Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University November 28th, 2011 © 2011 Columbia University 2 E6885 Network Science – Lecture 11: Graphical Models and Analysis Course Structure Final Project Presentation 14 12/19/11 Large-Scale Network Processing System 13 12/12/11 Cognitive Networks and Economy Issues in Networks 12 12/05/11 Graphical Models and Analysis 11 11/28/11 Information Diffusion in Networks 10 11/21/11 Final Project Proposal Presentation 9 11/14/11 Dynamic Networks -- II 8 10/31/11 Dynamic Networks -- I 7 10/24/11 Network Topology Inference 6 10/17/11 Network Modeling 5 10/10/11 Network Visualization, Sampling and Estimation 4 10/03/11 Network Partitioning, Clustering, and Use Case 3 09/26/11 Network Representations and Characteristics 2 09/19/11 Overview – Social, Information, and Cognitive Network Analysis 1 09/12/11 Topics Covered Class Number Class Date

Transcript of E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and...

Page 1: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

1

© 2011 Columbia University

E6885 Network Science Lecture 11:

Graphical Models and Analysis

E 6885 Topics in Signal Processing -- Network Science

Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University

November 28th, 2011

© 2011 Columbia University2 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Course Structure

Final Project Presentation 1412/19/11

Large-Scale Network Processing System 1312/12/11

Cognitive Networks and Economy Issues in Networks 1212/05/11

Graphical Models and Analysis1111/28/11

Information Diffusion in Networks1011/21/11

Final Project Proposal Presentation911/14/11

Dynamic Networks -- II810/31/11

Dynamic Networks -- I710/24/11

Network Topology Inference610/17/11

Network Modeling510/10/11

Network Visualization, Sampling and Estimation410/03/11

Network Partitioning, Clustering, and Use Case309/26/11

Network Representations and Characteristics209/19/11

Overview – Social, Information, and Cognitive Network Analysis109/12/11

Topics CoveredClass

Number

Class

Date

Page 2: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

2

© 2011 Columbia University3 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Graphical Routing for Large-Scale Video Streams

Create breakthrough technology to enable

…large-scale production, analysis and management of

…information and knowledge from

…thousands of disparate input sources

…using an autonomic system that

…continuously adapts processing to satisfy demands of

…changing environment, content, and requests.

© 2011 Columbia University4 E6885 Network Science – Lecture 11: Graphical Models and Analysis

� 10Gbit/s Continuous Feed Coming into System� Types of Data

• Speech, text, moving images, still images, coded application data, machine-to-machine binary communication

� System Mechanisms

• Telephony: 9.6Gbit/sec (including VoIP)

• Internet

� Email: 250Mbit/sec (about 500 pieces per second)

� Dynamic web pages: 50Mbit/sec

� Instant Messaging: 200Kbit/sec

� Static web pages: 100Kbit/sec

� Transactional data: TBD

• TV: 40Mb/sec (equivalent to about 10 stations)

• Radio: 2Mb/sec (equivalent to about 20 stations)

What is Large-scale?

Page 3: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

3

© 2011 Columbia University5 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Example IP Packet Stream Instantiation

ip http

ntp

udp

tcp ftp

rtp

rtsp

video

audio

200-500MB/s ~100MB/sper PE rates

10 MB/s

InputsDataflow

Graph

© 2011 Columbia University6 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Semantic MM Routing and Filtering

200-500MB/s ~100MB/sper PE rates

10 MB/s

InputsDataflow

Graph

ip http

ntp

udp

tcp ftp

rtp

rtsp

sessvideo

sessaudio Interest Routing

keywords id

Packet content analysis

Advanced content analysis

Interest Filtering

Interested MM streams

Page 4: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

4

© 2011 Columbia University7 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Dense-Information-Grinding

� Fast and Just-in-Time multimedia (video, speaker, and closed caption) stream

classification in large-scale environment.

� Explores the tradeoffs in resource requirements (e.g., algorithmic complexity,

streaming rate) and classification accuracy

© 2011 Columbia University8 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Filtering Multimedia from Networks

� Fast and Just-in-Time multimedia (video, speaker, and closed caption) stream

classification in large-scale environment.

� Explores the tradeoffs in resource requirements (e.g., algorithmic complexity,

streaming rate) and classification accuracy

Page 5: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

5

© 2011 Columbia University9 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Traditional Workflow Approach – Database Mining

� Decoding

– Input data to raw data

� Front-End

– Feature extraction from raw data

� Algorithm (E.g. speech recognition)

– Uses extracted features

Input data

Decoder

Front End

Algorithm

Annotated data

© 2011 Columbia University10 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Objective

� Configurable Parameters of Processing Elements to maximize relevant information:

– Y’’(X | q, R) > Y’(X | q, R),

with resource constraint.

� Required resource-efficient algorithms for:

– Classification, routing and filtering of signal-oriented data: (audio, video and, possibly, sensor data)

X

R

Y(X|q)Y’’(X|q,R)X’

� Input data X – Queries q – Resource R

– Y(X | q): Relevant information

– Y’(X | q, R) Y(X | q): Achievable subset given R

Page 6: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

6

© 2011 Columbia University11 E6885 Network Science – Lecture 11: Graphical Models and Analysis

An Example of Audio/Video Filtering

Sequence filtering

Shot segmentation,

key frame extraction

Not read

MPEG-1 video

encoding

Filtering based on

texture/motion properties

Filtering based on foreground,

background properties

Region Segmentation

Regionfiltering

Feature Extraction

Filtering based on certain

concept-level features

Concept Classification

Filtering based on complex concepts

� Reductions can be done on both syntax-level transcoding/filtering and semantic level filtering.

� New Research Focuses on Semantic Concept Filtering.

© 2011 Columbia University12 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Audio Classification/Routing

� Media data:

– Speech

� Media format:

– GSM FR 06.10

� Sample query (i.e. application):

– GSM audio - male

SVM-based

Classification

GSM

Networked

Player

GSM Codec /

RTP Streamer

Male

IP Port 1234

GSM Audio

Over RTP/UDP

Wav files

GSM

Networked

Player

Silence

FemaleIP Port 2346

Intranet

Office Desktop: 9.2.10.85 Laptop T30: 9.2.89.55

GSM

DFE

SVM/GSM

Model

Page 7: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

7

© 2011 Columbia University13 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Video Concept Classification/Routing

� Objective:

– Build Moderate Amount of Real-Time Concept Classifiers, e.g., Human, Outdoors, Face, Indoors, etc.

� Test-bed Corpus:

– NIST TRECVID 2003 corpus: 128 hours of MPEG-1 videos, 62 hours of them were manually annotated by 23 worldwide groups using IBM VideoAnnEx Collaborative Annotation System.

– 38 hours of video for training; 24 hours (partitioned to 6-, 6-, 12-hour sets) for internal evaluation; 65 hours for NIST benchmarking.

Classification

VideoLAN

Networked

Player

RTP Streamer

Outdoors

IP Port 1234

MPEG-1 Video

Over RTP/UDP

MPEG-1 files

VideoLAN

Networked

Player

Outer Space

IndoorsIP Port 2346

Intranet

Office Desktop: 9.2.10.85 Laptop T40: 9.2.89.55

Compressed-

Domain

Feature

Extraction

Concept

Model

Office Desktop: 9.2.64.86

© 2011 Columbia University14 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Shot Segmentation

Semi-ManualAnnotation

Feature Extraction

BuildingClassifier Detectors

Region Segmentation

Training Video Corpus

Lexicon

• Regions– Object (motion, Camera registration)

– Background (5 lg regions / shot)

Features• Color:

– Color histograms (72 dim, 512 dim), Auto-Correlograms (72 dim)

• Structure & Shape:

– Edge orientation histogram (32 dim), Dudani Moment Invariants (6 dim), Aspect ratio of bounding box (1dim)

• Texture:

– Co-occurrence texture (48 dim), Coarseness (1 dim), Contrast (1 dim), Directionality (1 dim), Wavelet (12 dim)

• Motion:

– Motion vector histogram (6 dim)

Prior Art: Supervised Learning for Building Generic Concept Detectors

Supervised learning: Classification and Fusion:• Support Vector Machines (SVM)

• Ensemble Fusion

• Other Fusions (Hierarchical, SVM, Multinet, etc.)

Page 8: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

8

© 2011 Columbia University15 E6885 Network Science – Lecture 11: Graphical Models and Analysis

IBM Research Video Concept Detection Systems (Aug. 2003)

EF

EF2

Post-processing

BOU BOF BOBO

Uni-models Multi-Modality Fusion

Multi-Model Fusion

MLP

DMF17

MN

ONT

DMF64

V1

V2

V1

V2

Best Uni-modelEnsemble Run

Best Multi-modalEnsemble Run

Best-of-the-BestEnsemble Run

Annotation and Data

Preparation

Videos (128 hours MPEG-1 )

Collaborative

Annotation

Featu

reE

xtractio

n

CH

WT

EH

MI

CT

MV

TAM

CLG

AUD

CC

Regio

nE

xtractio

n

Filte

ring

VW

MLP

: training only: training and testing

Feature Extraction

SD/A

By IBM TRECVID 2003 Team

© 2011 Columbia University16 E6885 Network Science – Lecture 11: Graphical Models and Analysis

MPEG-7 XML

Shot Segmentation

RegionSegmentation

Feature Extraction

Classifi-cation

MPEG-1 sequence

• Key-Frame Extraction• Story Boards• Video Editing

Detectors

• Content-Based Retrieval• Relevance Feedback

• Model-based applications

• Object-based compression, manipulation

XML Rendering

• detectors

“Text-overlay” “Outdoors” “People”

Prior Art: Classification Steps

Page 9: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

9

© 2011 Columbia University17 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Novel Visual Semantic Concept Filters

� 100 visual filters were built (v.s.

64 in 2003).

� Concepts are arranged

hierarchically.

� Based on novel compressed-

domain sliced features.

� Neither segmentation nor fusion

is used.

© 2011 Columbia University18 E6885 Network Science – Lecture 11: Graphical Models and Analysis

VideoDIG Demo– configurable and scalable video semantic concept detection/filtering

Face

Outdoors

Indoors

PE1: 9.2.63.66: 1220

PE2: 9.2.63.67

PE3: 9.2.63.68

Female

Male

Airplane

Chair

Clock

PE4: 9.2.63.66:1235

PE5: 9.2.63.66: 1240

PE6: 9.2.63.66

PE7: 9.2.63.66

PE100: 9.2.63.66

(Server) Concept Detection PEs

60 Mbps

TV broadcast, VCR,

DVD discs, Video

File Database,

Webcam

1.5 Mbps

MPEG-1/2

GOP Extraction

ShotSegmen-

tation

CDS Features

2.8 Kbps

(Client) Feature Extraction PEs& Display Modules

Meta-data

600 bps

ControlModules

Resource Constraints

UserInterests

Feature Extraction

320 Kbps

22.4 Kbps

DisplayController

On/Off

Encod-ing

Page 10: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

10

© 2011 Columbia University19 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Main Process

� This feature set is extracted as follows:

1. Parse the MPEG-1/2 packets and get the beginning of an I-frame or the closest I-frame of a pre-specified shot keyframe.

2. Using the VLC maps to map variable-length codes to the DCT-domain coefficients.

3. Within an MPEG slice or a union of slices, truncate selected DCT coefficients and calculate the histogram of these DCT coefficients.

4. Form a feature vector of the frame based on the histogram coefficients with multiple slices.

� In the above procedure, we can see that no multiplication operation is required to get these feature vectors. Only addition is needed for getting the histogram.

� In a typical situation, we partition a frame into three slices, and use 3 DCT coefficient histogram (1 DC and 2 lowest frequency AC coefficients) on all YCbCr domains. This forms a 576-dimensional feature vector.

� After feature extraction, SVM is used to train models and classification.

© 2011 Columbia University20 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Three Measurements of Multimedia Semantic Concept Detectors

Concept Model

Accuracy

Concept Model EfficiencyConcept Coverage

Page 11: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

11

© 2011 Columbia University21 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Performance Metric -- Precision-Recall Curve and Average Precision

� Example:

– Finding relevant shots that include overlay-text

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Pre

cisi

on

Recall

Precision vs. Recall − Text OverlayIBMR2OM1SYS1L2RAD1

�Average Precision :

where Ni are all the ranks of the relevant retrieved shots.

∑=iN

i

GT

NPN

AP @1

Example of critical points

Average Precision is the area under the P-R (ROC) curve

Retrieved Set

Real Set

© 2011 Columbia University22 E6885 Network Science – Lecture 11: Graphical Models and Analysis

0102030405060708090

100

Outdo

ors

NS_F

ace

Peo

ple

Building

Roa

d

Veg

etation

Animal

Fem

ale_

Spe

ech

Car/Truck

/Bus

Airc

raft

NS_M

onolog

ue

Non

-Studio_

Settin

g

Spo

rts_

Eve

nt

Wea

ther

Zoo

m_In

Phy

sica

l_Violenc

e

Malde

line_

Albrig

ht

Mea

n

Best IBM

Best Non-

IBM

Reference – Comparison of Precision at Top 100 and Average Precisions (NIST TRECVID 2003 Official Result)

00.10.20.30.40.50.60.70.80.9

1

Outdo

ors

NS_F

ace

Peo

ple

Building

Roa

d

Veg

etation

Animal

Fem

ale_

Spe

ech

Car/Truch

/Bus

Airc

raft

NS_M

onolog

ue

Non

-Studio_

Settin

g

Spo

rting

Wea

ther

Zoo

m_In

Phy

sica

l_Violenc

e

Malde

line_

Albrig

ht

Mea

n

Best IBM

Best Non-

IBM

Precision @ Top 100

Official NIST Average Precision Values

Page 12: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

12

© 2011 Columbia University23 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Progress on Novel Concept Filters for Distillery MM Semantic Routing

Concept Model

Accuracy

Concept Model EfficiencyConcept Coverage

� Baseline: (state-of-the-art) IBM TREC ’03 Visual Detectors

56% increase

21% increase

More than 10 times of Increase on classification

© 2011 Columbia University24 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Performance Comparison of the Compressed-Domain Detectors and IBM 2003 Visual Detectors

0.0000

0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

0.8000

0.9000

HumanFace

Outdoors

Studio_Setting

Indoors

Male_Face

Weather_News

Sport_EventSky

Female_Face

Nature_Vegetation

Crowd

Mountain

People_Event

Water_Body

People

Vehicle (Transportation)

PersonRock

Building

Airplane

Graphics

Snow

Tree

Land

Bill_Clinton

Beach C

ar

RoadRiot

Flower

Desert

Cloud

Cityscape

Animal

Truck

Cartoon

Podium

Smoke

Physical_Violence

TrainFire

IBM 2003 Visual Models

New CDS Models

� Improved Mean Average Precision: 21.48%� Improved Efficiency:

� >2M multiplication operations needed for generating features for IBM-03 classification per key frame.

� no multiplication operations needed for new CDS models, only 6K addition operations.

Page 13: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

13

© 2011 Columbia University25 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Demo -- Novel Semantic Concept Filters

© 2011 Columbia University26 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Complexity Analysis on SVM classifiers

( , )ix x

rik x x e

−−

=

1

( ) ( , )S

i i

i

f x a k x x b=

= ⋅ +∑

� Complexity c: operation (multiplication, addition) required for classification

� SVM Classifier:

where S is the number of support vectors, k(.,.) is a kernel function. E.g.,

c S D∝ ⋅ where D is the dimensionality of the feature vector

support vectors

� Support Vector Machine

– Largest margin hyperplane in the projected feature space

– With good kernel choices, all operations can be done in low-dimensional input feature space

Page 14: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

14

© 2011 Columbia University27 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Complexity-Accuracy Curves for Adaptive PE Operations – Feature Dimension Reduction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

00.20.40.60.81

Complexity -- Feature Dimension Ratio

Accuracy -- Average Precision

0.01230.037037037111

0.070.074074074211

0.01730.074074074121

0.03180.074074074112

0.070.111111111311

0.03140.111111111131

0.32190.111111111113

0.10650.148148148221

0.34570.148148148212

0.06990.148148148122

0.10650.222222222321

0.16840.222222222231

0.34570.222222222312

0.12410.222222222132

0.65810.222222222213

0.4270.222222222123

0.52350.296296296222

0.16840.333333333331

0.65810.333333333313

0.46850.333333333133

0.52350.444444444322

0.58220.444444444232

0.77570.444444444223

0.58220.666666667332

0.77570.666666667323

0.78610.666666667233

0.78611333

APFeature Dimension

RatioTextureColorSlice

� Experimental Results for Weather_News

Detector

� Model Selection based on the Model

Validation Set

� E.g., for Feature Dimension Ratio 0.22,

(the best selection of features are: 3

slices, 1 color, 2 texture selections), the

accuracy is decreased by 17%.

© 2011 Columbia University28 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Complexity-Accuracy Curves for Adaptive PE Operations – Reduction on the Number of Support Vectors

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

00.20.40.60.81

Complexity -- Number of Support Vectors

Accuracy - Average Precision

� Proposed Novel Reduction Methods:

– Ranked Weighting

– P/N Cost Reduction

– Random Selection

– Support Vector Clustering and Centralization

� Experimental Results on Weather_News Detectors show that complexity

can be at 50% for the cost of 14% decrease on accuracy

Page 15: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

15

© 2011 Columbia University29 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Speech and Text-based Topic Detection

Query ConceptWeather News

weather, weather condition, atmosphericcondition

=> cold weather => fair weather, sunshine

=> hot weather

Weather conditionAtmospheric condition

Cold weatherFair weather

Weather forecastWarm weather

Speech-basedRetrieval

Weatheratmosphericcold weather freeze frostsunshine temperscorcher sultrythaw warm downfall hail rain

building edificewalk-upbutcheryapart buildtenementarchitectur

ecall centersanctuaryBathhous

e

animalbeastbrutecritterdarterpeepercreatureFaunamicroorganis

mpoikilotherm

airplaneaeroplaneplaneairlineairbustwin-aisle airplaneamphibianbiplanepassengerfighter aircraft

Weather news

BuildingAnimalAirplane

� Unsupervised learning from WordNet:

Example of Topic – Related Keywords

© 2011 Columbia University30 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Concept Detection based on Automatic Speech Recognition

Concept Detection based on ASR

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Airplan

e

Animal

Building Fac

e

Mad

ele ine_A

lbright

Outdo

ors

Na ture

_Vegeta

tionPeo

ple

Phys ica

l_Violenc

e Road

Sport_

Event

Weath

er_New

sM

AP

Concept

Av

era

ge

Pre

cis

ion

Baseline (HJN)

Supervised

Revised Supervised

WordNet

weather, weather condition, atmospheric condition, cold weather, freeze, frost, fair weather, sunshine, temperateness, hot weather, scorcher, sultriness, thaw, thawing, warming, precipitation, downfall, fine spray, hail, rain, rainfall, monsoon, rainstorm, line storm, equinoctial storm, thundershower, downpour, cloudburst, …..

rain 334, temperatur 327, continu 287, weather 284, forecast 269, coast 254, shower 254, storm 245, california 242, northern 226, thunderstorm 225, snow 224, west 208, move 206, lake 195, southeast 187, northeast 175, northwest 172, heavi 170, eastern 161, plain 158, southern 157, expect 153, rocki 153, high 151, warm 147, gulf 143, headlin 142, look 142, great 140, caribbean 140, todai 139, seventi 137, east 135, eighti 134, thirti 133, hawaii129, meteorologist 125, midwest 125

Supervised:

WordNet:

Page 16: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

16

© 2011 Columbia University31 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Complexity-Accuracy on Concept Detection based on Automatic Speech Recognition

Tradeoff of Complexity and Accuracy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99

Number of Expanded Keywords

Av

era

ge

Pre

cis

ion

Face

Madeleine_Albright

Outdoors

Nature_Vegetation

People

Physical_Violence

Road

Sport_Event

Weather_News

© 2011 Columbia University32 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Summary

� On-Demand Video Routing

� Novel processing-efficient visual classifiers:

– The first system with 100 concept classifiers.

– Improved accuracy performance over state-of-the-art classifiers.

– Improved classification efficiency over state-of-the-art classifiers.

� Novel Technologies for adaptive classifiers:

– Feature space dimension reduction.

– Support Vector reduction.

� Testing and Training on Large Data Set

� Demos

Page 17: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

17

© 2011 Columbia University33 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Ongoing Work

� Further study inherent tradeoffs {accuracy – complexity} between processing-efficient representations of signal-oriented data and classification techniques

� Tradeoffs between classification accuracy – complexity – noisy inputs.

� Create breakthrough technologies for highly scalable, adaptive, self-organizing, and intelligent management of high volume of media data streams

© 2011 Columbia University34 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Healthcare Networks – example on Sleep Monitoring

� Knowing a person’s long-term sleep pattern is important.

� Current sleep quality monitoring is usually conducted at:

– Clinics with specific complicated devices such as PSG.

– Home using accelerometers (Actigraph) to record limb movements.

� Major drawbacks:

– Because the sleeping environment is different, a subject’s clinical sleep quality measurements may be affected by other factors that decrease the reliability.

– Long term measurement of sleep quality is difficult through clinical measurement.

– Actigraph only provide a single modality measurement. Wearing a specific device may be considered intrusive.

– Subjective reports (e.g., sleep diary or PSQI) may not be reliable.

Page 18: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

18

© 2011 Columbia University35 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Our Goals

� Objective measurements:

– Develop simple (wireless) multimodality sensors at home for long-term sleep logging.

� Early diagnoses based on machine cognition:

– Instead of simply recording the signals, we are interested at developing inference techniques for:

• Sleep pattern detection

• Sleep quality detection

• Sleep disorder detection

• Sleep-related diseases detection

© 2011 Columbia University36 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Many questionnaire items can be answered by automatic multimodality sensing

� Pittsburgh Sleep Quality Index (PSQI): a self-rated questionnaire (1 week/ 1 month)

� 19 individual items generate 7 component scores; their sum yields one global score.

� Example items:

-- How many hours of actual sleep did you get at night?

-- How often have you had trouble sleeping because

you…

Have to get up to use the bathroom?

Cough or snore loudly?

Had bad dreams? …

First Page of PSQI Questionnaire

Page 19: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

19

© 2011 Columbia University37 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Some Sleep Activity Measure Metrics May Be Inferred by Multimodality Sensors

� Many of the 7 component scores in PSQI can be automatically filled up via audio-visual monitoring:

– Subject Sleep Quality

– Sleep Latency

– Sleep Duration

– Habitual Sleep Efficiency

– Sleep Disturbances

– Use of Sleep Medication -- may be inferred by observing abnormal patterns

– Daytime Dysfunction – need additional wearable sensors

� Sleep-related diseases

--- sleep apnea, restless legs syndrome…; they often show several syndromes during sleep. These syndromes may be observable through audio-visual signals.

© 2011 Columbia University38 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Our Current Status

� What we have done:

– Developed visual, audio, heartbeat and infrared sensors for sleep monitoring

– Inference a person’s sleep pattern by sleep/awake detection

– Preliminary inference of sleep quality

– Logging of sleep situation

� What we may do next:

– Early detection and long term monitoring of sleep related diseases

– Validation of the effectiveness of simple multimodality sensors with rigorous field study

– Daytime wearable sensors to monitor dysfunctions caused by sleep disorder

– Others…

Page 20: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

20

© 2011 Columbia University39 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Outline

� Introduction

� Using simple-multimodality sensors to infer sleep condition (sleep vs. awake) and preliminary sleep quality measurement

� Experiments and results

� Conclusion and future works

© 2011 Columbia University40 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Using simple-multimodality sensors for sleep monitoring

� Overall outline

Brightness

Temperature

Humidity

Window curtain

Music volume

Heartbeat

audio

input

Inference engine

Video?

�Focus on sleep/awake detection and extend the result to preliminary sleepquality inference first

Page 21: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

21

© 2011 Columbia University41 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Using simple-multimodality sensors for sleep condition inference

� Data modalities:

-- physiological (heart-rate), motion, sound

� System for asleep-awake detection

© 2011 Columbia University42 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Data modalities and corresponding sensors

�Heart-rate

-- sensor: Garmin Forerunner 301

http://www.garmin.com

�Motion

-- sensor: infrared webcam

http://shop.store.yahoo.com/insidecomputer/6inniusb35we.html

� Sound

-- sensor: laptop+ audio-recording software

Page 22: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

22

© 2011 Columbia University43 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Feature extraction (heart-rate part)

� Extracted feature

-- power spectrum

-- wavelet coefficients

© 2011 Columbia University44 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Feature extraction (motion part)

� Normalized motion amplitude histogram (ME of consecutive I frames)

-- Block size=8x8, SR=5 in both x, y directions

-- define 6 bins as

max( abs(Δx) ,abs(Δy))<1

max( abs(Δx) ,abs(Δy))<2

� Non-motion ratio: 1st bin in the normalized motion amplitude histogram

� Extracted feature

-- Fourier transform coefficients of non-motion ratio

0

0.1

0.2

0.3

0.4

1 2 3 4 5 6…

Page 23: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

23

© 2011 Columbia University45 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Feature extraction (audio part)

� Extracted feature

-- Amplitude

-- Fourier transform coefficients

-- Mel-frequency cepstrum coefficients (MFCC)

� For each data modality, different lengths of windows with different overlaps are applied to extract the data for analysis

© 2011 Columbia University46 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Classifier (1)� SVM (support vector machine)

--- linearly separable patterns

� discriminant function

bxwxf T +=)(

Page 24: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

24

© 2011 Columbia University47 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Classifier (2)

� SVM

--- nonlinearly separable patterns

�Kernel vs. nonlinear transform

�Discriminant function

)()(),( xxxxK T

i φφ=

∑=

+=S

i

ii bxxKaxf1

),()(

© 2011 Columbia University48 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Multimodalities fusion

�Ensemble fusion

�Gaussian normalization:

�Combiner function:

--- maxima

--- average

x

xxxf

σµ )(

)(−

=

,....),max()( 21 xxxf =

∑=

=N

i

ii xwxf1

)(

Page 25: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

25

© 2011 Columbia University49 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Outline

� Introduction

� Using simple-multimodality sensors for sleep condition (sleep vs. awake) inference and preliminary sleep quality measurement

� Experiments and results

� Conclusion and future works

© 2011 Columbia University50 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Data Collection

�28 days of HR, motion, sound data along with filled-up PSQI questionnaire

� Example data

Page 26: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

26

© 2011 Columbia University51 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Inference results of sleep-awake detection (1)

� 6 days for validation set and 20 days left for randomly partitioned training and testing sets

� Exclude audio data here…

� Motion data is a strong and dominant indicator for the sleep/awake detection…

0.61730.97640.91690.9359Accuracy

0.36440.06940.01740.0357Miss rate

0.54840.03120.59050.2283FA rate

Heart-rate onlyMotion onlyEnsemble fusion (maxima)

Ensemble fusion (average)

ClassifierAverage performance

© 2011 Columbia University52 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Inference results of sleep-awake detection (2)

� Modified experiments that the subject is inactive during the awake time

� Multi-modality fusion actually can improve the classification results under some situations

0.70.92940.93090.8410Accuracy

0.30760.00530.03600.1317Miss rate

00.97220.4750.4143FA rate

Heart-rate onlyMotion onlyEnsemble fusion (maxima)

Ensemble fusion (average)

ClassifierAverage performance

Page 27: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

27

© 2011 Columbia University53 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Comparison of video and Actigraph for sleep-awake detection

� Example

� Performance with the same method applied

0 20 40 60 80 100 120 140 160 180 200 2200

0.2

0.4

0.6

0.8

1

No

n-M

oti

on

Ra

tio

0 20 40 60 80 100 120 140 160 180 200 2200

200

400

600

minute

Mo

tio

n C

ou

nts

0.06220.0331Miss rate

0.92440.9383Accuracy

0.19560.2928FA rate

ActigraphVideo sensor

DeviceAverage performance (5 days)

© 2011 Columbia University54 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Preliminary results of automatic sleep quality indexing (1)

� 3 objective component scores in PSQI

--- sleep latency: the time you spend before falling asleep

--- sleep duration: total time you spend on the bed

--- habitual sleep efficiency: asleep time/total bed time

� Using our sleep-awake inference results, only count detected awake time before detected sleep situation

Page 28: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

28

© 2011 Columbia University55 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Preliminary results of automatic sleep quality indexing (2)

� Example results

� Provide a preliminary, automatic score range for subjective sleep quality measurement

>=4 (5)>=3 (3)Estimated PSQI (vs. subjective PSQI)

90.8% (89.9%)94.4% (93.4%)Habitual sleep efficiency (vs. subjective sleep efficiency)

217 minutes (217 minutes)304 minutes (304 minutes)Sleep duration (vs. self-recorded sleep duration)

20 minutes (22 minutes)17 minutes (20 minutes)Awake time before sleep (vs. subjective sleep latency)

Sep. 10Aug. 24

Example Testing DayInference Items (vs. subjective ground truth)

© 2011 Columbia University56 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Extended work (1)

� For privacy concern, people may be unwilling to use video

� Using economic PIR (passive infrared sensor) to detect the motion

Page 29: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

29

© 2011 Columbia University57 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Extended Work (2)

� PIR sensor, wireless TX and RX (zigbee communication)

© 2011 Columbia University58 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Extended Work (3)

� Visualization of example data

0 2000 4000 6000 8000 10000 120000

5

10

15

20

seconds

ON

-OF

F

PIR DATA 2005-11-25

0 2000 4000 6000 8000 10000 120000

200

400

600

seconds

Motion C

ounts

Actigraph DATA 2005-11-25

Page 30: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

30

© 2011 Columbia University59 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Conclusion and future work (1)

� A novel, economical system (multimodality sensors with machine learning methods) for sleep-awake detection

� About 0.8~0.9 detection accuracy using HR and video sensors

� Explore the possibility of using simple video sensor rather than the costly Actigraph (>=$1000 USD)

� Apply the sleep-awake inference result to an automatic, preliminary indexing for subjective sleep quality assessment

� Replace video sensor with PIR sensor for motion information acquisition (data collection is going on…)

© 2011 Columbia University60 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Conclusion and future work (2)

�Bottleneck

-- hardware limitation (ex. noisy HR data)

-- data collection (different subjects, better procedure…)

-- ground truth for more meaningful evaluation

-- better approach for sleep quality measurement (postsleep inventory?)

-- meaningful & valuable issues (ex. sleep log)?

�Near-Term Future work

-- distributed system (going on now…)

-- use of audio data for snoring detection, disturbance detection, etc.

-- behavior of HRV (heart-rate variation)

-- thorough measuring sleep quality via simple sensors

Page 31: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

31

© 2011 Columbia University61 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Information Spreading in Context

Task S1.3

© 2011 Columbia University62 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Understanding Info Routing

Topic based information flow

prediction

Information spreading in context – attributes of

people and information

A

B C

D E F

Page 32: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

32

© 2011 Columbia University63 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Understanding Info Routing

Topic based information flow

prediction

Information spreading in context – attributes of

people and information

A

B C

D E F

© 2011 Columbia University64 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Information Flow in Dynamic Probabilistic Complex Network [Lin ’07]

� [Assumption] Edge can be represented by a four-state S-D-A-R (Susceptible-Dormant-Active-Removed) Markov Model. Nodes can be represented by three states S-A-I (Susceptible-Active-Informed) Model.

, ,

, ,

, ,

, ,

Pr( ( ) )

Pr( ( ) )( ) ,

Pr( ( ) )

Pr( ( ) )

i j i j

i j i j

i j i j

i j i j

y t S

y t Dt

y t A

y t R

σψµρ

= = = =

=

i,jp ≜

where

( ) ( ) ( )

( ) ( ) ( )

( ) ,

( ) ( ) ( )

t t t

t t t

t

t t t

1,1 2,1 N,1

1,2 2,2 N,2

1,N 2,N N,N

p p p

p p p

P

p p p

⋯ ⋯

⋮ ⋮ ⋱ ⋮≜

⋮ ⋮ ⋱ ⋮

⋯ ⋯

( )

( )

( ) ,

( )

t

t

t

t

1

2

N

q

q

Q

q

≜ ⋮

( )

( , ( ), ( )),

t t

f t t

δ+P

M Q P≜

( )

( ( ), ( ), ( )),

t t

g t t t t

δ

δ

+

+

Q

P Q P≜

and

Pr( ( ) )

( ) Pr( ( ) ) ,

Pr( ( ) )

i i

i i

i i

x t S

t x t A

x t I

λη

ν

= = = =

iq ≜

, , , , 1i j i j i j i jσ ψ µ ρ+ + + = 1i i iλ η ν+ + =

Page 33: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

33

© 2011 Columbia University65 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Edges are Markov State Machines, Nodes are not

� State transitions of edges: S-D-A-R model. (Susceptible, Dormant, Active, and Removed) This indicates the time-aspect changes of the state of edges.

S A RD

1 α−

trigger

α β γ

1 β− 1 γ−1

� States of nodes: S-A-I model. (Susceptible, Active, and Informed) Trigger occurs when the start node of the edge changes from state S to state I :

Node view Network view

Edge view

S Itrigger

A

© 2011 Columbia University66 E6885 Network Science – Lecture 11: Graphical Models and Analysis

An example: Topic Detection and Key People Detection of “California Power” in Enron

(a)

Topic Analysis for Topic 61

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

Jan-00 Apr-00 Jul-00 Oct-00 Jan-01 Apr-01 Jul-01 Oct-01

Po

pu

lari

ty

Jeff_Dasovich 0.249863 James_Steffes 0.139212

Richard_Shapiro 0.096179 Mary_Hain 0.078131

Richard_Sanders 0.052866 Steven_Kean 0.044745

Vince_Kaminski 0.035953

Key People

power 0.089361 California 0.088160 electrical 0.087345 price 0.055940 energy 0.048817

generator 0.035345 market 0.033314 until 0.030681

Key Words

Event “California Energy Crisis” occurred at exactly this time period. Key people are active in this event except Vince_Kaminski …

Page 34: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

34

© 2011 Columbia University67 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Information Flows Modeling and Visualization

• Modeling needs to be up to individual influential and informative nodes

• Nodes can be ranked by their influence in the network and the novelty of information they contribute

© 2011 Columbia University68 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Topic based information flow

prediction

Information Spreading in Context – Attributes of People and Information

A

B C

D E F

Page 35: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

35

© 2011 Columbia University69 E6885 Network Science – Lecture 11: Graphical Models and Analysis

How a specific piece of information spreads?

Viral Marketing Chain Letters

José Luis Iribarren and E. Moro, Phys. Rev. Lett. 2009

D. Liben-Nowell and J. Kleinberg, Proc Natl Acad Sci 2008

� A Scientific Problem

© 2011 Columbia University70 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Limitations of Prior Researches

� the underlying social networks

� The attributes of the individuals that are involved?

� what attributes of the information is about?

� What can models be better beyond just graphs?

Page 36: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

36

© 2011 Columbia University71 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Our Focus – People’s Daily Activities

� 8000+ IBM employees emails

� 2000+ Fw threads

� information about the individuals, eg: performance, dept, job role

� content of the emails

social network

ensemble of trees

individual characteristics

what the information is about

A B CAug 5, 09:30:12 “date request” Aug 5, 09:53:00 “Fw: date request”

Study via IBM SmallBlue dataset

© 2011 Columbia University72 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Research Focus

� How does the individual characteristics and the underlying social network affect the information spreading?

� How does the information spreading depend on the information itself?

� Is there a generic model for the tree structures of all the spreading processes?

Page 37: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

37

© 2011 Columbia University73 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Routing – Time Effect

� Time Factor of Email Forwarding

© 2011 Columbia University74 E6885 Network Science – Lecture 11: Graphical Models and Analysis

A B CAug 5, 09:30:12 “date request” Aug 5, 09:53:00 “Fw: date request”

more

lik

ely

Routing – Network Factor

Page 38: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

38

© 2011 Columbia University75 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Routing – Content Effect

� Information is more likely to be passed on by spreaders if the content is dissimilar to spreaders’ expertise.

� Non-Experts � Experts

© 2011 Columbia University76 E6885 Network Science – Lecture 11: Graphical Models and Analysis

A B CAug 5, 09:30:12 “date request” Aug 5, 09:53:00 “Fw: date request”

social bottleneck:information experiences delay in inter-dept flows

Routing – Departmental Context

Page 39: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

39

© 2011 Columbia University77 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Routing – Organization Effect

� How do organization hierarchy and boundaries affect forwarding behavior?

© 2011 Columbia University78 E6885 Network Science – Lecture 11: Graphical Models and Analysis

A B CAug 5, 09:30:12 “date request” Aug 5, 09:53:00 “Fw: date request”

Individual difference

revenue generated

faster

Routing – High v.s. Low Performers

Page 40: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

40

© 2011 Columbia University79 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Research Focus

� How does the individual characteristics and the underlying social network affect the information spreading?

� How does the information spreading depend on the information itself?

� Is there a generic model for the tree structures of all the spreading processes?

© 2011 Columbia University80 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Observations

• Ultra-shallow trees: Almost 95% of trees are of depth 2, and trees with more than 4 hops are absent.

� Stage dependency: The branching factor (number of children each node has) depends on the distance from the root.

A

B C

D E F

Size, width, and depth of trees

Page 41: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

41

© 2011 Columbia University81 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Networks are heterogeneous

A

B C

D E F

depth

d=0

d=1

d=2

A

B C

D E F

branching factor

A

B C

D E F

width

width=3

A

B C

D E F

size

size=6

© 2011 Columbia University82 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Tree Parameter Distributions

Page 42: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

42

© 2011 Columbia University83 E6885 Network Science – Lecture 11: Graphical Models and Analysis

© 2011 Columbia University84 E6885 Network Science – Lecture 11: Graphical Models and Analysis

A

B C

D E F

In our data

size

wid

th

size of the tree is determined by the

widthEfficient

Page 43: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

43

© 2011 Columbia University85 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Mystery & Observation 1: ultra shallow

Ultra Shallow

© 2011 Columbia University86 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Branching Factor Decay

branching factor k’

Mystery & Observation 2: the branching factors decay as depth increases

Page 44: E6885 Network Science Lecture 11: Graphical … Network Science Lecture 11: Graphical Models and Analysis E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept.

44

© 2011 Columbia University87 E6885 Network Science – Lecture 11: Graphical Models and Analysis

Publication

� We find that the social and organizational context significantly impacts to whom and how fast people forward information.

� Yet the structures within spreading processes can be well captured by a simple stochastic branching model, indicating surprising independence of context.

D. Wang, Z. Wen, H. Tong, C.-Y. Lin, C. Song, and A.L. Barabasi, Information Spreading in Context, WWW 2011, Hyderband, India, March 2011