E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021....

50
© 2021 CY Lin, Columbia University E6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analysis 1 E6895 Advanced Big Data Analytics Lecture 5: Massive Data Processing Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science February 12th, 2021

Transcript of E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021....

Page 1: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analysis1

E6895 Advanced Big Data Analytics Lecture 5:

Massive Data Processing

Ching-Yung Lin, Ph.D.

Adjunct Professor, Dept. of Electrical Engineering and Computer Science

February 12th, 2021

Page 2: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics2

Massive Stream Analysis Challenges

Page 3: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics3

Example IP Packet Stream Instantiation

ip http

ntp

udp

tcp ftp

rtp

rtsp

video

audio

Inputs Dataflow Graph

By IBM Dense Information Gliding Team

Page 4: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics4

Semantic MM Filtering

200-500MB/s ~100MB/sper PE rates

10 MB/s

Inputs Dataflow Graph

ip http

ntp

udp

tcp ftp

rtp

rtsp

sessvideo

sessaudio Interest Routing

keywords id

Packet content analysis

Advanced content analysis

Interest Filtering

Interested MM streams

Page 5: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics5

Resource-Accuracy Trade-Offs

Configurable Parameters of Processing Elements to maximize relevant information: Y’’(X | q, R) > Y’(X | q, R),

with resource constraint. Required resource-efficient algorithms for:

Classification, routing and filtering of signal-oriented data: (audio, video and, possibly, sensor data)

X

R

Y(X|q)Y’’(X|q,R)X’

▪ Input data X – Queries q – Resource R – Y(X | q): Relevant information – Y’(X | q, R) ` Y(X | q): Achievable subset given R

Page 6: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics6

Example: Distributed Video Signal Understanding (Lin et al.)

Face

Outdoors

Indoors

PE1: 9.2.63.66: 1220

PE2: 9.2.63.67

PE3: 9.2.63.68

Female

Male

Airplane

Chair

Clock

PE4: 9.2.63.66:1235

PE5: 9.2.63.66: 1240

PE6: 9.2.63.66

PE7: 9.2.63.66

PE100: 9.2.63.66

(Server) Concept Detection Processing Elements

CDS Features

(Distributed Smart Sensors) Block diagram of the smart sensors

Meta- data

600 bps

Control Modules

Resource Constraints

User Interests

Display and Information Aggregation

Modules

1.5 Mbps

MPEG- 1/2 GOP

ExtractionEvent

Extraction 2.8 Kbps

Feature Extraction320

Kbps22.4 Kbps

Encod- ing

Sensor 1Sensor 2

Sensor NSensor 3

TV broadcast, VCR, DVD discs, Video File Database, Webcam

Smart Cam

Page 7: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics7

Semantic Concept Filters

E.g.:

Page 8: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics8

Complexity Reduction Introduction

• Objective: Real-time classification of instances using Support Vector Machines (SVMs)

• Computationally efficient and reasonably accurate solutions • Techniques capable of adjusting tradeoff between accuracy and speed based on

available computational resources

Page 9: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics9

SVM formulation

SVM

Page 10: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics10

Decision

Page 11: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics11

Problems

Page 12: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics12

Example

Page 13: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics13

Naïve Approach I – Feature Dimension ReductionA

ccur

acy

-- A

vera

ge P

reci

sion

0

0.2

0.4

0.6

0.8

Complexity -- Feature Dimension Ratio

0 0.25 0.5 0.75 1

Slice Color TextureFeature Dimension

Ratio AP

3 3 3 1 0.7861

3 3 2 0.666666667 0.7861

3 2 3 0.666666667 0.7757

2 3 3 0.666666667 0.5822

3 2 2 0.444444444 0.7757

2 3 2 0.444444444 0.5822

2 2 3 0.444444444 0.5235

3 3 1 0.333333333 0.4685

3 1 3 0.333333333 0.6581

1 3 3 0.333333333 0.1684

2 2 2 0.296296296 0.5235

3 2 1 0.222222222 0.427

3 1 2 0.222222222 0.6581

2 3 1 0.222222222 0.1241

2 1 3 0.222222222 0.3457

1 3 2 0.222222222 0.1684

1 2 3 0.222222222 0.1065

2 2 1 0.148148148 0.0699

2 1 2 0.148148148 0.3457

1 2 2 0.148148148 0.1065

3 1 1 0.111111111 0.3219

1 3 1 0.111111111 0.0314

1 1 3 0.111111111 0.07

2 1 1 0.074074074 0.0318

1 2 1 0.074074074 0.0173

▪ Experimental Results for Weather_News Detector

▪ Model Selection based on the Model Validation Set

▪ E.g., for Feature Dimension Ratio 0.22, (the best selection of features are: 3 slices, 1 color, 2 texture selections), the accuracy is decreased by 17%.

Page 14: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics14

Naïve Approach II – Reduction on the Number of Support

Acc

urac

y - A

vera

ge P

reci

sion

0

0.225

0.45

0.675

0.9

Complexity -- Number of Support Vectors

0 0.25 0.5 0.75 1

▪ Proposed Novel Reduction Methods: – Ranked Weighting – P/N Cost Reduction – Random Selection – Support Vector Clustering and Centralization

▪ Experimental Results on Weather_News Detectors show that complexity can be at 50% for the cost of 14% decrease on accuracy

Page 15: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics15

Weighted Clustering Approach

Page 16: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics16

Cluster center weight (contd.)

Page 17: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics17

Using the weights

Page 18: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics18

Experiments

• Datasets • TREC video datasets (2003 and

2005) • 576 features per instance • > 20000 test instances

overall • MNist handwritten digit dataset

(RBF kernel) • 576 features • 60000 training instances,

10000 test instances

• Performance metrics • Speedup achieved over

evaluation with all support vectors

• Average precision achieved

Page 19: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics19

Results (Mnist 0-4)Av

erag

e Pr

ecis

ion

0.95

0.9625

0.975

0.9875

1

Speedup Ratio

0 75 150 225 300

01234

Page 20: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics20

Results (Mnist 5-9)Av

erag

e Pr

ecis

ion

0.7

0.775

0.85

0.925

1

Speedup Ratio

1 10 100 1000

56789

Page 21: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics21

Results (TREC 2003)

Aver

age

Prec

isio

n

0

0.225

0.45

0.675

0.9

ConceptHuman Outdoors Sport-Event Crowd People-Event

AP_fastAP_original

Spee

dup

1

10

100

1000

10000

Concept

Human Studio-Setting Crowd

Page 22: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics22

Summary of Complexity Reduction

Page 23: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics23

Acceleration of Neural Networks

Page 24: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics24

Neuron Importance Score Propagation (NISP, Yu et al 2018)

Page 25: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics25

Methods for Running CNNs on Mobile Devices

Compression (pruning) of CNN

Speeding up CNN +

Sending CNN jobs to cloud

Page 26: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics26

Thinking Differently

• All existing methods can be viewed as approximations of an overly-redundant CNN, but do we really need such a CNN as the starting point?

× × × × × × CNN Sliming!

Page 27: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics27

Slim CNN

• Slim CNN leads to:

• less storage space

• less memory usage

• less computation

• less power consumption

Page 28: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics28

Feature Selection on CNN

• CNNs can be viewed as a set of "overly-redundant" feature extractors

features

Page 29: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics29

A method for Pruning Redundant Neurons and Kernels of

Apply thermal

ExtractCNNResponses

MeasuretheImportanceof

FeatureExtractors

PruneModel Fine-tuningA pre-trained CNN

Page 30: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics30

A method for Pruning Redundant Neurons and Kernels of Deep Convolutional Neural Networks (NISP)

• Intractable

• Inconsistent

ExtractResponsesofaHigh-level

Layer

MeasuretheImportanceof

FeatureExtractors

Back-propagatetheImportance&PruneModel

Fine-tuningA pre-trained CNN

FC layers

…Inputlayers

Response

Forward Propagation

Important Score Back Propagation and Pruning

……ResponseResponse

tractable

consistent

Page 31: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics31

Fine-tuning the Pruned Model

• Our method outperforms the baselines in three aspects

• Very small accuracy loss at the beginning ==> retains the most important neurons

• Converges much faster than baselines

• For LeNet on MIST, our method only decreases 0.02% top-1 accuracy with a running ratio of 50% as compared to the pre-pruned network.

Page 32: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics32

Fine-tuning the Pruned Model

• The pruned model consists of important feature extractors, but will suffer loss of accuracy due to loss of redundant features

• Good starting point on the learning curve due to feature selection

• Fine-tuning the pruned model with a lower learning rate to recover the performance

Page 33: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Massive Data Analytics33

Stream Analysis using Spark

Page 34: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2018 CY Lin, Columbia UniversityE6893 Big Data Analytics – Lecture 5: Big Data Analytics Algorithms34

Spark ML Classification and Regression

Page 35: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms35

Spark Streaming

Page 36: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms36

Spark Streaming

Page 37: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms37

Spark Streaming

https://www.edureka.co/blog/spark-streaming/

Page 38: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms38

Spark Streaming

Page 39: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms39

Spark Streaming Example

Page 40: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms40

Spark Streaming Example

Page 41: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms41

Discretized Streams

Page 42: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms42

Discretized Streams

Page 43: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms43

Discretized Streams

https://www.edureka.co/blog/spark-streaming/

Page 44: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms44

DStream Transforms

https://www.edureka.co/blog/spark-streaming/

Page 45: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms45

Output DStreams

https://www.edureka.co/blog/spark-streaming/

Page 46: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms46

DStreams Caching

https://www.edureka.co/blog/spark-streaming/

Page 47: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms47

DStreams Example — Twitter Sentiment Analysis

https://www.edureka.co/blog/spark-streaming/

Page 48: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms48

DStreams Example — Twitter Sentiment Analysis

https://www.edureka.co/blog/spark-streaming/

Page 49: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2021 CY Lin, Columbia UniversityE6895 Advanced Big Data Analytics – Lecture 5: Big Data Analytics Algorithms49

DStreams Example — Twitter Sentiment Analysis

https://www.edureka.co/blog/spark-streaming/

All the tweets are categorized into Positive, Neutral and Negative according to the sentiment of the contents of the tweets

Page 50: E6895 Advanced Big Data Analytics Lecture 5: Massive Data …cylin/course/bigdata/EECS... · 2021. 2. 13. · • Objective: Real-time classification of instances using Support Vector

© 2018 CY Lin, Columbia UniversityE6893 Big Data Analytics – Lecture 5: Big Data Analytics Algorithms50

Questions?