Big Data Better Life

71
Big Data. Better Life. Jiebo Luo Department of Computer Science University of Rochester

Transcript of Big Data Better Life

Page 1: Big Data Better Life

Big Data. Better Life.

Jiebo LuoDepartment of Computer ScienceUniversity of Rochester

Page 2: Big Data Better Life

Rochester

2

The most important part of US

Page 3: Big Data Better Life

The 5 Vs of Big Data

April 10, 2017

3

Page 4: Big Data Better Life

Why Big Image/Multimodal Data?

• Big Image Data is the Biggest– An estimated 4 trillion photographs in the world

Facebook alone reports 6 billion photo uploads per month. Every minute 72 hours of video are uploaded to YouTube. Cisco estimates that visual data (photos and video) will account for over 85% of total internet traffic.

• Big Image Data is Harder– Visual data is difficult to handle

Unlike text which is clean, segmented, compact, one dimensional and indexable, visual content is noisy, unsegmented, high entropy and often multi-dimensional.

– The solution may be -- add more and more dataOnce we put a lot of data in the system, even basic distance metrics (applied on patches) start making a lot of senseContextual non-image data can help -> multimodal data analytics

• Big Image/Multimodal Data Analytics Can Be Rewarding!

Page 5: Big Data Better Life

Big Data

Machine Learning

Computer Vision Data Mining

Surveillance video analytics Surgery video analysis

3D scene modelAugmented photographyImage geolocation

Visual recognition using weakly labeled big image data(people, object, action, event, activity)

Social media summarizationMedia-driven recommendation

Cultural influence on social mediaCrowd-sourced learning

Data analytics for healthcare

Nowcasting and forecasting

Multimodal sentiment/affect analysis

Deep user profiling & demographicsIndividual or group behaviors

Wisdom of social

multimedia

Non-contact sensing

Make Computers See Let Data Speak

Medical image analysis

Page 6: Big Data Better Life

Ancient Medicine: Look, Listen, Question, Feel

Page 7: Big Data Better Life

Sensing from a Distance

[John is holding a gun to his head]Terminator: You cannot self‐terminate.John Connor: No, you can't. I can do anything I want. I'm a human being, not some god‐damn robot.Terminator: [correcting him] Cybernetic organism.John Connor: Whatever! Either we go, and save her Dad, or so much for the Great John Conner. Because your future, my destiny, I want no part in it, I never did.Terminator: Based on your pupil dilation, skin temperature, and motor functions, I calculate an 83% probability that you will not pull the trigger.

Page 8: Big Data Better Life

Tacking Mental Health

• Motivation– Mental health is a significant problem on the rise with reports of anxiety,

stress, depression, suicide, and violence – Mental illness has been and remains a major cause of disability,

dysfunction, and even violence and crime

• Challenges– Traditional methods of monitoring mental health are expensive, intrusive,

and often geared toward serious mental disorders – These methods do not scale to a large population of varying

demographics, and are not particularly designed for those in the early stages of developing mental health problems

• Opportunities– Advances in computer vision and machine learning, coupled with the

widespread use of the Internet and adoption of social media, are opening doors for a new approach to tackling mental health using physically noninvasive, low-cost multimodal sensors already in people’s daily lives

Page 9: Big Data Better Life

Tackling Mental Health Via Multimodal SensingDawei Zhou, Jiebo Luo, Vincent Silenzio*, Yun Zhou, Jile Hu, Glenn Currier*, Henry Kautz, AAAI‐2015

Page 10: Big Data Better Life

Innovation

• Extracting fine-grained psycho-behavioral signals that reflect the mental state of the subject from imagery unobtrusively captured by the webcams built in most mobile devices (laptops, tablets, and smartphones). We develop robust computer vision algorithms to monitor real-time psycho-behavioral signals including the heart rate, eye blink rate, pupil variations, head movements, and facial expressions of the users.

• Analyzing effects from personal social media stream data, which may reveal the mood and sentiment of its users. We measure the mood and emotion of the subject from the social media posted by the subject as a prelude to assessing the effects of social contacts and context within such media.

• Establishing the connection between mental health and multimodal signals extracted unobtrusively from social media and webcams using machine learning methods.

Page 11: Big Data Better Life

Multimodal (Weak) Signals

12

0 10 20 300

2

4

6

8

10

12

Time (min)

Hea

d M

ovem

ent R

ate

PositiveNeutralNegative

0 10 20 30 40 50 60 70 80-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (min)

Pup

il D

iam

eter

Positive Neutral Negative

0 5000 10000 15000 20000 25000 30000 35000 40000

Mouse Wheel

Mouse Moving Distance

Mouse Click

Key Stroke

PositiveNeutralNegative

Page 12: Big Data Better Life

Pattern Classification and Mining

13

Page 13: Big Data Better Life

Experiments

• Experiment I– 27 participants (16 females and 11

males), including undergraduate students, PhD students, and faculties, with different backgrounds in terms of education, income, and disciplines. The age of the participants ranges from 19 to 33, consistent with the age of the primary users of social media.

• Experiment II – 5 depression patients (2

severe/suicidal and 3 moderate) and a control group of five normal users. FaceTime with doctors.

15

Table 3. Leave-One-Subject-Out Test for Experiment 1. TP FP Prec. Rec. F-1 AUC

Negative 0.89 0.08 0.82 0.89 0.84 0.95

Neutral 0.56 0.13 0.67 0.56 0.59 0.79

Positive 0.78 0.17 0.76 0.78 0.75 0.91

Table 4. Leave-One-Subject-Out Test for Experiment 2. Patients vs.

Control in

positive

mood

Patients vs.

Control in

negative mood

Patients vs.

Control in

neutral

mood

precision 0.814 0.817 0.813

recall 0.674 0.738 0.717

Page 14: Big Data Better Life

Deployment

• Physically non-invasive– Detecting emotional information

using both online social media and passive sensors

– No specialized wired or wireless invasive sensors

– Can potentially enhance the effectiveness and quality of new services delivered online or via mobile devices in current depression patient care.

– Can incorporate other sensors

• Mobile app– Self-awareness– Self-management– Informed intervention

16

Page 15: Big Data Better Life

Moving beyond

• Other domains– Screening police, pilot, …

• Difficult conversations– Discussing suicide ideation

and attempt in a clinical or home environment

– Physician training– Other interview scenarios

• Monitoring interactions– Body gestures– Timing of interactions– Self-awareness– Mutual-awareness

17

Page 16: Big Data Better Life

Deep Learning for Image Sentiment Analysis

Convolutional Neural Network for Image Sentiment Analysis• Domain‐transfer  Learning;• Boosted Learning using Noisy Labels

Users who like to post many image tweets, they aremore likely to have positive sentiments.

Main Contributors: Quanzeng You, Hailin Jin*, Jianchao Yang*, Jianbo Yuan and  Jiebo Luo, AAAI 2015

0.5 Million Weakly labelled Images

Page 17: Big Data Better Life

Progressive CNN

Page 18: Big Data Better Life

Progressive CNN

Page 19: Big Data Better Life

Experiments

Page 20: Big Data Better Life

Examples of Top Ranked Images

• Positive examples and negative examples– left to right: PCNN, CNN, Sentribute, Sentibank, GCH, LCH, GCH+BoW, 

LCH+BoW

Page 21: Big Data Better Life

Joint Visual-Textual Sentiment Analysis• Cross‐modality Consistent Regression

Page 22: Big Data Better Life

Building a Large Scale Dataset for Image Emotion Recognition

• We started from 3+ million weakly labeled images of different emotions and ended up with an AMT manually labeled data set that is 30 times as large as the current largest publicly available visual emotion data set. 

• We also performed extensive benchmarking analyses on this large data set using the state of the art methods including CNNs, and established a nontrivial baseline for further research by the community

Main Contributors: Quanzeng You, Hailin Jin*, Jianchao Yang*, Jianbo Yuan and  Jiebo Luo, AAAI 2016

Page 23: Big Data Better Life

Fine‐Grained User Profiling from Multiple Social Multimedia PlatformsMain Contributors: Quanzeng You, Sumit Bhatia*, Tong Sun* and Jiebo Luo

User Expression & Behavior

(demographics & interests)

Page 24: Big Data Better Life

26

User Demographics & Interests

Page 25: Big Data Better Life

Computerized Identification of AutismMain Contributors: Tristram Smith, Jiebo Luo

ASD (Autism Spectral Disorder) is now estimated tooccur in approximately 1 in 68 individuals

Currently, no laboratory test for ASD exists, and theprocess of diagnosing the disorder is highly complexand labor‐intensive, requiring extensive expertise

Few centers offer ASD diagnostic evaluations, andthese centers have lengthy waiting lists, ranging from 2‐12 months for an initial appointment.

Waiting is not only stressful for children with ASD andtheir families, but it delays their access to earlyintervention services, which have been shown toimprove outcomes dramatically in many

Develop a computerized system of processing naturallanguage to identify ASD from de‐identified, semi‐structured patient records.

Items of a standard format or templatea. Parent intake questionnaire 

b. Teacher questionnaire

c. Child Behavior Checklist

d. Children’s Sleep Habits Questionnaire

e. Autism Diagnostic Observation Schedule

Unstructured itemsa. Records provided by primary care providers and early intervention or preschool service providers, scanned into the electronic medical record

b. Phone intake by social worker (unstructured text entered directly into the medical record)

c. Clinician report (unstructured text entered directly into the electronic record)

Page 26: Big Data Better Life

Understanding the Pulse of Our Society• Social interactions and social activities• Public health surveillance• Web sentiment analysis and trend prediction• Cyber terrorism, extremism, and activism• Fads and infectious ideas• Marketing intelligence analytics • Traffic and human mobility patterns• Human and environment• Social unrest, protest and riot

Page 27: Big Data Better Life

Social Multimedia‐based Prediction of Elections

Prediction for the swing states  in 2012 US Presidential Election

Social images can act like a prism to reveal split public opinions

Main Contributors: Quanzeng You, Liangliang Cao*, Junhuan Zhu, John R. Smith* and Jiebo Luo, IEEE Trans. Multimedia, 2015

Competitive Vector Auto Regression

Textual and Visual Sentiment

Negative Campaign

Page 28: Big Data Better Life

Fine‐Grained Analysis of the 2016 Election

Page 29: Big Data Better Life

America Tweets China: Analysis of State and Individual Characteristics Regarding Attitudes towards China

Main Contributors: Yu Wang and Jiebo Luo, IEEE Big Data Conference, 2015

Page 30: Big Data Better Life

Correlation coefficients of textual sentiment and visual sentiment

News Media versus Social Media

Users who like to post many image tweets, they aremore likely to have positive sentiments.

Main Contributors: Quanzeng You and Jiebo Luo

Examples of Image Tweets

Page 31: Big Data Better Life

Home Location from Tweets & Urban Computing

Bad headache, no school today!

Page 32: Big Data Better Life
Page 33: Big Data Better Life
Page 34: Big Data Better Life
Page 35: Big Data Better Life

Correlations btw. Health and Other Factors

Page 36: Big Data Better Life

Towards Lifestyle Understanding: Predicting Home and Vacation Locations from User’s Online Photo Collections

1

2

Page 37: Big Data Better Life

•1000 Flickr users from the following populous areas throughout continental US– Chicago, Boston, Austin, Columbus, Washington DC,

Denver, Houston, Los Angeles, Salt Lake City, the greater NYC Area, the Bay Area, Phoenix, San Antonio, and Seattle

•423047 geotagged photos in a one‐year time span

Data

Page 38: Big Data Better Life

Temporal (accuracy 66%): •High check‐in rate (accuracy 58%)•Time prevalence: photos at home can be taken at any time of a day, any day of a week, and any month of a year

•Monthly rate: more photo taken at home during December

Visual (accuracy 64%):•kitchen/living room/bedroom scenes…

30,000 manually labeled real life photos+ CNN Classifier

Home Location Prediction

Page 39: Big Data Better Life

Visual and the temporal feature provides complementary information to each other

Fused home predictor achieves a high accuracy of 71% with 70.7-meter error distance

Home Prediction Result

Page 40: Big Data Better Life
Page 41: Big Data Better Life

35 categories of vacation photos from SUN/Places Database(ocean, basilica, forest, canyon, harbor, desert, …)

For each (user, loc), compute a 35-dimension vector representingthe probability that this location is of category i

Naive Bayes Classifier

Precision = 0.73, Recall = 0.59, f score = 0.65

Deep Network

Page 42: Big Data Better Life

Vacation locations can be multiple (vacation 1, vacation 2, ….)•Spatial: away from home, say >100 miles

•Temporal: 

1) once or twice a year

2) burst of lots of photos within a few days

3) peak season & off season

•Visual: natural scenes, beach scenes, building scenes, etc. 

Vacation Location Prediction

Page 43: Big Data Better Life

Spatiotemporal: AUC = 0.781 , precision = recall = 0.468Visual: AUC = 0.787 , precision = recall = 0.507Fused vacation predictor: AUC = 0.854, precision = recall = 0.594

Vacation Location Prediction Results

Page 44: Big Data Better Life

Using Social Multimedia to Solve Social Problems Main Contributors: Ran Pang, Jiebo Luo, and Henry Kautz

Drinking Levels among YouthThe CDC 2011 Youth Risk Behavior Survey found that among high school students, during the past 30 days:

• 39% drank some amount of alcohol.• 22% binge drank.• 8% drove after drinking alcohol.• 24% rode with a driver who had been 

drinking alcohol.

Consequences of Underage Drinking• School problems, such as higher absence and poor or failing grades.• Social problems, such as fighting, physical and sexual assault.• Legal problems, such as arrest for driving or physically hurting 

someone while drunk.• Physical problems, such as hangovers or illnesses.• Unwanted, unplanned, and unprotected sexual activity.• Higher risk for suicide and homicide.• Alcohol‐related car crashes and other unintentional injuries.• Abuse of other drugs.

Page 45: Big Data Better Life

Using Social Multimedia to Solve Social Problems Main Contributors: Ran Pang, Jiebo Luo, and Henry Kautz

Social Multimedia

Visual Data

Textual Data

Computer Vision

NLP

User Demographics

User Activities

Behavior Patterns

Page 46: Big Data Better Life

Time Patten of Underage Alcohol UseMain Contributors: Ran Pang, Jiebo Luo, and Henry Kautz

NYC

ALL

Page 47: Big Data Better Life

Brand Influence in Underage Alcohol UseMain Contributors: Ran Pang, Jiebo Luo, and Henry Kautz

Page 48: Big Data Better Life

Vodka 1 Vodka 2 Champagne Beer 1 Beer 2

Young Male 6.43% 6.79% 6.10% 13.21% 10.95%

Adult Male 29.69% 42.16% 24.27% 52.41% 51.91%

Young Female 19.76% 15.12% 19.49% 11.58% 12.17%

Adult Female 44.12% 35.93% 50.14% 22.79% 24.97%

EXPERIMENTS (3): Youth Exposure to Alcohol Media

Page 49: Big Data Better Life

• Mining deeper level patterns in terms of factors such as family income, rural vs. urban, coastal vs.heartland regions, as well as social influence by peers in the social networks

• Combining the proposed approach with surveys, which can be used to verify the findings fromsocial media data mining.

• Applying this methodology to other social problems that involve youth, such as tobacco, drugs, teenpregnancy, unsafe sex, unsafe driving, obesity, stress, and depression.

ONGOING DIRECTIONS

Page 50: Big Data Better Life

Drug Image Classification

• Fine‐tuned CNN• Starting with the pre‐trained VGG Net • Fine‐tuned CNN features + SVM 

• Using noisy data downloaded from Google

• Fine‐tuned data statistics• Instagram photos

label pills bottle weed total Non‐drug

# 2421 1233 675 4329 12253

Main Contributors: Xitong Yang, Meredith McCarron, Lacey Kelly,  Jiebo Luo

Page 51: Big Data Better Life

● Cafe: WineBar

● Neighborhood: Washington Heights

● Downtown Manhattan: 5th ave

● Barber Shop: El Vaye Barber & Beauty Salon

● Apartment and Neighbourhood

● Private Club: Brentwood Country Club

● Restaurant(alot): House of Blues Los Angeles

● Los Angeles International Airport (LAX)

Page 52: Big Data Better Life

Drug Use Patterns from InstagramMain Contributors: Yiheng Zhou, Numair Sani, Jiebo Luo

Page 53: Big Data Better Life

Understanding Pets and HappinessMain Contributors: Yucheng Wu, Ran Pang, Jiebo Luo

Social support is critical for psychological and physical well‐being, reflecting the centrality of belongingness in our lives. Human interactions often provide people with considerable social support, but can pets also fulfill one's social needs? 

Studies found in a community sample that pet owners fared better on several well‐being (e.g., greater self‐esteem, more exercise) and individual‐difference (e.g., greater conscientiousness, less fearful attachment) measures. 

We intend to verify such findings at a larger scale and potentially at a fine granularity, through social multimedia. We will set up an experimental group and a control group.

Page 54: Big Data Better Life

The Totem Pole of HappinessMain Contributors: Xuefeng Peng, Kevin Chi, Jiebo Luo

Page 55: Big Data Better Life

From Catwalk to Main StreetMain Contributors: Kezhen Chen, Kuan‐Ting Chen*,  Peizhong Cong, Winston Hsu, Jiebo Luo (MM ‘15 Grand Challenge)

Motivations• In modern times, a growing number of people pay more attention to fashion and the mass has the penchant to emulate what large city residents and celebrities wear

• Investigating fashion trends is of great interest to the industry and academia because of the potential for boosting many emerging applications, such as clothing recommendation, advertising by clothing brand association, etc. 

Approach1. Constructing a large dataset from the New York 

Fashion Shows and New York street chic in order to understand the likely clothing fashion trends in New York

2. Utilizing a learning‐based approach to discover fashion attributes as the representative characteristics of fashion trends, and

3. Comparing the analysis results from the New York Fashion Shows and street‐chic images to verify whether the fashion shows have actual influence on the public in New York City.

Page 56: Big Data Better Life

From Catwalk to Main StreetMain Contributors: Kezhen Chen, Kuan‐Ting Chen*,  Peizhong Cong, Jiebo Luo

Page 57: Big Data Better Life

64

Two Most Important Social Signals (IMO)

• User• Sentiment

Events

Page 58: Big Data Better Life

Project Janus

Visual Intelligence & Social Multimedia Analytics

Interest, Personality, Sentiment, Behavior, Life, Work, Health, Happiness.

Page 59: Big Data Better Life

66

A Selfie is Worth a Thousand Words: Mining Deep Personal Patterns behind User Selfie Behaviors

WWW 2017

Page 60: Big Data Better Life

67

When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing Features

WWW 2017

Page 61: Big Data Better Life

Project Janus

Visual Intelligence & Social Multimedia Analytics

Interest, Personality, Sentiment, Behavior, Life, Work, Health, Happiness.

Page 62: Big Data Better Life

Big cities vs. Small cities

1. Different mobility patterns?

2. Exciting vs. Routine?

3. Stressful vs. Relaxed?

4. Fast vs. Slow?

Geo-tagged social media makes it possible to understand various life styles in different cities at scale

Page 63: Big Data Better Life

Data‐Driven Lifestyle Patterns

Page 64: Big Data Better Life

Data‐Driven Lifestyle Patterns

Page 65: Big Data Better Life

Human Mobility and Human‐Environment InteractionMain Contributors: Yuncheng Li, Jifei Huang, and  Jiebo Luo (ICIMCS 2015)

Geotagged tweets

Morning and evening rush hours                                    Haze                        Dehazed

Page 66: Big Data Better Life

Experiments: Metrics

• Spearman correlation coefficients– rank correlation

• Haze level: – ordinal data– sign is irrelevant

• The metric: – absolute spearman coefficients

Page 67: Big Data Better Life

When Do Luxury Cars Hit the Road?

Page 68: Big Data Better Life

Promoting STEM Education via Social MediaMain Contributors: Lee Murphy, Kelly He, Jiebo Luo

The data show both that many sectors of the U.S. economy are facing a shortage of STEM talent and that foreign‐born STEM workers currently in the workforce are complementing, not displacing, their U.S. counterparts.

Page 69: Big Data Better Life

Promoting STEM Education via Social MediaMain Contributors: Lee Murphy, Kelly He, Jiebo Luo

Building Powerful Connections: Identify college students 

among Twitter users

Identify those who are “STEM champions”

Identify diverse STEM role models from LinkedIn

Match a given STEM potential to STEM champions and/or Role Models by proximity (minimum degrees of separation) 

Page 70: Big Data Better Life

• Analyzing grocery shopping baskets– Wegmans Shopper’s Club member data– Mining rich data from shopping baskets

• Nutrition break-down and tracking• Recommendation: organic foods; healthy alternatives• Customer profiling; lifestyle changes• Health management (linked to social media, exercise apps)

• Analyzing family food images – Undocumented immigrant workers with illiteracy– Daily capture of food intake images– Nutrition analysis and healthy eating suggestions

Using Big Consumer Data to Promote Healthy LifestylesPIs: Jiebo Luo (PhD), Henry Kautz (PhD), Karen Stein (PhD/RN)

Page 71: Big Data Better Life

Big Data. Robust Intelligence. Better Lives.

RobustIntelligence

Big Data

BetterLife

Better informed Better served Better lived

Big dataBig image data

Better knowledgeBetter decision

衣,食,住,行用,学,作,友情、乐,健,育