Big Data Better Life

Post on 12-Apr-2017

9 views 0 download

Transcript of Big Data Better Life

Big Data. Better Life.

Jiebo LuoDepartment of Computer ScienceUniversity of Rochester

Rochester

2

The most important part of US

The 5 Vs of Big Data

April 10, 2017

3

Why Big Image/Multimodal Data?

• Big Image Data is the Biggest– An estimated 4 trillion photographs in the world

Facebook alone reports 6 billion photo uploads per month. Every minute 72 hours of video are uploaded to YouTube. Cisco estimates that visual data (photos and video) will account for over 85% of total internet traffic.

• Big Image Data is Harder– Visual data is difficult to handle

Unlike text which is clean, segmented, compact, one dimensional and indexable, visual content is noisy, unsegmented, high entropy and often multi-dimensional.

– The solution may be -- add more and more dataOnce we put a lot of data in the system, even basic distance metrics (applied on patches) start making a lot of senseContextual non-image data can help -> multimodal data analytics

• Big Image/Multimodal Data Analytics Can Be Rewarding!

Big Data

Machine Learning

Computer Vision Data Mining

Surveillance video analytics Surgery video analysis

3D scene modelAugmented photographyImage geolocation

Visual recognition using weakly labeled big image data(people, object, action, event, activity)

Social media summarizationMedia-driven recommendation

Cultural influence on social mediaCrowd-sourced learning

Data analytics for healthcare

Nowcasting and forecasting

Multimodal sentiment/affect analysis

Deep user profiling & demographicsIndividual or group behaviors

Wisdom of social

multimedia

Non-contact sensing

Make Computers See Let Data Speak

Medical image analysis

Ancient Medicine: Look, Listen, Question, Feel

Sensing from a Distance

[John is holding a gun to his head]Terminator: You cannot self‐terminate.John Connor: No, you can't. I can do anything I want. I'm a human being, not some god‐damn robot.Terminator: [correcting him] Cybernetic organism.John Connor: Whatever! Either we go, and save her Dad, or so much for the Great John Conner. Because your future, my destiny, I want no part in it, I never did.Terminator: Based on your pupil dilation, skin temperature, and motor functions, I calculate an 83% probability that you will not pull the trigger.

Tacking Mental Health

• Motivation– Mental health is a significant problem on the rise with reports of anxiety,

stress, depression, suicide, and violence – Mental illness has been and remains a major cause of disability,

dysfunction, and even violence and crime

• Challenges– Traditional methods of monitoring mental health are expensive, intrusive,

and often geared toward serious mental disorders – These methods do not scale to a large population of varying

demographics, and are not particularly designed for those in the early stages of developing mental health problems

• Opportunities– Advances in computer vision and machine learning, coupled with the

widespread use of the Internet and adoption of social media, are opening doors for a new approach to tackling mental health using physically noninvasive, low-cost multimodal sensors already in people’s daily lives

Tackling Mental Health Via Multimodal SensingDawei Zhou, Jiebo Luo, Vincent Silenzio*, Yun Zhou, Jile Hu, Glenn Currier*, Henry Kautz, AAAI‐2015

Innovation

• Extracting fine-grained psycho-behavioral signals that reflect the mental state of the subject from imagery unobtrusively captured by the webcams built in most mobile devices (laptops, tablets, and smartphones). We develop robust computer vision algorithms to monitor real-time psycho-behavioral signals including the heart rate, eye blink rate, pupil variations, head movements, and facial expressions of the users.

• Analyzing effects from personal social media stream data, which may reveal the mood and sentiment of its users. We measure the mood and emotion of the subject from the social media posted by the subject as a prelude to assessing the effects of social contacts and context within such media.

• Establishing the connection between mental health and multimodal signals extracted unobtrusively from social media and webcams using machine learning methods.

Multimodal (Weak) Signals

12

0 10 20 300

2

4

6

8

10

12

Time (min)

Hea

d M

ovem

ent R

ate

PositiveNeutralNegative

0 10 20 30 40 50 60 70 80-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (min)

Pup

il D

iam

eter

Positive Neutral Negative

0 5000 10000 15000 20000 25000 30000 35000 40000

Mouse Wheel

Mouse Moving Distance

Mouse Click

Key Stroke

PositiveNeutralNegative

Pattern Classification and Mining

13

Experiments

• Experiment I– 27 participants (16 females and 11

males), including undergraduate students, PhD students, and faculties, with different backgrounds in terms of education, income, and disciplines. The age of the participants ranges from 19 to 33, consistent with the age of the primary users of social media.

• Experiment II – 5 depression patients (2

severe/suicidal and 3 moderate) and a control group of five normal users. FaceTime with doctors.

15

Table 3. Leave-One-Subject-Out Test for Experiment 1. TP FP Prec. Rec. F-1 AUC

Negative 0.89 0.08 0.82 0.89 0.84 0.95

Neutral 0.56 0.13 0.67 0.56 0.59 0.79

Positive 0.78 0.17 0.76 0.78 0.75 0.91

Table 4. Leave-One-Subject-Out Test for Experiment 2. Patients vs.

Control in

positive

mood

Patients vs.

Control in

negative mood

Patients vs.

Control in

neutral

mood

precision 0.814 0.817 0.813

recall 0.674 0.738 0.717

Deployment

• Physically non-invasive– Detecting emotional information

using both online social media and passive sensors

– No specialized wired or wireless invasive sensors

– Can potentially enhance the effectiveness and quality of new services delivered online or via mobile devices in current depression patient care.

– Can incorporate other sensors

• Mobile app– Self-awareness– Self-management– Informed intervention

16

Moving beyond

• Other domains– Screening police, pilot, …

• Difficult conversations– Discussing suicide ideation

and attempt in a clinical or home environment

– Physician training– Other interview scenarios

• Monitoring interactions– Body gestures– Timing of interactions– Self-awareness– Mutual-awareness

17

Deep Learning for Image Sentiment Analysis

Convolutional Neural Network for Image Sentiment Analysis• Domain‐transfer  Learning;• Boosted Learning using Noisy Labels

Users who like to post many image tweets, they aremore likely to have positive sentiments.

Main Contributors: Quanzeng You, Hailin Jin*, Jianchao Yang*, Jianbo Yuan and  Jiebo Luo, AAAI 2015

0.5 Million Weakly labelled Images

Progressive CNN

Progressive CNN

Experiments

Examples of Top Ranked Images

• Positive examples and negative examples– left to right: PCNN, CNN, Sentribute, Sentibank, GCH, LCH, GCH+BoW, 

LCH+BoW

Joint Visual-Textual Sentiment Analysis• Cross‐modality Consistent Regression

Building a Large Scale Dataset for Image Emotion Recognition

• We started from 3+ million weakly labeled images of different emotions and ended up with an AMT manually labeled data set that is 30 times as large as the current largest publicly available visual emotion data set. 

• We also performed extensive benchmarking analyses on this large data set using the state of the art methods including CNNs, and established a nontrivial baseline for further research by the community

Main Contributors: Quanzeng You, Hailin Jin*, Jianchao Yang*, Jianbo Yuan and  Jiebo Luo, AAAI 2016

Fine‐Grained User Profiling from Multiple Social Multimedia PlatformsMain Contributors: Quanzeng You, Sumit Bhatia*, Tong Sun* and Jiebo Luo

User Expression & Behavior

(demographics & interests)

26

User Demographics & Interests

Computerized Identification of AutismMain Contributors: Tristram Smith, Jiebo Luo

ASD (Autism Spectral Disorder) is now estimated tooccur in approximately 1 in 68 individuals

Currently, no laboratory test for ASD exists, and theprocess of diagnosing the disorder is highly complexand labor‐intensive, requiring extensive expertise

Few centers offer ASD diagnostic evaluations, andthese centers have lengthy waiting lists, ranging from 2‐12 months for an initial appointment.

Waiting is not only stressful for children with ASD andtheir families, but it delays their access to earlyintervention services, which have been shown toimprove outcomes dramatically in many

Develop a computerized system of processing naturallanguage to identify ASD from de‐identified, semi‐structured patient records.

Items of a standard format or templatea. Parent intake questionnaire 

b. Teacher questionnaire

c. Child Behavior Checklist

d. Children’s Sleep Habits Questionnaire

e. Autism Diagnostic Observation Schedule

Unstructured itemsa. Records provided by primary care providers and early intervention or preschool service providers, scanned into the electronic medical record

b. Phone intake by social worker (unstructured text entered directly into the medical record)

c. Clinician report (unstructured text entered directly into the electronic record)

Understanding the Pulse of Our Society• Social interactions and social activities• Public health surveillance• Web sentiment analysis and trend prediction• Cyber terrorism, extremism, and activism• Fads and infectious ideas• Marketing intelligence analytics • Traffic and human mobility patterns• Human and environment• Social unrest, protest and riot

Social Multimedia‐based Prediction of Elections

Prediction for the swing states  in 2012 US Presidential Election

Social images can act like a prism to reveal split public opinions

Main Contributors: Quanzeng You, Liangliang Cao*, Junhuan Zhu, John R. Smith* and Jiebo Luo, IEEE Trans. Multimedia, 2015

Competitive Vector Auto Regression

Textual and Visual Sentiment

Negative Campaign

Fine‐Grained Analysis of the 2016 Election

America Tweets China: Analysis of State and Individual Characteristics Regarding Attitudes towards China

Main Contributors: Yu Wang and Jiebo Luo, IEEE Big Data Conference, 2015

Correlation coefficients of textual sentiment and visual sentiment

News Media versus Social Media

Users who like to post many image tweets, they aremore likely to have positive sentiments.

Main Contributors: Quanzeng You and Jiebo Luo

Examples of Image Tweets

Home Location from Tweets & Urban Computing

Bad headache, no school today!

Correlations btw. Health and Other Factors

Towards Lifestyle Understanding: Predicting Home and Vacation Locations from User’s Online Photo Collections

1

2

•1000 Flickr users from the following populous areas throughout continental US– Chicago, Boston, Austin, Columbus, Washington DC,

Denver, Houston, Los Angeles, Salt Lake City, the greater NYC Area, the Bay Area, Phoenix, San Antonio, and Seattle

•423047 geotagged photos in a one‐year time span

Data

Temporal (accuracy 66%): •High check‐in rate (accuracy 58%)•Time prevalence: photos at home can be taken at any time of a day, any day of a week, and any month of a year

•Monthly rate: more photo taken at home during December

Visual (accuracy 64%):•kitchen/living room/bedroom scenes…

30,000 manually labeled real life photos+ CNN Classifier

Home Location Prediction

Visual and the temporal feature provides complementary information to each other

Fused home predictor achieves a high accuracy of 71% with 70.7-meter error distance

Home Prediction Result

35 categories of vacation photos from SUN/Places Database(ocean, basilica, forest, canyon, harbor, desert, …)

For each (user, loc), compute a 35-dimension vector representingthe probability that this location is of category i

Naive Bayes Classifier

Precision = 0.73, Recall = 0.59, f score = 0.65

Deep Network

Vacation locations can be multiple (vacation 1, vacation 2, ….)•Spatial: away from home, say >100 miles

•Temporal: 

1) once or twice a year

2) burst of lots of photos within a few days

3) peak season & off season

•Visual: natural scenes, beach scenes, building scenes, etc. 

Vacation Location Prediction

Spatiotemporal: AUC = 0.781 , precision = recall = 0.468Visual: AUC = 0.787 , precision = recall = 0.507Fused vacation predictor: AUC = 0.854, precision = recall = 0.594

Vacation Location Prediction Results

Using Social Multimedia to Solve Social Problems Main Contributors: Ran Pang, Jiebo Luo, and Henry Kautz

Drinking Levels among YouthThe CDC 2011 Youth Risk Behavior Survey found that among high school students, during the past 30 days:

• 39% drank some amount of alcohol.• 22% binge drank.• 8% drove after drinking alcohol.• 24% rode with a driver who had been 

drinking alcohol.

Consequences of Underage Drinking• School problems, such as higher absence and poor or failing grades.• Social problems, such as fighting, physical and sexual assault.• Legal problems, such as arrest for driving or physically hurting 

someone while drunk.• Physical problems, such as hangovers or illnesses.• Unwanted, unplanned, and unprotected sexual activity.• Higher risk for suicide and homicide.• Alcohol‐related car crashes and other unintentional injuries.• Abuse of other drugs.

Using Social Multimedia to Solve Social Problems Main Contributors: Ran Pang, Jiebo Luo, and Henry Kautz

Social Multimedia

Visual Data

Textual Data

Computer Vision

NLP

User Demographics

User Activities

Behavior Patterns

Time Patten of Underage Alcohol UseMain Contributors: Ran Pang, Jiebo Luo, and Henry Kautz

NYC

ALL

Brand Influence in Underage Alcohol UseMain Contributors: Ran Pang, Jiebo Luo, and Henry Kautz

Vodka 1 Vodka 2 Champagne Beer 1 Beer 2

Young Male 6.43% 6.79% 6.10% 13.21% 10.95%

Adult Male 29.69% 42.16% 24.27% 52.41% 51.91%

Young Female 19.76% 15.12% 19.49% 11.58% 12.17%

Adult Female 44.12% 35.93% 50.14% 22.79% 24.97%

EXPERIMENTS (3): Youth Exposure to Alcohol Media

• Mining deeper level patterns in terms of factors such as family income, rural vs. urban, coastal vs.heartland regions, as well as social influence by peers in the social networks

• Combining the proposed approach with surveys, which can be used to verify the findings fromsocial media data mining.

• Applying this methodology to other social problems that involve youth, such as tobacco, drugs, teenpregnancy, unsafe sex, unsafe driving, obesity, stress, and depression.

ONGOING DIRECTIONS

Drug Image Classification

• Fine‐tuned CNN• Starting with the pre‐trained VGG Net • Fine‐tuned CNN features + SVM 

• Using noisy data downloaded from Google

• Fine‐tuned data statistics• Instagram photos

label pills bottle weed total Non‐drug

# 2421 1233 675 4329 12253

Main Contributors: Xitong Yang, Meredith McCarron, Lacey Kelly,  Jiebo Luo

● Cafe: WineBar

● Neighborhood: Washington Heights

● Downtown Manhattan: 5th ave

● Barber Shop: El Vaye Barber & Beauty Salon

● Apartment and Neighbourhood

● Private Club: Brentwood Country Club

● Restaurant(alot): House of Blues Los Angeles

● Los Angeles International Airport (LAX)

Drug Use Patterns from InstagramMain Contributors: Yiheng Zhou, Numair Sani, Jiebo Luo

Understanding Pets and HappinessMain Contributors: Yucheng Wu, Ran Pang, Jiebo Luo

Social support is critical for psychological and physical well‐being, reflecting the centrality of belongingness in our lives. Human interactions often provide people with considerable social support, but can pets also fulfill one's social needs? 

Studies found in a community sample that pet owners fared better on several well‐being (e.g., greater self‐esteem, more exercise) and individual‐difference (e.g., greater conscientiousness, less fearful attachment) measures. 

We intend to verify such findings at a larger scale and potentially at a fine granularity, through social multimedia. We will set up an experimental group and a control group.

The Totem Pole of HappinessMain Contributors: Xuefeng Peng, Kevin Chi, Jiebo Luo

From Catwalk to Main StreetMain Contributors: Kezhen Chen, Kuan‐Ting Chen*,  Peizhong Cong, Winston Hsu, Jiebo Luo (MM ‘15 Grand Challenge)

Motivations• In modern times, a growing number of people pay more attention to fashion and the mass has the penchant to emulate what large city residents and celebrities wear

• Investigating fashion trends is of great interest to the industry and academia because of the potential for boosting many emerging applications, such as clothing recommendation, advertising by clothing brand association, etc. 

Approach1. Constructing a large dataset from the New York 

Fashion Shows and New York street chic in order to understand the likely clothing fashion trends in New York

2. Utilizing a learning‐based approach to discover fashion attributes as the representative characteristics of fashion trends, and

3. Comparing the analysis results from the New York Fashion Shows and street‐chic images to verify whether the fashion shows have actual influence on the public in New York City.

From Catwalk to Main StreetMain Contributors: Kezhen Chen, Kuan‐Ting Chen*,  Peizhong Cong, Jiebo Luo

64

Two Most Important Social Signals (IMO)

• User• Sentiment

Events

Project Janus

Visual Intelligence & Social Multimedia Analytics

Interest, Personality, Sentiment, Behavior, Life, Work, Health, Happiness.

66

A Selfie is Worth a Thousand Words: Mining Deep Personal Patterns behind User Selfie Behaviors

WWW 2017

67

When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing Features

WWW 2017

Project Janus

Visual Intelligence & Social Multimedia Analytics

Interest, Personality, Sentiment, Behavior, Life, Work, Health, Happiness.

Big cities vs. Small cities

1. Different mobility patterns?

2. Exciting vs. Routine?

3. Stressful vs. Relaxed?

4. Fast vs. Slow?

Geo-tagged social media makes it possible to understand various life styles in different cities at scale

Data‐Driven Lifestyle Patterns

Data‐Driven Lifestyle Patterns

Human Mobility and Human‐Environment InteractionMain Contributors: Yuncheng Li, Jifei Huang, and  Jiebo Luo (ICIMCS 2015)

Geotagged tweets

Morning and evening rush hours                                    Haze                        Dehazed

Experiments: Metrics

• Spearman correlation coefficients– rank correlation

• Haze level: – ordinal data– sign is irrelevant

• The metric: – absolute spearman coefficients

When Do Luxury Cars Hit the Road?

Promoting STEM Education via Social MediaMain Contributors: Lee Murphy, Kelly He, Jiebo Luo

The data show both that many sectors of the U.S. economy are facing a shortage of STEM talent and that foreign‐born STEM workers currently in the workforce are complementing, not displacing, their U.S. counterparts.

Promoting STEM Education via Social MediaMain Contributors: Lee Murphy, Kelly He, Jiebo Luo

Building Powerful Connections: Identify college students 

among Twitter users

Identify those who are “STEM champions”

Identify diverse STEM role models from LinkedIn

Match a given STEM potential to STEM champions and/or Role Models by proximity (minimum degrees of separation) 

• Analyzing grocery shopping baskets– Wegmans Shopper’s Club member data– Mining rich data from shopping baskets

• Nutrition break-down and tracking• Recommendation: organic foods; healthy alternatives• Customer profiling; lifestyle changes• Health management (linked to social media, exercise apps)

• Analyzing family food images – Undocumented immigrant workers with illiteracy– Daily capture of food intake images– Nutrition analysis and healthy eating suggestions

Using Big Consumer Data to Promote Healthy LifestylesPIs: Jiebo Luo (PhD), Henry Kautz (PhD), Karen Stein (PhD/RN)

Big Data. Robust Intelligence. Better Lives.

RobustIntelligence

Big Data

BetterLife

Better informed Better served Better lived

Big dataBig image data

Better knowledgeBetter decision

衣,食,住,行用,学,作,友情、乐,健,育