Big Data Better Life
-
Upload
goergen-institute-for-data-science -
Category
Data & Analytics
-
view
9 -
download
0
Transcript of Big Data Better Life
Big Data. Better Life.
Jiebo LuoDepartment of Computer ScienceUniversity of Rochester
Rochester
2
The most important part of US
The 5 Vs of Big Data
April 10, 2017
3
Why Big Image/Multimodal Data?
• Big Image Data is the Biggest– An estimated 4 trillion photographs in the world
Facebook alone reports 6 billion photo uploads per month. Every minute 72 hours of video are uploaded to YouTube. Cisco estimates that visual data (photos and video) will account for over 85% of total internet traffic.
• Big Image Data is Harder– Visual data is difficult to handle
Unlike text which is clean, segmented, compact, one dimensional and indexable, visual content is noisy, unsegmented, high entropy and often multi-dimensional.
– The solution may be -- add more and more dataOnce we put a lot of data in the system, even basic distance metrics (applied on patches) start making a lot of senseContextual non-image data can help -> multimodal data analytics
• Big Image/Multimodal Data Analytics Can Be Rewarding!
Big Data
Machine Learning
Computer Vision Data Mining
Surveillance video analytics Surgery video analysis
3D scene modelAugmented photographyImage geolocation
Visual recognition using weakly labeled big image data(people, object, action, event, activity)
Social media summarizationMedia-driven recommendation
Cultural influence on social mediaCrowd-sourced learning
Data analytics for healthcare
Nowcasting and forecasting
Multimodal sentiment/affect analysis
Deep user profiling & demographicsIndividual or group behaviors
Wisdom of social
multimedia
Non-contact sensing
Make Computers See Let Data Speak
Medical image analysis
Ancient Medicine: Look, Listen, Question, Feel
Sensing from a Distance
[John is holding a gun to his head]Terminator: You cannot self‐terminate.John Connor: No, you can't. I can do anything I want. I'm a human being, not some god‐damn robot.Terminator: [correcting him] Cybernetic organism.John Connor: Whatever! Either we go, and save her Dad, or so much for the Great John Conner. Because your future, my destiny, I want no part in it, I never did.Terminator: Based on your pupil dilation, skin temperature, and motor functions, I calculate an 83% probability that you will not pull the trigger.
Tacking Mental Health
• Motivation– Mental health is a significant problem on the rise with reports of anxiety,
stress, depression, suicide, and violence – Mental illness has been and remains a major cause of disability,
dysfunction, and even violence and crime
• Challenges– Traditional methods of monitoring mental health are expensive, intrusive,
and often geared toward serious mental disorders – These methods do not scale to a large population of varying
demographics, and are not particularly designed for those in the early stages of developing mental health problems
• Opportunities– Advances in computer vision and machine learning, coupled with the
widespread use of the Internet and adoption of social media, are opening doors for a new approach to tackling mental health using physically noninvasive, low-cost multimodal sensors already in people’s daily lives
Tackling Mental Health Via Multimodal SensingDawei Zhou, Jiebo Luo, Vincent Silenzio*, Yun Zhou, Jile Hu, Glenn Currier*, Henry Kautz, AAAI‐2015
Innovation
• Extracting fine-grained psycho-behavioral signals that reflect the mental state of the subject from imagery unobtrusively captured by the webcams built in most mobile devices (laptops, tablets, and smartphones). We develop robust computer vision algorithms to monitor real-time psycho-behavioral signals including the heart rate, eye blink rate, pupil variations, head movements, and facial expressions of the users.
• Analyzing effects from personal social media stream data, which may reveal the mood and sentiment of its users. We measure the mood and emotion of the subject from the social media posted by the subject as a prelude to assessing the effects of social contacts and context within such media.
• Establishing the connection between mental health and multimodal signals extracted unobtrusively from social media and webcams using machine learning methods.
Multimodal (Weak) Signals
12
0 10 20 300
2
4
6
8
10
12
Time (min)
Hea
d M
ovem
ent R
ate
PositiveNeutralNegative
0 10 20 30 40 50 60 70 80-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (min)
Pup
il D
iam
eter
Positive Neutral Negative
0 5000 10000 15000 20000 25000 30000 35000 40000
Mouse Wheel
Mouse Moving Distance
Mouse Click
Key Stroke
PositiveNeutralNegative
Pattern Classification and Mining
13
Experiments
• Experiment I– 27 participants (16 females and 11
males), including undergraduate students, PhD students, and faculties, with different backgrounds in terms of education, income, and disciplines. The age of the participants ranges from 19 to 33, consistent with the age of the primary users of social media.
• Experiment II – 5 depression patients (2
severe/suicidal and 3 moderate) and a control group of five normal users. FaceTime with doctors.
15
Table 3. Leave-One-Subject-Out Test for Experiment 1. TP FP Prec. Rec. F-1 AUC
Negative 0.89 0.08 0.82 0.89 0.84 0.95
Neutral 0.56 0.13 0.67 0.56 0.59 0.79
Positive 0.78 0.17 0.76 0.78 0.75 0.91
Table 4. Leave-One-Subject-Out Test for Experiment 2. Patients vs.
Control in
positive
mood
Patients vs.
Control in
negative mood
Patients vs.
Control in
neutral
mood
precision 0.814 0.817 0.813
recall 0.674 0.738 0.717
Deployment
• Physically non-invasive– Detecting emotional information
using both online social media and passive sensors
– No specialized wired or wireless invasive sensors
– Can potentially enhance the effectiveness and quality of new services delivered online or via mobile devices in current depression patient care.
– Can incorporate other sensors
• Mobile app– Self-awareness– Self-management– Informed intervention
16
Moving beyond
• Other domains– Screening police, pilot, …
• Difficult conversations– Discussing suicide ideation
and attempt in a clinical or home environment
– Physician training– Other interview scenarios
• Monitoring interactions– Body gestures– Timing of interactions– Self-awareness– Mutual-awareness
17
Deep Learning for Image Sentiment Analysis
Convolutional Neural Network for Image Sentiment Analysis• Domain‐transfer Learning;• Boosted Learning using Noisy Labels
Users who like to post many image tweets, they aremore likely to have positive sentiments.
Main Contributors: Quanzeng You, Hailin Jin*, Jianchao Yang*, Jianbo Yuan and Jiebo Luo, AAAI 2015
0.5 Million Weakly labelled Images
Progressive CNN
Progressive CNN
Experiments
Examples of Top Ranked Images
• Positive examples and negative examples– left to right: PCNN, CNN, Sentribute, Sentibank, GCH, LCH, GCH+BoW,
LCH+BoW
Joint Visual-Textual Sentiment Analysis• Cross‐modality Consistent Regression
Building a Large Scale Dataset for Image Emotion Recognition
• We started from 3+ million weakly labeled images of different emotions and ended up with an AMT manually labeled data set that is 30 times as large as the current largest publicly available visual emotion data set.
• We also performed extensive benchmarking analyses on this large data set using the state of the art methods including CNNs, and established a nontrivial baseline for further research by the community
Main Contributors: Quanzeng You, Hailin Jin*, Jianchao Yang*, Jianbo Yuan and Jiebo Luo, AAAI 2016
Fine‐Grained User Profiling from Multiple Social Multimedia PlatformsMain Contributors: Quanzeng You, Sumit Bhatia*, Tong Sun* and Jiebo Luo
User Expression & Behavior
(demographics & interests)
26
User Demographics & Interests
Computerized Identification of AutismMain Contributors: Tristram Smith, Jiebo Luo
ASD (Autism Spectral Disorder) is now estimated tooccur in approximately 1 in 68 individuals
Currently, no laboratory test for ASD exists, and theprocess of diagnosing the disorder is highly complexand labor‐intensive, requiring extensive expertise
Few centers offer ASD diagnostic evaluations, andthese centers have lengthy waiting lists, ranging from 2‐12 months for an initial appointment.
Waiting is not only stressful for children with ASD andtheir families, but it delays their access to earlyintervention services, which have been shown toimprove outcomes dramatically in many
Develop a computerized system of processing naturallanguage to identify ASD from de‐identified, semi‐structured patient records.
Items of a standard format or templatea. Parent intake questionnaire
b. Teacher questionnaire
c. Child Behavior Checklist
d. Children’s Sleep Habits Questionnaire
e. Autism Diagnostic Observation Schedule
Unstructured itemsa. Records provided by primary care providers and early intervention or preschool service providers, scanned into the electronic medical record
b. Phone intake by social worker (unstructured text entered directly into the medical record)
c. Clinician report (unstructured text entered directly into the electronic record)
Understanding the Pulse of Our Society• Social interactions and social activities• Public health surveillance• Web sentiment analysis and trend prediction• Cyber terrorism, extremism, and activism• Fads and infectious ideas• Marketing intelligence analytics • Traffic and human mobility patterns• Human and environment• Social unrest, protest and riot
Social Multimedia‐based Prediction of Elections
Prediction for the swing states in 2012 US Presidential Election
Social images can act like a prism to reveal split public opinions
Main Contributors: Quanzeng You, Liangliang Cao*, Junhuan Zhu, John R. Smith* and Jiebo Luo, IEEE Trans. Multimedia, 2015
Competitive Vector Auto Regression
Textual and Visual Sentiment
Negative Campaign
Fine‐Grained Analysis of the 2016 Election
America Tweets China: Analysis of State and Individual Characteristics Regarding Attitudes towards China
Main Contributors: Yu Wang and Jiebo Luo, IEEE Big Data Conference, 2015
Correlation coefficients of textual sentiment and visual sentiment
News Media versus Social Media
Users who like to post many image tweets, they aremore likely to have positive sentiments.
Main Contributors: Quanzeng You and Jiebo Luo
Examples of Image Tweets
Home Location from Tweets & Urban Computing
Bad headache, no school today!
Correlations btw. Health and Other Factors
Towards Lifestyle Understanding: Predicting Home and Vacation Locations from User’s Online Photo Collections
1
2
•1000 Flickr users from the following populous areas throughout continental US– Chicago, Boston, Austin, Columbus, Washington DC,
Denver, Houston, Los Angeles, Salt Lake City, the greater NYC Area, the Bay Area, Phoenix, San Antonio, and Seattle
•423047 geotagged photos in a one‐year time span
Data
Temporal (accuracy 66%): •High check‐in rate (accuracy 58%)•Time prevalence: photos at home can be taken at any time of a day, any day of a week, and any month of a year
•Monthly rate: more photo taken at home during December
Visual (accuracy 64%):•kitchen/living room/bedroom scenes…
30,000 manually labeled real life photos+ CNN Classifier
Home Location Prediction
Visual and the temporal feature provides complementary information to each other
Fused home predictor achieves a high accuracy of 71% with 70.7-meter error distance
Home Prediction Result
35 categories of vacation photos from SUN/Places Database(ocean, basilica, forest, canyon, harbor, desert, …)
For each (user, loc), compute a 35-dimension vector representingthe probability that this location is of category i
Naive Bayes Classifier
Precision = 0.73, Recall = 0.59, f score = 0.65
Deep Network
Vacation locations can be multiple (vacation 1, vacation 2, ….)•Spatial: away from home, say >100 miles
•Temporal:
1) once or twice a year
2) burst of lots of photos within a few days
3) peak season & off season
•Visual: natural scenes, beach scenes, building scenes, etc.
Vacation Location Prediction
Spatiotemporal: AUC = 0.781 , precision = recall = 0.468Visual: AUC = 0.787 , precision = recall = 0.507Fused vacation predictor: AUC = 0.854, precision = recall = 0.594
Vacation Location Prediction Results
Using Social Multimedia to Solve Social Problems Main Contributors: Ran Pang, Jiebo Luo, and Henry Kautz
Drinking Levels among YouthThe CDC 2011 Youth Risk Behavior Survey found that among high school students, during the past 30 days:
• 39% drank some amount of alcohol.• 22% binge drank.• 8% drove after drinking alcohol.• 24% rode with a driver who had been
drinking alcohol.
Consequences of Underage Drinking• School problems, such as higher absence and poor or failing grades.• Social problems, such as fighting, physical and sexual assault.• Legal problems, such as arrest for driving or physically hurting
someone while drunk.• Physical problems, such as hangovers or illnesses.• Unwanted, unplanned, and unprotected sexual activity.• Higher risk for suicide and homicide.• Alcohol‐related car crashes and other unintentional injuries.• Abuse of other drugs.
Using Social Multimedia to Solve Social Problems Main Contributors: Ran Pang, Jiebo Luo, and Henry Kautz
Social Multimedia
Visual Data
Textual Data
Computer Vision
NLP
User Demographics
User Activities
Behavior Patterns
Time Patten of Underage Alcohol UseMain Contributors: Ran Pang, Jiebo Luo, and Henry Kautz
NYC
ALL
Brand Influence in Underage Alcohol UseMain Contributors: Ran Pang, Jiebo Luo, and Henry Kautz
Vodka 1 Vodka 2 Champagne Beer 1 Beer 2
Young Male 6.43% 6.79% 6.10% 13.21% 10.95%
Adult Male 29.69% 42.16% 24.27% 52.41% 51.91%
Young Female 19.76% 15.12% 19.49% 11.58% 12.17%
Adult Female 44.12% 35.93% 50.14% 22.79% 24.97%
EXPERIMENTS (3): Youth Exposure to Alcohol Media
• Mining deeper level patterns in terms of factors such as family income, rural vs. urban, coastal vs.heartland regions, as well as social influence by peers in the social networks
• Combining the proposed approach with surveys, which can be used to verify the findings fromsocial media data mining.
• Applying this methodology to other social problems that involve youth, such as tobacco, drugs, teenpregnancy, unsafe sex, unsafe driving, obesity, stress, and depression.
ONGOING DIRECTIONS
Drug Image Classification
• Fine‐tuned CNN• Starting with the pre‐trained VGG Net • Fine‐tuned CNN features + SVM
• Using noisy data downloaded from Google
• Fine‐tuned data statistics• Instagram photos
label pills bottle weed total Non‐drug
# 2421 1233 675 4329 12253
Main Contributors: Xitong Yang, Meredith McCarron, Lacey Kelly, Jiebo Luo
● Cafe: WineBar
● Neighborhood: Washington Heights
● Downtown Manhattan: 5th ave
● Barber Shop: El Vaye Barber & Beauty Salon
● Apartment and Neighbourhood
● Private Club: Brentwood Country Club
● Restaurant(alot): House of Blues Los Angeles
● Los Angeles International Airport (LAX)
Drug Use Patterns from InstagramMain Contributors: Yiheng Zhou, Numair Sani, Jiebo Luo
Understanding Pets and HappinessMain Contributors: Yucheng Wu, Ran Pang, Jiebo Luo
Social support is critical for psychological and physical well‐being, reflecting the centrality of belongingness in our lives. Human interactions often provide people with considerable social support, but can pets also fulfill one's social needs?
Studies found in a community sample that pet owners fared better on several well‐being (e.g., greater self‐esteem, more exercise) and individual‐difference (e.g., greater conscientiousness, less fearful attachment) measures.
We intend to verify such findings at a larger scale and potentially at a fine granularity, through social multimedia. We will set up an experimental group and a control group.
The Totem Pole of HappinessMain Contributors: Xuefeng Peng, Kevin Chi, Jiebo Luo
From Catwalk to Main StreetMain Contributors: Kezhen Chen, Kuan‐Ting Chen*, Peizhong Cong, Winston Hsu, Jiebo Luo (MM ‘15 Grand Challenge)
Motivations• In modern times, a growing number of people pay more attention to fashion and the mass has the penchant to emulate what large city residents and celebrities wear
• Investigating fashion trends is of great interest to the industry and academia because of the potential for boosting many emerging applications, such as clothing recommendation, advertising by clothing brand association, etc.
Approach1. Constructing a large dataset from the New York
Fashion Shows and New York street chic in order to understand the likely clothing fashion trends in New York
2. Utilizing a learning‐based approach to discover fashion attributes as the representative characteristics of fashion trends, and
3. Comparing the analysis results from the New York Fashion Shows and street‐chic images to verify whether the fashion shows have actual influence on the public in New York City.
From Catwalk to Main StreetMain Contributors: Kezhen Chen, Kuan‐Ting Chen*, Peizhong Cong, Jiebo Luo
64
Two Most Important Social Signals (IMO)
• User• Sentiment
Events
Project Janus
Visual Intelligence & Social Multimedia Analytics
Interest, Personality, Sentiment, Behavior, Life, Work, Health, Happiness.
66
A Selfie is Worth a Thousand Words: Mining Deep Personal Patterns behind User Selfie Behaviors
WWW 2017
67
When Fashion Meets Big Data: Discriminative Mining of Best Selling Clothing Features
WWW 2017
Project Janus
Visual Intelligence & Social Multimedia Analytics
Interest, Personality, Sentiment, Behavior, Life, Work, Health, Happiness.
Big cities vs. Small cities
1. Different mobility patterns?
2. Exciting vs. Routine?
3. Stressful vs. Relaxed?
4. Fast vs. Slow?
Geo-tagged social media makes it possible to understand various life styles in different cities at scale
Data‐Driven Lifestyle Patterns
Data‐Driven Lifestyle Patterns
Human Mobility and Human‐Environment InteractionMain Contributors: Yuncheng Li, Jifei Huang, and Jiebo Luo (ICIMCS 2015)
Geotagged tweets
Morning and evening rush hours Haze Dehazed
Experiments: Metrics
• Spearman correlation coefficients– rank correlation
• Haze level: – ordinal data– sign is irrelevant
• The metric: – absolute spearman coefficients
When Do Luxury Cars Hit the Road?
Promoting STEM Education via Social MediaMain Contributors: Lee Murphy, Kelly He, Jiebo Luo
The data show both that many sectors of the U.S. economy are facing a shortage of STEM talent and that foreign‐born STEM workers currently in the workforce are complementing, not displacing, their U.S. counterparts.
Promoting STEM Education via Social MediaMain Contributors: Lee Murphy, Kelly He, Jiebo Luo
Building Powerful Connections: Identify college students
among Twitter users
Identify those who are “STEM champions”
Identify diverse STEM role models from LinkedIn
Match a given STEM potential to STEM champions and/or Role Models by proximity (minimum degrees of separation)
• Analyzing grocery shopping baskets– Wegmans Shopper’s Club member data– Mining rich data from shopping baskets
• Nutrition break-down and tracking• Recommendation: organic foods; healthy alternatives• Customer profiling; lifestyle changes• Health management (linked to social media, exercise apps)
• Analyzing family food images – Undocumented immigrant workers with illiteracy– Daily capture of food intake images– Nutrition analysis and healthy eating suggestions
Using Big Consumer Data to Promote Healthy LifestylesPIs: Jiebo Luo (PhD), Henry Kautz (PhD), Karen Stein (PhD/RN)
Big Data. Robust Intelligence. Better Lives.
RobustIntelligence
Big Data
BetterLife
Better informed Better served Better lived
Big dataBig image data
Better knowledgeBetter decision
衣,食,住,行用,学,作,友情、乐,健,育