Trust, Influence, and Noise: Implications for Safety Surveilance Bill Rand Asst. Prof. of Marketing...
-
Upload
richard-martin -
Category
Documents
-
view
213 -
download
1
Transcript of Trust, Influence, and Noise: Implications for Safety Surveilance Bill Rand Asst. Prof. of Marketing...
Trust, Influence, and Noise: Implications for Safety Surveilance
Bill RandAsst. Prof. of Marketing and Computer Science
Director of the Center for Complexity in Business
Data Science
• Data> Large and rich sources of data of all types> Social media, GIS, loyalty cards, CRM, Open-source
mainstream media• Science
> Developing theories of how and why people interact> Hypothesis creation, First principles of consumer behavior
• Storytelling> Explaining the science of the data to others> Analysis, Visualization, Modeling, Simulation
http://www.rhsmith.umd.edu/ccb/
Cutting through the Noise
• Opportunity: Social Media is a great marketing channel.• Challenge: However, there is a lot of noise, and its not
apparent what users we should be paying attention to for monitoring.
• Solution: Identify properties that are indicative of future conversations.
Influence
• Influential users are ones who are able to reach a lot of users quickly with their messaging.
• How do you identify influentials?
Trust
• Trust is a measure of how much one user believes the content of another user.
• How does trust evolve on social media?
• Does understanding trust help you in modeling conversations?
0.4
0.5
0.6
0 1000 2000 3000 4000 5000Threshold
AUC
SpecificationStaticDynamicBaselineBaseline+staticBaseline+dynamicPast scores
SVM
0.650.700.750.800.85
0 1000 2000 3000 4000 5000Threshold
AUC
SpecificationStaticDynamicBaselineBaseline+staticBaseline+dynamicPast scores
Random forest
0.40.50.60.70.80.9
0 1000 2000 3000 4000 5000Threshold
AUC
SpecificationStaticDynamicBaselineBaseline+staticBaseline+dynamicPast scores
Deep learning
Different Methods for Identification
• Baseline – How many messages do they generate?
• Past Scores – How many conversations have they created before?
• Static – How many friends?• Dynamic – What are the
dynamics of conversations?
Identifying Trends on Social Media
• To identify trends, you need to establish a baseline, but how do you establish that baseline?
• What matters?– Subject– Geography– Time
10/27/1
2 5:00
10/27/1
2 6:00
10/27/1
2 6:59
10/27/1
2 8:00
10/27/1
2 9:00
10/27/1
2 10:00
10/27/1
2 11:00
10/27/1
2 12:00
10/27/1
2 12:59
10/27/1
2 14:00
10/27/1
2 15:00
10/27/1
2 15:59
10/27/1
2 17:00
10/27/1
2 18:00
10/27/1
2 18:59
10/27/1
2 20:00
10/27/1
2 21:00
10/27/1
2 22:00
10/27/1
2 23:00
10/28/1
2 0:00
10/28/1
2 0:59
10/28/1
2 2:00
10/28/1
2 3:00
10/28/1
2 3:59
10/28/1
2 5:00
10/28/1
2 6:00
10/28/1
2 6:59
Sandy - "Near NYC" Most Common (TF/IDF) Terms
hurricane hurricanesandy frankenstorm storm nycapocalypse ny food water
10/27/1
2 5:00
10/27/1
2 6:00
10/27/1
2 6:59
10/27/1
2 8:00
10/27/1
2 9:00
10/27/1
2 10:00
10/27/1
2 11:00
10/27/1
2 12:00
10/27/1
2 12:59
10/27/1
2 14:00
10/27/1
2 15:00
10/27/1
2 15:59
10/27/1
2 17:00
10/27/1
2 18:00
10/27/1
2 18:59
10/27/1
2 20:00
10/27/1
2 21:00
10/27/1
2 22:00
10/27/1
2 23:00
10/28/1
2 0:00
10/28/1
2 0:59
10/28/1
2 2:00
10/28/1
2 3:00
10/28/1
2 3:59
10/28/1
2 5:00
10/28/1
2 6:00
10/28/1
2 6:59
Sandy - "Far NYC" Most Common (TF/IDF) Terms
hurricane hurricanesandy storm coast weatherhit beach rain wind
Inferring Geolocation in Social Media Data
• Geolocation in social media can be inferred from three different types of data:
– Geoencoded Data– User-described Location– Ambient Geography
• Ambient Geography is the use of references in natural language text to help determine the location being referenced
• We are developing a Bayesian modeling framework to constantly update a user’s most probable location based on their social media activity
• Among the many benefits, we plan to use this tool to help verify the accuracy of social media content, since the proximity of a user to an event can help assess their credibility
Challenges and Opportunities
• Challenges– We need better methods to automatically assess the quality and impact
of social media content– The failure of Google Flu Trends indicates that the solution is not in big
data analysis unguided by theory– There is a selection bias in terms of those who use social media to talk
about health, we need to account for this bias• Opportunities
– These tools will have more resolution as we move into the future– New methods of filtering and content analysis will improve the overall
results– Combining multiple signals about quality of content will improve
surveillance• In the end, we need to cut through the noise
[email protected]@billrandter.ps/ccbter.ps/ccbssrn