Citizen Sensing, Social Media Analytics, and Applications
-
date post
23-Sep-2014 -
Category
Education
-
view
5 -
download
0
description
Transcript of Citizen Sensing, Social Media Analytics, and Applications
-
Citizen Sensor Data Mining, Social Media Analytics and
Development Centric Web Applications.Tutorial at
Semantic Technology Conference, San Francisco, CA.
Karthik GomadamAccenture Technology Labs,
San Jose
Amit ShethKno.e.sis @
Wright State University
Selvam VelmuruganeMoksha, Kiirti
Monday, June 6, 2011
-
Lu Chen(Sentiment Analysis)
Meena Nagarajan(Content Analysis)
Ashutosh Jadhav(Event Analysis)
Hemant Purohit(People & Network analysis)
Pavan Kapanipathi(Real Time Web)
Selvam Velmurugan (Kiirti, eMoksha NGOs)
Pramod Anantharam(Social & Sensor web)
Amit Sheth(Semantic Web)
Monday, June 6, 2011
-
Much of the work discussed in this tutorial is primarily the doctoral research by Dr. Meena Nagarajan, currently at IBM Almaden. It also includes current work done at kno.e.sis center at Wright State University.
A Quick Word
Monday, June 6, 2011
-
Citizen Sensing: Role, Enablers, Apps
Systematic Study Social Media
Citizen Sensing @ Real-time
Emerging Research Areas Spam and Trust in Social Media, Mobile Social ComputingResearch Application: Twitris
Tutorial part 2
Outline
Monday, June 6, 2011
-
Citizen Sensing
Everyday users of Web2.0 and social networks: Citizens ofan Internet- or Web-enabled social communityObservation and Information reported by citizens => Citizen SensingHuman-in-the-loop (participatory)sensing + Web 2.0 + mobile computing = emergence of
" citizen-sensor networks
Monday, June 6, 2011
-
Social Signals
The activity of observing, reporting, disseminating information via text, audio, video and built in device sensor (and smart devices), Creating social signals through aggregation, enhancement,
analysis, visualization, and interpretation.Immense potential to disseminate information quickly and in real-time
Monday, June 6, 2011
-
Enablers: Mobile Devices & Ubiquitous Connectivity
Mobile device fast emerging as our primary tool Redefines the way we engage with people, information,
etc. Global, Ubiquitous, always availableSense where you are, how you are,
Monday, June 6, 2011
-
Enablers: Mobile Devices & Ubiquitous Connectivity
Global, Ubiquitous, always availableSense where you are, how you are,
Monday, June 6, 2011
-
Enablers: Mobile Devices & Ubiquitous Connectivity
Sense where you are, how you are,
Monday, June 6, 2011
-
Enablers: Mobile Devices & Ubiquitous Connectivity
Monday, June 6, 2011
-
Mobile Platforms Hit Critical Mass Over 5 billion users 1+B with internet connected mobile devices (2010) Smartphones > Notebooks + Netbooks (2010E) 500K+ mobile phone applications 74% of mobile phone users (2.4B) worldwide texted (2007)
Enablers: Mobile Devices & Ubiquitous Connectivity
Monday, June 6, 2011
-
Enablers: Web 2.0 & Social Media
500M+ Facebook Users100M+ Twitter users, 85M+ tweets/dayInternet Users: 1.8 BlnContent dissemination medium Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
-
Enablers: Web 2.0 & Social Media
100M+ Twitter users, 85M+ tweets/dayInternet Users: 1.8 BlnContent dissemination medium Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
-
Enablers: Web 2.0 & Social Media
Internet Users: 1.8 BlnContent dissemination medium Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
-
Enablers: Web 2.0 & Social Media
Content dissemination medium Even for traditional media (@cnn, @nytimes)
Monday, June 6, 2011
-
Enablers: Web 2.0 & Social Media
Monday, June 6, 2011
-
Enablers: Web 2.0 & Social Media
Types of UGC: Twitter(text/microblogs), Facebook(multimedia),YouTube(videos), Flicker(images), Blogs(text),Ping: (Social network for music)
Monday, June 6, 2011
-
Enablers: Web 2.0 & Social Media
Flicker(images), Blogs(text),Ping: (Social network for music)
Monday, June 6, 2011
-
Enablers: Web 2.0 & Social Media
Ping: (Social network for music)
Monday, June 6, 2011
-
Enablers: Web 2.0 & Social Media
Monday, June 6, 2011
-
Iran electionHaiti EarthquakeUS healthcare debate
Citizen Sensors in Action
Monday, June 6, 2011
-
Revolution 2.0 Political/Social Activism
If you want to liberate a government, give them the internet. - Wael Ghonim (Egyptian social activist)When Blitzer asked Tunisia, then Egypt, whats next?, Ghonim replied succinctly Ask Facebook.
Monday, June 6, 2011
-
Revolution 2.0 Political/Social Activism
When Blitzer asked Tunisia, then Egypt, whats next?, Ghonim replied succinctly Ask Facebook.
Monday, June 6, 2011
-
Revolution 2.0 Political/Social Activism
Monday, June 6, 2011
-
Citizen Journalism
Twitter Journalism
Monday, June 6, 2011
-
Social Media Inuence: Intelligence, News & Analysis
Many media companies useFacebook and Twitter asnews-delivery platform. Manyindividuals rely on them as newssource. News is increasingly social.
Monday, June 6, 2011
-
Business Intelligence Trend SpoTing, Forecasting, Brand
Tracking and Crisis ManagementSysomos : http://www.sysomos.com/Trendspotting : http://trendspotting.comSimplify : http://simplify360.com/Shoutlet : http://www.shoutlet.com/ Reputation (Defender): http://www.reputationdefender.com/
Monday, June 6, 2011
-
Development (Education, Health, eGov)
LiveMocha (http://www.livemocha.com/) OnlineLanguage learning tool with social engagement bridging the gap!!Soliya (http://www.soliya.net/) Dialogue between students fromdiverse " backgrounds
across the globe using latest multimedia technologiesProject Einstein (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal programconnecting
youths in refugee camps to the world
Monday, June 6, 2011
-
Development (Education, Health, eGov)
Soliya (http://www.soliya.net/) Dialogue between students fromdiverse " backgrounds
across the globe using latest multimedia technologiesProject Einstein (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal programconnecting
youths in refugee camps to the world
Monday, June 6, 2011
-
Development (Education, Health, eGov)
Project Einstein (http://digital-democracy.org/what-we-do/programs/) A photography-based digital penpal programconnecting
youths in refugee camps to the world
Monday, June 6, 2011
-
Development (Education, Health, eGov)
Monday, June 6, 2011
-
PatientsLikeMe (http://mashable.com/2010/07/13/social-media-health-trends/)TrialX (http://trialx.com)
Image: hMp://www.dragonsearchmarketing.com/blog/
social-media-development-through-visual-aids-tools/
Development (Education, Health, eGov)
Monday, June 6, 2011
-
Why People-Content-Network metadata?
Monday, June 6, 2011
-
Spatio - Temporal -Thematic+
People - Content - Network
Dimensions of Systematic Study of Social Media
Monday, June 6, 2011
-
"Who says what, to whom, why, to what extent and with what effect?" [Laswell] Network: Social structure emerges from the aggregate of relationships (ties)People: poster identities, the active effortof accomplishing interactionContent : studying the content of ommunication.
Social InformationProcessing
Monday, June 6, 2011
-
How does the (semantics or style of) content t into the observations made about the network?
Often, the three-dimensional dynamic of people, content and link structure is what shapes the social dynamic.
Studying Online Human Social Dynamics
Monday, June 6, 2011
-
Studying Online Human Social Dynamics
Monday, June 6, 2011
-
Studying Online Human Social Dynamics
Example: how does the topic of discussion, emotional charge of a conversation, the presence of an expert and connections between participants; together explain information propagation in a social network?
Monday, June 6, 2011
-
Studying Online Human Social Dynamics
Monday, June 6, 2011
-
Metadata/Annotations
Metadata: an organized way to study types creation/extraction and storage use
Monday, June 6, 2011
-
The Anatomy of a Tweet
Monday, June 6, 2011
-
Explicit information from user proles User Names, Pictures, Videos, Links, Demographic
Information, Group memberships... Often is not updated Implicit information from user a+ention metadata Page views, Facebook 'Likes', Comments; TwiMer
'Follows', Retweets, Replies..
People Metadata: Variety of Self-expression Modes on Multiple
Social Media Platforms
Monday, June 6, 2011
-
People Metadata: Various Levels
Demographic
Network
Activity
Interests
Monday, June 6, 2011
-
People Metadata: Continued
User Demographic MetadataUser-idScreen/Display-name of userReal name of userLocation Profile Creation DateUser descriptionUser BioURL
Interest Level MetadataAuthor type Trustee/donor, journalist, blogger, scientist etc.
Favorite tweets Types of lists subscribed Style of Writing personality indicator No. of Followees Author type trend of Followees
Monday, June 6, 2011
-
Web Presence:User affiliationsKLOUT Score influence measure (www.klout.com)
Activity Level Metadata
Age of the prole
Frequency of posts
Timestamp of last status
No. of Posts
No. of Lists/groups created
No. of Lists/groups subscribed
Inuence Level Metadata (Inferring People Metadata from Network level Information)
No. of Followers normal, inuential
No. of Mentions
No. of Retweets/Forwards
No. of Replies
No. of Lists/groups following
No. of people following back
Authority & Hub Scores
People Metadata: Continued
Monday, June 6, 2011
-
Content Independent metadata " date, location, author etcContent Dependent metadata Direct content-based metadata Explicit/Mentioned Content metadata
named entities in content Implicit/Inferred Content Metadata
related named entities from knowledge sources Indirect content-based metadata (External metadata)
context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)
Content Metadata
Monday, June 6, 2011
-
Content Dependent metadata Direct content-based metadata Explicit/Mentioned Content metadata
named entities in content Implicit/Inferred Content Metadata
related named entities from knowledge sources Indirect content-based metadata (External metadata)
context inferred from URLs in content (images, links to articles, FourSquare checkins etc.)
Content Metadata
Monday, June 6, 2011
-
Content Metadata
Monday, June 6, 2011
-
For Tweets Published date and time Location (where tweet was generated from) Tweet posting method (smart-phone, twitter.com,
clients for twitter) Author information
Content Independent Metadata
Monday, June 6, 2011
-
Content Independent Metadata
Monday, June 6, 2011
-
For Text messages Published date and time Origin location Recipient Carrier information
Content Independent Metadata
Monday, June 6, 2011
-
Content Independent Metadata
Monday, June 6, 2011
-
Content Independent Metadata
Monday, June 6, 2011
-
Content Dependent Metadata (Tweet) Direct Content-based Metadata
Direct Content-based Metadata
Indirect content-based metadata (External metadata)
Monday, June 6, 2011
-
Content Dependent Metadata
Direct Content-based Metadata
Monday, June 6, 2011
-
Network Metadata
Connections/Relationships (foundation for the network) matter!Structure Level Metadata
Community SizeCommunity growth rateLargest Strongly Connected Component sizeWeakly Connected Components & Max. sizeAverage Degree of SeparationClustering Coecient
Relationship Level Metadata
Type of RelationshipRelationship strengthUser Homophily based on certain characteristic (e.g., Location, interest etc.)Reciprocity: mutual relationshipActive Community/ Ties
Monday, June 6, 2011
-
Metadata: Creation, Extraction and Storage
Monday, June 6, 2011
-
Extracted Metadata Directly visible information from the user profile, tweet
content & community structureCreated Metadata After processing information in the user profile, content
and/or network structure
Metadata Creation & Extraction
Monday, June 6, 2011
-
Length: 144 characters; General topic: Egypt protestThis poor {sentiment_expression: {target:Lara Logan, polarity:negative}} woman! RT @THR CBS News'{entity:{type=News Agency}} Lara Logan{entity:{type=Person}} Released From Hospital{entity:{type=Location}} After Egypt{entity:{type=Country} Assault{type=topic}http://bit.ly/dKWTY0 {external_URL}
An Example
Monday, June 6, 2011
-
Rich Snippet, RDFa, open graph, semantic web based social data standards
Relationships/connections play central role Relationships as rst class object is important
Why Semantic Web is a standard for social metadata?
Monday, June 6, 2011
-
Semantic Web: A Very Short Primer
Monday, June 6, 2011
-
Representation RDF relationships as first class object OWLRepresenting Knowledge and Agreements:
nomenclature, taxonomy, folksonomy, ontology
Semantic Web: A Very Short Primer
Monday, June 6, 2011
-
Semantic Web: A Very Short Primer
Monday, June 6, 2011
-
Annotation RDFa, Xlink, model reference
Semantic Web: A Very Short Primer
Monday, June 6, 2011
-
Annotation RDFa, Xlink, model referenceWeb of Data Linked Open Data
Semantic Web: A Very Short Primer
Monday, June 6, 2011
-
Annotation RDFa, Xlink, model referenceWeb of Data Linked Open DataQuerying SPARQL; Rules: SWRL, RIF
Semantic Web: A Very Short Primer
Monday, June 6, 2011
-
Store metadata as data and use standard database techniques
Use filtering and clustering, summarization, statistics - implicit semantics
How to save and use metadata?
Monday, June 6, 2011
-
Use filtering and clustering, summarization, statistics - implicit semantics
How to save and use metadata?
Monday, June 6, 2011
-
How to save and use metadata?
Monday, June 6, 2011
-
How to save and use metadata?
Monday, June 6, 2011
-
Use explicit semantics and Semantic Web standards and technologies
semantics = meaningricher representation, support for relationships, contextsupports use of background knowledgebetter integration, powerful analysisSemantics- the implicit, the formal and the
powerfulSocial metadata on the Web
How to save and use metadata?
Monday, June 6, 2011
-
Metadata Extraction from Informal Text
Meena Nagarajan, Understanding User-Generated Content on Social Media, Ph.D. Dissertation, Wright State University, 2010
Monday, June 6, 2011
-
Characteristics of Text on Social Media
Monday, June 6, 2011
-
The Formality of Text
Monday, June 6, 2011
-
Recognize key entities mentioned in content Information Extraction (entity recognition, anaphora
resolution, entity classification..) Discovery of Semantic Associations between entities Topic Classification, Aboutness of content What is the content about? Intention Analysis Why did they share this content?
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
-
Topic Classification, Aboutness of content What is the content about? Intention Analysis Why did they share this content?
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
-
Intention Analysis Why did they share this content?
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
-
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
-
Content Analysis-Typical Sub-tasks
Monday, June 6, 2011
-
Content Analysis-Typical Sub-tasks
Sentiment Analysis What opinions are people conveying via the content?Author ProfilingWhat can we infer about the author from the content he posts?Context (external to content) extractionURL extraction, analyzing external content
Monday, June 6, 2011
-
Examining usefulness of multiple context cues for text mining algorithms Compensating for for informal, highly variable
language, lack of context Using context cues: Document corpus, syntactic,
structural cues, social medium, external domain knowledge
In this talk, highlighting sample metadata creation tasks: NER, Key Phrase Extraction, Intention, Sentiment/Opinion Mining
Research Eorts, Contributions in this space..
Monday, June 6, 2011
-
Named Entity Recognition I loved the hangover !Key Phrase Extraction
Part 1. NER, Key Phrase Extraction
Monday, June 6, 2011
-
Multiple Context Cues Utilized for NER in Blogs and MySpace
Monday, June 6, 2011
-
Multiple Context Cues Utilized for Keyphrase Extraction from TwiTer,
Facebook and MySpace
Monday, June 6, 2011
-
Techniques focus on relatively less explored content aspects on social
media platformsCombination of top-down, bottom-up analysis for informal text Statistical NLP, ML algorithms over large corpora Models and rich knowledge bases in a domain
Focus, Impact
Monday, June 6, 2011
-
NAMED ENTITY RECOGNITION
Monday, June 6, 2011
-
I loved your music Yesterday!It was THE HANGOVER of the year..lasted
forever.. So I went to the movies..badchoice picking GI
Janeworse now
NAMED ENTITY RECOGNITION
Monday, June 6, 2011
-
Identifying and classifying tokens
NAMED ENTITY RECOGNITION
Monday, June 6, 2011
-
NER in prior work vs. NER for Informal Text
Monday, June 6, 2011
-
NER focus in this work: Cultural Named Entities
Artifacts of Culture Name of a books, music albums, lms, video games,
etc.Common words in a language The Lord of the Rings, Lips, Crash, Up, Wanted,
Today, Twilight, Dark Knight
Cultural Named Entities
Monday, June 6, 2011
-
Varied senses, several poorly documented Merry Christmas covered by 60+ artists Star Trek:
movies, TV series, media franchise.. and cuisines !!Changing contexts with recent events The Dark Knight reference to Obama, health care
reformUnrealistic expectations Comprehensive sense definitions, enumeration of
contexts, labeled corpora for all senses .. NER Relaxing the closed-world sense assumptions
Characteristics of Cultural Entities
Monday, June 6, 2011
-
NER in prior work vs. NER for Informal Text
Monday, June 6, 2011
-
NER generally a sequential prediction problem NER system that achieves 90.8 F1 score on the
CoNLL-2003 NER shared task (PER, LOC, ORGN entities) [Lev Ratinov, Dan Roth]
Focus of approach: Spot and Disambiguate ParadigmStarting off with a dictionary or list of entities we want to spot
A Spot and Disambiguate Paradigm
Monday, June 6, 2011
-
Spot, then disambiguate in context (natural language, domain knowledge cues)Binary ClassificationIs this mention of the hangover in a sentence referring to a movie?
A Spot and Disambiguate Paradigm
Monday, June 6, 2011
-
NER in prior work vs. NER for Informal Text
Monday, June 6, 2011
-
Algorithmic Contributions Supervised Algorithms
Monday, June 6, 2011
-
Algorithmic Contributions Supervised Algorithms
Examples:I am watching Pattinson scenes in Twilight for the nth time.I spent a romantic evening watching the Twilight
by the bay..I love Lilys song
Monday, June 6, 2011
-
Multiple Senses in the Same Domain
Monday, June 6, 2011
-
Problem Defn Cultural Entity Identification : Music album, tracks Smile (Lilly Allen), Celebration (Madonna)Corpus: MySpace comments Context-poor utterances
" Happy 25th Lilly, Alfieis funny
Algorithm Preliminaries
Monday, June 6, 2011
-
Corpus: MySpace comments Context-poor utterances
" Happy 25th Lilly, Alfieis funny
Algorithm Preliminaries
Monday, June 6, 2011
-
" Happy 25th Lilly, Alfieis funny
Algorithm Preliminaries
Monday, June 6, 2011
-
Goal: Semantic Annotation of music named entities (w.r.t
MusicBrainz)
Algorithm Preliminaries
Monday, June 6, 2011
-
Using a Knowledge Resource for NER is not straight-forward..
Monday, June 6, 2011
-
Approach Overview
Scoped Relationship graphsUsing context cues from the
content, webpage title, url new Merry Christmas tune
Reduce potential entity spot size new albums/songs
Generate candidate entitiesSpot and Disambiguate
Monday, June 6, 2011
-
Sample Real-world Constraints
Career Restrictionsrelease your third album already..Recent Album restrictionsI loved your new album..Artist age restrictionshappy 25th rihanna, loved alfie btw.. etc.
Monday, June 6, 2011
-
Challenge 1: Several senses in the same domain Scoping relationship graphs narrows possible senses Solves the named entity identification problem
partially
Challenge 2: Non-music mentions Got your new album Smile. Loved it! Keep your SMILE on!
" " " "" " " "
Non-Music Mentions
Monday, June 6, 2011
-
Challenge 1: Several senses in the same domain Scoping relationship graphs narrows possible senses Solves the named entity identification problem
partially
Challenge 2: Non-music mentions Got your new album Smile. Loved it! Keep your SMILE on!
" " " "" " " "
Non-Music Mentions
Monday, June 6, 2011
-
Syntactic features POS Tags, Typed dependencies.. Example hereWord-level features Capitalization, QuotesDomain-level features
Using Language Features to eliminate incorrect mentions..
Monday, June 6, 2011
-
Supervised Learners
Monday, June 6, 2011
-
1800+ spots in MySpace user comments from artist pages
Keep your SMILE on! good spot, bad spot, inconclusive?
4-way annotator agreements
Madonna 90% agreement Rihanna 84% agreement
Lily Allen 53% agreement
Hand Labeling - Fairly Subjective
Monday, June 6, 2011
-
Daniel Gruhl, Meena Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Context and Domain Knowledge Enhanced Entity SpoMing in Informal Text, The 8th International Semantic Web Conference,
2009: 260-276
Dictionary SpoTer + NLP Step
Monday, June 6, 2011
-
Highlights issues with using a domain knowledge for an IE task Two stage approach: chaining NL learners over results of domain model based spotters Improves accuracy up to a further 50% allows the more time-intensive NLP analytics to
run on less than the full set of input data
NER on Social Media Text using Domain Knowledge
Monday, June 6, 2011
-
" "
Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth: Multimodal Social Intelligence in a Real-Time Dashboard System, special issue of the VLDB Journal on "Data Management and Mining for Social Networks and Social Media", 2010 CHECK hMp://www.almaden.ibm.com/cs/
projects/iis/sound/
BBC SoundIndex (IBM Almaden): Pulse of the Online Music
Monday, June 6, 2011
-
http://www.almaden.ibm.com/cs/projects/iis/sound/
The Vision
Monday, June 6, 2011
-
Monday, June 6, 2011
-
Only 4% -ve sentiments, perhaps ignore the Sentiment Annotator on this data source?
Ignoring Spam can change ordering of popular artists
Trending popularity of artists Trending topics in artist pages
Several Insights
Monday, June 6, 2011
-
Billboards Top 50 Singles chart during the week of Sept 22-28 07 vs. MySpace popularity charts.User study indicated 2:1 and upto 7:1 (younger age
groups) preference for MySpace list.Challenging traditional polling methods!
Predictive Power of Data
Monday, June 6, 2011
-
Key Phrase Extraction
Monday, June 6, 2011
-
Key phrases extracted from prominent discussionson Twitter around the 2009 Health Care Reformdebate and 2008 Mumbai Terror Attack on one day
Key Phrase Extraction: Example
Monday, June 6, 2011
-
Different from Information ExtractionExtracting vs. Assigning Key Phrases " Focus: Key Phrase ExtractionPrior work focus: extracting phrases that summarize a document -- a news article, a web page, a journal article, a book..Focus: summarize multiple documents (UGC) around same event/topic of interest
Key Phrase Extraction from SM Text
Monday, June 6, 2011
-
Focus: Summarizing Social Perceptions via key phrase extractionPreserving/Isolating the social behind the social
data"What is said in Egypt vs. the USA should be viewed in
isolation
Key Phrase Extraction on SM Content
Monday, June 6, 2011
-
Accounting for redundancy, variability, off-topic content
" Met up with mom for lunch, she looks lovely as ever, good genes .. Thanks Nike, I love my new Gladiators ..smooth as a feather. I burnt all the calories of Italian joy in one run.. if you are looking for good Italian food on Main, Bucais the place to go.
Key Phrase Extraction on SM Content
Monday, June 6, 2011
-
Thematic components similar messages convey similar ideas Space, time metadata role of community and geography in communicationPoster attributes age, gender, socio-economic status reflect similar
perceptions
Social and Cultural Logic in SMC
Monday, June 6, 2011
-
Focus: n-grams, spatio-temporal metadata (social components) Syntactic Cues: In quotes, italics, bold; in document headers; phrases collocated with acronyms
Feature Space (common to several eorts)
Monday, June 6, 2011
-
Document and Structural Cues: Two word phrases, appearing in the beginning of a document, frequency, presence in multiple similar documents etc. Linguistic Cues: Stemmed form of a phrase, phrases that are simple and compound nouns in sentences etc.
Feature Space (common to several eorts)
Monday, June 6, 2011
-
President Obama in trying to regain control of the health-care debate will likely shift his pitch in September
" 1-grams: President, Obama, in, trying, to, regain, ..." 2-grams: President Obama, Obama in, in
trying, trying
Key Phrase Extraction: Overview
Monday, June 6, 2011
-
A descriptor is an n-gram weighted by: Thematic Importance
TFIDF, stop words, noun phrases Redundancy: statistically discriminatory in nature variability: contextually important
Spatial Importance (local vs. global popularity) Temporal Importance (always popular vs. currently trending)
Monday, June 6, 2011
-
Monday, June 6, 2011
-
Eliminating Off-topic Content [WISE2009]Frequency based heuristics will not eliminate off-topic content that is ALSO POPULAR
Monday, June 6, 2011
-
Yeah i know this a bit off topic but the other electronics forum is dead right now. im looking for a good camcorder, somethin not to large that can record in full HD only ones so far that ive seen are sonysCanonHV20.Great little cameras under $1000.
Approach Overview
Monday, June 6, 2011
-
Assume one or more seed words (from domain knowledge base) C1 -['camcorder']Extracted Key words / phrases
C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic']
Gradually expand C1 by adding phrases from C2 that are strongly associated with C1Mutual Information based algorithm [WISE2009]
Approach Overview
Monday, June 6, 2011
-
Are the key phrases we extracted topical and good indicators of what the content is about? If it is, it should act as an effective index/search
phrase and return relevant contentEvaluation Application: Targeted Content Delivery
Key Phrases and Aboutness Evaluations
Monday, June 6, 2011
-
12K posts from MySpace and Facebook Electronics forums Baseline phrases: Yahoo Term Extractor Our method phrases: Key phrase extraction,
eliminationTargeted Content from Google AdSense
Targeted Content Delivery -Evaluations
Monday, June 6, 2011
-
Targeted Content for all content vs. extracted key phrases
Monday, June 6, 2011
-
User Studies and Results
Monday, June 6, 2011
-
TFIDF + social contextual cues yield more useful phrases that preserve social perceptionsCorpus + seeds from a domain knowledge base eliminate off-topic phrases effectively
Impact and Contributions
Monday, June 6, 2011
-
Intention Mining
Monday, June 6, 2011
-
On social networksUse case for this talk " Targeted content = content-based " advertisements " Target = user profilesContent-based advertisements CBAs " Well-known monetization model for online content
Targeted Content Delivery via Intention Mining
Monday, June 6, 2011
-
Circa. 2009 Content-based Ads
Monday, June 6, 2011
-
Circa. 2009 -Ads on Proles
Monday, June 6, 2011
-
Interests do not translate to purchase intents " Interests are often outdated.. " Intents are rarely stated on a profile.. Cases that do seem to work " New store openings, sales " Highly demographic-targeted ads
What is going on here
Monday, June 6, 2011
-
Intents in User
Monday, June 6, 2011
-
Content Ads Outside Proles
Monday, June 6, 2011
-
Non-trivial Non-policed contentBrand image, Unfavorable sentiments People are there to networkUser attention to ads is not guaranteed Informal, casual nature of content People are sharing experiences and eventsMain message overloaded with off topic content"
Targeted Content-based Advertising
Monday, June 6, 2011
-
Targeted Content-based Advertising
Monday, June 6, 2011
-
Targeted Content-based Advertising
I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to
do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not
fun. Pleasssse, help? :(
Learning from Multi-topic Web Documents for Contextual Advertisement, Zhang, Y., Surendran, A. C., Platt, J. C., and
Narasimhan, M.,KDD 2008
Monday, June 6, 2011
-
Identifying intents behind user posts on social networks Identify Content with monetization potentialIdentifying keywords for advertising in user-generated content Considering interpersonal communication & off-topic
chatter
Preliminary Results in
Monday, June 6, 2011
-
Investigations
User studies Hard to compare activity based ads to s.o.t.a Impressions to Clickthroughs How well are we able to identify monetizable posts How targeted are ads generated using our " keywords
vs. entire user generated contentMonday, June 6, 2011
-
Scribe Intent not same as Web Search Intent 1B.People write sentences, not keywords or phrasesPresence of a keyword does not imply navigational / transactional intents am thinking of getting X (transactional) I like my new X (information sharing) what do you think about X (information seeking)
1B. J. Jansen, D. L. Booth, and A. Spink, Determining the informational, navigational, and transactional intent of web queries,Inf. Process. Manage., vol. 44, no. 3, 2008.
Identifying Monetizable Intents
Monday, June 6, 2011
-
Action patterns surrounding an entity How questions are asked and not topic words that indicate
what the question is about where can I find a chottopspcam User post also has an entity
From X to Action PaTerns
Monday, June 6, 2011
-
Set of user posts from SNSsNot annotated for presence or absence of any intent
Conceptual Overview Bootstrapping to learn IS paTerns
Monday, June 6, 2011
-
Generate a universal set of n- gram paMerns; freq > f
S = set of all 4-grams; freq > 3
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
-
! !Generate set of candidate paMerns from seed words (why,when,where,how,what)
Sc= all 4-grams in S that extract seed words
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
-
! !User picks 10 seed paMerns from Sc
Sis= does anyone know how, where do I nd,
someone tell me where
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
-
! !! !
Gradually expand Sis by adding Information
Seeking paDerns from Sc
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
-
! !! !
For every pis in Sis generate set of ller paMerns
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
-
.* anyone know how does .* know how
does anyone .* how does anyone know .*
Bootstrapping to learn IS paTerns
Monday, June 6, 2011
-
Extracting and Scoring PaTerns
Monday, June 6, 2011
-
Extracting and Scoring PaTerns
does * know how does someone know how
Functional Compatibility -Impersonal pronouns Empirical Support 1/3
does somebody know how Functional Compatibility -Impersonal pronouns
Empirical Support 0 PaMern Retained
does john know how PaMern discarded
Monday, June 6, 2011
-
Sc= {does anyone know how, where do I nd,
someone tell me where}
pis= `does anyone know how
Extracting and Scoring PaTerns
Monday, June 6, 2011
-
pis= `does anyone know how
Extracting and Scoring PaTerns
Monday, June 6, 2011
-
Extracting and Scoring PaTerns
Monday, June 6, 2011
-
Functional properties / communicative functions of words
From a subset of LIWC
cognitive mechanical (e.g., if, whether, wondering, nd) I am thinking about geMing X
adverbs(e.g., how, somehow, where)
(e.g., someone, anybody, whichever)
Someone tell me where can I nd X
1Linguistic Inquiry Word Count, LIWC, hMp://liwc.net
Expanding the PaTern Pool
Monday, June 6, 2011
-
Over iterations, single-word substitutions, functional usage and empirical support conservatively expands Sis
Infusing new paMerns and seed words
Stopping conditions
Details in [WISE2009] for..
Monday, June 6, 2011
-
Sample Extracted PaTerns
Monday, June 6, 2011
-
Information Seeking paMerns generated oine
Information seeking intent score of a post
Extract and compare paMerns in posts with extracted paMerns
Transactional intent score of a post LIWC Money dictionary - 173 words and
word forms indicative of transactions, e.g., trade, deal, buy, sell, worth, price etc.
Identifying Monetizable Posts
Monday, June 6, 2011
-
Identifying keywords in monetizable posts" Plethora of work in this spaceOff-topic noise removal is our focus" I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :(
Keywords for Advertizing
Monday, June 6, 2011
-
Identifying keywords in monetizable posts Plethora of work in this spaceOff-topic noise removal is our focus I NEED HELP WITHSONY VEGAS PRO 8!! Ugh and
ihave a video project due tomorrow for merrilllynch :(( all ineed to do is simple: Extract several scenes from a clip, insert captions, transitions and thatsit. really. omggicant figure out anything!! help!! and igot food poisoning from eggs. its not fun. Pleasssse, help? :(
Keywords for Advertising
Monday, June 6, 2011
-
Topical hints
C1 -['camcorder']Keywords in post
C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic']
Move strongly related keywords from C2 to C1 one-by-one
Relatedness determined using information gain Using the Web as a corpus, domain independent
Conceptual Overview (also see slides 88,89)
Monday, June 6, 2011
-
C1 -['camcorder']C2 -['electronics forum', 'hd', 'camcorder', 'somethin', 'ive', 'canon', 'little camera', 'canon hv20', 'cameras', 'offtopic'] Informative words ['camcorder', 'canon hv20', 'little camera', 'hd', 'cameras',
'canon']
O-topic ChaTer
Monday, June 6, 2011
-
Keywords from 60 monetizable user posts
Monetizable intent, at least 3 keywords in content45 MySpace Forums, 15 Facebook Marketplace, 30 graduate students
10 sets of 6 posts each Each set evaluated by 3 randomly selected usersMonetizable intents?
All 60 posts voted as unambiguously information seeking in intent
Evaluations -User Study
Monday, June 6, 2011
-
Google AdSenseads for user post vs. extracted topical keywords
1. Eectiveness of using topical keywords
Monday, June 6, 2011
-
Instructions User Study
Monday, June 6, 2011
-
Users picked ads relevant to the post At least 50% inter-evaluator agreementFor the 60 posts Total of 144 ad impressions 17% of ads picked as relevantFor the topical keywords Total of 162 ad impressions 40% of ads picked as relevant
Result -2X Relevant Impressions
Monday, June 6, 2011
-
Users profile information Interests, hobbies, TV shows.. Non-demographic informationSubmit a postLooking to buy and why (induced noise)Ads that generate interest, captured attention
2. Prole Ads vs. Activity Ads
Monday, June 6, 2011
-
Using profile ads
Total of 56 ad impressions 7% of ads generated interestUsing authored posts
Total of 56 ad impressions 43% of ads generated interest" Using topical keywords from authored posts
Total of 59 ad impressions 59% of ads generated interest
Result -8X Generated Interest
Monday, June 6, 2011
-
User studies small and preliminary, clearly suggest Monetization potential in user activity Improvement for Ad programs in terms of relevant
impressionsEvaluations based on forum, marketplace Verbose content Status updates, notes, community and event
memberships One size may not fit all
To note
Monday, June 6, 2011
-
A world between relevant impressions and click throughs Objectionable content, vocabulary impedance, Ad
placement, network behaviorIn a pipeline of other community effortsNo profile information taken into accountCannot custom send information to Google AdSense
To note
Monday, June 6, 2011
-
SENTIMENT / OPINION MINING
Monday, June 6, 2011
-
Two main types of information we can learn from user-generated content: fact vs. opinionMuch of what we read in social media (e.g., blogs, Twitter, Facebook) is a mix of facts and opinions. For example, " Latest news: Mobile web services not working in #Bahrain and Internet is extremely slow #feb14 {fact}... looks like they "learned" from #Egypt {opinion}"
Content Analysis: Sentiment Analysis/Opinion Mining
Monday, June 6, 2011
-
Sentiment Analysis Motivation
Which movie should I see?
What customers complain about?
Why do people oppose
health care reform?
Monday, June 6, 2011
-
Example: How awful that many #Egyptian artifacts are in danger of
being destroyed. What Zahi Hawass must be thinking #jan25 (read in the
tone of what were YOU thinking
Sentiment Analysis: Tasks
Monday, June 6, 2011
-
Sentiment Analysis: Tasks
Monday, June 6, 2011
-
Sentiment Analysis: Tasks
Classification: overall sentiment polarity: positive/neutral/negativeExample: How awful that many #Egyptian artifacts are in danger of being destroyed.overall polarity is negative Target-specific sentiment polarity: positive/neutral/negative Example: for target "egyptian artifacts", polarity is "negative for target "Zahi Hawass", polarity is "neutral
Monday, June 6, 2011
-
Sentiment Analysis: Tasks
Monday, June 6, 2011
-
Sentiment Analysis: Tasks
Identification & Extraction: opinion, opinion holder, opinion target
Example: opinion="awful", opinion holder="the author", target="egyptian artifacts are in danger"
Opinion="must be thinking", opinion holder="the author", target="Zahi Hawass"
Monday, June 6, 2011
-
Classification: Supervised: labeled training data features, differ from traditional topic classification tasks learning strategies
Unsupervised: lexicon-based approach Bootstrapping
Sentiment Analysis: Approaches
Monday, June 6, 2011
-
Sentiment Analysis: Approaches
Monday, June 6, 2011
-
Sentiment Analysis: Approaches
Identification & Extraction: utilizing the relations between opinion and opinion target, proximity, syntactic dependency, co-occurrence and prepared patterns/rules
Monday, June 6, 2011
-
Sentiment Analysis: From Tweets to polls
Lexicon-based approach for sentiment analysis of tweets:subjective lexicon from OpinionFinder (Wilson et al., 2005)Within topic tweets, count messages containing these positive and negative words defined by the lexicon
corpus: 0.7 billion tweets, Jan 2008 Oct
2009 1.5 billion tweets, Jan 2008 May
2010
Monday, June 6, 2011
-
Sentiment Analysis: From Tweets to polls
subjective lexicon from OpinionFinder (Wilson et al., 2005)Within topic tweets, count messages containing these positive and negative words defined by the lexicon
corpus: 0.7 billion tweets, Jan 2008 Oct
2009 1.5 billion tweets, Jan 2008 May
2010
Monday, June 6, 2011
-
Sentiment Analysis: From Tweets to polls
Within topic tweets, count messages containing these positive and negative words defined by the lexicon
corpus: 0.7 billion tweets, Jan 2008 Oct
2009 1.5 billion tweets, Jan 2008 May
2010
Monday, June 6, 2011
-
Sentiment Analysis: From Tweets to polls
B.OConnor, R.Balasubramanyan, B.R.Routledge, and N.A.Smith. From Tweets to polls: Linking text sentiment to public opinion time series. In Intl.AAAI Conference on Weblogs and
Social Media, Washington,D.C.,2010.
corpus: 0.7 billion tweets, Jan 2008 Oct
2009 1.5 billion tweets, Jan 2008 May
2010
Monday, June 6, 2011
-
Corpus: 2.89 million tweets referring to 24 movies released over a period of three monthsSentiment Analysis Classifier:
DynamicLMClassifier provided by LingPipe linguistic analysis packagethousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
-
Sentiment Analysis Classifier:DynamicLMClassifier provided by LingPipe linguistic analysis packagethousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
-
DynamicLMClassifier provided by LingPipe linguistic analysis packagethousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
-
thousands of workers from the Amazon Mechanical Turk to assignsentiments (positive, negative, neutral) for a large random sample of tweetstrain the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
-
train the classifier using an n-gram model
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
-
Sentiment Analysis: Predicting the Future With Social Media
S. Asur and B.Huberman. Predicting the Future With Social Media. 2010. hMp://arxiv.org/abs/1003.5699
Monday, June 6, 2011
-
Observations:The opinions may not contribute toward the given target (1,2,3,6)The subjectivity and polarity of opinion clues are domain-dependent (5,7)Single words are not enough (4,7,8)
Simple lexicon-based method doesn't work.
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
-
General subjective lexicon Commonly used subjective lexicon + popular slangs learned from
Urban Dictionary
Domain-dependent sentiment lexicon Learned from domain-specic corpus
bootstrapping More than words (word/phrase/paMern)
n-gram + statistical model
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
-
General subjective lexicon Commonly used subjective lexicon + popular slangs learned from
Urban Dictionary
Domain-dependent sentiment lexicon Learned from domain-specic corpus
bootstrapping More than words (word/phrase/paMern)
n-gram + statistical model
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
-
Domain-dependent sentiment lexicon Learned from domain-specic corpus
bootstrapping More than words (word/phrase/paMern)
n-gram + statistical model
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
-
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
-
Sentiment Analysis: Target-specic opinion identication & Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
-
Sentiment Analysis: Target-specic opinion identication &
Classication of Tweets-Unsupervised Approach
Monday, June 6, 2011
-
Sentiment Analysis: Target-specic opinion identication &
Classication of Tweets-Unsupervised Approach
Target-specic opinion identication/extraction Shallow syntactic analysis Rules + Proximity
Monday, June 6, 2011
-
URL Extraction is for Tweets
FourSquare in Facebook, TwiMer
What is it in other mediums/SMS?
Content Analysis: Context Extraction, Utilization
Monday, June 6, 2011
-
ResolutionSemantic Context Relevance
Content Analysis: URL extraction
Monday, June 6, 2011
-
Personality Signals Blogs, Style of WritingPsychometric analysis of contentSample study: Gendered writing styles online
Author Categorization: Using Content to derive additional
People metadata
Monday, June 6, 2011
-
Interesting questions to ask: Who are the most popular people* in the network Who are the most influential people in the network Who are the most active people in the network What are the types of people in communities of the
network Who are the bridges between communities in the network
People Analysis: Using Network to derive People metadata
Monday, June 6, 2011
-
By Link Analysis AlgorithmsHits [K-99]& variants PageRank [BP-97]& variants etc..Links not sufficient! Million Follower Fallacy[C-10]
People Analysis: Inuence
Source : informing-arts
Monday, June 6, 2011
-
People Analysis: Inuence
Monday, June 6, 2011
-
People Analysis: Inuence
Flavor of Context Analysis (activity level)Popularity NOT = Influence! Influence & Passivity[RGAH-10]Interest Similarity TwitterRank: Reciprocity & Homophily [WLJH-10]Klout Score - True Reach, Amplification [Klout]
Monday, June 6, 2011
-
Blogger, Scientist, Journalist,Artist, Trustee, Company X in DomainY.. Multiple types and affiliations!User interest mining Key Phrase Extraction followed by semantic association on
user bio, tweets, lists, favorite posts Twitter Study [BCDMJNRM-09]
People Analysis: User types & Aliation
Source: kahunainstitute.com
Monday, June 6, 2011
-
People Analysis: User types & Aliation
Monday, June 6, 2011
-
Semantic analysis of profile description Web Presence:Use of Web & Knowledge bases
(Wikipedia, Blogs)to build contextfor user types Entity Spotting & Extraction, followed by Semantic
Association and Similarity with user-type context
People Analysis: User types & Aliation
Monday, June 6, 2011
-
People Analysis: Social Engagement
Frequency Distribution Analysis of user activity posting, retweet, reply, mentions, lists etc.
Source: http://www.syscomminternational.com/
Monday, June 6, 2011
-
Network Analysis
Interesting questions to ask:
How communities form around topics- growth & evolution
What are the eects of presence of inuential participants in the communities
What are the eects of content nature (or sentiment, opinions) owing in network on the community life
What is the community structure: degree of separation and sub-communities
Foundation of network: NodesConnections/Relationships
Monday, June 6, 2011
-
Network Analysis: Methods
Source: http://www.kudos-dynamics.com/
Monday, June 6, 2011
-
Network Analysis: Methods
Source: http://www.kudos-dynamics.com/
Network Structure metricsCentrality, Connected Component, Avg.
Degree, Clustering Coecient, Avg. Path Length, Bridge, Cohesion, Prestige, Reciprocity
Important Literature: [AB-02, WS-98, BW-00; NW-06, WF-92, MW-10]
Monday, June 6, 2011
-
Community Discovery, growth, evolution Based on relationship types (e.g., signed network),
geography/location based etc. Hierarchical clustering algorithms Top-down, bottom-upModularity Maximization [NW-06]Algorithms comparison survey [B-06]
Network Analysis: Algorithms
Monday, June 6, 2011
-
Graph Partitioning & TraversalBest time-complexity & reachabilityFollow Greedy paths K-way multilevel Partitioning , Bron-Kerbosch, K-plex, K-core or N-cliques, DFS, BFS,
MST
Network Analysis: Algorithms
"We dream in Graph and We analyze in Matrix-
Barry Wellman, INSNA
Monday, June 6, 2011
-
Network Analysis: Methods
Network Modeling Approaches Random graph model (Erdos-Renyi model) Small-world model(Small World Phenomenon) Scale-free model(led to Power-Law degree distribution) Social Network Analysis methods Centrality (Degree, Eigenvector, Betweenness, Closeness) Clusters (Cliques and extensions, Communities)
Source: http://www.kudos-dynamics.com/
Monday, June 6, 2011
-
Information Flow: Diffusion Maximizing Spread (Opinion, Innovation, Recommendation) Outbreak Detection (e.g., disease)Social Network: No info about user action Understanding dynamics is challenging!Power Law distribution [LAH-07]Factors impacting flow: Sampling strategy, user Homophily, content nature
[CLSCK-10, NPS-10]
Network Analysis: Diusion & Homophily
Monday, June 6, 2011
-
Querying
Monday, June 6, 2011
-
(Network WorkBench)NWBTruthy Graph-toolOrangePajekTuliphttp://en.wikipedia.org/wiki/social_network_analysis_software
Analysis & Visualization Tools
Source: hMp://truthy.indiana.edu/
Monday, June 6, 2011
-
Event Detection
Monday, June 6, 2011
-
Citizen Sensing in Real-time
Monday, June 6, 2011
-
People cant wait forInformation500 years ago
Single life time20 years ago
Next day or two Television,News papers
Presently
Minutes are notconsideredfast enough Digital media,Social media
Real-Time Motivation
Monday, June 6, 2011
-
Is Real-Time the future of Web?Social Media for Real-Time Web Disaster Management
Ushahidi Real-Time Markets
Examples Brand Tracking
Twarql Movie reviews
Real-Time Social Media
Monday, June 6, 2011
-
Scenario
The GuardianFeb 2010
Monday, June 6, 2011
-
Scenario
The GuardianFeb 2010
Monday, June 6, 2011
-
Scenario
Journalist
The GuardianFeb 2010
Monday, June 6, 2011
-
Information Overload Can we aggregate, organize and collectively analyze data
Real Time Can we deliver the data as it is generated
Challenges
Monday, June 6, 2011
-
Expressive description of Information need
Using SPARQL (Instead of traditional keyword search)Flexibility on the point of view
Ability to "slice and dice" the data in several dimensions: thematic, spatial, temporal, sentiment etc..
Streaming data with Background Knowledge
Enables automatic evolution and serendipityScalable Real-Time delivery
Using sparqlPuSH (SFSW'10)
A Semantic Web Approach
Monday, June 6, 2011
-
Concept Feed
Monday, June 6, 2011
-
Architecture
Monday, June 6, 2011
-
Social Sensor Server
Monday, June 6, 2011
-
Named Entity Recognition 2 Million Entities from DBPedia Load as Trie for efficiency N-grams matched Example: Obama, Barack Obama
Metadata Extractions (Social Sensor Server)
Monday, June 6, 2011
-
URL, HashTag Extraction Regex extraction Resolution URL Resolution: Follows http redirects for resolution HashTag Resolution:Tagdef, Tagal,WTHashTag.com
Metadata Extractions (Social Sensor Server)
Monday, June 6, 2011
-
Metadata Extractions (Social Sensor Server)
Monday, June 6, 2011
-
Other Metadata provided by Twitter User profile: User Name, Location, Time etc.. Tweet: RT, reply etc..
Metadata Extractions (Social Sensor Server)
Monday, June 6, 2011
-
RDF Annotation Common RDF/OWL Vocabularies FOAF -(foaf-project.org) Friend of aFriend SIOC- (sioc-project.org) Semantically Interlinked
Online Communities
OPO -(online-presence.net) Online PresenceOntology MOAT -(moat-project.org) Meaning Of A Tag
Structured Data(Social Sensor Server)
Monday, June 6, 2011
-
Structured Data(Social Sensor Server)
Monday, June 6, 2011
-
A snippet of the annotation
rdf:type sioct:MicroblogPost ; sioc:content Fingers crossed for the upcoming #hcrvote
sioc:hascreator ; foaf:maker ;
moat:taggedWith dbpedia:Healthcare_reform . geonames:locatedIn
Dbpedia:Ohio .
Structured Data(Social Sensor Server)
Monday, June 6, 2011
-
Semantic Publisher
Monday, June 6, 2011
-
Virtuoso to store triplesQueries formulated by the users are storedSPARQL protocol over the HTTP to access rdf from the storeCombine data from tweet with the background knowledge in the rdf store
Semantic Publisher
Monday, June 6, 2011
-
Application Server & Distribution Hub
Monday, June 6, 2011
-
Distribution Hub PUSH Model - Pubsubhubbub protocol Pushes the tweets to the Application Server
Application Server Delivers data to the Clients RSS Enable Concept feeds
Application Server & Distribution Hub
Monday, June 6, 2011
-
?competitor
?category
?tweet dbpedia:IPad
moat:taggedWith
skos:subjectskos:subject
skos:subject
Background Knowledge (e.g. DBpedia)
@anonymizedLorem ipsum bla bla this is an example tweet
Brand Tracking - Example
Monday, June 6, 2011
-
?competitor
?category
?tweet dbpedia:IPad
moat:taggedWith
skos:subjectskos:subject
skos:subject
Background Knowledge (e.g. DBpedia)
@anonymizedLorem ipsum bla bla this is an example tweet
Brand Tracking - Example
Monday, June 6, 2011
-
?competitor
?category
?tweet dbpedia:IPad
moat:taggedWith
skos:subjectskos:subject
category:Wi-Fi category:Touchscreen
skos:subject
Background Knowledge (e.g. DBpedia)
@anonymizedLorem ipsum bla bla this is an example tweet
Brand Tracking - Example
Monday, June 6, 2011
-
?competitor
?category
?tweet dbpedia:IPad
moat:taggedWith
skos:subjectskos:subject
category:Wi-Fi category:Touchscreen
skos:subject
Background Knowledge (e.g. DBpedia)
@anonymizedLorem ipsum bla bla this is an example tweet
HPTabletPCIPhone
Brand Tracking - Example
Monday, June 6, 2011
-
1242 Articles from Nytimes
Around 800,000 tweets
Monday, June 6, 2011
-
1242 Articles from Nytimes
Around 800,000 tweets
President Obama lays out plan for
Health care reform in Speech to Joint
Session of Congress (10th Sept
Timeline.com)
Monday, June 6, 2011
-
1242 Articles from Nytimes
Around 800,000 tweets
President Obama lays out plan for
Health care reform in Speech to Joint
Session of Congress (10th Sept
Timeline.com)
Obama taking an active role in Health talks in pursuing his proposed overhaul
of health care system. (13th Aug
Nytimes)Monday, June 6, 2011
-
Twarql on Linked Open Data
Monday, June 6, 2011
-
Twarql on Linked Open Data
Monday, June 6, 2011
-
Emerging Research Areas
Monday, June 6, 2011
-
Reasons for spamming include: Gaining Popularity Use of popular topic related keywords (e.g. hashtags of
trending topics) to propagate something off topic.
Launching malicious attacks Phishing attacks, virus, malware etc. Misleading the masses Propagating false information [MM-10].
Spam in Social Networks
Monday, June 6, 2011
-
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website.
Monday, June 6, 2011
-
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website.
Monday, June 6, 2011
-
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website.
Monday, June 6, 2011
-
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
-
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
-
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
-
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
-
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
-
Spam in Social Networks
Gaining popularity using trending keywords:This tweet uses #Cairo but refers to a fashion website. Egypt
Protests
Monday, June 6, 2011
-
Spam detection Content-based features ContentSize,URL type, spam words
Metadata-based features Account information, behavior.
Network-based features Provenance. (e.g. content from a reliable source)
Spam in Social Networks
Monday, June 6, 2011
-
Reputation,Policy,Evidence, and Provenance used to derive trustworthiness.Illustrative examples of online cues used for trust assessment. Wikipedia: article size, number of references, author, edit
history, age of the article, edit frequency etc. Product Reviews: number of helpful, very helpful ratings,
author expertise, sentiments in comments received for a review etc.
Trust in Social Networks
Monday, June 6, 2011
-
We propose trust ontology[AHTS-10] that Captures semantics of trust. Enables representation and reasoning with trust.Semantics of Trust specifies, for a given trustor and trustee, the following features. Type - Type of trust relationship. Scope - Context of the trust relationship. Value - Quantifies the trust relationship.
Trust in Social Networks
Monday, June 6, 2011
-
Gleaning primitive (edge) trust Trust value between two nodes is quantified using
numbers. E.g., [0,1] or [-1,1] or partial ordering[TAHS-09].Gleaning composite (path) trust Propagation via chaining and aggregation (transitivity)Some popular algorithms for trust computation Eigentrust, Spreading Activation, SUNNY etc.
Trust in Social Networks
Monday, June 6, 2011
-
Machine sensor observations are quantitative in nature, while human observations can be both qualitative and quantitative.Benefits of combining observations from humans and machine sensors Complementary evidence. Corroborative evidence
Integrating Social And Sensor Networks
Monday, June 6, 2011
-
Applications of integrating heterogeneous sensor observations Situation Awareness by using human observations to
interpret machine sensor observations. Enhancing trustworthiness using corroborative evidence.
Integrating Social And Sensor Networks
Monday, June 6, 2011
-
Instant Discovery: Geo-tagging and location-aware services, in combination with search, have made discovery a two-way street.
Compressed Expression: Mobile makes social networking even more compelling
Outsourced Memory: Cloud-based servers to store all of their mobile applications and databases
Mobile Social Computing
Monday, June 6, 2011
-
Compressed Expression: Mobile makes social networking even more compelling
Outsourced Memory: Cloud-based servers to store all of their mobile applications and databases
Mobile Social Computing
Monday, June 6, 2011
-
Outsourced Memory: Cloud-based servers to store all of their mobile applications and databases
Mobile Social Computing
Monday, June 6, 2011
-
Mobile Social Computing
Monday, June 6, 2011
-
Mobile Social Computing
Monday, June 6, 2011
-
Mobile Social Computing
Automated Decisions: Smart apps helps to make faster decisions or even apps makes decisions for usPeer Power: Mobiles can create social movements based on peer influence
Monday, June 6, 2011
-
Personalized Branding: advertising are rapidly becomingpersonalized based onindividual's needs and preferencesMobiles in social development becoming an integral part of development Coordination in disaster situations Health care delivery, especially in developing countries Elections and other forms of political expression
Mobile Social Computing (Cont.)
Monday, June 6, 2011
-
Research Application: Twitris
Monday, June 6, 2011
-
1. Information OverloadMultiple events around usWHAT to be aware ofMultiple Storylines aboutsame event!!
Twitris - Motivation
Monday, June 6, 2011
-
2. Evolution of Citizen Observation with location and time
Twitris - Motivation
Monday, June 6, 2011
-
3. Semantics of Social perceptions
What is being said about an event (theme) where (spatial) When (temporal )
Twitris lets you browse citizen reports using social perceptions as the fulcrum
Twitris - Motivation
Monday, June 6, 2011
-
Facilitates understanding of multi-dimensional social perceptions over SMS, Tweets, multimedia Web content, electronic news media
Twitris: Semantic Social Web Mash-up
Monday, June 6, 2011
-
Twitris: Architecture
Monday, June 6, 2011
-
Twitris: Functional Overview
Monday, June 6, 2011
-
Twitris: Functional Overview
Monday, June 6, 2011
-
Twitris: Event Summarization 1
Monday, June 6, 2011
-
Sentiment Analysis using statistical and machine learning techniques
Twitris: Event Summarization 2
Monday, June 6, 2011
-
Entity-relationship graph
using semantically annotated DBpedia entities mentioned in the tweets
Twitris: Event Summarization 3
Monday, June 6, 2011
-
http://twitris.knoesis.org/
http://knoesis1.wright.edu/sidfot/
Twitris: Demo, Quick Show
Monday, June 6, 2011
-
Twitris: On going work
Monday, June 6, 2011
-
Domain models to enhance understanding of the content
Twitris: Knowledge-Enabled Computing
Monday, June 6, 2011
-
Great role in military and NGOrescue operations during emergencies:Haiti and Chile Earthquakes
Twitris: Coordination
Monday, June 6, 2011
-
Coordinating needs and resources in disaster situation Analyze SMS and Web reports from disaster location Use domain models for efficient and timely coordination
Twitris: Coordination
Monday, June 6, 2011
-
Modeling relationships between social behavior,roles, social and cultural values, etc.
Twitris: Socio-Cultural-Behavior Model as Lens
Monday, June 6, 2011
-
We simply do not have enough genes to program the brain fully in advance, we must work together, extending and supporting our own intelligence with social prosthetic systems that make up for our missing cognitive and emotional capacities:Evolution has allowed our brains to be configured during development so that we are plug compatible with other humans, so that others can help us extend ourselves.- Harvard "Group Brain Project"
Collaboration
Monday, June 6, 2011
-
Open Source Linux,Apache, ...Social Networks Facebook, Twitter, ...Crowd Sourcing Wikipedia, Kiva, Ushahidi, Kiirti, SwiftRiver, Sahana...Collaborative Governance Peer-to-Patent, ...
Beginnings
Monday, June 6, 2011
-
http://gomadam.org/tutorial
@namelessnerd
Monday, June 6, 2011
-
Facebook + Twitter Iran post-election protests Tunisia,Egypt, Libya, Bahrain, ... Ushahidi Kenya Violence India, Lebanon, Afghanistan, and Sudan elections Haiti Earthquake Pakistan Floods
Popular Initiatives
Monday, June 6, 2011
-
Kiirti BBMP election monitoring Bangalore AutoWatch
Popular Initiatives
Monday, June 6, 2011
-
FixOurCity allows citizens to report, view and discuss civic issues in their locality.
FixOurCity Process Flow
Monday, June 6, 2011
-
Built on top of FixMyCity open-source codebaseStage I Report by Area/Ward and Street Integration with Google Map Displays Ward member name/contact details Select category of issue, description and severity Confirmation through email to avoid misuse
FixOurCity Backend
Monday, June 6, 2011
-
Stage II/III Normalize incoming reports to official wards and
categories Integration with Corporation website to allow auto-
forwarding and updating of reports
FixOurCity Backend
Monday, June 6, 2011
-
Information Collection: SMS (FrontlineSMS, Clickatell), Email, WebVisualization/Interactive Mapping: Timeline, Category, Geo-spatialAlerts: Geo-spatialAdmin: User Management, Report Moderation / Creation, Site Statistics
Ushahidi Features
Monday, June 6, 2011
-
Enables filtering and verification of real-time data from channels like Twitter, SMS, Email and RSS feeds.
SwiftRiver Architecture - I
Monday, June 6, 2011
-
Kiirti allows you to set up your own instance of the Ushahidi Platform without having to install it on your own web server. And, it provides pre-integrated Voice and SMS reporting capabilities within India.
Kiirti Features
Monday, June 6, 2011
-
Kiirti - Flywheel of Engagement
Monday, June 6, 2011
-
Sahana: a Free and Open Source Disaster Management system. A web based collaboration tool that addresses the common coordination problems during a disaster between Government groups, the civil society (NGOs) and the victims themselves.
Sahana Features
Monday, June 6, 2011
-
Sahana Features
Monday, June 6, 2011
-
Requests Management: Tracks requests for aid and matches them against donors who have pledged aid.Volunteer Management: Manage volunteers by capturing their skills, availability and allocation.
Sahana Features
Monday, June 6, 2011
-
Volunteer Management: Manage volunteers by capturing their skills, availability and allocation.
Sahana Features
Monday, June 6, 2011
-
Sahana Features
Monday, June 6, 2011
-
Sahana Features
Monday, June 6, 2011
-
Missing Persons Registry: Report and Search for Missing Persons.Disaster Victim Identification.Shelter Registry- Tracks the location, distribution, capacity and breakdown of victims in Shelters.
Sahana Features
Monday, June 6, 2011
-
Hospital Management System- Hospitals can share information on resources & needs.Organization Registry- "Who is doing What & Where". Allows relief agencies to coordinate their activities.Ticketing- Master Message Log to process incoming reports & requests.Delphi Decision Maker- Supports the decision making of large groups of Experts.
Sahana Features
Monday, June 6, 2011
-
Organization Registry- "Who is doing What & Where". Allows relief agencies to coordinate their activities.Ticketing- Master Message Log to process incoming reports & requests.Delphi Decision Maker- Supports the decision making of large groups of Experts.
Sahana Features
Monday, June 6, 2011
-
Ticketing- Master Message Log to process incoming reports & requests.Delphi Decision Maker- Supports the decision making of large groups of Experts.
Sahana Features
Monday, June 6, 2011
-
Delphi Decision Maker- Supports the decision making of large groups of Experts.
Sahana Features
Monday, June 6, 2011
-
Sahana Features
Monday, June 6, 2011
-
Sahana Features
Monday, June 6, 2011
-
Sahana Features
Monday, June 6, 2011
-
Mapping- Situation Awareness & Geospatial Analysis.Messaging- Sends & Receives Alerts via Email & SMS.Document Library- A library of digital resources, such as Photos & Office documents.
Sahana Features
Monday, June 6, 2011
-
Peer To Patent is a historic initiative by the United States Patent and Trademark Office (USPTO) that opens the patent examination process to public participation for the first time. Peer to Patent is an online system that aims to improve the quality of issued patents by enabling the public to supply the USPTO with information relevant to assessing the claims of pending patent applications.
Peer to Patent
Monday, June 6, 2011
-
Twitris 2.0, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris 2.0 addresses challenges in large scale processing of social data, preserving spatio-temporal-thematic properties.
Twitris Architecture
Monday, June 6, 2011
-
Online Dispute Resolution 30M+ pending cases in India's courtsPublic Policy ReviewsCrisis ManagementEffective Local Governance
Future Possibilities
Monday, June 6, 2011
-
http://www.nascio.org/events/2009Midyear/documents/NASCIO-KeynoteNoveck.pdfhttp://citizensensing.posterous.com/[MM-10] Eni Mustafaraj, Panagiotis Metaxas, From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search, In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line (April 2010).[AHTS-10] Pramod Anantharam, Cory A. Henson, Krishnaprasad Thirunarayan and, Amit P. Sheth, 'Trust Model for Semantic Sensor and Social Networks: A Preliminary Report', National Aerospace & Electronics Conference (NAECON), Dayton Ohio, July 14-16th, 2010.[TAHS-09] K. Thirunarayan, Dharan K. Althuru, Cory A. Henson, and Amit P. Sheth, 'A Local Qualitative Approach to Referral and Functional Trust,' In: Proceedings of the The 4th Indian International Conference on Artificial Intelligence (IICAI-09), pp. 574-588, December 2009.
References
Monday, June 6, 2011
-
B.OConnor, R.Balasubramanyan, B.R.Routledge, and N.A.Smith. From Tweets to polls: Linking text sentiment to public opinion time series.In International AAAI Conference on Weblogs and Social Media, Washington,D.C.,2010.Sitaram Asur and Bernardo A.Huberman. Predicting the Future With Social Media. 2010. http://arxiv.org/abs/1003.5699A. Sheth, Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A comprehensive path towards event monitoring and situational awareness, February 17, 2009M. Nagarajan et al., Spatio-Temporal-Thematic Analysis of Citizen-Sensor Data - Challenges and Experiences, Tenth International Conference on Web Information Systems Engineering, Oct 5-7, 2009, PolandDaniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth, Multimodal Social Intelligence in a Real-Time Dashboard System to appear in a special issue of the VLDB Journal on 'Data Management and Mining for Social Networks and Social Media', 2010
References
Monday, June 6, 2011
-
A. Sheth, C. Thomas, and P. Mehra, Continuous Semantics to Analyze Real-Time Data, IEEE Internet Computing, November-December 2010, pp. 80-85[NPS-10] M. Nagarajan, H. Purohit, and A. Sheth. A Qualitative Examination of Topical Tweet and Retweet Practices, 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010[RGAH-10] D. Romero, W. Galuba, S. Asur, and B. Huberman. Influence and Passivity in Social Media. Arxiv preprint, arXiv:1008.1253, 2010[LLDM-10] J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1):29{123, 2009.[CHBG-10] M. Cha, H. Haddadi, F. Benevenuto, and K. Gummadi. Measuring user influence in twitter: The million follower fallacy. In ICWSM'04, 2010.[BP-98] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, Vol 30, 1-7, 1998.
References
Monday, June 6, 2011
-
[K-99] Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46 (5): 604 -632, 1999.[AB-02] R. Albert and A.L. Barabasi. Statistical Mechanics of Complex Networks. Rev. Modem Physics, vol. 74, no. 1, pp. 47-97, 2002.[WLJH-10] Jianshu Weng and Ee-Peng Lim and Jing Jiang and Qi He. TwitterRank: nding topic-sensitive influential twitterers. WSDM, 2010.[BCDMJNRM-09] N. Banerjee, D. Chakraborty, K. Dasgupta, S. Mittal, A. Joshi, S. Nagar, A. Rai, and S. Madan. User interests in social media sites: an exploration with micro-blogs. CIKM '09.[RCD-10] A. Ritter, C. Cherry, and B. Dolan. 2010. Unsupervised modeling of Twitter conversations. InHuman Language Technologies: ACL (HLT '10).[WS-10] D.J. Watts; S.H. Strogatz. Collective dynamics of 'small-world' networks. Nature 393 (6684): 40910, 1998
References
Monday, June 6, 2011
-
[NW-06] M. E. J. Newman, D. J. Watts The structure and dynamics of network, Princeton University Press, 2006[WF-92] Wasserman & Faust, Social Network Analysis, 1992[EK-10] D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010[MW-10] A. Marin and B. Wellman. Handbook of Social Network Analysis, 2010[B-06] H. Balakrishnan. Algorithms for Discovering Communities in Complex Networks. Ph.D. Dissertation. University of Central Florida, Orlando, FL, USA. Advisor(s) Narsingh Deo. 2006[CLSCK-10] M. D. Choudhury, , Y-R. Lin, H. Sundaram, K. S. Candan, L. Xie, A. Kelliher. How Does the Sampling Strategy Impact the Discovery of Information Diffusion in Social Media?. ICWSM 2010[LAH-07] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of viral marketing. ACM Trans. Web 1, 1, Article 5, May 2007.
References
Monday, June 6, 2011