Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · •...
Transcript of Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · •...
![Page 1: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/1.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=onviaSocialMediaMining
SearchingforCredibleInforma3onviaSocialMediaMining
HuanLiu
DataMiningandMachineLearningLabArizonaStateUniversity
![Page 2: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/2.jpg)
ThankstoFormerandCurrentPhDStudentsofDMML
• RezaZafarni,AsstProf,SyracuseU• XiaHu,AsstProf,TexasA&MU• MagdielGalan,Intel• ShamanthKumar,CastlightHealth• PritamGundecha,IBMResAlmaden• JiliangTang,AsstProf,MSU• HuijiGao,LinkedIn• AliAbbasi,MachineZone• SalemAlelyani,AsstProf,KingKhalidU• XufeiWang,LinkedIn• GeoffreyBarbier,AFRL• LeiTang,Clari• ZhengZhao,Google• Ni3nAgarwal,ChairProf,UALR• SaiMoturu,PostDoc,MITMediaLab• LeiYu,AsscProf,BinghamtonU,NY
• RobertTrevino,AFRL• YunzhongLiu,LeEco,US• SomnathShahapurkar,FICO• FredMorstaXer• IsaacJones• SuhasRanganath• SuhangWang• TahoraNazer• JundongLi• LiangWu• GhazalehBeigi• KaiShu• Jus3nSampson
![Page 3: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/3.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
False,Misleading,andInaccurateInforma3on
• Spam• Fraud• FakeNews• Rumor• UrbanLegend• Gossip• Informa3oncanbe:true,false,oruncertain• BigData:6th`V’EveryoneShouldKnowAbout
– Vulnerability– Socialmediahasall6V’s
3
Disinforma*on(purposeful)
Misinforma*on(uninten*onal)&Disinforma*on
![Page 4: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/4.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
SpaminSocialMedia
• Unwantedcontentinforma3ongeneratedbyspammingusersascomments,chat,fakerequeststhatareusedtopromoteproductsorspreadmaliciousinforma3on.
4
– Fakereviews – Maliciouslinks – Fakerequests
![Page 5: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/5.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Fraud(Scam)inSocialMedia
• Asocialmediafraudisdefraudingand/ortakingadvantageofsocialmediauserswiththeuseofsocialmediaservices.
5
– Swindlemoney – Stealpersonalinforma3on
![Page 6: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/6.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
FakeNewsWebsitesandSocialMedia
• Fakenewswebsitesdeliberatelypublishhoaxes,propaganda,anddisinforma3ontodrivetrafficexacerbatedbysocialmedia
• Fakenewscanaffectdomes3cpoli3cs,inflamedbysocialmedia,duetolimitedresourcestochecktheveracityofclaims– Easyto“like”and“share”,buttakingefforttocheck,albeitjustafewclicksaway(effortasymmetry)
• Fakenews+SocialmediaCyberwarfare
6
![Page 7: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/7.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
FakeNewsIsRampantinSocialMedia
• Fakenewsspreadsonsocialmedia– Spreadsrapidly
– Evolvesfast
7
• Crossovertoothernetworks • Modifiedcontent
![Page 8: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/8.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
FakeNewsCanCauseRealHarm
• Pizzagate:storiesoffakenewsfromRedditleadtorealshoo3ng
• Afalserumorerased$136billionin10minutes
8
Fake News Onslaught Targets Pizzeria as Nest of Child-Trafficking, New York Times, 2016
![Page 9: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/9.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Rumors
• Wikipedia:“Atalltaleofexplana3onscircula3ngfrompersontopersonandpertainingtoanobject,event,orissueinpublicconcern”.
• Rumorscanbetrueorfalse.
9
– Falserumor
![Page 10: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/10.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
GossipinSocialMedia
• Gossipisidlechatandrumoraboutpersonaland/orprivateaffairsofothers.
• Socialmediaallowsforfaster,alargerscaleof,andmoreconvenientidlechat.
10
– Celebrity:“ObamasmovingtoAsheville”
– Friends:People“aremuchmorelikelytogossipwhenastoryunitesafamiliarpersonwithaninteres3ngscenario.“
FamiliaritywithInterestBreedsGossip:Contribu3onsofEmo3on,Expecta3on,andReputa3on, PLoS ONE, 2014
![Page 11: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/11.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
UrbanLegendinSocialMedia
• Fic3onalstorieswithmacabreelementsrootedinlocalpopularculture.– Onsocialmedia,itdevelopsfasterandspreadswider
• Insummary,itisimpera3vetostudycredibilitychecking
11
• UrbanlegendofFengshui
![Page 12: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/12.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
OnCredibilityChecking
• Studyingdifferenttypesofcredibilityandtheneedfordifferentdataandinforma3onsourcesincredibilitychecking– Wedon’thavetoreinventwheelsinsocialmediaminingandcan“standontheshoulderofgiants”
– Machinesdifferfromhumansincredibilitychecking
• AboutCredibilityChecking– TypesofCredibility(socialsciences,psychology,CS)– AspectsofCredibilityChecking– ComponentsofCredibilityCheckinginSocialMedia
12
![Page 13: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/13.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
FourTypesofCredibility
• Presumedcredibility(generalassump3ons)– “Ourfriendsusuallytelltruth”
• Reputedcredibility(basedonthirdpar3es’reports)– Forinstance,pres3giousawardsorofficial3tles
• Surfacecredibility(simpleinspec3on)– “Peoplejudgeabookbyitscover”
• Experiencedcredibility(first-handexperience)– “Timecantell”(路遥知马力,日久见人心)
• Anynewtypetoexploreinsocialmedia? 13
![Page 14: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/14.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
AspectsofCredibilityChecking(CC)
• CanweturnCCintoaproblemeasierforusersorAMTurks(withoutmuchexper3se)tocheck?
• IssuesaboutCredibilityCheckingMeasures– Reputa3onandHistory(3me)– AccuracyandRelevance– TransparencyandIntegrity(consistency)– Responsefromindependentsources(consistency)
• Implica3onorimpactassessment– Noteverypieceoffakenewsisdisastrous– “Warnornottowarn”:howtobalance?
14
![Page 15: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/15.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
News/Post
Fake
Yes NoUncertain
• Recipients
• Senders
• Sourceofinforma3on
• Content
• Networkcontext• Crowdsourcing(fact-checkingsites,e.g.,Snopes)• Groundtruth(mul3faceted,goldstandard)
Exper=se,experienceBackground,occupa=on
Reputa=onLengthofonlinepresenceSocialnetworks
ProvenanceReputa=on,Cura=on/Edi=ngLength
Wri=ngstyleTopicsURLsMul=media
Topicthread(Outlierdetec=on)RetweetsRepliesComments
ComponentsinCredibilityCheckinginSocialMedia
15
![Page 16: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/16.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
SearchingforCredibleInforma=on
16
CredibleData
Spam
Bots(automa=callygeneratedcontent)
FakeNews
Rumor
• AUniqueChallenge– Groundtruth
• Addi3onalChallenges– Credibilityverifica3on– Dynamicchange– Timeliness
• Alterna3veApproaches– RumorDetec3on– SpamDetec3on– BotDetec3on– InferringDistrust
![Page 17: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/17.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
UsingSocialMediaforCredibilityChecking
• VelocityandVolume– 6,000tweetspersecond,5millionperdayonTwiXer– 55millionstatusand300millionphotosperdayonFB
• Variety– Geo-spa3al,textual,pictorial,temporal,socialdimensions– Crossmodality(e.g.,geotaggedpictures)
• Veracity– Truthfulnessandaccuracyofinforma3on
• Usebigdata,mul3-sourceinfo,andsocialnetworkstocompensateforlackofexper3se(以其之矛还其之盾)
17
![Page 18: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/18.jpg)
18
Adecentbreakdownofallthingsrealandfakenew
s.hX
p://imgur.com
/7xHaUXf
![Page 19: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/19.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
RumorDetec=on
• Rumor:unverifiedandrelevantinforma3onthatcirculatesinthecontextofambiguity.
• Goal:detec3ngemergingrumorswithminimuminforma3onasearlyaspossible– Ifinterven3onisnotfeasible,getearlywarningorprepared
• Challenges:– Howtoovercomethelackofinforma3oninasingletweet?– Howtodetectrumorsintheirforma3vestage?
19
![Page 20: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/20.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
InsufficientInforma=oninaSingleTweet
• Asingletweetcouldbedamaging,butcontainsliXleinforma3onw/ocontextfordetec3on
• Treatbatchesoftweetsas“conversa3ons”• Basedonkeywordsimilari3es• Basedonreplychains
20
...
1to9tweets 10+tweets
PointofAcceptableAccuracy
• Aggregateconversa3ons• Sharedhashtags• Commonlinks• Cosinesimilarity
![Page 21: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/21.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Detec=onofEmergingRumors
• Emergentdetec3on-linkthefirsttweetinarumorwiththosealreadyposted
• Standardrumorclassifica3onsarenoteffec3veforsmallconversa3ons– Lackofnetworkandsta3s3caldata– Datasparsityissues
• Implicitlinkingworkseffec3velyfordetec3ngsmallrumorcascades
21
![Page 22: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/22.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
BotDetec=on
• Bots– Innocuous:relayinforma3onfromofficialsources– Malicious:spreadrumorsandfalseinforma3on
• Goal:RemovebotsfromsocialmediadatawithhighRecall– WHY?
• Challenges– Acquiringgroundtruth– IncreasingRecallwithoutsignificantlyreducingPrecision
22
![Page 23: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/23.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
BotsinSocialMedia
• BotsonTwiXer:– TwiXerclaims5%of230Musersarebots.– Onestudyfound20Mbotaccounts=9%**.– 24%ofalltweetsaregeneratedbybots***.
• 5-11%ofFacebookaccountsarefake****.
*hXp://blogs.wsj.com/digits/2014/03/21/new-report-spotlights-twiXers-reten3on-problem/**hXp://www.nbcnews.com/technology/1-10-twiXer-accounts-fake-say-researchers-2D11655362***hXps://sysomos.com/inside-twiXer/most-ac3ve-twiXer-user-data****hXp://thenextweb.com/facebook/2014/02/03/facebook-es3mates-5-5-11-2-accounts-fake/ 23
![Page 24: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/24.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
FindingGroundTruth
• ThreestatesofaTwiXeruser:– Ac3ve– Suspended– Deleted
• Idea:– Usethesestatesas
labels– Twosnapshotsof
eachuseristaken
24
Suspended
Deleted
Ac3ve
Ini=alCrawl• Findsseedsetofusers.• CrawlsProfile,Network,...
StatusonTwiXerasalabelingmechanism
![Page 25: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/25.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
GroundTruth-Honeypots
• Actasobviousbotaccounts• AXractotherbotaccounts• Botsareiden3fiedwhentheyfollowouraccount• Assump=on:Realusersdonotfollowbots
25
![Page 26: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/26.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Honeypots-Logic
• Post“Luring”Content– Postcontentthatwillbeseen– trendingtopics,hashtags,
“famous”tweets...• MaintainNetwork
Connec=ons– “Followback”,Retweets– Famebegetsfame
• PromoteOtherHoneypots– Retweeteachother’stweets– Men3oneachother
HoneypotAccounts
ChooseHoneypot,
h
RetweetRandomHoneypot
10%
SampleRandomTweet,t
90% hretweets
t
30%
hcopiest70%
Recordh’snewfriends
Wait10s
Follownew
friends
26
![Page 27: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/27.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
BoostOR
• BasedonAdaBoost• TrytoincreaseRecallwithoutdras3cdecreaseinPrecision
• Itera3velyupdatetheweightofinstances:– Unchangedifcorrectlyclassified– Decreasediffalsenega3ve– Increasediffalseposi3ve
27
![Page 28: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/28.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Trust-DistrustPredic=on
• Goal– Trustanddistrustrela3onscanplayanimportantroleinhelpingonlineuserscollectreliableinforma3on
– Findingtrustworthyusersandreliableinforma3onisofsignificantimportance
– Howtopredicttrustrela3onsbetweenusers?
• Challenges– Trustrela3onsareextremelysparse– Distrustrela3onsareevensparserthantrustones– Findingsubs*tutefeaturesindica3veoftrustanddistrust
28
![Page 29: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/29.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
TrustandEmo=ons
• Accordingtopsychology,user’semo3onscanbestrongindicatorsoftrustanddistrustrela3ons
• Emo3onalinforma3onismoreavailablethanthatoftrust/distrust
• Thereexistsacorrela3onbetweenemo3onsandtrust/distrustrela3ons
29
![Page 30: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/30.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
ModelingEmo=onalInforma=on
• Userswithposi3ve(nega3ve)emo3onsaremorelikelytoestablishtrust(distrust)rela3ons
• Userswithhighposi3ve(nega3ve)emo3onstrengthsaremorelikelytoestablishtrust(distrust)
• TheEmo3onalTrustDistrustframeworkETD– Low-rankmatrixfactoriza3on
– Emo3onalinforma3onregulariza3on
30
![Page 31: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/31.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
StudyingBiasinSocialMediaData
• TwiXersharesitsdata– “Firehose”feed-100%-costly– “StreamingAPI”feed-1%-free
• Weusuallyobtaindataviasampling– IsthesampleddatafromtheStreamingAPIrepresenta3veofthetrueac3vityonTwiXer’sFirehose?
• Challenges– Howtodetermineifthesampleisbiasedwhenwedonothaveaccesstothewholedata?
– Howtoobtainanunbiasedsample?
31
![Page 32: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/32.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Twicer’sStreamingAPIvs.Firehose
• DatafromFirehoseandStreamingAPIhasbeencollectedforspecificperiodof3metoperformanalysis
• Morethan90%ofallgeotaggedtweetsareavailableviaStreamingAPIandthereisnotsignificantdifferenceinloca3ondistribu3on
• Basedonin-degreecentralityandbetweennesscentralityinuser-userretweetnetworks,theStreamingAPIfinds~50%ofthekeyusers
32
![Page 33: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/33.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Mi=ga=ngBiasinTwicer’sStreamingAPI
CanwefindbiaswithouttheFirehose?
Es3ma3ngBiasfromStreamingAPI:– ObtaintrendofhashtagfromSampleAPIandStreamingAPI
– BootstrapSampleAPItoobtainconfidenceintervals
– MarkregionswhereStreamingAPIisoutsideofconfidenceintervals
Mi3ga3ngBias:– Leveragemul3plecrawlerstomaximizedataforeachquery
– RoundRobinSpliyng
33
![Page 34: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/34.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Time-Cri=calInforma=oninCrisisResponse
• Socialmediaisusedtorequestforimmediateassistanceduringcrisis
• Time-cri3calpostsdemandimmediateaXen3on• Addressingthesequeriespromptlycanhelpinemergencyresponse
• Howcanthesepostsbedis3nguishedfromothers?
• WhatIsRequiredinFindingTime-Cri*calResponses?– Userswithexper3seorknowledge– Fastresponse– Relevantanswers
34
![Page 35: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/35.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
FindingTime-Cri=calResponses
• Manyques3onsaskedduringcrisisshouldbeimmediatelyaXended
• Manyrespondersarebusy• Howcanwefindapromptresponderwhocanprovidearelevantanswer?
• ChallengesofIden3fyingPromptResponders– Howdowees3matethereply*meofuserstoiden3fypromptresponders?
– Timelinessandrelevance:howdoweintegrate3melinesswithrelevancetorankcandidateresponders?
35
![Page 36: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/36.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Informa=onSeekinginSocialMedia
• Socialmediaisusedtorequestforhelpduringcrisis
• Addressingthesequeriespromptlycanhelpinemergencyresponse
36
![Page 37: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/37.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
Iden=fyingCandidateResponders
• Timeliness– Theusercanrespondmorequicklyifsheisavailablesoonazertheques3onisposted.Itcanbees3matedusingthepreviouspos3ng3mes
– Auserrespondstoques3onsfasterifshehasrepliedpromptlytosimilarques3onsinthepast
• Relevance– Userswhosepreviouscontentissimilartotheques3onhavehigherrelevanceandtheirresponseismorelikelytobearelevantanswer
• Timelinessandrelevanceareintegratedbycombiningtherankingscores
37
![Page 38: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/38.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
SearchingforCredibleInforma=on
38
CredibleData
Spam
Bots(automa=callygeneratedcontent)
FakeNews
Rumor
• AUniqueChallenge– Groundtruth
• Addi3onalChallenges– Credibilityverifica3on– Dynamicchange– Timeliness
• Alterna3veApproaches– RumorDetec3on– SpamDetec3on– BotDetec3on– InferringDistrust
以其之矛还其之盾
![Page 39: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/39.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
ThankYouAll
• ProfessorYang’skindinvita3onandwarmhospitality• FundingsupportfromONR,NSF,ARO,amongothers• DMMLLabformerandcurrentmembers,andLiangWuforhelpingwiththeprepara3onofthispresenta3on
Searchfor“HuanLiu”formoreinforma3onaboutDMML
HLiu,FMorstaXer,JTang,andRZafarani.``Thegood,thebad,andtheugly:uncoveringnovelresearchopportuni=esinsocialmediamining",inTrendsofDataScience,Interna3onalJournalonDataScienceandAnaly3cs,SpringerInterna3onalPublishingSwitzerland.September,2016.DOI10.1007/s41060-016-0023-0
39
![Page 40: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/40.jpg)
40
• scikit-feature–anopensourcefeatureselec3onrepositoryinPython
• SocialCompu3ngRepository
RepositoriesandRecentBooks
![Page 41: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/41.jpg)
41hcp://dmml.asu.edu/smm/
![Page 42: Searching for Credible Informaon via Social Media Mininghuanliu/papers/BJUT12222016.pdf · • Crossover to other networks • Modified content Arizona ... • Social media allows](https://reader033.fdocuments.us/reader033/viewer/2022050403/5f808c6b7f0de933942522e9/html5/thumbnails/42.jpg)
ArizonaStateUniversityDataMiningandMachineLearningLab SearchingforCredibleInforma=on BJUT2016
References
1. [BeigiSDM’16]GhazalehBeigi,JiliangTang,SuhangWang,andHuanLiu.“Exploi3ngEmo3onalInforma3onforTrust/DistrustPredic3on”.SIAMInterna3onalConferenceonDataMining(SDM16),May5-7,2016.Miami,Florida.
2. [MorstaXerASONAM’16]FredMorstaXer,LiangWu,TahoraH.Nazer,KathleenM.Carley,andHuanLiu.“ANewApproachtoBotDetec3on:StrikingtheBalanceBetweenPrecisionandRecall”,IEEE/ACMInterna3onalConferenceonAdvancesinSocialNetworkAnalysisandMining(ASONAM2016),August18-21,SanFrancisco,CA.
3. [MorstaXerWWW’14]FredMorstaXer,JürgenPfeffer,HuanLiu.WhenisitBiased?AssessingtheRepresenta3venessofTwiXer'sStreamingAPI”,WWWWebScience2014.
4. [MorstaXerICWSM’13]FredMorstaXer,JürgenPfeffer,HuanLiu,KathleenMCarley.IstheSampleGoodEnough?ComparingDatafromTwiXer'sStreamingAPIwithTwiXer'sFirehose”,ICWSM2013.
5. [SampsonCIKM’16]Jus3nSampson,FredMorstaXer,LiangWuandHuanLiu.“LeveragingtheImplicitStructurewithinSocialMediaforEmergentRumorDetec3on",shortpaper,ACMInterna3onalConferenceofInforma3onandKnowledgeManagement(CIKM2016),October24-28,2016.Indianapolis,Indiana.
6. [SampsonICDM’15]Jus3nSampson,FredMorstaXer,RezaZafarani,andHuanLiu.“Real-TimeCrisisMappingUsingLanguageDistribu3on”.Demo.InProceedingsofIEEEInterna3onalConferenceonDataMining(ICDM2015),November14-17,2015.Atlan3cCity,NJ.
42