Kevin teh insight presentation
Transcript of Kevin teh insight presentation
![Page 1: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/1.jpg)
Disambiguating Twitter SearchKevin [email protected] Data Science Fellows ProgramMarch 2013
Tuesday, February 26, 13
![Page 2: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/2.jpg)
That’s not the python that I meant...
Tuesday, February 26, 13
![Page 3: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/3.jpg)
The solution? cluster-pluck.
Tuesday, February 26, 13
![Page 4: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/4.jpg)
cluster-pluck disambiguates Twitter search in real time
Tuesday, February 26, 13
![Page 5: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/5.jpg)
It works in Spanish too!
Tuesday, February 26, 13
![Page 6: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/6.jpg)
Tuesday, February 26, 13
![Page 7: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/7.jpg)
Tools
300,000Tweets
User
Filter
Word Filter Web Application
Tuesday, February 26, 13
![Page 8: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/8.jpg)
Algorithmread query and d/l
corpus of 1500 tweets
select potentially meaningful words
countwords
cluster candidatesinto groups
assign tweetsto clusters
filter outcommon words
rank remaining words by rate of capitalization and
select top 10
rank remaining words by number
of occurrences and select top 10
link two candidates if their relative
proportion of co-occurrence is
greater than 0.25
rank connected components by
total occurrences and take top 3
Tuesday, February 26, 13
![Page 9: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/9.jpg)
Kevin [email protected]
Math PhD -- May ’13Topic: Noncommutative Geometry (Whatever that is)
B.A.Sc. -- April ’07Engineering Science (Whatever that is)
Tuesday, February 26, 13
![Page 10: Kevin teh insight presentation](https://reader034.fdocuments.us/reader034/viewer/2022051400/557ddca4d8b42a124f8b4fde/html5/thumbnails/10.jpg)
Tuesday, February 26, 13