X-PostCreating a Cross Posting FacilitatorFor Technology Communities.
Hacker News & StackOverflow
WS3 Group 3Anca Dumitrache, Fabio Benedetti, Seyi Feyisetan
Introduction
● Stack Overflow: questions and answers on technology● Hacker News: news for technology enthusiasts
● similar to Hacker News: Reddit, Slashdot● similar to Stack Overflow: Quora
Goals1. develop a methodology to compare online technology
communities
2. use the vocabulary of one social community (e.g. StackOverflow) to describe the other (e.g. Hacker News)
3. topic recommendation: newsworthy cross posting across communities
Topic recommendation
Pipeline
Pipeline
Approach
1. data gathering:○ sources: Hacker News + StackOverflow○ fixed timeframe: September 2013○ method: web scraping with Python, R
2. data processing:○ linking: named entity extraction with term matching using the tags
vocabulary from Stack Overflow○ cleanup: only keep posts with tech-related topics
Future development1. data processing:
○ crowdsourced disambiguation of entities2. training:
○ use a priori observations of cross posting as training data○ possible features:
i. co-occurring tagsii. frequency of tagsiii. number of points in a postiv. number of comments in a postv. time...
3. evaluation:○ crowdsourced ranking of recommendation relevance
Results
Topic overlap
Trending topics
Trending topics
Frequency overlap
Frequency overlap
zoomed in
Findings
1. small set of overlapping topics over the two social machines(but better NER could identify more links)
2. StackOverflow has a more diverse range of topics than HackerNews(although the vocabulary likely introduces bias)
3. different frequently discussed topics on both social machines(although a set of outliers does exist)
Future Work● add more data sources such as Reddit, Slashdot
● gather data over a larger timeframe
● fine tune our Named Entity Recogniser
● expand the vocabulary used to describe the communities (and publish as Linked Data)
● use crowdsourcing for tag disambiguation and output evaluation
ConclusionPreliminary studies show that: ● we can use StackOverflow tags as a vocabulary to understand online
technology communities
● we can identify a feature set to compare these communities
● there is enough gap between trending topics in the two communities to allow for the use case of a topic recommendation system
Top Related