Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou...
-
Upload
austen-leonard -
Category
Documents
-
view
212 -
download
0
Transcript of Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou...
Automatic Detection of Tags for Political BlogsAutomatic Detection of Tags for Political BlogsKhairun-nisa Hassanali Vasileios Hatzivassiloglou
[email protected] [email protected]
The University of Texas at Dallas
1. Summary
More than 22 .6 million Americans maintain web sites with regularly updated commentary (blogs), of which at least 38,500 are specifically dedicated to politics
A tool for automatically tagging of political blog posts was introduced.
Political blogs differ from other blogs as they often revolve around named entities (politicians, organizations and places). Therefore, tagging of political blog posts benefits from using basic named entity recognition to improve tagging.
Tag identification using a hybrid approach (statistical and grammatical) yield better results
Sood et. al report a precision/recall of 13.11%/22.83% whereas Wang and Davidson report a precision/recall of 45.25%/23.24%. Our recall is higher perhaps because of the domain.
7. Experimental Results
8. Conclusion
5. Tag Detection using Support Vector Machines
Collect data from several blogs that tag data
Preprocess data – Parse HTML
and rectify errors
Divide data into posts and index them by their
tags
Train the SVMs on the training data
OutputInput
One classifier for each tag
Blog URLs
Training of SVM classifiers
Detection of Tags
Collect data from the blog
Preprocess data – Parse HTML
and rectify errors
Divide data into posts
Run all the classifiers on each
post
OutputInput
Top five tags associated with each
post
Blog URL
Many blogs tag their posts
Tags are representative of the topics discussed
Training data was collected from “Daily Kos” and “Red State”
100,000 posts from Daily Kos (2003-2010)
70,000 posts from Red State (2007-2010)
A total of 787,780 tags
Used Joachim’s SVM Light
Use the same SVM based approach with new features based on grammatical knowledge
Proper Nouns are frequently topics
Place a higher weight on proper and common nouns
Identifying entities referred by different names
Barack Obama, Obama and Barack Hussein Obama refer to the same person
Fetch data from blog
Preprocess data and segment into
posts
Perform shallow parsing
Extract Noun Phrases
Input
Blog URL
Output
Top scoring nouns
Extraction of Tag Nouns
Fetch data from blog
Preprocess data and segment into
posts
Perform co-reference resolution
Extract entities
Input
Blog URL
Output
Top scoring entities
Extraction of Tag Entities using Named Entity Recognition and Co-reference Resolution
Fig. 1: Tag Detection using Support Vector Machines
Fig. 2: Tag Detection using Grammatical Techniques
3. The Larger Problem
Given multiple texts from two or more blogs/political sources, answer the following questions:
On which subjects the texts, as a whole across each source, agree/disagree?
How similar are the sources’ positions?
What makes them agree/disagree?
Difficult to associate an attitude with a specific topic/subject
Many clues are implicit and appear to require deep semantic analysis
Tags can serve as a basis for bringing together posts about the same topic
Compiling a profile for each political entity: What it talks about and what its position is
Organizing groups of sources according to perspective
Tags for Political blogs are automatically detected
Tags are representative of topics
Significant topics are automatically identified using SVM and other NLP techniques
9. Future WorkPolitical Profile is a summary of a political entity’s (politician, political group) stance on different issues
Extract the top scoring topics along with the “entities’ sentiments” (attitudes towards topic) and select representative sentences that voice sentiments towards these topicsAggregate information across texts according to specific criteria (poster, source, time) and quantitatively compare signatures and identify which topics are responsible for the differences
2. Political Blogs
6. Tag Detection using Grammatical Techniques
4. Why are Tags Needed?
Precision Recall F-Score
Single Word SVM 27.30% 60.30% 37.60%
+ Stemming 26.10% 59.50% 36.30%
+ Proper Nouns 36.50% 56.80% 44.40%
Named Entities 48.40% 49.10% 48.70%
All Combined 21.10% 65% 31.90%
Manual Scoring 67.00% 75% 70.80%
Fig 3: Results on Daily Kos
Precision Recall F-Score
Single Word SVM 19.00% 30.00% 23.30%
+ Stemming 22.00% 30.20% 25.50%
+ Proper Nouns 46.30% 54.00% 49.90%
Named Entities 60.10% 41.50% 49.10%
All Combined 20.30% 65.70% 31.00%
Manual Scoring 47.00% 62.00% 53.50%
Fig 4: Results on Red State
2681 posts from Daily Kos and 571 posts from Red State
Compared tags to original tags of blog post
Manually evaluated relevance of tags on a small portion of test set