We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based...
-
Upload
douglas-floyd -
Category
Documents
-
view
214 -
download
0
Transcript of We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based...
![Page 1: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/1.jpg)
We the News
Investigating Blog Punditry
IS256 Applied Natural Language Processing
IS290-6 Web-based services
Yiming Liu, Kevin Lim, Olga Amuzinskaya
![Page 2: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/2.jpg)
Conceptual Outline
NLP analyzer: Summarizes the blog authors' reactions to a news
event Attempts to extract “interesting” opinions from the
blogosphere A component of an overall blog retrieval, analysis,
and output framework Point/counterpoint formulation and presentation
![Page 3: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/3.jpg)
Core Value Proposition
Blogs are interesting in many ways But sometimes not for their “truth value” Often because they are hugely personal and opinionated
Extracting core terms out of news stories and bringing together professionally and non-professionally generated news and analysis opinions, pictures putting information pieces that are interesting and relevant together
![Page 4: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/4.jpg)
NLP Analyzer: summarization
The goal is to pick up the “reactive”, opinion-infused summary sentences: "Gore's right, there is a catastrophic climate change"
vs "Wear less layers, idiot"
Emotional content and affect: a proxy for “opinion”.
Hypothesis: Highly affective sentences are more likely to convey what the authors' core opinions are.
![Page 5: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/5.jpg)
Conceptual Architecture: Retrieval
NewsAdaptor
TermExtractor
Orchestration
REST
articles
Python data structure
terms
XMLWriter
BlogAdaptor
PhotosAdaptor
terms terms
XML NLP Analyzer
Python data structure
NewsFeeds
Search Terms
articlesblogs
photos
XML
![Page 6: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/6.jpg)
Common Data Format: XML
![Page 7: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/7.jpg)
Conceptual Architecture: Summarizer
NaïveBayes
classifier
NLP Analyzer
topictraining& testingcollections
XMLReader
coll.
emotional opinions
requestscoring curse words
capitalization
exclamations
Simpleclassifier
coll.
classified sentences
Orchestration
XMLWriter
classified sentences
News collection
NewsTopic
GoldStandard
![Page 8: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/8.jpg)
Gold standard / training set
Obtained data for our training from Technorati and other blog search engines. Formatted into the shared XML data format Manually picked summary sentences out of text
Retrieved blogs relevant to 3 topics Elections 2006 Inconvenient Truth IE7
![Page 9: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/9.jpg)
Summarizer
Multinomial Naïve Bayes classifier Applied scorers to evaluate blog features:
curse words bonus cue words exclamation points imperative sentences emotional words pleasure words capitalization
strong words search term negation words partisan labels sentence positions pronouns valence of words
![Page 10: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/10.jpg)
Classifiers
Comparison Baseline Multinomial Naïve Bayes Struggled with SVM Focused on getting better scorers and data set
instead of working on SVM
![Page 11: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/11.jpg)
A sample ranking Election:
Terrorists are cheering because Democrats have been championing their cause since 2003 … Islamic throat-cutting fascists know that a Democrat win is a win for Islamic throat-cutting fascists. (correct)
How miserable is your political party when you have the enemy of your country cheering for your victory [sic] … (correct)
Yesterday was a victory for all of you useful idiots who claim to be smarter than everyone else and a victory for the terrorists who played you like idiots against your own government. (miss)
As we improved, a hit or miss became an arbitrary thing.
![Page 12: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/12.jpg)
Machine vs. human summarization Election:
Machine: ...Democrats ... will have won a stunning 73 % of Senate seats ... . Human: Enjoy!
Inconvenient Truth: Machine: You don't have to be a fan of Gore , or his politics, to find his
message about global warming worth considering . Human: An Inconvenient Truth is a powerful film that makes you think
about the topic of conservation.
IE7: Machine: Fortunately, I use Firefox for most things, so I still have web
access. Human: Yes , I know it is hard to imagine incompetence at Microsoft , but
I have to bring up the latest turd from Redmond that has bee foisted upon an unsuspecting population : Internet Explorer 7 Or should I say Internet Destroyer 7 ... .
![Page 13: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/13.jpg)
Cross-Validation results
Election: Accuracy: retrieved 25 of actual 26, out of 335 possible Recall: 0.77
Inconvenient Truth: Accuracy: retrieved 10 of actual 18 out of 137 possible Recall: 0.56
IE7: Accuracy: retrieve 12 of actual 21, out of 88 possible Recall: 0.38
Precision: 0.80
Precision: 0.67
Precision: 0.67
![Page 14: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/14.jpg)
NLP Analyzer: demo run on test set
Demo:http://harbinger.sims.berkeley.edu/~k7lim/ANLPWebservice/affectservice.wordy.xml
![Page 15: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/15.jpg)
Challenges Full-text extraction:
resolve dependency on blog formats. Informality of bloggers:
smart quotes, elipses, etc., which require special handling
our segmenter fails to segment sentences that don't have capitalization
Stemmers are hard to obtain (bottleneck): morphy is slow Porter is terrible
![Page 16: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/16.jpg)
Future work: The Automatic Pundit
Point/Counterpoint formulation and presentation: automatic agent that can advocate the core arguments
on behalf of each side of given issue This would require classification of summaries into
positive/negative valences… …and more accurate summaries…
![Page 17: We the News Investigating Blog Punditry IS256 Applied Natural Language Processing IS290-6 Web-based services Yiming Liu, Kevin Lim, Olga Amuzinskaya.](https://reader036.fdocuments.us/reader036/viewer/2022082817/56649dd45503460f94acc916/html5/thumbnails/17.jpg)
Questions?