Knowledge Representation. Computational Journalism week 8
-
Upload
jonathan-stray -
Category
Documents
-
view
16 -
download
0
description
Transcript of Knowledge Representation. Computational Journalism week 8
![Page 1: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/1.jpg)
Frontiers of Computational Journalism
Columbia Journalism School
Week 7: Knowledge Representation
November 6, 2015
![Page 2: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/2.jpg)
Unstructured data
![Page 3: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/3.jpg)
Structured data
![Page 4: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/4.jpg)
Everyblock.com circa 2009
![Page 5: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/5.jpg)
Connected China. Reuters, 2013
![Page 6: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/6.jpg)
Article Metadata headline
photo
photo caption byline
photo credit
publication date dateline article body related articles
![Page 7: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/7.jpg)
Schema.org news markup Overall type of the object on this page, in HTML head
Headline, dateline, date as additions to div/span properties
Byline expressed as nested object (using itemscope) of type schema.org/Person
![Page 8: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/8.jpg)
Driving application: “rich snippets”
Schema.org covers not just news but music, restaurants, people, organizations, reviews, offers... Snippets, and beSer search-‐‑ability generally, are motivation for Google, Yahoo, Bing to push schema.org
![Page 9: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/9.jpg)
Additional metadata from indexing team
In database, but doesn't necessarily make it to HTML.
![Page 10: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/10.jpg)
News application: content navigation
Articles about “Syria” on NYT topic page More reliable than simple text search (because the relevance algorithm knows a story is "ʺabout"ʺ Syria.)
![Page 11: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/11.jpg)
Ontologies What objects and relations are available?
Often represented as class hierarchy. Arrows = “is_a” relation
![Page 12: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/12.jpg)
(Part of) a real ontology, from Cyc
![Page 13: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/13.jpg)
Every big news org has their own big ontology L
topics, people, organizations, places...
![Page 14: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/14.jpg)
Yaaay Linked Data! Triples of (subject relation object), each a URL or literal <urn:x-states:New%20York> <http://purl.org/dc/terms/alternative> "NY”
<http://dbpedia.org/resource/Columbia_University> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/CollegeOrUniversity>
Abbreviations possible with many formats... <http://dbpedia.org/resource/Columbia_University> rdf:type
ns6:CollegeOrUniversity
![Page 15: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/15.jpg)
![Page 16: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/16.jpg)
![Page 17: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/17.jpg)
![Page 18: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/18.jpg)
NYT ontology available as LOD
owl:SameAs makes this interoperable
![Page 19: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/19.jpg)
NYT API can return linked data { "title": "Syria's Rebels Open Talks on Forging United Political Front"
"body": "BEIRUT, Lebanon — Syria ’s fractious opposition groups began negotiations in Doha, Qatar, on Sunday to forge a more unified front to reshape the political landscape in a bloody conflict that claims more than 100 lives virtually every day. Given the scant prospects that any attempt to restructure the opposition will succeed — the",
"dbpedia_resource_url": [ "http://dbpedia.org/resource/Hillary_Rodham_Clinton", "http://dbpedia.org/resource/Bashar_al-Assad"],
"facet_terms": "CLINTON, HILLARY RODHAM ASSAD, BASHAR AL- SYRIA DOHA (QATAR) SYRIAN NATIONAL COUNCIL STATE DEPARTMENT WAR AND REVOLUTION DEFENSE AND MILITARY FORCES"}
![Page 20: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/20.jpg)
Objects and relations in text?
names, dates, places, verbs.
![Page 21: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/21.jpg)
Named Entity Recognition Extract subjects, objects, from text. Also, resolve pronouns if possible. "Gov. Andrew M. Cuomo on Wednesday gave a sea wall the nod. Because of the recent history of powerful storms hitting the area, he said, elected officials have a responsibility to consider new and innovative plans to prevent similar damage in the future."
![Page 22: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/22.jpg)
NER state of the art • Commercial: Google Knowledge Graph • Academic: Stanford NER library
![Page 23: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/23.jpg)
Next level of understanding: verbs “The water that made rivers of Avenues C and D receded on Tuesday, and the East Village was a mixture of disaster and nonchalance. A group of young men in pajama pants and shorts threw a football on East 12th Street, while workers pumped the basement of CHP Hardware on Avenue C and Eighth Street.”
subject verb object
![Page 24: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/24.jpg)
Knowledge Representation in AI (a crazy brief introduction)
Classic "symbolic" paradigm represents knowledge as statements in mathematical logic. Many variations. Most are subsets or modifications of standard first order logic (FOL). Mathematical representation of human knowledge is a very old dream! (Greeks, Leibniz, GOFAI...)
![Page 25: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/25.jpg)
Leibniz, 1685 The only way to rectify our reasonings is to make them as tangible as those of the Mathematicians, so that we can find our error at a glance, and when there are disputes among persons, we can simply say: Let us calculate [calculemus], without further ado, to see who is right.
![Page 26: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/26.jpg)
Predicates and Relations Predicate: asserts that object belongs to a class
vechicle(schoolbus)bird(tweety)straight_gangsta(emily_bell)
Relation: asserts relationship between objects
is_a(car, vehicle)higher_rank(general, colonel)capital(paris, france)
![Page 27: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/27.jpg)
Inference General rules
a ∧ (a => b) => bp ∨ !p
Domain specific inferences
is_a(car, vehicle)can_move(vehicle) => can_move(car)
![Page 28: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/28.jpg)
News as relations between entities “Alice attended the wedding”
attended(alice, wedding)
“IBM was founded in 1917.”
founded(IBM, 1917)
“Hurricane Sandy hit New York”
hit(hurricane_sandy, New_York)
Encode facts as relation(subject,object)also wriSen (subject relation object)
![Page 29: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/29.jpg)
Things we could do with this Question answering
“The granddaughter of which actor starred in E.T.?” (?x acted-in “E.T.”)(?y is-a actor)(?x granddaughter-of ?y)
Inference (bob brother-of alice)(alice mother-of lucy) =>
(bob uncle-of lucy)
Answer questions using inference
“how many executives of publicly-traded Canadian companies died in car crashes?
![Page 30: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/30.jpg)
Problems Not all subjects are simple.
“Over a hundred guests attended the wedding” attended(num_guests, wedding)
greater_than(num_guests,100)
Some relations have multiple parts.
“Hurricane Sandy hit New York on Monday” hit(sandy, New_York, monday)
![Page 31: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/31.jpg)
Standard inference doesn’t allow defaults “All birds fly”
bird(tweety)bird(?x) => flies(?x) => flies(tweety)
But, “penguins and chickens don’t fly” bird(?x) & !penguin(?x) & !chicken(?x)=> flies(?x)
Now we can’t guess that tweety flies bird(tweety) => flies(tweety) ?we don’t know!
![Page 32: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/32.jpg)
Standard mathematical logic doesn’t deal well with exceptions
Some people don’t have a last name.
Sometimes an election isn’t decided on election day. Is a trash can used as a flower pot still a trash can? Is a broken car still a vehicle if it can't move?
![Page 33: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/33.jpg)
Relations from sentence parsing “The water that made rivers of Avenues C and D receded on Tuesday, and the East Village was a mixture of disaster and nonchalance. A group of young men in pajama pants and shorts threw a football on East 12th Street, while workers pumped the basement of CHP Hardware on Avenue C and Eighth Street.”
subject verb object
![Page 34: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/34.jpg)
Relation extraction systems • Commercial: IBM's DeepQA (Watson) • Academic: Open IE project
![Page 35: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/35.jpg)
Ontology explosions
(water made rivers of Avenues C and D) (East Village was a mixture of disaster and nonchalance) (group of young men in pajama pants and shorts threw football) (workers pumped the basement of CHP Hardware )
Do we have all of these in the ontology?
![Page 36: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/36.jpg)
“General Question Answering”
Precision/recall tradeoff. State of the art is IBM’s DeepQA
![Page 37: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/37.jpg)
DeepQA use of structured data “Watson can also use detected relations to query a triple store and directly generate candidate answers. Due to the breadth of relations in the Jeopardy domain and the variety of ways in which they are expressed, however, Watson’s current ability to effectively use curated databases to simply “look up” the answers is limited to fewer than 2 percent of the clues.” -‐‑ Ferruci et. al. “Building Watson”
![Page 38: Knowledge Representation. Computational Journalism week 8](https://reader034.fdocuments.us/reader034/viewer/2022051020/5695d0541a28ab9b02920c69/html5/thumbnails/38.jpg)
Wall Street is high on Molson Coors Brewing (TAP), expecting it to report earnings that are up 17.5% from a year ago when it reports its third quarter earnings on Wednesday, November 7, 2012. The consensus estimate is $1.34 per share, up from earnings of $1.14 per share a year ago. The consensus estimate has dipped over the past month, from $1.35, but it’s still up from the consensus estimate of $1.19 three months ago. For the fiscal year, analysts are expecting earnings of $3.89 per share. Revenue is projected to eclipse the year-earlier total of $954.4 million by 31%, finishing at $1.25 billion for the quarter. For the year, revenue is projected to roll in at $4.04 billion. The company’s net income has declined in the last two quarters. The company posted profit falling by 52.8% in the second quarter. This is after it reported a profit decline in the first quarter by 4.1%.
Automatic story generation, by Narrative Science