Post on 20-Jul-2020
Natural Language ProcessingLecture 2: Applications of NLP
Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwards gained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.
Information Extraction (From 10,000 Feet)
• Input: text, empty relational database• Output: populated relational database
State Party Candidate Fraction
FL D Edwards 0.14
FL D Clinton 0.50
FL D Obama 0.33
Named Entity Recognition
Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.
Reference Resolution
Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.
Coreference Resolution
Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.
Relation Detection
Senator John Edwards is to drop out of the race to become the Democratic party's presidential candidate after consistently trailing in third place.In the latest primary, held in Florida yesterday, Edwardsgained only 14% of the vote, with Hillary Clinton polling 50% and Barack Obama on 33%. A reported 1.5m voters turned out to vote.
member_of
John Edwards Democratic party
Hillary Clinton Democratic party
Barack Obama Democratic party
NER
Encoding the NER Problem...PresidentDonaldTrumpmetwithlocalleadersandfederal responders shortlyafterlandingatanAirForcebaseinCarolina,PuertoRico...
...B-PERI-PERI-PER
OOOOOOOOOOOO
B-ORGI-ORG
OO
B-LOCO
B-LOCI-LOC
...
President Donald Trump met with local leaders and federal responders shortly after landing at an Air Force base in Carolina, Puerto Rico, for what was supposed to be a briefing on the situation on the island. Instead, Trump turned it into an opportunity to congratulate himself and the federal government's response to the disaster…. He downplayed throughout his remarks how dire things are in Puerto Rico, where more than half of the people don't have power, running water, or cellphone service two weeks after Hurricane Maria, a Category 4 storm, tore through the island.
Some Named Entity Types
• Different annotation schemes for NER use different types• Common types include• PER—person• ORG—Organization• LOC—Location• GPE—Geopolitical Entity• FAC—Facility• NAT—Natural phenomenon
• These are only tagged when they are proper names
Evaluating an NER System
Correct named entity tokens
Hypothesized named entity
tokens
NER System Building Process
Test Data
Evaluate Precision and Recall
Relation Extraction
Examples of Relations
Seeding Tuples
ExamplesBrad is married to Angelina.Bill is married to Hillary.Hillary is married to Bill.Hillary is the wife of Bill.
SeedsX is married to YX is the wife of Y
Find all X,Y where these templates existExtract the X,Ys, find all other X … Y
Bootstrapping Relations
Information Retrieval
The Vector Space Model
• Each document is a |Σ|-dimensional vector di:
• Given a query q, represent it in the same way.
• Similarity of vectors ⇒ relevance:
• Twists: tf-idf
Evaluating Information Retrieval
precision
recall
Question Answering
Question Answering Architecture
SLP p. 793
Your Project (Perhaps)
SLP p. 793
QuestionKnown Document
Evaluating QA
Some General Tools
• Supervised classification• Feature vector representations• Bootstrapping• Evaluation:• Precision and recall (and their curves)• Mean reciprocal rank