Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Named Entity Recognition: Fallacies,
Challenges & Opportunities
Authors: Mónica Marrero, Julián Urbano, Sonia Sánchez-
Cuadrado, Jorge Morato, Juan Miguel Gómez-Berbís
Presented by: Bianca Pereira
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Alchemy API Raises $2M
“Alchemy, which launched in 2009, processes 3 billion
API calls per month. It is used in 36 countries (…)”
http://semanticweb.com/alchemy-api-raises-2m_b35276
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
“FOX can generate RDF out of natural language with
improved accuracy. FOX has been shown to be up to
15% more accurate than other frameworks, including
commercial software.”
http://semanticweb.com/aksw-announces-federated-knowledge-extraction_b21399
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
“There are many open-source and commercial products
out there that attempt to determine sentiment in
tweets, but what is interesting to find out is what entity
is that sentiment attached to.”
http://semanticweb.com/introducing-semanticweb-com-innovation-spotlight-series-with-pingar_b30106
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
“DBPedia Spotlight’s ability (…) to support (…) faceted
browsing, customized web feeds (…) enrich blog
content.”
“Many (…) relationship extraction algorithms rely on
entity identification beforehand(…)”
http://semanticweb.com/the-spotlight%E2%80%99s-on-dbpedia_b17942
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
“People and places (…) are only a small part of this
wider project (…) around entities that Bing embarked on
a while back.”
http://techcrunch.com/2013/03/21/bing-just-got-a-lot-smarter-now-knows-more-about-people-and-places/
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Agenda
What is a (Named) Entity?
Named Entity Recognition evolution
Named Entity Recognition evaluation
Conclusions
How is it related to my PhD?
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Named Entity Recognition
What is Named Entity Recognition?
“Identification of mentions to real world entities
in a natural language text. “
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Named Entity Recognition
What is Named Entity Recognition?
“Identification of mentions to real world entities
in a natural language text. “
(my words)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Named Entity Recognition
The term “named entity” was coined for the Named
Entity task at the 6th Message Understanding
Conference (MUC-6).
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Named Entity Recognition
The term “named entity” was coined for the Named
Entity task at the 6th Message Understanding
Conference (MUC-6).
“Unique identifiers of entities (organizations, persons,
locations), times (dates, times), and quantities
(monetary values, percentages).”
(http://cs.nyu.edu/faculty/grishman/NEtask20.book_2.html#HEADING1)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Named Entity Recognition
In the next year, the definition changed a little bit.
“Named Entities (NE) were defined as proper names
and quantities of interest. Person, organization, and
location names were marked as well as dates, times,
percentages, and monetary amounts.”
(http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings
/muc_7_proceedings/overview.html)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
MUC-7 Results
The results for the MUC-7 Named Entity task are
very promising
(http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_
proceedings/marsh_slides.pdf)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Challenges
There were no Message Understanding Conference
anymore…
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Challenges
But there was..
Automatic Content Extraction (ACE - 1999)
Computational Natural Language Learning (CoNLL – 2002)
INEX Entity Ranking Track (2007)
TREC Entity Track (2009)
TAC Knowledge Base Population (TAC-KBP – 2009)
…
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
ACE
“ Recognition of entities, not just names. In the ACE
entity detection and tracking (EDT) task, all mentions
of an entity, whether a name, a description, or a
pronoun, are to be found and collected into
equivalence classes based on reference to the same
entity.”
(Doddington et al. 2004)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
CoNLL
“ Named entities are phrases that contain names of
persons, organizations, locations, times and
quantities. (…) We will concentrate on four types of
named entities: persons, locations, organizations and
names of miscellaneous entities that do not belong
to the previous three groups.(…)”
(http://www.clips.ua.ac.be/conll2002/ner/)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
INEX Entity Ranking Track
“ (…) entities (such as countries, people and dates)
requires the estimation of relevance of items (i.e.,
instances of entities) (…) we restricted candidate
items to those entities that have their own
Wikipedia article.”
(De Vries et al. 2007)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
TREC Entity Track
“ A web entity is uniquely identifiable by one of its
primary homepages. Real-world entities can be
represented by multiple homepages.”
(Balog et al. 2009)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
TAC-KBP
“ The tasks will be structured by having participants
process a list of target entities. The list will contain
entity types of Person, Organization and Geo-Political
Entity.”
(http://apl.jhu.edu/~paulmac/kbp/090601-KBPTaskGuidelines.pdf)
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
What is a Named Entity?
Proper nouns
Water? Whale? Twelve o’clock?
Rigid designator
Richard Nixon (V) vs President of the United States (X)
Unique identifier
“(…) virtually everything could be referred to uniquely,
depending on the context or the previous knowledge of
the receiver, although a unique identifier for one receiver
might not be so for another one, either because of lack of
shared knowledge or the ambiguity of the context.”
Purpose and domain of application
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Evaluation
As the definition changes the evaluation changes.
Each challenge has different..
.. types of Named Entity to identify
.. identification and annotation criteria
.. valid boundaries of a Named Entity
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Other problems
How to evaluate current tools with different
definitions of Named Entities?
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Other problems
How to evaluate current tools with different
definitions of Named Entities?
Using only Person, Organization and Place.
Using only those tools which work with numbers and
dates.
Using current annotated corpora (and see what happens).
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Other problems
How to evaluate current tools with different
definitions of Named Entities?
Using only Person, Organization and Place.
Using only those tools which work with numbers and
dates.
Using current annotated corpora (and see what happens).
How to choose the best tool?
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Other problems
How to evaluate current tools with different
definitions of Named Entities?
Using only Person, Organization and Place.
Using only those tools which work with numbers and
dates.
Using current annotated corpora (and see what happen).
How to choose the best tool?
It depends on the application.
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Conclusions
Is NER really solved?
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Conclusions
Is NER really solved?
Content Validity
– Reflect the needs of the real user.
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Conclusions
Is NER really solved?
Content Validity
External Validity
– The experiments can be generalized to other populations and
experimental settings.
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Conclusions
Is NER really solved?
Content Validity
External Validity
Convergent Validity
– The results agree with other results, theoretical or
experimental.
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Conclusions
Is NER really solved?
Content Validity
External Validity
Convergent Validity
Conclusion Validity
– The conclusions drawn from the results are justified.
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Conclusions
Is NER really solved?
Content Validity
External Validity
Convergent Validity
Conclusion Validity
“There is not enough evidence to support the statement that
NER is solved: it rather suggests the opposite”
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Conclusions
What about..
genes and diseases?
entities identified by the same name as their classes?
(ambulance, airplane, and so on)
entities identified by their attributes and description
entities…
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
Conclusions
What is an entity?
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
My PhD thesis
How is it related to my PhD topic?
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
My PhD thesis
How is it related to my PhD topic?
Entity Linking is the identification and disambiguation of
entities using a background knowledge base.
Entity Recognition is the first step.
What is an entity?
And more.. What is an entity in different domains?
Digital Enterprise Research Institute www.deri.ie
Enabling Networked Knowledge
References
(Balog et al 2009)
Balog, Krisztian, et al. “Overview of the TREC 2009 Entity
Track.” 2009
(Doddington et al 2004)
Doddington, George, et al. “The automatic content
extraction (ACE) program-tasks, data, and evaluation.”
Proceedings of LREC. Vol 4. 2004.
(De Vries et al 2007)
De Vries, Arjen P., et al. “Overview of the INEX 2007 entity
ranking track.” Focused Access to XML Documents.
Springer Berlin Heidelberg, 2008. 245-251.
Top Related