Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
-
Upload
drtc-indian-statistical-institute-bangalore -
Category
Technology
-
view
47 -
download
1
Transcript of Line,,NATIONAL SEMINAR ORGANIZED BY KULISAA 15.01.2015
ENTITY SEARCH ON THE WEB
Tanmay MondalMSLIS student, DRTC
Bangalore,Karnataka-560059
LiNE ,NATIONAL SEMINAR
ORGANIZED BY KULISAA
15.01.2015
Information
• Information is huge in various databases
It is growing exponentially
A document collection is the set of all web pages
indexed by search engines
TIME FACTOR
• A traditional information extraction approach is to scan every document in any collection
Search Options : OR, AND, *, NOT, ''...''
• Find out right information at right time is Time consuming for users
Person
Location Organization Nationality Religion Product
Phone Number
Email Address/URL
Distance
Date
Time
Money Generic Number
Query: “Countries where I can pay in Euro”Results: Germany, Spain, Italy, ...etc.
For specific Information
OUR QUERY
When people use retrieval systems they are often not searching for documents or text passages
The named entities (persons, organizations,
locations,products...) play a central role in answering
such information needs
At least 20-30% of the queries submitted to Web SE
are simply entities
~71% of Web search queries contain named entities
Entity Search
Entity refers to any object or a thing that can be uniquely identified in the real world
It is a presentation of a ranked list of entities directly, rather than a list of web pages
It's a better match search queries with a database containing
hundreds of millions of "entities"-people, places, organization
& their semantic relations.
Entities are everywhere
Entity & its facets
An entity must be distinguished from other entities
Type of an entity refers to a generic class into which the given entity is classified.
Attribute refers to a property (predicate) associated with an
entity.
Value refers to the value of an attribute (for a given entity).
Relation provides more information with many entites
Entity: Prof. S.R. Ranganathan is a person , IBM is an
organization
Popular Entity Search:
• Product search Various Products like Books, Electronics, Clothes, etc.
• People search Experts, Friends, Profile of famous persons, etc.
• Location search Address, Business, Governments’ Offices, etc.
• And many more search based on entities only….
Main Work of ESE
Entity Retrieval : Entity search engines return a ranked list of entities most relevant for a user query
Entity Relationship / Fact Mining and Navigation : It discover interesting relationships / facts about the entities associated with the queries
Prominence Ranking : Detect the popularity of an entity and enable users to browse entities in different categories
Entity Description Retrieval : Entity description blocks for each entity information about an object in a web page is generally grouped together as an object block
FEATURES OF ENTITY SEARCH ENGINE (ESE)
• ESE provides explicit and easily under stable information
• It gathers/aggregates information from different sources and keeps in one place
• It extracts entities in a structured form• It understands the meaning of our query• It provides most useful information about entities• It shows important relation with other entities• Duplication of entites can be avoided
contd...
More structured than document based• Based on different categories, we can search
entities• Users don’t need to visit different sites for a
particular entity• It helps users to retrieve pin pointed answers
without wasting much time• It provides sources of information for detailed
or document information
Different ESE
• Entity Relationship query (http://idir.uta.edu/erq/)• EntityCube(http://entitycube.research.microsoft.com/)• Okkam Entity Name System (http://api.okkam.org/) • Yatedo( http://www.yatedo.com/)• Dbpedia (http://dbpedia.org/About) • WorldCat (https://www.worldcat.org/)
Geonames(http://www.geonames.org/)
• WolframAlpha (http://www.wolframalpha.com/) • Geneview (http://bc3.informatik.hu-berlin.de/)
Sindice(http://sindice.com/)• IMDb (http://www.imdb.com)• Sindice(http://sindice.com/)
Entity identifiers should not be multiplied beyond necessity
Every entity (individual, instance, “thing”) is assigned a global identifier, ideally unique
More than 7.5 million entity repository with more structured form
Sources Of Information1. Wikipedia Provides lists of different types of entites2. GeoNames contains over million geographical names3. OkkamDBManager databases like extranets, online shops or
publishing houses4. OkkamManualEntry insert new entities in a manual way
OKKAM ENS
Wolfram|Alpha
Wolfram|Alpha is an engine for computing answers and providing knowledge
It generates output by doing computations from its own
internal knowledge base, instead of searching the web and
returning links
It is an online service that answers factual queries directly by
computing the answer
Make all systematic knowledge immediately computable and
accessible to everyone
My Library
Entites are for UseEntites are for Use
Each Entity has its own attributes & relationEach Entity has its own attributes & relation
Every Entity has its importanceEvery Entity has its importance
Save the Time for finding out EntitesSave the Time for finding out Entites
Entites are growing rapidlyEntites are growing rapidly