Knowledge acquisition using automated techniques
-
Upload
university-of-melbourne-australia -
Category
Technology
-
view
491 -
download
2
description
Transcript of Knowledge acquisition using automated techniques
![Page 1: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/1.jpg)
Methods of
Knowledge Extraction
Deepti Aggarwal
SIEL|SERL, IIIT-Hyderabad, India
![Page 2: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/2.jpg)
AgendaIntroduction to Web as a knowledge
repository
Automated extraction techniques (Input sources, extracted structures, input pre-processing, extraction methods, output generation)
Issues with automated extraction
![Page 3: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/3.jpg)
What is knowledge?A familiarity with someone or
something with experience
Includes facts, information, descriptions, skills
![Page 4: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/4.jpg)
Types of KnowledgeExplicit
Knowledge
Always present explicitly in records
Objective facts having a definite answer
E.g., Hyderabad is the capital of A.P.
Implicit Knowledge
Not present explicitly for analysis
Cultural beliefs with subjective judgments
E.g., Hyderabad is the best city to live in India.
![Page 5: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/5.jpg)
How knowledge is represented over a period of time?From Public library to global library
![Page 6: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/6.jpg)
How knowledge is represented over the web?Millions of documents, blogs, forums,
social networks scattered on web
Diverse topic, different formats, from diverse people in diverse language, different point of views
![Page 7: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/7.jpg)
Benefits of knowledge extraction over the WebQuestion Answering systems
Search engines
Validating knowledge
Tracking a particular information
Predicting market, polls etc.
Community advertisements
Explicitknowledge
Implicitknowledge
![Page 8: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/8.jpg)
Problems with knowledge acquisition over web
Abundance of data
Relevance of information
Personalized retrieval
![Page 9: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/9.jpg)
Possible approachesManual filtering
Automated techniques
Combination of both
![Page 10: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/10.jpg)
Automated Extraction
![Page 11: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/11.jpg)
Input sources
Extraction system
Database of all facts, relations
Inputpre-
processing
Extractionmethods
Outputprocessing
Working of automated extraction systems
Defining output
structures
![Page 12: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/12.jpg)
Input sourcesTypes
![Page 13: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/13.jpg)
Input sourcesweb documents
news articles
blogs
social networks activities (user profiles, posts, comments)
Sentence level parsing required.
![Page 14: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/14.jpg)
Defining the structures of
outputNamed Entities and their relations
![Page 15: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/15.jpg)
Output structures Named Entities
Named entities relations
![Page 16: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/16.jpg)
1. Named Entity: Definition
It is an atomic element in a body of text.
Types: person, organization, location etc.
Different named entities when linked
together, form a relation.
![Page 17: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/17.jpg)
1. Named Entity: An example
Sachin Tendulkar was born in Bombay.
NE of type ‘Person’ NE of type ‘Location’
![Page 18: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/18.jpg)
2. Named Entity Relationship: Structure
Subject – Relation - Object
NE of any type
Verb, Adjective, Adverb
NE of any type
![Page 19: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/19.jpg)
2. Named Entity Relationship: An Example
Sachin Tendulkar was born in Bombay
Subject Relation Object
![Page 20: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/20.jpg)
Co-referencing
Sachin was born in Bombay. He is a ...
Sachin Tendulkar …. Mr. Tendulkar … Master Blaster ...
![Page 21: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/21.jpg)
Input pre-processing
Libraries
![Page 22: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/22.jpg)
NLP libraries: Splitting each sentence into tokens,
words, digits using Sentence Tokenizer
Recognizing language constructs, nouns, verbs, pronouns using Part-of-speech Tagger
Example: Sachin/NNP Tendulkar/NNP was/VBD born/VBN in/IN Bombay/NNP
![Page 23: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/23.jpg)
NLP libraries (contd.): Linking individual constituents of a
sentence with Parser to form parse tree
Identify types of named entity using Named Entity Recognizer
Example: Sachin Tendulkar/PERSON was born in Bombay/LOCATION
![Page 24: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/24.jpg)
NLP libraries (contd.): Identify all co-references and replace
with actual entity using Co -reference Resolution tool
Identify specific meaning of a word Word Sense Disambiguation External vocabularies: MindNet,
DBpedia, WordNet E.g., contextual meaning of ‘crane’:
noun-bird, verb-lift/move
![Page 25: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/25.jpg)
Extraction methods
![Page 26: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/26.jpg)
Extracting relationships among NEs: Standard process
1. Identify named entities within a sentence.
2. Find the verb or adjective that
connects the identified named entities.
3. Connect them together to form
relation.
![Page 27: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/27.jpg)
Extracting relationships among NEs: Required process
1. Identify part-of-speech constructs: noun, verb, adjective etc.
2. Determine Co-references, Acronyms and
abbreviations.
3. Connect them together to form a
relationship.
![Page 28: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/28.jpg)
Extraction Methods
Natural Language Processing: rule based.
Based on sentence structure
E.g., for English language, a rule can be “noun-verb-noun”
Machine Learning: supervised and unsupervised learning.
Features are detected from the training data
E.g., to extract instances of some medical diseases, system is trained over all the symptoms of each given disease.
![Page 29: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/29.jpg)
Extraction Methods (contd.)
Other methods: Vocabulary based systems, context based clustering.
Maintaining a mapping file of all countries and their nationalities helps to determine nationality of a person when his birth place is known.
Hybrid:
NLP based libraries to pre-process the input data, applying machine learning approach to extract the relations by using some external vocabulary as WordNet.
![Page 30: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/30.jpg)
Output generation
![Page 31: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/31.jpg)
Types of output systems
1. Identifies all mentions of named entities and their relations.
E.g., from a given corpus, extract all named entity relations.
2. Identify missing relations of a database
E.g., Given a database, extract the missing attributes of given entities from the corpus.
3. Linking various entities within a database.
E.g., Given a database, link two entities together with some relation extracted from the corpus.
![Page 32: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/32.jpg)
Input sources
Extraction system
Database of all facts, relations
Inputpre-
processing
Extractionmethods
Outputprocessing
Working of automated extraction systems
Defining output
structures
![Page 33: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/33.jpg)
Issues with automated extraction
Accuracy, running time, dependency
![Page 34: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/34.jpg)
Issue 1: Challenges of language structure
Co-reference resolutionAmbiguous, complex sentencesAbbreviationsAcronyms
![Page 35: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/35.jpg)
See an example…
“Tom called his father last night. They
talked for an hour. He said he would be home the next day."
What is ‘He' referring to? Tom or his father?
![Page 36: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/36.jpg)
“You see sir, I can talk English, I can walk English, I can laugh English, I can run English, because
English is such a funny language.” Amitabh in Namak Halal
![Page 37: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/37.jpg)
Issue 2: AccuracyNamed entity detection: 90%,
relationship 50-70%. Introduction of noise at each step.
E.g., disambiguation of acronym ‘crane’ with WordNet, introduces contextual errors, which then decreases accuracy of rule based relationship extraction
![Page 38: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/38.jpg)
Issue 3: EfficiencyFeature detection steps are
expensive.
Require days for computation
![Page 39: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/39.jpg)
Issue 4: Dependencyon external vocabulary sources, like
Wikipedia, WordNet, MindNet etc.Maintenance & updation of vocabulary
sources is manual: costly and require expertise.
Limited size produce context based noise
Domain-dependent: medical domainCorpus-dependent: Wikipedia, news
corpusRelation specific: Date and Place-of-
event
![Page 40: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/40.jpg)
Issue 5: Problem with Implicit knowledge extraction
Community Knowledge is learned and shared
No one can be an expert.
cultural competence and perception of workers are fed into a system as variables.
Cultural Consensus Theory provides models to include such variables into the system.
![Page 41: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/41.jpg)
Can we do better?
Can we seek human intelligence to improve the accuracy of automated techniques?
![Page 42: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/42.jpg)
References[1] I. Tuomi. Data is more than knowledge:
implications of the reversed knowledge hierarchy for knowledge management and organizational memory. J. Manage. Inf. Syst. , 16(3):103–117, Dec. 1999.
[2] S. Sekine. Named Entity: History and Future. 2004.
[3] S. Sarawagi. Information extraction. Found. Trends databases , 1(3):261–377, Mar. 2008.
[4] S. C. Weller. Cultural consensus theory: Applications and frequently asked questions. Field Methods,19(4):339–368, 2007.
![Page 43: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/43.jpg)
References (contd.)[5] Z. Syed, E. Viegas, and S. Parastatidis. Automatic
discovery of semantic relations using mindnet. LREC,2010.
[6] G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Wordnet: An on-line lexical database. International Journal of Lexicography , 3:235–244, 1990
[7] T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Eng. Bull. , pages 40–48, 2006.
[8] E. Greengrass. Information retrieval: A survey, 2000.
![Page 44: Knowledge acquisition using automated techniques](https://reader033.fdocuments.us/reader033/viewer/2022061123/54702ce8af79598a778b456e/html5/thumbnails/44.jpg)
Thank youQuestions?