Alexandria ACM SC | Introduction to Natural Language Processing
-
Upload
alex-acm-sc-library -
Category
Documents
-
view
215 -
download
0
Transcript of Alexandria ACM SC | Introduction to Natural Language Processing
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
1/21
Ahmad M. Bakr
Computer and Systems Engineering
Department
Faculty of EngineeringAlexandria University, Egypy
Introduction to Natural Language
Processing
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
2/21
Agenda
Introduction.
Basic text processing techniques.
Information Retrieval.
Sentiment Analysis. Named Entity Recognition.
Question Answering.
Relation Extraction.
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
3/21
Introduction
NLP is a branch ofartificial intelligence that dealswith analyzing, understanding and generating the
languages that humans use naturally in order
to interface with computers.
Natural language processing aims to teach
computers to understand the way humans learn
and use language.
http://www.webopedia.com/TERM/A/artificial_intelligence.htmlhttp://www.webopedia.com/TERM/I/interface.htmlhttp://www.webopedia.com/TERM/I/interface.htmlhttp://www.webopedia.com/TERM/A/artificial_intelligence.html -
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
4/21
Introduction Speech processing: get flight information or book a hotel over the
phone.
Information extraction: discover names of people and events theyparticipate in, from a document.
Machine translation: translate a document from one humanlanguage into another.
Question answering: find answers to natural language questions ina text collection or database.
Summarization: generate a short biography of Noam Chomsky fromone or more news articles.
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
5/21
Text Processing Text processing is manipulation of text, especially
the transformation of text from one format toanother.
Usually from plain text (set of paragraphs) to a
form that is easy to be included in calculations. Vector Space Model (VSM) is one of the forms
used by application to represent document as avector of its words.
dj={W1,W2, W3 . Wn}
Each word is assigned a weight (i.e TF-IDF) Weight = Term Frequency * 1/(Document Frequency)
Similarity between two documents can becalculated as the similarity between the vectors of
these documents.
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
6/21
Information Retrieval
Information retrieval is the activity of obtaininginformation resources relevant to an information
need from a collection of information resources.
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
7/21
Information Retrieval
Usually information is indexed to speed up thequeries.
Inverted Index is one of the primary attempts to
index text based on its words.
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
8/21
Information Retrieval
Can we use inverted index to search forsentences A B C?
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
9/21
Information Retrieval
Document Index Graph
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
10/21
Sentiment Analysis Sentiment analysis oropinion mining refers to the
application ofnatural language
processing, computational linguistics, and text
analytics to identify and extract subjective information
in source materials.
http://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/Computational_linguisticshttp://en.wikipedia.org/wiki/Text_analyticshttp://en.wikipedia.org/wiki/Text_analyticshttp://en.wikipedia.org/wiki/Text_analyticshttp://en.wikipedia.org/wiki/Text_analyticshttp://en.wikipedia.org/wiki/Computational_linguisticshttp://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/Natural_language_processing -
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
11/21
Sentiment Analysis
Techniques: Maintaining a list of words for each class
Example This is a nicemovie , This is a badmovie
Using classifiers that trained with sentences for each class
separately
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
12/21
Named Entity Recognition
NER is a subtask ofinformation extraction thatseeks to locate and classify atomic elements in
text into predefined categories such as the names
of persons, organizations, locations, expressions
of times, quantities, monetary values,percentages, etc.
http://en.wikipedia.org/wiki/Information_extractionhttp://en.wikipedia.org/wiki/Information_extraction -
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
13/21
Name Entity Recognition
Approaches: Database based recognition (word net)
Rule based model
Statistical models (ex. HMM and Maximum Entropy)
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
14/21
Name Entity Recognition
Wikipedia-based NER
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
15/21
Name Entity Recognition
Wikipedia-based NER Index all pages titles
Two phase algorithm
Given a text, search all titles. (phase one)
Score the candidate titles (phase two)
What factors should the scoring formula consider
?
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
16/21
Question Answering
What is Question Answering
QA is a computer science discipline within the
fields ofinformation retrieval and natural
language processing (NLP), which is concerned
with building systems that automatically answer
questions posed by humans in a natural
language.
http://en.wikipedia.org/wiki/Information_retrievalhttp://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/Natural_languagehttp://en.wikipedia.org/wiki/Natural_languagehttp://en.wikipedia.org/wiki/Natural_languagehttp://en.wikipedia.org/wiki/Natural_languagehttp://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/Natural_language_processinghttp://en.wikipedia.org/wiki/Information_retrieval -
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
17/21
Question Answering
A QA implementation, usually a computerprogram, may construct its answers by querying a
structured database of knowledge or information,
usually a knowledge base. More commonly, QA
systems can pull answers from an unstructuredcollection of natural language documents.
http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Knowledge_basehttp://en.wikipedia.org/wiki/Knowledge_basehttp://en.wikipedia.org/wiki/Database -
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
18/21
Question Answering
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
19/21
Question Answering Question Classification
Question classifiermodule determines the type ofquestion and the type of answer. Examples:1) Who discovered x-rays? should be classified
into the type of human (individual)
Examples: 2) Where is Alexandria Located ? should beclassified into the type of place
Rule-based approaches
Using Classifiers to be trained with possible questiontypes
Question is put in a form of parse tree to capture therelationship between its entities (i.e subjects, objects etc)
The main purpose of the parse tree is to understand
the question and the links between its entities.
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
20/21
Question Answering
Query FormulationApply text processing techniques to form a query
from the question.
Techniques as:
Stemming (Swimming Swim)
Adding synonymous (USA United States of America)
Give weights to words of the question (nouns takes higher
weights)
-
8/22/2019 Alexandria ACM SC | Introduction to Natural Language Processing
21/21
Question Answering Search knowledge base
The main target is to identify the paragraphs thatpossibly contain answers to the users question
Knowledge based is usually indexed.
Answers Extraction Parse the candidate paragraphs to extract
sentences with possible answers
Construct the parse tree of the matches sentences
Parse tree gives insights about the relationshipbetween the entities of a candidate sentence
Rank the possible answers based on theirrelevance to the question.