NLP
description
Transcript of NLP
NLP• Natural language processing– Combines AI and linguistics– A component in HCI in that we want a more “human” ability to
communicate with computers• Primarily revolves around– NLU – natural language understanding– NLG – natural language generation
• But also encompasses– Automated summarization/classification of articles and
information extraction– Machine translation– Question answering– Information retrieval– And to lesser extents, speech recognition and optical character
recognition
Understanding Language• NLU is not merely a matter of mapping words to
meanings, instead we need to• stemming/morphological segmentation• part of speech (POS) tagging (identifying grammatical role for
given word)• syntactic parsing• word sense disambiguation (WSD) (identifying the proper
meaning for the words in a sentence)• named entity recognition• identifying the underlying meaning of a sentence• apply context (previous sentences) to understanding of current
sentence• resolve references within and across sentences• apply worldly knowledge (discourse, pragmatics)• represent the meaning in some useful/operable way
NLU Problems• Words are ambiguous (different grammatical roles, different
meanings)• Sentences can be vague because of the use of references (“it
happened again”), and the assumption of worldly knowledge (“can you answer the phone” is not meant as a yes/no question)
• The same statement can have different meanings– “where is the water?”
– to a plumber, we might be referring to a leak– to a chemist we might be referring to pure water – to a thirsty person, we might be referring to potable water
• NLU reasoner may never be complete– new words are added, words take on new meanings, new expressions
are created (e.g., “my bad”, “snap”)• There are many ways to convey one meaning
Fun Headlines• Hospitals are Sued by 7 Foot Doctors• Astronaut Takes Blame for Gas in Spacecraft• New Study of Obesity Looks for Larger Test Group• Chef Throws His Heart into Helping Feed Needy• Include your Children when Baking Cookies• Iraqi Head Seeks Arms• Juvenile Court to Try Shooting Defendant• Kids Make Nutritious Snacks• British Left Waffles on Falkland Islands• Red Tape Holds Up New Bridges• Clinton Wins on Budget, but More Lies Ahead• Ban on Nude Dancing on Governor’s Desk
Ways to Not Solve This Problem• Simple machine translation– we do not want to perform a one-to-one mapping of
words in a sentence to components of a representation• this approach was tried in the 1960s with language translation
from Russian to English– “the spirit is willing but the flesh is weak” “the vodka is good but
the meat is rotten”– “out of sight out of mind” “blind idiot”
• Use dictionary meanings– we cannot derive a meaning by just combining the
dictionary meanings of words together– similar to the above, concentrating on individual word
translation or meaning is not the same as full statement understanding
What Is Needed to Solve the Problem• Since language is (so far) only used between humans, language
use can take advantage of the large amounts of knowledge that any person might have– thus, to solve NLU, we need access to a great deal and large variety of
knowledge• Language understanding includes recognizing many forms of
patterns– combining prefix, suffix and root into words– identifying grammatical categories for words– identifying proper meanings for words– identifying references from previous messages– identifying worldly context (pragmatics)
• Language use implies intention– we have to also be able to identify the message’s context and often,
communication is intention based• “do you know what time it is?” should not be answered with yes or no
Restricted Domains• Early attempts at NLU limited the dialog to a specific
domain with reduced vocabulary and syntactic structures– LUNAR – a front end to a database on lunar rocks– SABRE – reservation system (uses a speech recognition front
end and a database backend)– SHRDLU – a blocks world system that permitted NLU input
for commands and questions• what is sitting on the red block?• what shapes is the blue block on the table?• place the green pyramid on the red brick
• Aside from the reduced complexity with limited vocabulary/syntax, we can also derive a useful representation for the domain• in general though, what representation do we use?
MARGIE, SAM & PAM• In the 70s, Roger Schank presented his CD theory as an
underlying representation for language• MARGIE would input words and build a structure from
each sentence– This structure in composed almost entirely from keywords and
not from syntax, pulling up case frames from a case grammar for a given word (we cover case grammars later)
– The structure would give MARGIE a memory for question answering and prediction
• SAM would use underlying scripts to map sentences onto for storage to reason about typical actions and activities
• PAM would take input from a story and store input sentences in CDs to reason over the story plot and characters
NLU Through Mapping• The typical NLU solution is by mapping from primitive
components of language up through worldly knowledge (similarly to SR mappings)– prosody – intonation/rhythm of an utterance– phonology – identifying and combining speech sounds into
phonemes/syllables/words– morphology – breaking words into root, prefix and suffix– syntax – identifying grammatical roles of words, grammatical
categories of clauses– semantics – applying or identifying meaning for each word, each
phrase, the sentence, and beyond– discourse/pragmatics – taking into account references, types of speech,
speech acts, beliefs, etc– world knowledge – understanding the statement within the context of
the domain• the first two only apply to speech recognition
• Many approaches to each mapping
The Process Pictorial
ly
Morphology• In many languages, we can gain knowledge about a
word by looking at the prefix and suffix attached to the root, for instance in English:– an ‘s’ usually indicates plural, which means the word is a
noun– adding ‘-ed’ makes a verb past tense, so words ending in ‘ed’
are often verbs– we add ‘-ing’ to verbs– we add de-, non-, im-, or in- to words
• Many other languages have similar clues to the word’s POS through the prefix/suffix
• Morphology by itself is insufficient to tag a word’s POS, morphology provides additional clues for both POS and the semantic analysis of the word
Morphological Analysis• Two basic processes– stemming – breaking the word down to its root by
simply removal of a prefix or suffix• may not be easy to do as some words have letters that are the
same as a prefix or suffix but are not, such as defense (de- is not a prefix) as opposed to decrease or dehumidify
• often used when the suffix/prefix is not needed such as when doing a keyword search where only the root is desired
– lemmatization – obtaining the root (known as the lemma) of the word through a more proper morphological analysis of the word (combined with knowledge of the vocabulary)• is, are, am be
• There are many approaches for both processes
Approaches• Dictionary lookup – store all word stems, prefixes and
suffixes – most words have no more than 4 prefix/suffixes so our
dictionary is not increased by more than a factor of 4• Translate the vocabulary (including history of words)
into a finite-state transducer– follow the FST to a terminal node based on matching letters
• Hand-coded rules – which will combine stems + prefix/suffixes with the location
of the word in a sentence (its POS, surrounding words)• Statistical approaches (trained using supervised
learning)
Syntactic Analysis• Given a sentence, our first task is to determine the grammatical
roles of each word of the sentence– alternatively, we want to identify if the sentence is syntactically correct or
incorrect• The process is one of parsing the sentence and breaking the
components into categories and subcategories– “The big red ball bounced high”– break this into noun phrase and verb phrase, break the noun phrase into
article, adjective(s), noun, etc• Generate a parse tree of the parse• Syntactic parsing is computationally complex because words can
take on multiple roles– we generally tackle this problem in a bottom-up manner (start with the
words) but an alterative is top-down where we start with the grammar and use it to generate the sentence
– both forms will result in our parse tree
POS Tagging• Before we do a syntactic parse, we must identify the
grammatical role (POS) of each word• As many words can take on multiple roles, we need to
use some form of context to fully identify each word’s POS– for instance, “can” has roles as
• a noun (a container)• an auxiliary verb (as in “I can climb that mountain”)• a verb (as in “We canned the sliced peaches”)• it can also be part of a proper noun (the dance the can can)• how about the sentence: “We can can the can”
– we might try to generate all possible combinations of tags and select the best grouping (most logical grouping) or use some statistical or rule-based means
Rule-based POS• The oldest approach and possibly the one that will
lead to the most accurate results• But also the approach that takes the most effort– Start with a dictionary– Specify rules based on an analysis of linguistic features
of the given word• a rule will contain conditions to test surrounding words
(context) to determine a word’s POS• some rules may be word-specific, others can be generic• example: if a word can be a noun or verb and follows “the”
then select noun
• Rules can also be modified or learned using supervised learning techniques
POS by Transformation (Brill tagging)• First, an initial POS selection is made for each word in the
sentence (perhaps the most common role for a given word, or the first one in the list or even random)
• Second, transformational rules are applied to correct tags that fail because of some conditional test such as “change from modal to noun if previous tag is a determiner”– we can also pre-tag any word that has 100% certainty (e.g., a word that
has only 1 grammatical role, or a word already tagged by a human)• Supervised learning can be used to derive or expand on the rules
(otherwise, rules must be hand-coded as with the rule-based approach) although unsupervised learning can also be applied but this leads to a reduced accuracy– aside from high accuracy, this approach does not risk overfitting the
data that stochastic/HMM approaches might, it is also easier to interpret the output over stochastic/HMM
Statistical POS• The most common POS approach is the HMM
– the model consists of possible sequences of grammatical components for the given word sequence • e.g., Det – Adj – N – aux – V – Prep – Det – Adj – N
• A corpus is used to train the transition probabilities – we are interested in sequences longer than trigrams because grammatical
context can carry well beyond just 3 words but most HMM approaches are limited to bigrams and trigrams
• Emission probabilities are the likelihood that a word will take on a particular role– these probabilities must be generated through supervised learning (marked
up corpus)• although unsupervised learning may also provide reasonable probabilities using the
E-M algorithm
• Notice the independence assumption – the HMM does not take into account such decisions as two nouns in a row or 10 adjectives in a row
Maximum Entropy POS• The drawbacks of the HMM (no context in terms of
history of POSs in the sentence and only 2-3 word transition probabilities) can be resolved by adding a history– the ME approach adds context
• Templates (much like the transformational approach with rules) are predefined where discriminating features are learned – the approach is to compute the most probabilistic path
through the best matching templates (maximize the “entropy”) under some constraints
– features for a template typically consider a window of up to 2 words behind and 2 words ahead of the current word
Other Supervised Learning Methods• SVMs – train an SVM for a given grammatical role, use a
collection of SVMs and vote, resistant to overfitting unlike stochastic approaches
• NNs/Perceptrons – require less training data than HMMs and are quickly computationally
• Nearest-neighbor on trained classifiers• Fuzzy set taggers – use fuzzy membership functions to for
the POS for a given word and a series of rules to compute the most likely tags for the words
• Ontologies – we can withdraw information from ontologies to provide clues (such as word origins or unusual uses of a word) – useful when our knowledge is incomplete or some words are unknown (not in the dictionary/rules/HMM)
Unsupervised POS• This is the most challenging approach because we
must learn grammatical roles with little or no marked up data sets– But this is the most convenient because marking up a
data set for supervised learning is very time consuming• One approach is to use a small marked up data set
for initial training and then bootstrap up through unsupervised training that clusters around the concepts learned in the marked up data– Approaches include neural networks, rule induction,
data mining-like clustering, decision trees and Bayesian approaches
Syntactic Parsing• With all words tagged, we then put together the sentence
structure through parsing– if POS has selected 1 tag per word, we still have variability
with the parse of the sentence– consider: The man saw the boy with a telescope
• the prepositional phrase “with a telescope” could modify “saw” (how the man saw the boy) or “the boy” (he saw the boy who has or owns a telescope)
– Put the block in the box on the table• does “in the box” modify the block or “on the table”?
• As with POS, there are many approaches to parsing– the result of parsing is the grouping of words into larger
constituent groups, which are hierarchical so this forms a parse tree
Parse Tree Example• A parse tree for a
simple sentence is shown to the left– notice how the NP
category can be in multiple places
– similarly, a NP or a VP might contain a PP, which itself will contain a NP
• Our parsing algorithm must accommodate this by recursion
Context-free Grammars (CFG)• A formal way to describe the syntactic forms for legal
sentences in a language– the CFG is defined as G=(S, N, S, R) where S is the start state,
R are a set of production rules that map nonterminal symbols (N) into other nonterminal symbols and terminal symbols (S)
– rules are “rewrite rules” which rewrite a more general set of symbols to a more specific set• for instance, NP Det Adj* Noun and Det the | a(n)
• A parse tree for a sentence denotes the mappings of the nonterminal symbols through the rules selected into nonterminal/terminal symbols
• CFGs can be used to build parsers (to perform syntactic analysis and derive parse trees)
Parsing by Dynamic Programming• Also known as chart parsing, which can be top-down or bottom-
up in nature depending on the order of “prediction” and “scan”• Parse left to right in the sentence by selecting a rule in the
grammar that matches the current word’s POS• Apply the rule and keep track of where we are with a dot
(initial, middle, end/complete)– the chart is a data structure, a simple table that is filled in as
processing occurs, using dynamic programming• The chart parsing algorithm consists of three parts:
– prediction: select a rule whose LHS matches the current state, this triggers a new row in the chart
– scan: the rule and match to the sentence to see if we are using an appropriate rule
– complete: once we reach the end of a rule, we complete the given row and return recursively
Example• Simple example of the sentence “Mary runs”• Processing through the grammar:
– S . N V predict: N V– N . mary predict: mary– N mary . scanned: mary – S N . V completed: N; predict: V – V . runs predict: runs– V runs . scanned: runs – S N V . completed : V, completed: S
• The chart:– S0: [($ --> . S), start
(S --> . Noun Verb)] predictor– S1: [(Noun --> mary .), scanner
(S --> Noun . Verb)] completer– S2: [(Verb --> runs .)] scanner
(S --> Noun Verb .), completer ($ --> S .)] completer
Parsing by TNs• A transition network is a finite state automata whose edges are
grammatical classifications• A recursive transition network is the same, but can be
recursive– we use the RTN because of the recursive nature of natural
languages• Given a grammar, we can automatically generate an RTN by
just “unfolding” rules that have the same LHS non-terminal into a single graph
• Use the RTN by starting with a sentence and following the edge that matches the grammatical role of the current word in our parse – this is a bottom-up parsing– we have a successful parse if we reach a state that is a terminating
state– since we traverse the RTN recursively, if we get stuck in a deadend,
we have to backtrack and try another route
Example Grammar and RTN
S NP VP S NP Aux VPNP NP1 Adv | Adv NP1NP1 Det N | Det Adj N |
Pron | That SN Noun | Noun Rreletc…
RTN Output• The parse tree below shows the decomposition
of a sentence S (John hit the ball) into constituents and those constituents into further constituents until we reach the leafs (words)– the actual output of an RTN parser is a nested chain
of constituents and words, generated from the recursive descent through the chart parsing or RTN
[S [NP
(N John)] [VP
[V hit][NP (Det the) (N ball)] ] ] ]
Augmented Transition Networks• The RTN only provides the constituent hierarchy but
while parsing, we could potentially obtain useful information for semantic analysis– we can augment each of the RTN links to have code that
annotates constituent elements with more information such as • is the NP plural?• what is the verb’s tense?• what might a reference refer to?
• We use objects to describe each word where objects have additional variables to denote singular/plural, tense, root (lemma), possibly prefix/suffix information, etc (see the next slide)
• This is an ATN, which makes the transition to semantic analysis somewhat easier
ATN Example
• Each word is tagged by the ATN to include its part of speech (lowest level constituent) along with other information, perhaps obtained through morphological analysis
An ATN Generated Parse Tree
Statistical Parsing• The parsing model consists of two parts– Generative component to generate the constituent layout for
the sentence• This may create a parse tree or a dependency structure (see below)
where we annotate words by their dependencies between each other (these are functional roles which are not quite the same as grammatical roles)
– Evaluative component to rank the output of the generative component• the evaluator may or may not provide probabilities but all it needs
to do is rank all of the outputs
Probabilistic CFGs• Here, the generator is a CFG as we had with the non-
probabilistic approaches• The evaluator computes for each CFG rule its likelihood
and each possible parse for the sentence is merely the probability of applying the sequence of rules that caused that parse to occur– We need training data to acquire the probabilities for each rule
(although unsupervised training is also possible but less accurate)
This approach assumesindependency of ruleswhich is not true and soaccuracy can suffer
History-Based Model• Here, we compute the probability of a sequence of tags
– This uses a history that includes the probability of a sequence of tags both before the given word and after
– Example: Last week Marks bought Brooks.
Note that this approach does not constrain the generative model to fit an actual grammar (that is, sentences do not have to be grammatically correct)
We will need a great deal of data to compute the needed probabilities and since it is unlikely that we will have such a detailed set, we will have to smooth the data that we do have to fit this model
PCFG Transformations• The main drawback of PCFG is not retaining a history• We can annotate a generated constituent by specify
what non-terminal led to this particular item such as – NP^S (S NP) – NP^VP VP Aux V NP
• notice that we are not encoding the whole history like the previous approach but we also need far fewer probabilities here (bigrams)
• Another approach is to provide more finely detailed grammar rules (have more categories) whereby each rule only maps to 1 or 2 items terminal/non-terminals on the right hand side– again limiting the number of probabilities by using bi and
trigrams
Discriminative Models• Here we compute the probability of an entire parse as a whole,
that is P(parse | words)– a local model attempts to find the most probable parse by
concentrating on the most probable parse of each word (thus, a local discrimination)• this uses a history based approach like those we just talked about except that
we do not need exact (or precise) probabilities since we are ranking our choices
– a global model computes the probability of each parse and selects the largest one• this approach has the advantage of being able to incorporate any type of
language feature we like such as specialized rules that are not available in the dataset
• the biggest disadvantage is its computational complexity
• With this approach, we are not tying together the generative and evaluation models so we can use any form of generative model
Semantic Analysis• Now that we have parsed the sentence, how do we
proscribe a meaning to the sentence?– the first step is to determine the meaning of each word
(WSD)– next, we attempt to combine the word meanings into some
representation that conveys the meaning of the utterance• this second step is made easier if our target representation is a
command such as a database query (as found in LUNAR) or an OS command (as found in DWIM – do what I mean)• in general though, this becomes very challenging
– what form of representation should the sentence be stored in? – how do we disambiguate when words have multiple meanings?– how do we handle references to previous sentences?– what if the sentence should not be taken literally?
Word Sense Disambiguation• Even if we have a successful parse, a word might
have multiple meanings that could alter our interpretation– Consider the word tank
• a vat/container (noun)• a transparent receptacle for fish (noun)• a vehicle/weapon (noun)• a jail cell (noun)• to fill (as in a car tank) (verb)• to drink to the point of being drunk (verb)• to lose a game on purpose (verb)
–We also have idiomatic meanings when applied with other words like tank top and think tank
Semantic Grammars• In a restricted domain and restricted grammar, we
might combine the syntactic parsing with words in the lexicon– this allows us not only find the grammatical roles of the
words but also their meanings• the RHS of our rules could be the target representations
rather than an intermediate representation like a parse• S I want to ACTION OBJECT | ACTION OBJECT | please
ACTION OBJECT• ACTION print | save | …• print lp• OBJECT filename | programname | …• filename get_lexical_name( )
• This approach is not useful in a general NLU case
Word Sense Disambiguation• We need to seek clues in the use of the word to help
figure out its word sense– Consider the word plant as a noun:
manufacturing/processing versus life form• since both are nouns, knowing POS is not enough• an adjective may or may not help: nuclear plant versus tropical
plant but if the nuclear plant is in the tropics then we might be misled
• on the other hand, knowing that one sense of the word means a living thing while the other sense is a structure (building) or is used in manufacturing or processing might help
– How much of the sentence (and preceding sentences) might we need to examine before we obtain this word’s sense?
Features for Word Sense Disambiguation• To determine a word’s sense, we look at the word’s POS,
the surrounding words’ POS’ and what those words are• Statistical analysis can help tie a word to a meaning– “Pesticide” immediately preceding plant indicates a
processing/manufacturing plant but “pesticide” anywhere else in the sentence would primarily indicate a life form plant
– The word “open” on either side of plant (within a few words) is equally probable for either sense of the word plant• the window size for comparison is usually the same sentence
although it has been shown that context up to 10,000 words away could still impact another word!
– For “pen”, a trigram analysis might help, for instance “in the pen” would be the child’s structure, “with the pen” would probably be the writing utensil
Rule-based/Frame-based• We could encode for every word its possible senses by
means of rules that help interpret the surrounded words– One early attempt (1975) was to provide a flowchart of rules
for each of the 1815 words in the system’s vocabulary which included looking for clues among morphology, collocations, POS and exact word matches
• Or we could annotate the words with frames that list expectations– e.g., plant: living thing: expect to see words about living
things, structure: expect to see words about structures– we could annotate words with semantic subject codes (EC
for economic/finance, AU for automotive)
Semantic Markers• One approach is through semantic markers– we affix such markers to nouns only and then look at verbs
and other words to determine the proper sense• Example: I will meet you at the diamond– diamond can be
• an abstract object (the geometric shape)• a physical object (a gem stone, usually small)• a location (a baseball diamond)
– look for clues in the sentence that we are referring to an abstract object, physical object, location
– the phrase “at the” indicates a location– this could be erroneous, we might be talking about meeting
up at the exhibit of a large diamond at some museum
Case Grammars• Rather than tying the semantics nouns of the
sentence, we will tie roles to a verb and then look to fill in the roles with the words in the sentence– for instance, does this verb have an agent? an object?
an instrument?• to open: [Object (Instrument) (Agent)]• we expect when something is open to know what was
opened (a door, a jar, a window, a bank vault) and possibly how it was opened (with a door knob, with a stick of dynamite) and possibly who opened it (the bank robber, the wind, etc)
– semantic analysis becomes a problem of filling in the blanks – finding which word(s) in the sentence should be filled into Object or Instrument or Agent
Case Grammar Roles• Agent – instigator of the action• Instrument – cause of the event or object used in the event
(typically inanimate)• Dative – entity affected by the action (typically animate)• Factitive – object or being resulting from the event• Locative – place of the event• Source – place from which something moves• Goal – place to which something moves• Beneficiary – being on whose behalf the event occurred (typically
animate)• Time – time the event occurred• Object – entity acted upon or that is changed
– To kill: [agent instrument (object) (dative) {locative time}]– To run: [agent (locative) (time) (source) (goal)]– To want: [agent object (beneficiary)]
Probabilistic WSD• All of the previous approaches required that humans
create the rules/templates/annotations– this is both time consuming and impractical if we want to
cover all of English or hand annotate tens of thousands of sentences
• Instead, we could learn the rules or do probabilistic reasoning (e.g., HMM trained through bigrams or trigrams)– for a supervised form of learning, we will probably need
hundreds to thousands of annotated sentences (we need enough data to handle the variability we will find in the differing word senses)
– We can also use partially supervised or unsupervised approaches
Supervised WSD• Given the corpus where words are annotated with
appropriate features, we could use– Decision trees and decision lists (a decision list is a binary
function which seeks out features in the data of a class and returns whether the data fits the class or not, we could provide such a function for each word’s sense)
– Naïve Bayesian classifiers– Nearest neighbor– SVMs (with boosting in many cases)
• When there is plenty of data, simpler approaches are often successful (e.g., NBC) and when there is highly discriminative data, decision trees/lists work well, when there is sparse data, SVMs and boosting tend to work the best
Other Approaches• There may not be sufficient annotated data for
supervised training– Lightly (or minimally) supervised training where we have
an enumerated list of word sense memberships (rather than fully annotated sentences) or by using other sources of knowledge (e.g., WordNet) to provide class information can be used• a variation is known as iterative bootstrapping where a small
hand annotated collection of data is used for training and then untrained data is annotated via what has been learned to enlarge the training data, adding annotations/corrections as needed
– Unsupervised clustering can also be applied to determine for a given word various possible senses of that word (what it doesn’t do is necessarily define those senses)
Discourse Processing• Because a sentence is not a stand-alone entity, to fully
understand a statement, we must unite it with previous statements– anaphoric references
• Bill went to the movie. He thought it was good.– parts of objects
• Bill bought a new book. The last page was missing.– parts of an action
• Bill went to New York on a business trip. He left on an early morning flight.
– causal chains• There was a snow storm yesterday. The schools were closed today
– illocutionary force• It sure is cold in here.
Handling References• How do we track references? – consider the following paragraph:• Bill went to the clothing store. A sales clerk asked him if he
could help. Bill said that he needed a blue shirt to go with his blue hair. The clerk looked in the back and found one for him. Bill thanked him for his help.• 2nd sentence: him, he• 3rd sentence: he, his• 4th sentence: one, him• 5th sentence: him, his
• How do we determine what these words refer to? Can we automate this?– is it as simple as tying a reference to the most recent
occurrence of a noun?
Pragmatics• To fully understand NL statements, we need to bring in
worldly knowledge– it sure is cold in here – this is not a statement, it is a polite request to
turn the heat up– do you know what time it is – is not a yes/no question
• Other forms of statements requiring pragmatics– speech acts – the statement itself is the action, as in “you are under
arrest”– understanding and modeling beliefs – a statement may be made
because someone has a false belief, so the listener must adjust from analyzing the sentence to analyzing the sentence within a certain context
– conversational postulates – adding such factors as politeness, appropriateness, political correctness to our speech
– idioms – often what we say is based on colloquialisms and slang – “my bad” shouldn’t be interpreted literally
Final Representational Form• Even when (if) we are able to decipher the
meaning of a statement, how to we store it?– We saw earlier a semantic network-based approach– Also, CDs, filling in scripts, filling in templates, filling
in frames• The final representational form will be based on
the goal of our NLP system– Today, many approaches are using NLP to automate
the construction of ontologies– Another use is to classify a document for indexing by a
search engine so there is no “final” form of the paper other than obtaining its class and lists of keywords
Template Based Information Extraction• Similar to case grammars, an approach is to provide
templates of information and then extract the information from the given document– specifically, once a page has been identified as being relevant
to a topic, a summary of this text can be created by excerpting text into a template
– in the example on the next slide• a web page has been identified as a job ad• the job ad template is brought up and information is filled in by
identifying such target information as “employer”, “location city”, “skills required”, etc
– identifying the right items for extraction is partially based on keyword matching and partially based on using the tags provided by previous syntactic and semantic parsing• for instance, the verb “hire” will have an agent (contact person or
employer) and object (hiree)
Application Areas• MS Word – spell checker/corrector, grammar checker,
thesaurus• WordNet• Search engines (more generically, information retrieval
including library searches)• Database front ends• Question-answering systems within restricted domains• Automated documentation generation• News categorization/summation• Information extraction• Machine translation– for instance, web page translation
• Language composition assistants – help non-native speakers with the language
• On-line dictionaries
Information Retrieval• Originally, this was limited to queries for library
references– “find all computer science textbooks that discuss
abduction” translated into a DB query and submitted to a library DB
• Today, it is found in search engines– take an NLU input and use it to search for the referenced
items• Not only do we need to perform NLU, we also have
to understand the context of the request and disambiguate what a word might mean– do a Google search on abduction and see what you find– simple keyword matching isn’t good enough
NLG, Machine Translation• NLG: given a concept to relate, translate it into a legal
statement– like NLU, a mapping process, but this time in reverse
• much more straight forward than NLU because ambiguity is not present
• but there are many ways to say something, a good NLG will know its audience and select the proper words through register (audience context)
• a sophisticated NLG will use reference and possibly even parts of speech
• Machine Translation:– this is perhaps the hardest problem in NLP becomes it must
combine NLU and NLG– simple word-to-word translation is insufficient– meaning, references, idioms, etc must all be taken care of– current MT systems are highly inaccurate