Morphological Analysis

26
Language processing (HUL455) MORPHOLOGICAL ANALYSIS -JINIA RAO & ASHISH KASHYAP

Transcript of Morphological Analysis

Page 1: Morphological Analysis

Language processing (HUL455)

MORPHOLOGICAL ANALYSIS

-JINIA RAO & ASHISH KASHYAP

Page 2: Morphological Analysis

CONTENTS

• Morphology & its types.

• Approaches to Morphology

• Morpheme based morphology

• Morphological Analysis and its need.

• Morphological Generation and Analysis using Paradigms

• Problems in Morphological Analysis.

• Bibliography.

Page 3: Morphological Analysis

MORPHOLOGY

• The study of word formation – how words are built up from smaller pieces.

• Identification, analysis, and description of the structure of a given language's MORPHEMES and other linguistic units, such as root words, affixes, parts of speech, intonations and stresses, or implied context.

Page 4: Morphological Analysis

Examples

• Washing= wash + ing

• Browser= browse + er

• Rats= rat + s

Page 5: Morphological Analysis

Types of Morphology

• Inflectional morphology:-modification of a word to express different grammatical categories. Examples- cats, men etc.

• Derivational Morphology:- creation of a new word from existing word by changing grammatical category. Examples- happiness, brotherhood etc.

Page 6: Morphological Analysis

APPROACHES TO MORPHOLOGY

There are three principal approaches to morphology

• Morpheme based morphology

• Lexeme based morphology

• Word based morphology

Page 7: Morphological Analysis

Morpheme-based morphology

• Word forms are analyzed as arrangements of morphemes.

• Morphemes- smallest linguistic unit with a grammatical function.

Page 8: Morphological Analysis

Lexeme based Morphology

• Lexeme-based morphology usually takes what is called an "item-and-process" approach.

• Instead of analyzing a word form as a set of morphemes arranged in sequence, a word form is said to be the result of applying rules that alter a word-form or stem in order to produce a new one

Page 9: Morphological Analysis

Word based Morphology

• Word-based morphology is (usually) a word-and-paradigm approach.

• Instead of stating rules to combine morphemes into word forms, or to generate word forms from stems, word-based morphology states generalizations that hold between the forms of inflectional paradigms

Page 10: Morphological Analysis

MORPHOLOGICAL ANALYSIS

• Analyzing words into their linguistic components (morphemes).

• Ambiguity: More than one alternatives

flies fly VERB + PROG

fly NOUN + PLU

Page 11: Morphological Analysis

Expected Output

Input Morphologically analyzed output

Cats Cat+ N+ PL

Cat Cat + N + SG

Cities City + N + PL

Geese Goose + N + PL

Goose Goose + N + SG OR Goose + V

Gooses Goose + V + 3SG

Merging Merge + V + PresPart

Caught Catch + V + PastPart

Caught Catch + V + Past

Page 12: Morphological Analysis

NEED FOR MORPHOLOGICAL ANALYSIS

• Wastage of memory in exhaustive lexicon.

• Failure to depict linguistic generalization-necessary to understand an unknown word.

• Morphologically rich and productive languages might be problematic.

Page 13: Morphological Analysis

MORPHOLOGICAL ANALYSIS USING PARADIGMS

• Most NLP systems use simple linguistic theories for morphological analysis.

• Most NLP systems widely use this approach.

Page 14: Morphological Analysis

• Words are related to each other by analogical rules.

• Words can be categorized based on the pattern they fit into.

• Applicable both to existing words and to new ones.

• Application of a pattern different from the one that has been used - give rise to a new word

• Examples:-older replacing elder .

Page 15: Morphological Analysis

Procedure and Algorithm

• A language expert provides different tables of word forms covering the words in the entire language.

• The roots follow the pattern( or paradigm ) implicit in the table for generating their word forms.

• Examples

Page 16: Morphological Analysis

Continued..

EACH ENTRY IN THE TABLE SHOWS THE NUMBER OF CHARACTERS TO BE DELETED FROM

CASE

Number Direct Oblique

Singular LADKAA LADAKE

Plural LADAKE LADAKON

CASE

Number Direct Oblique

Singular (0,ø) (1,e)

Plural (1,e) (1,ON)

Page 17: Morphological Analysis

Continued…

The table can be expressed in terms of an algorithm, which is as follows:-

ALGORITHM 1: Forming paradigm table

PURPOSE: To form paradigm table from word forms table for a root

INPUT: Root r, Words forms table WFT (with labels for rows and columns)

OUTPUT: Paradigm table PT

ALGORITHM:

1. Create an empty table PT of the same dimensionality, size and labels as the word forms table WFT

Page 18: Morphological Analysis

Continued…

2. For every entry w in WTF, doIf w=r

then store “(0, Ø)” in the corresponding position in PT

else begin

let i be the position of the first characters in w and r which are

differentstore (size(r)-i+1,suffix(i,w)) at the

corresponding position in PT3. Return PT

Page 19: Morphological Analysis

Generation of a Word Form

ALGORITHM 2: Generating a word formPURPOSE: To generate a word form given a root and desired feature values.INPUT: Root r, Feature values FVUSES: Paradigm tables, Dictionary of roots DR, dictionary of indeclinable words DIOUTPUT: Word wALGORITHM:

1. If root r belongs to DI then return( words stored in DI for r irrespective of FV)

Page 20: Morphological Analysis

Continued…

2. let p = paradigm type of r as obtained from DR

3. let PT = paradigm table for p.

4. let (n,s) = entry in PT for feature values FV

5. w := r minus n characters at the end

6. w := w plus suffix s

END ALGORITHM

Page 21: Morphological Analysis

PROBLEMS IN MORPHOLOGICAL ANALYSIS

• False Analysis

• Productivity

• Bound base morphemes

Page 22: Morphological Analysis

False analysis

Words such as hospitable, sizeable.

• They don’t have the meaning “to be able”

• They can not take the suffix -ity to form a noun

• Analyzing them as the words containing suffix -able leads to false analysis

Page 23: Morphological Analysis

PRODUCTIVITY

• Property of a morphological process to give rise to new formations on a systematic basis.

Exceptions to the above rule.• Peaceable• Actionable• Companionable

Page 24: Morphological Analysis

Bound Base Morphemes

• Occur only in a particular complex word.

• Do not have independent existence.

• Words such as feasible, malleable

• -able has the regular meaning “be able”

• -ity form is possible

• Base words don’t exit independently

base (nonexistent)

morpheme (known)

Compound

Page 25: Morphological Analysis

REFERENCES

• “Linguistics, An Introduction to Language and Communication” by Adrian Akmajian, Richard A. Demers, Ann K. Farmer and Robert M. Harnish (5th Edition)

• SPEECH and LANGUAGE PROCESSING, An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky and James H. Martin (Second Edition)

• “Natural Language Processing- a Paninian perspective” by Akshar Bharati, Vineet Chaitanya, Rajeev Sangal.

Page 26: Morphological Analysis

THANKYOU!!!