Text Pattern Formation For Information Extraction

8
Lidia M. Pivovarova Saint-Petersburg State University The Ph.D. advisor: prof. Valery Sh. Rubashkin NLDB 2008

Transcript of Text Pattern Formation For Information Extraction

Page 1: Text Pattern Formation For Information Extraction

Lidia M. PivovarovaSaint-Petersburg State University

The Ph.D. advisor: prof. Valery Sh. Rubashkin

NLDB 2008

Page 2: Text Pattern Formation For Information Extraction

FACTORS -

- the system designed to monitor underling characteristics of a

subject domain

Page 3: Text Pattern Formation For Information Extraction

General System DescriptionThe

Ontology

TEXTS

Lemmatization, part-of-

speech tagging, semantic mark-up

Morph. analyzer

Semantic analyzer Situati

on State

Search Patterns

Page 4: Text Pattern Formation For Information Extraction

The FactorsFactors – the required information aspects.~ 100 factors

Factors: - qualitativee.g. social tension, investment attractiveness,level of sovereignty, human rights activity

- quantitativee.g. the number of unemployed, an average salary, the inflation level, the ammount of import

Page 5: Text Pattern Formation For Information Extraction

Numerical valuesQualitative factors:

very small, small, less than average, average, more than average, large, very large.

Quantitative factors: the number + <unit> e. g. an average salary –> monetary unit (ruble, $, …) the number of unemployed -> no units

Page 6: Text Pattern Formation For Information Extraction

The PatternsQualitative factors ->“factor + numerical value”

patterns.e. g. Social tension <-- spontaneous meeting (large)

Quantitative factors -> “only factor” patterns.e. g. The number of unemployed <-- become

unemployedSearch algorithm 1) find a pattern 2) find a number + unitif not 3) find words large, small, increase, decrease etc.

Page 7: Text Pattern Formation For Information Extraction

Pattern Formation ProcessPattern is a set of words and ontology concepts.

Ontology provides:- pattern generalization- synonym accumulation- information about units

Pattern formation: user marks relevant fragment in a text or chooses concept from the ontology.

Page 8: Text Pattern Formation For Information Extraction

ExampleAs is known, European Union strictly

demanded Latvia to close the both generating units of Ignalinskaya nuclear power station. It is also promised to remit 3 billions euro for this goal.

Factors:The EU pressure to Latvia.The financial aid of EU to Latvia.