Text Processing for Procedural Question Answering
-
Upload
estelle-delpech -
Category
Technology
-
view
195 -
download
2
description
Transcript of Text Processing for Procedural Question Answering
![Page 1: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/1.jpg)
Text Processing for Procedural Question Answering
Undergoing work for TextCoop project
ILPL group, presentation by Estelle Delpech
![Page 2: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/2.jpg)
Text Processing for Procedural Question Answering
I. INTRODUCTION : GLOBAL ARCHITECTURE
II. CLUES TO IDENTIFY TITLES/ INSTRUCTIONNAL COMPOUNDS
III. THE WHOLE PROCESS
IV. MAIN ISSUES
V. DEMO
![Page 3: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/3.jpg)
I. INTRODUCTION : GLOBAL I. INTRODUCTION : GLOBAL ARCHITECTUREARCHITECTURE
![Page 4: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/4.jpg)
A global Architecture (Surdeau & Pasca)
How to…?
Goal
Task
TEXTPROCESSING
![Page 5: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/5.jpg)
TEXT PROCESSING for Procedural QA : Identification of task structure
Xbar analysis of task structure
PRE-PROCESSING
SEGMENTER
TEXT GRAMMAR
Identification of terminal symbols
HTML cleaning MS tagging
TASK
spec G’
complement
GoalPre-requisite
Title Instructional Compound
DATABASE
.html
![Page 6: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/6.jpg)
II . CORPUS OBSERVATION : II . CORPUS OBSERVATION :
WHAT CLUES TO IDENTIFYWHAT CLUES TO IDENTIFY-INSTRUCTIONNAL COMPOUNDS ?-INSTRUCTIONNAL COMPOUNDS ?-TITLES ?-TITLES ?
![Page 7: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/7.jpg)
1. Clues for Instructional Compounds Identification
Definition : kernel instructions linked to various clauses by rhetorical or logical relations.
Identification in two steps :
Fixing the first wall plate (or shelf bracket)We are going to mark the first wall plate (or bracket) for drilling. First, position the face plate so one screw lines up with the mark on the wall you made in the last step and place the level on top of the face plate to ensure it is level. Second, you should mark the wall in the next screw hole, again by turning the screw until it bites into the wall (see fig 1.3). It is advised that you mark any remaining screw holes while keeping the wall plate firmly in position. Now you have to choose a suitable drill bit (masonry or the right type for the surface). It should be the same width as the wall plug to be used. Get to hand one of the wall plugs, and place it against the tip of the drill bit (see fig 1.4). Finally, Place a piece of masking tape on the drill bit to use as a guide, this will ensure you don't drill too deep.
Fixing the first wall plate (or shelf bracket)We are going to mark the first wall plate (or bracket) for drilling. First, position the face plate so one screw lines up with the mark on the wall you made in the last step and place the level on top of the face plate to ensure it is level. Second, you should mark the wall in the next screw hole, again by turning the screw until it bites into the wall (see fig 1.3). It is advised that you mark any remaining screw holes while keeping the wall plate firmly in position. Now you have to choose a suitable drill bit (masonry or the right type for the surface). It should be the same width as the wall plug to be used. Get to hand one of the wall plugs, and place it against the tip of the drill bit (see fig 1.4). Finally, place a piece of masking tape on the drill bit to use as a guide, this will ensure you don't drill too deep.
Detect presence of instructions : expression of obligation Find instructionnal compound boudaries, e.g. connectors…
Fixing the first wall plate (or shelf bracket)We are going to mark the first wall plate (or bracket) for drilling. First, position the face plate so one screw lines up with the mark on the wall you made in the last step and place the level on top of the face plate to ensure it is level. Second, you should mark the wall in the next screw hole, again by turning the screw until it bites into the wall (see fig 1.3). It is advised that you mark any remaining screw holes while keeping the wall plate firmly in position. Now you have to choose a suitable drill bit (masonry or the right type for the surface). It should be the same width as the wall plug to be used. Get to hand one of the wall plugs, and place it against the tip of the drill bit (see fig 1.4). Finally, place a piece of masking tape on the drill bit to use as a guide, this will ensure you don't drill too deep.
![Page 8: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/8.jpg)
Presence of instructions : Morpho-lexical patterns
HTML tags (typo-disposition) :
shall Adv* base form verbHave to Adv* base form verb## Op? adv* base form verbit be adv* (necessary|compulsory) that
<p> </p> <li> </li>
Compound boudaries : Morpho-lexical patterns
## to Adv* base form verb .* ,(##|Conj) (if|then|after )
You should pre-heat the oven
You have to pre-heat the oven
Do not pre-heat the oven
It is better that you pre-heat the oven
[To cook the cake, pre-heat the oven] [and then start peeling …
[If you want to cook the cake, pre-heat the oven.] [If you don’t want to cook …
<li> [ Pre-heat the oven … ]</li>
1. Clues for Instructional Compounds Identification
![Page 9: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/9.jpg)
2. Titles identification :About the HTML encoding of titles
The <hn> tag can not be used as a single clue for title identification
HTML encoding is free, the code can be underspecified (css)
Corpus observation : 80 % titles are encoded with <b> 57 % <b> encode titles 64 % <h> encode titles the coding varies from a web site to another
We had to find some other clues …
![Page 10: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/10.jpg)
2. Clues for Title Identification
Some helpful visual Clues : Short sequence of word
Spaced from the rest of the text
Emphasized
not emphasizednot a title
not short
![Page 11: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/11.jpg)
Linguistic Clues :
2. Clues for Title Identification
Rarely contains tensed verb Can be a single question
?
?
Textual environment clues :
Occurs between two paragraphs of text
Occurs between title and a paragraph of text ?
?
No single clue, but a bundle of clues
![Page 12: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/12.jpg)
III. THE WHOLE PROCESSIII. THE WHOLE PROCESS
PRE-PROCESSING
SEGMENTER
Identification of terminal symbols
HTML cleaning MS tagging
Title
Instructional Compound
![Page 13: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/13.jpg)
1. HTML Cleaning module
Emphasis tags
<h><b><u><i>
<b>
<br>
<br>
<br>
<li>
<li>
<p>
<p>
<p>
<p>
<p>
Main typo-dispostional informationHTML Cleaning
<b>
<b>
<b>
<b>
Text chunks tags
<p> <div><ol> <ul>
<p><p>
Raw HTML Code
<br><li>
<br><li>
Subdivision tagsSubdivision tags
The output of the HTML Cleaning module is :
a list of text chunks, corresponding more or less to paragraph breaks
Their corresponding typo-dispositionnal structure
![Page 14: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/14.jpg)
2. Clues Collection module
MS Tagging Clues collection
<li><li>
<b>
<b>
<b><br>
<b><br><br>
<b>
<li><li>
<b>
STRUCTURE TEXT CLUES
Nb of instructions Instructions types Nb of goals Nb of words Nb of sentences Nb of question Nb of tensed verbs
TAGS
TreeTagger
The output of the Clues Collection module is : the list of text chunks with :
Their corresponding typo-dispositionnal structure
Text with tagged instructions, goals, connectors
Linguistic information This information is used for :
Titles identification Instructionnal compounds
identification
![Page 15: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/15.jpg)
3. Processing each chunk : text or title ?TEXT CHUNKSTYPE
unknown
unknown
unknown
unknown
unknown
unknown
unknown
unknown
unknown
unknown
TEXT CHUNKS
Identification of unambiguous
Titles
Short chunk spaced from the rest of the
text with emphasis a single question
Identification ofunambiguous paragraphs of
text
Long chunk No emphasis Subdivided + than 1 instruction presence of tensed verbs
title
text
ambiguous
text
text
title
ambiguous
text
ambiguous
ambiguous
![Page 16: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/16.jpg)
3. Ambiguous chunks : text or title ?
Short chunks with no emphasis
Instruction-like short chunks
Use of textual environement clues : 1. Identify unambiguous titles/paragraphs of text2. Desambiguates the remaining chunks
![Page 17: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/17.jpg)
3. Ambiguous chunks : text or title ?TEXT CHUNKS
Desambiguisationusing textual
environment clues
a series of ambiguous paragraphs become texttitle
text
ambiguous
text
text
title
ambiguous
text
ambiguous
ambiguous
TEXT CHUNKS
title
text
text
text
title
text
text
text
title
title
an ambiguous paragraph between two paragraphs of text becomes a title
an ambiguous paragraph between two paragraphs of text becomes a title
![Page 18: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/18.jpg)
OUTPUT EXAMPLE
goal
goal
goal
MAIN GOAL
MAIN TASK
task
task
task
![Page 19: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/19.jpg)
IV. Main issues : noise in web pages
« noise » of web pages : advertisements, lists of links, navigation help... interfers with compouds /title identification :
short sequence emphasis linguistic form:
Base form verb at the beginning of a sentence typical of a title or an instruction
but it is a list of links !!
titles
instruction
titles
![Page 20: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/20.jpg)
IV. Main issues : refining goal/titles identification
only sub-goals sub tasks relations are identified what about the hierarchy task/sub-task(s) ?
what about the head title / main goal ? the head title is not always the 1st
identified title (noise) sometimes there is no head title
what if the action is implicit ? ex : the room and the bed implicit : how to clean the room and the
bed
some ideas : choose a title that has vocabulary in
common with instructions identify action verbs in relation with the
nouns of the title
![Page 21: Text Processing for Procedural Question Answering](https://reader033.fdocuments.us/reader033/viewer/2022052412/558cf59bd8b42a830f8b474d/html5/thumbnails/21.jpg)
V. DEMOV. DEMO