A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer...
Transcript of A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer...
1
A Co
mpu
ter
Syst
em f
or A
utom
atic
ally
A
Com
pute
r Sy
stem
for
Aut
omat
ical
ly
Iden
tifyi
ng T
ext
Stru
ctur
e in
Writ
ing
Iden
tifyi
ng T
ext
Stru
ctur
e in
Writ
ing
Laur
ence
Ant
hony
and
Geo
rge
V.
Laur
ence
Ant
hony
and
Geo
rge
V. L
ashk
iaLa
shki
aD
ept.
of C
ompu
ter
Scie
nce,
Fac
ulty
of E
ngin
eerin
gD
ept.
of C
ompu
ter
Scie
nce,
Fac
ulty
of E
ngin
eerin
gO
kaya
ma
Univ
. of S
cien
ce, 1
Oka
yam
a Un
iv. o
f Sci
ence
, 1-- 1
1 Ri
dai
Rida
i -- chocho ,
Oka
yam
a, O
kaya
ma
anth
ony
anth
ony @
ice.
@ic
e.ou
sou
s .ac
..a
c.jp
jp
la
shki
ala
shki
a @ic
e.@
ice.
ous
ous .
ac.
.ac.
jpjp
http
://a
ntpc
1.ic
e.ht
tp:/
/ant
pc1.
ice.
ous
ous .
ac.
.ac.
jpjp
2
Pres
enta
tion
Out
line
Pres
enta
tion
Out
line
Back
grou
ndBa
ckgr
ound
Rese
arch
Aim
Rese
arch
Aim
Syst
em D
esig
nSy
stem
Des
ign
Appl
icat
ion
to R
esea
rch
Abst
ract
sAp
plic
atio
n to
Res
earc
h Ab
stra
cts
Res
ults
Res
ults
Conc
lusi
ons
Conc
lusi
ons
3
Back
grou
ndBa
ckgr
ound
Impo
rtan
ce o
f Te
xt S
truc
ture
Impo
rtan
ce o
f Te
xt S
truc
ture
Swal
es (
1981
, 199
0),
Swal
es (
1981
, 199
0),
Carr
ell
Carr
ell (
1982
)(1
982)
Hin
ds (
1982
, 198
3),
Hin
ds (
1982
, 198
3),
Hoe
y H
oey
(199
4), W
inte
r (1
994)
(199
4), W
inte
r (1
994)
Stud
ies
on T
ext
Stru
ctur
eSt
udie
s on
Tex
t St
ruct
ure
TITL
ESTI
TLES
--D
udle
yD
udle
y --Ev
ans
(199
4), A
ntho
ny (
2001
)Ev
ans
(199
4), A
ntho
ny (
2001
)A
BST
RA
CTS
AB
STR
AC
TS--
Ayer
s (1
993)
, Pos
tegu
illo
(199
6)Ay
ers
(199
3), P
oste
guill
o (1
996)
INST
RO
DU
CTI
ON
SIN
STR
OD
UC
TIO
NS
--Sw
ales
(19
90),
Ant
hony
(19
99)
Swal
es (
1990
), A
ntho
ny (
1999
)D
ISC
USS
ION
SD
ISC
USS
ION
S--
Hop
kins
& D
udle
yH
opki
ns &
Dud
ley --
Evan
s (1
988)
Evan
s (1
988)
PA
TEN
TSP
ATE
NTS
--Ba
zerm
an (
1994
)Ba
zerm
an (
1994
)G
RA
NT
PR
OP
OSA
LSG
RA
NT
PR
OP
OSA
LS--
Conn
or &
Co
nnor
& M
aura
nen
Mau
rane
n (1
999)
(199
9)LE
GA
L W
RIT
ING
LEG
AL
WR
ITIN
G--
Bhat
ia
Bhat
ia (
1993
)(1
993)
4
Back
grou
ndBa
ckgr
ound
Prob
lem
s w
ith A
naly
zing
Tex
t St
ruct
ure
Prob
lem
s w
ith A
naly
zing
Tex
t St
ruct
ure
A la
rge
corp
us o
f te
xt d
ata
A la
rge
corp
us o
f te
xt d
ata
(The
tex
t da
ta m
ust
(The
tex
t da
ta m
ust
‘‘ ACU
RAT
ELY
ACU
RAT
ELY ’’
repr
esen
t w
hat
repr
esen
t w
hat
we
hope
to
stud
y)w
e ho
pe t
o st
udy)
A lo
t of
res
earc
h tim
eA
lot
of r
esea
rch
time
(Tim
e to
ana
lyze
a lo
t of
tex
ts)
(Tim
e to
ana
lyze
a lo
t of
tex
ts)
Goo
d va
lidat
ion
and
relia
bilit
y te
sts
Goo
d va
lidat
ion
and
relia
bilit
y te
sts
Mos
t Te
xt S
truc
ture
Stu
dies
are
Mos
t Te
xt S
truc
ture
Stu
dies
are
‘‘ Sm
all S
cale
Smal
l Sca
le’’
5
Back
grou
ndBa
ckgr
ound
Swal
es (
1981
: p.
13)
Swal
es (
1981
: p.
13)
““ In
effe
ct, t
he d
isco
urse
ana
lyst
labe
ls
In e
ffect
, the
dis
cour
se a
naly
st la
bels
so
met
hing
as
X an
d th
en b
egin
s to
see
X
som
ethi
ng a
s X
and
then
beg
ins
to s
ee X
oc
curr
ing
all o
ver
the
plac
eoc
curr
ing
all o
ver
the
plac
e ””
6
Back
grou
ndBa
ckgr
ound
Hen
ry e
t al
. (20
01)
Hen
ry e
t al
. (20
01)
40 A
pplic
atio
n Le
tter
s40
App
licat
ion
Lett
ers
Taro
neTa
rone
et a
l. (2
000)
et a
l. (2
000)
2 Ph
ysic
s Res
earc
h Ar
ticle
s2
Phys
ics
Res
earc
h Ar
ticle
s
Conn
or e
t al
. (19
99)
Conn
or e
t al
. (19
99)
34 G
rant
Pro
posa
ls34
Gra
nt P
ropo
sals
Will
iam
s (1
999)
Will
iam
s (1
999)
5 M
edic
al R
esea
rch
Artic
les
5 M
edic
al R
esea
rch
Artic
les
Anth
ony
(199
9)An
thon
y (1
999)
12 C
ompu
ter
Scie
nce
Rese
arch
Art
icle
12
Com
pute
r Sc
ienc
e Re
sear
ch A
rtic
le
Intr
oduc
tions
Intr
oduc
tions
7
Res
earc
h Ai
mRes
earc
h Ai
m
Dev
elop
a C
ompu
ter
Syst
em t
o Pr
oces
s D
evel
op a
Com
pute
r Sy
stem
to
Proc
ess
and
Anal
yze
Text
Str
uctu
re A
utom
atic
ally
and
Anal
yze
Text
Str
uctu
re A
utom
atic
ally
––A A
‘‘ Lea
rnin
g Sy
stem
Lear
ning
Sys
tem
’’fo
r te
xt s
truc
ture
for
text
str
uctu
re
Easy
to
Proc
ess
a La
rge
Corp
us o
f Te
xt D
ata
Easy
to
Proc
ess
a La
rge
Corp
us o
f Te
xt D
ata
Fast
Fast
The
anal
ytic
pro
cess
is c
lear
ly d
efin
edTh
e an
alyt
ic p
roce
ss is
cle
arly
def
ined
Easy
to
test
the
rel
iabi
lity
and
valid
ity
Easy
to
test
the
rel
iabi
lity
and
valid
ity
8
Syst
em D
esig
nSy
stem
Des
ign
‘‘ Uns
uper
vise
d Le
arni
ngU
nsup
ervi
sed
Lear
ning
’’ vs.
.vs
..‘‘ S
uper
vise
d Le
arni
ngSu
perv
ised
Lea
rnin
g ’’??
In U
nsup
ervi
sed
Lear
ning
,In
Uns
uper
vise
d Le
arni
ng,
Giv
e th
e sy
stem
tex
t ex
ampl
esG
ive
the
syst
em t
ext
exam
ples
Tell
the
syst
em w
hat
Tell
the
syst
em w
hat
‘‘ feat
ures
feat
ures
’’ to
look
at
to lo
ok a
tLe
t th
e sy
stem
fin
d a
mod
el (
set
of c
lass
es)
by
Let
the
syst
em f
ind
a m
odel
(se
t of
cla
sses
) by
de
finin
g a
rela
tion
betw
een
the
feat
ures
and
the
de
finin
g a
rela
tion
betw
een
the
feat
ures
and
the
ex
ampl
esex
ampl
esCl
assi
fy n
ew t
ext
exam
ples
by
com
paris
on w
ith
Clas
sify
new
tex
t ex
ampl
es b
y co
mpa
rison
with
fe
atur
es in
eac
h cl
ass
feat
ures
in e
ach
clas
s
9
Uns
uper
vise
d Le
arni
ngU
nsup
ervi
sed
Lear
ning
Giv
e th
e sy
stem
tex
t ex
ampl
esG
ive
the
syst
em t
ext
exam
ples
Text
1:
Onc
e up
on a
tim
e, t
here
was
a u
gly
duck
ling.
Text
1:
Onc
e up
on a
tim
e, t
here
was
a u
gly
duck
ling.
Text
2:
It li
ved
on a
lake
.Te
xt 2
: It
live
d on
a la
ke.
Text
3:
One
day
, the
litt
le b
ird t
urne
d in
to a
sw
an.
Text
3:
One
day
, the
litt
le b
ird t
urne
d in
to a
sw
an.
Text
4:
It li
ved
happ
ily, e
ver,
aft
er.
Text
4:
It li
ved
happ
ily, e
ver,
aft
er.
Tell
the
syst
em w
hat
Tell
the
syst
em w
hat
‘‘ feat
ures
feat
ures
’’ to
look
at
to lo
ok a
tAl
l wor
ds e
xcep
t ar
ticle
s, N
o pu
nctu
atio
nAl
l wor
ds e
xcep
t ar
ticle
s, N
o pu
nctu
atio
n
Def
ine
a re
latio
n be
twee
n fe
atur
es a
nd
Def
ine
a re
latio
n be
twee
n fe
atur
es a
nd
exam
ples
exam
ples
Clas
s 1
Clas
s 1
--on
ce, u
pon,
tim
e, t
here
, was
, ugl
y, d
uckl
ing
once
, upo
n, t
ime,
the
re, w
as, u
gly,
duc
klin
gCl
ass
2 Cl
ass
2 --
one,
day
, litt
le, b
ird, t
urne
d, in
to, s
wan
one,
day
, litt
le, b
ird, t
urne
d, in
to, s
wan
Clas
s 3
Clas
s 3
--it,
live
d, o
n, la
ke, h
appi
ly, e
ver,
aft
erit,
live
d, o
n, la
ke, h
appi
ly, e
ver,
aft
er
10
Uns
uper
vise
d Le
arni
ngU
nsup
ervi
sed
Lear
ning
Uns
uper
vise
d le
arni
ng s
yste
m m
odel
s of
ten
Uns
uper
vise
d le
arni
ng s
yste
m m
odel
s of
ten
DO
NO
T m
atch
our
mod
els
DO
NO
T m
atch
our
mod
els
Clas
sify
new
tex
t ex
ampl
esCl
assi
fy n
ew t
ext
exam
ples
Onc
e up
on a
tim
e, t
here
wer
e 3
bear
s (B
EG)
Onc
e up
on a
tim
e, t
here
wer
e 3
bear
s (B
EG)
The
3 be
ars
lived
in a
big
hou
se. (
MID
)Th
e 3
bear
s liv
ed in
a b
ig h
ouse
. (M
ID)
They
all
stay
ed in
the
hou
se h
appi
ly e
ver
afte
r. (
END
)Th
ey a
ll st
ayed
in t
he h
ouse
hap
pily
eve
r af
ter.
(EN
D)
The
syst
em w
ill d
ecid
e Th
e sy
stem
will
dec
ide
……Cl
ass
1 (m
atch
ing
Clas
s 1
(mat
chin
g ‘‘ o
nce
once
’’ , , ‘‘ u
pon
upon
’’ , , ‘‘ ti
me
time ’’
, , ‘‘ th
ere
ther
e ’’))
Clas
s 3
(mat
chin
g Cl
ass
3 (m
atch
ing
‘‘ live
dliv
ed’’ ))
Clas
s 3
(mat
chin
g Cl
ass
3 (m
atch
ing
‘‘ hap
pily
happ
ily’’ , ,
‘‘ eve
rev
er’’ , ,
‘‘ aft
eraf
ter ’’ ))
11
Syst
em D
esig
nSy
stem
Des
ign
‘‘ Uns
uper
vise
d Le
arni
ngU
nsup
ervi
sed
Lear
ning
’’ vsvs..
‘‘ Sup
ervi
sed
Lear
ning
Supe
rvis
ed L
earn
ing ’’
??In
Sup
ervi
sed
Lear
ning
,In
Sup
ervi
sed
Lear
ning
,G
ive
the
syst
em a
str
uctu
re m
odel
(se
t of
cla
sses
)G
ive
the
syst
em a
str
uctu
re m
odel
(se
t of
cla
sses
)G
ive
the
syst
em e
xam
ples
of
the
mod
elG
ive
the
syst
em e
xam
ples
of
the
mod
elTe
ll th
e sy
stem
wha
t Te
ll th
e sy
stem
wha
t ‘‘ fe
atur
esfe
atur
es’’ t
o lo
ok a
tto
look
at
Def
ine
a re
latio
n be
twee
n th
e cl
asse
s an
d th
e D
efin
e a
rela
tion
betw
een
the
clas
ses
and
the
feat
ures
feat
ures
Clas
sify
new
tex
t ex
ampl
es b
y co
mpa
ring
its
Clas
sify
new
tex
t ex
ampl
es b
y co
mpa
ring
its
feat
ures
with
tho
se in
eac
h cl
ass
feat
ures
with
tho
se in
eac
h cl
ass
12
Supe
rvis
ed L
earn
ing
Supe
rvis
ed L
earn
ing
Giv
e th
e sy
stem
a s
truc
ture
mod
elG
ive
the
syst
em a
str
uctu
re m
odel
(set
of
clas
ses)
(set
of
clas
ses)
Clas
s 1:
BEG
INN
ING
Clas
s 1:
BEG
INN
ING
Clas
s 2:
MID
DLE
Clas
s 2:
MID
DLE
Clas
s 3:
EN
DCl
ass
3: E
ND
Giv
e th
e sy
stem
exa
mpl
es o
f th
e m
odel
Giv
e th
e sy
stem
exa
mpl
es o
f th
e m
odel
BEG
: O
nce
upon
a t
ime,
the
re w
as a
ugl
y du
cklin
g.BE
G:
Onc
e up
on a
tim
e, t
here
was
a u
gly
duck
ling.
MID
: It
live
d on
a la
ke.
MID
: It
live
d on
a la
ke.
MID
: O
ne d
ay, t
he li
ttle
bird
tur
ned
into
a s
wan
.M
ID:
One
day
, the
litt
le b
ird t
urne
d in
to a
sw
an.
END
: It
live
d ha
ppily
, eve
r, a
fter
.EN
D:
It li
ved
happ
ily, e
ver,
aft
er.
Tell
the
syst
em w
hat
Tell
the
syst
em w
hat
‘‘ feat
ures
feat
ures
’’ to
look
at
to lo
ok a
tAl
l wor
ds e
xcep
t ar
ticle
s...,
No
punc
tuat
ion
All w
ords
exc
ept
artic
les.
.., N
o pu
nctu
atio
n
13
Supe
rvis
ed L
earn
ing
Supe
rvis
ed L
earn
ing
Def
ine
a re
latio
n be
twee
n cl
asse
s an
d fe
atur
esD
efin
e a
rela
tion
betw
een
clas
ses
and
feat
ures
Clas
s 1
(BEG
) Cl
ass
1 (B
EG)
--on
ce, u
pon,
tim
e, t
here
, was
, ug
ly, du
cklin
gon
ce, u
pon,
tim
e, t
here
, was
, ug
ly, du
cklin
gCl
ass
2 (M
ID)
Clas
s 2
(MID
) --
it, li
ved,
on,
lake
, one
, da
y, li
ttle
, bi
rd, t
urne
d,
it, li
ved,
on,
lake
, one
, da
y, li
ttle
, bi
rd, t
urne
d,
into
, sw
anin
to, s
wan
Clas
s 3
(EN
D)
Clas
s 3
(EN
D)
--liv
ed, h
appi
ly, e
ver,
aft
erliv
ed, h
appi
ly, e
ver,
aft
er
Clas
sify
new
tex
t ex
ampl
esCl
assi
fy n
ew t
ext
exam
ples
Onc
e up
on a
tim
e, t
here
wer
e 3
bear
s (B
EG)
Onc
e up
on a
tim
e, t
here
wer
e 3
bear
s (B
EG)
The
3 be
ars
lived
in a
big
hou
se. (
MID
)Th
e 3
bear
s liv
ed in
a b
ig h
ouse
. (M
ID)
They
all
lived
in t
he h
ouse
hap
pily
eve
r af
ter.
(EN
D)
They
all
lived
in t
he h
ouse
hap
pily
eve
r af
ter.
(EN
D)
The
syst
em w
ill d
ecid
eTh
e sy
stem
will
dec
ide ……
Clas
s 1
(BEG
) (m
atch
ing
Clas
s 1
(BEG
) (m
atch
ing
‘‘ onc
eon
ce’’ , ,
‘‘ upo
nup
on’’ , ,
‘‘ tim
etim
e ’’, , ‘‘ th
ere
ther
e ’’))
Clas
s 2
(MID
) (m
atch
ing
Clas
s 2
(MID
) (m
atch
ing
‘‘ live
dliv
ed’’ ))
Clas
s 3
(EN
D)
(mat
chin
g Cl
ass
3 (E
ND
) (m
atch
ing
‘‘ live
dliv
ed’’ , ,
‘‘ hap
pily
happ
ily’’ , ,
‘‘ eve
rev
er’’ , ,
‘‘ aft
eraf
ter ’’ ))
14
Supe
rvis
ed L
earn
ing
Supe
rvis
ed L
earn
ing
Prob
lem
sPr
oble
ms
We
need
a
We
need
a ‘‘ g
ood
good
’’ mod
el o
f st
ruct
ure
mod
el o
f st
ruct
ure
––Bu
t th
ere
are
man
y m
odel
s of
str
uctu
re in
the
lite
ratu
reBu
t th
ere
are
man
y m
odel
s of
str
uctu
re in
the
lite
ratu
re
We
need
a s
et o
f W
e ne
ed a
set
of
‘‘ labe
led
exam
ples
labe
led
exam
ples
’’––
But
man
y sy
stem
s w
ork
wel
l with
onl
y a
few
labe
led
But
man
y sy
stem
s w
ork
wel
l with
onl
y a
few
labe
led
exam
ples
exam
ples
We
need
a
We
need
a ‘‘ g
ood
good
’’ set
of
feat
ures
set
of f
eatu
res
––Bu
t la
ngua
ge c
onta
ins
a LO
T of
red
unda
nt w
ords
!Bu
t la
ngua
ge c
onta
ins
a LO
T of
red
unda
nt w
ords
!(e
.g. a
, the
, of, in
, (e
.g. a
, the
, of, in
, ……
.).)––
Build
ing
a lis
t of
fea
ture
s by
han
d is
infe
asib
leBu
ildin
g a
list
of f
eatu
res
by h
and
is in
feas
ible
We
need
a
We
need
a ‘‘ g
ood
good
’’ rel
atio
n be
twee
n th
e cl
asse
s an
d re
latio
n be
twee
n th
e cl
asse
s an
d th
e fe
atur
es??
?th
e fe
atur
es??
?––
In p
ract
ice,
ver
y si
mpl
e re
latio
nshi
ps a
re e
ffec
tive!
In p
ract
ice,
ver
y si
mpl
e re
latio
nshi
ps a
re e
ffec
tive!
15
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Giv
e th
e sy
stem
a s
truc
ture
mod
el:
Giv
e th
e sy
stem
a s
truc
ture
mod
el:
‘‘ Mod
ifie
dM
odif
ied ’’
CAR
S M
odel
(Sw
ales
, 199
0: A
ntho
ny, 1
999)
CAR
S M
odel
(Sw
ales
, 199
0: A
ntho
ny, 1
999)
Mov
e 1
Mov
e 1
Esta
blis
hing
Es
tabl
ishi
ng
1.1
1.1
Cla
imin
g c
entr
alit
yC
laim
ing
cen
tral
ity
a Te
rrito
ry
a Te
rrito
ry
1.2
1.2
Mak
ing
top
ic g
ener
aliz
atio
ns
Mak
ing
top
ic g
ener
aliz
atio
ns
1.3
1.3
Rev
iew
ing
item
s of
pre
viou
s re
sear
chR
evie
win
g it
ems
of p
revi
ous
rese
arch
Mov
e 2
Mov
e 2
Esta
blis
hing
Es
tabl
ishi
ng
2.1
A2
.1A
Cou
nte
r cl
amin
gC
oun
ter
clam
ing
a ni
che
a ni
che
2.1
B
2.1
B
Ind
icat
ing
a g
apIn
dic
atin
g a
gap
2.1
C
2.1
C
Qu
esti
on r
aisi
ng
Qu
esti
on r
aisi
ng
2.1
D
2.1
D
Con
tin
uin
g a
tra
diti
onC
onti
nu
ing
a t
radi
tion
Mov
e 3
Mov
e 3
Occ
upyi
ng
Occ
upyi
ng
3.1
A
3.1
A
Ou
tlin
ing
purp
ose
Ou
tlin
ing
purp
ose
the
nich
eth
e ni
che
3.1
B
3.1
B
An
nou
nci
ng
pres
ent
rese
arch
An
nou
nci
ng
pres
ent
rese
arch
3.2
3.2
An
nou
nci
ng
prin
cipa
l fin
din
gsA
nn
oun
cin
g pr
inci
pal f
ind
ings
3.3
3.3
Eval
uat
ion
of
rese
arch
Eval
uat
ion
of
rese
arch
3.4
3.4
Ind
icat
ing
RA
str
uct
ure
Ind
icat
ing
RA
str
uct
ure
16
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Giv
e th
e sy
stem
exa
mpl
es o
f th
e m
odel
Giv
e th
e sy
stem
exa
mpl
es o
f th
e m
odel
100
Abst
ract
s (I
EEE
Tran
s. o
n PD
S) d
ivid
ed in
to10
0 Ab
stra
cts
(IEE
E Tr
ans.
on
PDS)
div
ided
into
692
labe
led
69
2 la
bele
d ‘‘ S
teps
Uni
tsSt
eps
Uni
ts’’ (
only
exa
mpl
es f
rom
6 c
lass
es)
(onl
y ex
ampl
es f
rom
6 c
lass
es)
554
Step
Uni
ts (
80%
) us
ed f
or
554
Step
Uni
ts (
80%
) us
ed f
or ‘‘ t
rain
ing
trai
ning
’’ the
sys
tem
the
syst
em13
8 St
ep U
nits
(20
%)
used
for
13
8 St
ep U
nits
(20
%)
used
for
‘‘ tes
ting
test
ing ’’
the
syst
emth
e sy
stem
Tell
the
syst
em w
hat
Tell
the
syst
em w
hat
‘‘ feat
ures
feat
ures
’’ to
look
at
to lo
ok a
tAl
l wor
d cl
uste
rs u
p to
5 w
ords
long
All w
ord
clus
ters
up
to 5
wor
ds lo
ngPo
sitio
n of
ste
p un
it in
abs
trac
t (i.
e. 1
st, 2
nd, 3
rd,
Posi
tion
of s
tep
unit
in a
bstr
act
(i.e.
1st
, 2nd
, 3rd
, ……
))
(Red
uce
(Red
uce
‘‘ Noi
seN
oise
’’ in
Feat
ures
)in
Fea
ture
s)Ran
k w
ords
by
Ran
k w
ords
by
‘‘ impo
rtan
ceim
port
ance
’’ usi
ng:
usin
g:ra
w f
requ
ency
, ra
w f
requ
ency
, in
form
atio
n ga
inin
form
atio
n ga
inU
se o
nly
high
ran
ked
wor
dsU
se o
nly
high
ran
ked
wor
ds
17
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Ther
e w
ere
man
y pe
ople
in t
he p
ark.
Ther
e w
ere
man
y pe
ople
in t
he p
ark.
––1
wor
d1
wor
dth
ere/
wer
e/ m
any/
peo
ple/
in/
the/
par
kth
ere/
wer
e/ m
any/
peo
ple/
in/
the/
par
k
––2
wor
ds2
wor
dsth
ere
wer
e/ w
ere
man
y /
man
y pe
ople
/th
ere
wer
e/ w
ere
man
y /
man
y pe
ople
/pe
ople
in /
in t
he /
the
par
kpe
ople
in /
in t
he /
the
par
k
––3
wor
ds3
wor
dsth
ere
wer
e m
any
/ w
ere
man
y pe
ople
/ th
ere
wer
e m
any
/ w
ere
man
y pe
ople
/ m
any
peop
le in
/ p
eopl
e in
the
/m
any
peop
le in
/ p
eopl
e in
the
/in
the
par
kin
the
par
k
––……
18
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Ther
e w
ere
man
y pe
ople
in t
he p
ark.
Ther
e w
ere
man
y pe
ople
in t
he p
ark.
––1
wor
d1
wor
dth
ere
ther
e / / w
ere
wer
e / / m
any
man
y / / pe
ople
peop
le/ /
inin/ /
the
the /
/ pa
rkpa
rk
––2
wor
ds2
wor
dsth
ere
wer
eth
ere
wer
e / w
ere
man
y /
/ w
ere
man
y /
man
y pe
ople
man
y pe
ople
//pe
ople
in /
in t
he /
the
par
kpe
ople
in /
in t
he /
the
par
k
––3
wor
ds3
wor
dsth
ere
wer
e m
any
ther
e w
ere
man
y/
wer
e m
any
peop
le/
/ w
ere
man
y pe
ople
/ m
any
peop
le in
/ p
eopl
e in
the
/m
any
peop
le in
/ p
eopl
e in
the
/in
the
par
kin
the
par
k
––……
19
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Def
ine
a re
latio
n be
twee
n fe
atur
es a
nd m
odel
Def
ine
a re
latio
n be
twee
n fe
atur
es a
nd m
odel
––U
se p
roba
bilit
y of
wor
ds (
feat
ures
) be
ing
in e
ach
clas
sU
se p
roba
bilit
y of
wor
ds (
feat
ures
) be
ing
in e
ach
clas
sPr
obab
ility
= F
requ
ency
of
Wor
d/To
tal n
umbe
r of
wor
dsPr
obab
ility
= F
requ
ency
of
Wor
d/To
tal n
umbe
r of
wor
ds
Clas
s 1
(Cla
imin
g Ce
ntra
lity)
Clas
s 1
(Cla
imin
g Ce
ntra
lity)
Clas
s 2
(Mak
ing
topi
c ge
nera
lizat
ions
)Cl
ass
2 (M
akin
g to
pic
gene
raliz
atio
ns)
Clas
s 3
(Ind
icat
ing
a ga
p)
Clas
s 3
(Ind
icat
ing
a ga
p)
Clas
s 4
(Out
linin
g pu
rpos
e)Cl
ass
4 (O
utlin
ing
purp
ose)
Clas
s 5
(Ann
ounc
ing
prin
cipa
l fin
ding
s)
Clas
s 5
(Ann
ounc
ing
prin
cipa
l fin
ding
s)
Clas
s 6
(Eva
luat
ion
of r
esea
rch)
Cl
ass
6 (E
valu
atio
n of
res
earc
h)
Clas
s 1:
W
ord
1 Cl
ass
1:
Wor
d 1
prob
prob
Wor
d 2
Wor
d 2
prob
prob
Wor
d 3
Wor
d 3
prob
prob
. . ……
Clas
s 2:
W
ord
1 Cl
ass
2:
Wor
d 1
prob
pr
ob
Wor
d 2
Wor
d 2
prob
prob
Wor
d 3
Wor
d 3
prob
prob
. . ……
Clas
s 3:
Cl
ass
3:
Wor
d 1
Wor
d 1
prob
pr
ob
Wor
d 2
Wor
d 2
prob
prob
Wor
d 3
Wor
d 3
prob
prob
. . ……
Clas
s 4:
W
ord
1Cl
ass
4:
Wor
d 1
prob
prob
Wor
d 2
Wor
d 2
prob
prob
Wor
d 3
Wor
d 3
prob
prob
. . ……
Clas
s 5:
W
ord
1Cl
ass
5:
Wor
d 1
prob
prob
Wor
d 2
Wor
d 2
prob
prob
Wor
d 3
Wor
d 3
prob
prob
. . ……
Clas
s 6:
W
ord
1Cl
ass
6:
Wor
d 1
prob
prob
Wor
d 2
Wor
d 2
prob
prob
Wor
d 3
Wor
d 3
prob
prob
. . ……
20
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Clas
sify
new
tex
t ex
ampl
esCl
assi
fy n
ew t
ext
exam
ples
For
each
new
tex
t, c
hoos
e cl
ass
with
hig
hest
Fo
r ea
ch n
ew t
ext,
cho
ose
clas
s w
ith h
ighe
st
prob
abili
ty o
f ha
ving
wor
ds (
feat
ures
)pr
obab
ility
of
havi
ng w
ords
(fe
atur
es)
e.g.
New
Tex
t on
ly h
as f
eatu
res
3, 8
, 10
e.g.
New
Tex
t on
ly h
as f
eatu
res
3, 8
, 10
Clas
s 1
P= p
.cla
ss 1
x
Clas
s 1
P= p
.cla
ss 1
x p
. f3
x p.
f8
x p.
f10
p.
f3
x p.
f8
x p.
f10
=
1.5
= 1
.5Cl
ass
2 P=
Cl
ass
2 P=
p.c
lass
2 x
p.
clas
s 2
x ......
= 1
.8=
1.8
Clas
s 3
P= p
.cla
ss 3
x
Clas
s 3
P= p
.cla
ss 3
x .
.....=
2.7
= 2
.7Cl
ass
4 P=
p.c
lass
4 x
Cl
ass
4 P=
p.c
lass
4 x
......
= 2
.3=
2.3
Clas
s 5
P=
Clas
s 5
P= p
.cla
ss 5
x
p.cl
ass
5 x
......=
1.8
= 1
.8Cl
ass
6 P=
p.c
lass
6 x
Cl
ass
6 P=
p.c
lass
6 x
......
= 1
.2=
1.2
Choo
se C
lass
3Ch
oose
Cla
ss 3
21
Res
ults
Res
ults
Clas
sific
atio
n Ac
cura
cy (
Ove
rall)
Clas
sific
atio
n Ac
cura
cy (
Ove
rall)
554
Step
Uni
ts u
sed
for
554
Step
Uni
ts u
sed
for
‘‘ trai
ning
trai
ning
’’ the
sys
tem
the
syst
em13
8 St
ep U
nits
use
d fo
r 13
8 St
ep U
nits
use
d fo
r ‘‘ te
stin
gte
stin
g ’’th
e sy
stem
the
syst
em
No.
of F
eatu
res
Acc
urac
y(R
aw F
requ
ency
)A
ccur
acy
(Inf
orm
atio
n G
ain)
2208
(all)
56 %
-10
0051
%70
%70
056
%70
%50
059
%69
%30
059
%69
%10
054
%-
Not
e: R
ando
m g
uess
ing
has
an a
ccur
acy
of 1
6.66
% (
NO
T 50
%!)
Not
e: R
ando
m g
uess
ing
has
an a
ccur
acy
of 1
6.66
% (
NO
T 50
%!)
Choo
sing
the
mos
t co
mm
on c
lass
= 2
6%
Choo
sing
the
mos
t co
mm
on c
lass
= 2
6%
22
Res
ults
Res
ults
Clas
sific
atio
n Ac
cura
cy (
Each
Ste
p U
nit)
Clas
sific
atio
n Ac
cura
cy (
Each
Ste
p U
nit)
Num
ber
of f
eatu
res
= 7
00N
umbe
r of
fea
ture
s =
700
Rank
ed b
y Ra
nked
by
info
rmat
ion
gain
info
rmat
ion
gain
mea
sure
mea
sure
Accu
racy
(ov
eral
l) =
70%
Accu
racy
(ov
eral
l) =
70%
Clas
sSt
ep 1.
1St
ep 1.
2St
ep 2.
1bSt
ep 3
.1bSt
ep 3.
2St
ep 3.
3St
ep 1.
12
(43
%)
40
01
0St
ep 1.
20
17 (7
7 %
)0
04
1St
ep 2.
1b0
21
(17
%)
02
1St
ep 3.
1b0
00
34 (9
2 %
)3
0St
ep 3.
20
20
225
(66
%)
9St
ep 3.
30
10
28
17 (6
1 %
)
Not
e: C
lass
ifica
tions
cor
resp
ond
with
CAR
S M
odel
N
ote:
Cla
ssifi
catio
ns c
orre
spon
d w
ith C
ARS
Mod
el ‘‘ m
oves
mov
es’’
(Acc
urac
y=88
% w
hen
usin
g (A
ccur
acy=
88%
whe
n us
ing
‘‘ sec
ond
opin
ion
seco
nd o
pini
on’’ ))
The
syst
em m
akes
the
sam
e m
ista
kes
as h
uman
s.Th
e sy
stem
mak
es t
he s
ame
mis
take
s as
hum
ans.
23
Res
ults
Res
ults
Clas
sific
atio
n Ac
cura
cyCl
assi
ficat
ion
Accu
racy
(For
diff
eren
t da
ta s
ets)
(For
diff
eren
t da
ta s
ets)
Num
ber
of f
eatu
res
= 7
00 (
Rank
ed b
y in
form
atio
n ga
in)
Num
ber
of f
eatu
res
= 7
00 (
Rank
ed b
y in
form
atio
n ga
in)
––D
ata
Set
1 Ac
cura
cy =
70%
Dat
a Se
t 1
Accu
racy
= 7
0%––
Dat
a Se
t 2
Accu
racy
= 6
9%D
ata
Set
2 Ac
cura
cy =
69%
––D
ata
Set
3 Ac
cura
cy =
69%
Dat
a Se
t 3
Accu
racy
= 6
9%
Clas
sific
atio
n Ac
cura
cyCl
assi
ficat
ion
Accu
racy
(For
diff
eren
t da
ta s
ets)
(For
diff
eren
t da
ta s
ets)
(Usi
ng 1
st a
nd 2
nd r
anke
d cl
assi
ficat
ion
(Usi
ng 1
st a
nd 2
nd r
anke
d cl
assi
ficat
ion
--‘‘ S
econ
d O
pini
onSe
cond
Opi
nion
’’ ) )
Num
ber
of f
eatu
res
= 7
00 (
Rank
ed b
y in
form
atio
n ga
in)
Num
ber
of f
eatu
res
= 7
00 (
Rank
ed b
y in
form
atio
n ga
in)
––D
ata
Set
1 Ac
cura
cy =
88%
Dat
a Se
t 1
Accu
racy
= 8
8%––
Dat
a Se
t 2
Accu
racy
= 8
6%D
ata
Set
2 Ac
cura
cy =
86%
––D
ata
Set
3 Ac
cura
cy =
86%
Dat
a Se
t 3
Accu
racy
= 8
6%
24
Res
ults
Res
ults
A A ‘‘ W
indo
ws
Win
dow
s ’’In
terf
ace
Inte
rfac
eTo
ena
ble
rese
arch
ers
to u
se t
he s
yste
m it
nee
ds
To e
nabl
e re
sear
cher
s to
use
the
sys
tem
it n
eeds
to
be
easi
ly a
cces
sibl
e vi
a a
to b
e ea
sily
acc
essi
ble
via
a ‘‘ w
indo
ws
win
dow
s ’’in
terf
ace
inte
rfac
e
A A ‘‘ w
indo
ws
win
dow
s ’’sy
stem
has
bee
n bu
ilt u
sing
the
sy
stem
has
bee
n bu
ilt u
sing
the
pr
ogra
mm
ing
lang
uage
PER
L 5.
6 an
d PE
RL/
prog
ram
min
g la
ngua
ge P
ERL
5.6
and
PERL
/ Tk
Tk––
The
syst
em o
ffer
s su
gges
tions
abo
ut t
he s
truc
ture
of
The
syst
em o
ffer
s su
gges
tions
abo
ut t
he s
truc
ture
of
new
tex
tsne
w t
exts
––Th
e st
ruct
ure
sugg
estio
ns c
an b
e ed
ited/
corr
ecte
dTh
e st
ruct
ure
sugg
estio
ns c
an b
e ed
ited/
corr
ecte
d––
The
new
tex
ts c
an b
e ad
ded
to t
he d
atab
ase
of t
rain
ing
The
new
tex
ts c
an b
e ad
ded
to t
he d
atab
ase
of t
rain
ing
exam
ple
text
sex
ampl
e te
xts
––Th
e sy
stem
can
Th
e sy
stem
can
‘‘ rel
earn
rele
arn ’’
the
stru
ctur
e an
d im
prov
e ov
er
the
stru
ctur
e an
d im
prov
e ov
er
time
time
25
Conc
lusi
ons
Conc
lusi
ons
A co
mpu
ter
syst
em w
as d
evel
oped
to
A co
mpu
ter
syst
em w
as d
evel
oped
to
anal
yze
text
str
uctu
rean
alyz
e te
xt s
truc
ture
Lear
ning
met
hod:
Le
arni
ng m
etho
d: ‘‘ S
uper
vise
d Le
arni
ngSu
perv
ised
Lea
rnin
g ’’Tr
aini
ng e
xam
ples
: 55
4Tr
aini
ng e
xam
ples
: 55
4Te
stin
g ex
ampl
e: 1
38Te
stin
g ex
ampl
e: 1
38Ac
cura
cy 7
0% (
88%
whe
n us
ing
seco
nd o
pini
on)
Accu
racy
70%
(88
% w
hen
usin
g se
cond
opi
nion
)
Syst
em e
rror
s ar
e si
mila
r to
tho
se m
ade
Syst
em e
rror
s ar
e si
mila
r to
tho
se m
ade
by h
uman
sby
hum
ans
The
accu
racy
nee
ds t
o be
impr
oved
The
accu
racy
nee
ds t
o be
impr
oved
Curr
ently
wor
king
on
bett
er f
eatu
re s
elec
tion
Curr
ently
wor
king
on
bett
er f
eatu
re s
elec
tion
26
Conc
lusi
ons
Conc
lusi
ons
The
syst
em r
uns
in a
Th
e sy
stem
run
s in
a ‘‘ w
indo
ws
win
dow
s ’’en
viro
nmen
ten
viro
nmen
tTh
e sy
stem
off
ers
The
syst
em o
ffer
s ‘‘ s
ugge
stio
nssu
gges
tions
’’ whi
ch c
an
whi
ch c
an
be e
dite
d by
the
use
rbe
edi
ted
by t
he u
ser
The
The
‘‘ win
dow
sw
indo
ws ’’
inte
rfac
e ne
eds
to b
e in
terf
ace
need
s to
be
enha
nced
enha
nced
I ho
pe t
o m
ake
a co
mpl
ete
envi
ronm
ent
to h
elp
I ho
pe t
o m
ake
a co
mpl
ete
envi
ronm
ent
to h
elp
rese
arch
ers
solv
e m
any
rese
arch
ers
solv
e m
any
‘‘ sup
ervi
sed
lear
ning
supe
rvis
ed le
arni
ng’’
prob
lem
spr
oble
ms
––M
ove
anal
ysis
, Tex
t ca
tego
rizat
ion,
Aut
hor
auth
entic
ity
Mov
e an
alys
is, T
ext
cate
goriz
atio
n, A
utho
r au
then
ticity
et
c.et
c.
27
A Co
mpu
ter
Syst
em f
or A
utom
atic
ally
A
Com
pute
r Sy
stem
for
Aut
omat
ical
ly
Iden
tifyi
ng T
ext
Stru
ctur
e in
Writ
ing
Iden
tifyi
ng T
ext
Stru
ctur
e in
Writ
ing
Laur
ence
Ant
hony
and
Geo
rge
V.
Laur
ence
Ant
hony
and
Geo
rge
V. L
ashk
iaLa
shki
aD
ept.
of C
ompu
ter
Scie
nce,
Fac
ulty
of E
ngin
eerin
gD
ept.
of C
ompu
ter
Scie
nce,
Fac
ulty
of E
ngin
eerin
gO
kaya
ma
Univ
. of S
cien
ce, 1
Oka
yam
a Un
iv. o
f Sci
ence
, 1-- 1
1 Ri
dai
Rida
i -- chocho ,
Oka
yam
a, O
kaya
ma
anth
ony
anth
ony @
ice.
@ic
e.ou
sou
s .ac
..a
c.jp
jp
la
shki
ala
shki
a @ic
e.@
ice.
ous
ous .
ac.
.ac.
jpjp
http
://a
ntpc
1.ic
e.ht
tp:/
/ant
pc1.
ice.
ous
ous .
ac.
.ac.
jpjp