A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer...

27
1 A Computer System for Automatically A Computer System for Automatically Identifying Text Structure in Writing Identifying Text Structure in Writing Laurence Anthony and George V. Laurence Anthony and George V. Lashkia Lashkia Dept. of Computer Science, Faculty of Engineering Dept. of Computer Science, Faculty of Engineering Okayama Univ. of Science, 1 Okayama Univ. of Science, 1 - - 1 1 Ridai Ridai - - cho cho , Okayama , Okayama anthony anthony @ice. @ice. ous ous .ac. .ac. jp jp lashkia lashkia @ice. @ice. ous ous .ac. .ac. jp jp http://antpc1.ice. http://antpc1.ice. ous ous .ac. .ac. jp jp

Transcript of A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer...

Page 1: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

1

A Co

mpu

ter

Syst

em f

or A

utom

atic

ally

A

Com

pute

r Sy

stem

for

Aut

omat

ical

ly

Iden

tifyi

ng T

ext

Stru

ctur

e in

Writ

ing

Iden

tifyi

ng T

ext

Stru

ctur

e in

Writ

ing

Laur

ence

Ant

hony

and

Geo

rge

V.

Laur

ence

Ant

hony

and

Geo

rge

V. L

ashk

iaLa

shki

aD

ept.

of C

ompu

ter

Scie

nce,

Fac

ulty

of E

ngin

eerin

gD

ept.

of C

ompu

ter

Scie

nce,

Fac

ulty

of E

ngin

eerin

gO

kaya

ma

Univ

. of S

cien

ce, 1

Oka

yam

a Un

iv. o

f Sci

ence

, 1-- 1

1 Ri

dai

Rida

i -- chocho ,

Oka

yam

a, O

kaya

ma

anth

ony

anth

ony @

ice.

@ic

e.ou

sou

s .ac

..a

c.jp

jp

la

shki

ala

shki

a @ic

e.@

ice.

ous

ous .

ac.

.ac.

jpjp

http

://a

ntpc

1.ic

e.ht

tp:/

/ant

pc1.

ice.

ous

ous .

ac.

.ac.

jpjp

Page 2: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

2

Pres

enta

tion

Out

line

Pres

enta

tion

Out

line

Back

grou

ndBa

ckgr

ound

Rese

arch

Aim

Rese

arch

Aim

Syst

em D

esig

nSy

stem

Des

ign

Appl

icat

ion

to R

esea

rch

Abst

ract

sAp

plic

atio

n to

Res

earc

h Ab

stra

cts

Res

ults

Res

ults

Conc

lusi

ons

Conc

lusi

ons

Page 3: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

3

Back

grou

ndBa

ckgr

ound

Impo

rtan

ce o

f Te

xt S

truc

ture

Impo

rtan

ce o

f Te

xt S

truc

ture

Swal

es (

1981

, 199

0),

Swal

es (

1981

, 199

0),

Carr

ell

Carr

ell (

1982

)(1

982)

Hin

ds (

1982

, 198

3),

Hin

ds (

1982

, 198

3),

Hoe

y H

oey

(199

4), W

inte

r (1

994)

(199

4), W

inte

r (1

994)

Stud

ies

on T

ext

Stru

ctur

eSt

udie

s on

Tex

t St

ruct

ure

TITL

ESTI

TLES

--D

udle

yD

udle

y --Ev

ans

(199

4), A

ntho

ny (

2001

)Ev

ans

(199

4), A

ntho

ny (

2001

)A

BST

RA

CTS

AB

STR

AC

TS--

Ayer

s (1

993)

, Pos

tegu

illo

(199

6)Ay

ers

(199

3), P

oste

guill

o (1

996)

INST

RO

DU

CTI

ON

SIN

STR

OD

UC

TIO

NS

--Sw

ales

(19

90),

Ant

hony

(19

99)

Swal

es (

1990

), A

ntho

ny (

1999

)D

ISC

USS

ION

SD

ISC

USS

ION

S--

Hop

kins

& D

udle

yH

opki

ns &

Dud

ley --

Evan

s (1

988)

Evan

s (1

988)

PA

TEN

TSP

ATE

NTS

--Ba

zerm

an (

1994

)Ba

zerm

an (

1994

)G

RA

NT

PR

OP

OSA

LSG

RA

NT

PR

OP

OSA

LS--

Conn

or &

Co

nnor

& M

aura

nen

Mau

rane

n (1

999)

(199

9)LE

GA

L W

RIT

ING

LEG

AL

WR

ITIN

G--

Bhat

ia

Bhat

ia (

1993

)(1

993)

Page 4: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

4

Back

grou

ndBa

ckgr

ound

Prob

lem

s w

ith A

naly

zing

Tex

t St

ruct

ure

Prob

lem

s w

ith A

naly

zing

Tex

t St

ruct

ure

A la

rge

corp

us o

f te

xt d

ata

A la

rge

corp

us o

f te

xt d

ata

(The

tex

t da

ta m

ust

(The

tex

t da

ta m

ust

‘‘ ACU

RAT

ELY

ACU

RAT

ELY ’’

repr

esen

t w

hat

repr

esen

t w

hat

we

hope

to

stud

y)w

e ho

pe t

o st

udy)

A lo

t of

res

earc

h tim

eA

lot

of r

esea

rch

time

(Tim

e to

ana

lyze

a lo

t of

tex

ts)

(Tim

e to

ana

lyze

a lo

t of

tex

ts)

Goo

d va

lidat

ion

and

relia

bilit

y te

sts

Goo

d va

lidat

ion

and

relia

bilit

y te

sts

Mos

t Te

xt S

truc

ture

Stu

dies

are

Mos

t Te

xt S

truc

ture

Stu

dies

are

‘‘ Sm

all S

cale

Smal

l Sca

le’’

Page 5: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

5

Back

grou

ndBa

ckgr

ound

Swal

es (

1981

: p.

13)

Swal

es (

1981

: p.

13)

““ In

effe

ct, t

he d

isco

urse

ana

lyst

labe

ls

In e

ffect

, the

dis

cour

se a

naly

st la

bels

so

met

hing

as

X an

d th

en b

egin

s to

see

X

som

ethi

ng a

s X

and

then

beg

ins

to s

ee X

oc

curr

ing

all o

ver

the

plac

eoc

curr

ing

all o

ver

the

plac

e ””

Page 6: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

6

Back

grou

ndBa

ckgr

ound

Hen

ry e

t al

. (20

01)

Hen

ry e

t al

. (20

01)

40 A

pplic

atio

n Le

tter

s40

App

licat

ion

Lett

ers

Taro

neTa

rone

et a

l. (2

000)

et a

l. (2

000)

2 Ph

ysic

s Res

earc

h Ar

ticle

s2

Phys

ics

Res

earc

h Ar

ticle

s

Conn

or e

t al

. (19

99)

Conn

or e

t al

. (19

99)

34 G

rant

Pro

posa

ls34

Gra

nt P

ropo

sals

Will

iam

s (1

999)

Will

iam

s (1

999)

5 M

edic

al R

esea

rch

Artic

les

5 M

edic

al R

esea

rch

Artic

les

Anth

ony

(199

9)An

thon

y (1

999)

12 C

ompu

ter

Scie

nce

Rese

arch

Art

icle

12

Com

pute

r Sc

ienc

e Re

sear

ch A

rtic

le

Intr

oduc

tions

Intr

oduc

tions

Page 7: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

7

Res

earc

h Ai

mRes

earc

h Ai

m

Dev

elop

a C

ompu

ter

Syst

em t

o Pr

oces

s D

evel

op a

Com

pute

r Sy

stem

to

Proc

ess

and

Anal

yze

Text

Str

uctu

re A

utom

atic

ally

and

Anal

yze

Text

Str

uctu

re A

utom

atic

ally

––A A

‘‘ Lea

rnin

g Sy

stem

Lear

ning

Sys

tem

’’fo

r te

xt s

truc

ture

for

text

str

uctu

re

Easy

to

Proc

ess

a La

rge

Corp

us o

f Te

xt D

ata

Easy

to

Proc

ess

a La

rge

Corp

us o

f Te

xt D

ata

Fast

Fast

The

anal

ytic

pro

cess

is c

lear

ly d

efin

edTh

e an

alyt

ic p

roce

ss is

cle

arly

def

ined

Easy

to

test

the

rel

iabi

lity

and

valid

ity

Easy

to

test

the

rel

iabi

lity

and

valid

ity

Page 8: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

8

Syst

em D

esig

nSy

stem

Des

ign

‘‘ Uns

uper

vise

d Le

arni

ngU

nsup

ervi

sed

Lear

ning

’’ vs.

.vs

..‘‘ S

uper

vise

d Le

arni

ngSu

perv

ised

Lea

rnin

g ’’??

In U

nsup

ervi

sed

Lear

ning

,In

Uns

uper

vise

d Le

arni

ng,

Giv

e th

e sy

stem

tex

t ex

ampl

esG

ive

the

syst

em t

ext

exam

ples

Tell

the

syst

em w

hat

Tell

the

syst

em w

hat

‘‘ feat

ures

feat

ures

’’ to

look

at

to lo

ok a

tLe

t th

e sy

stem

fin

d a

mod

el (

set

of c

lass

es)

by

Let

the

syst

em f

ind

a m

odel

(se

t of

cla

sses

) by

de

finin

g a

rela

tion

betw

een

the

feat

ures

and

the

de

finin

g a

rela

tion

betw

een

the

feat

ures

and

the

ex

ampl

esex

ampl

esCl

assi

fy n

ew t

ext

exam

ples

by

com

paris

on w

ith

Clas

sify

new

tex

t ex

ampl

es b

y co

mpa

rison

with

fe

atur

es in

eac

h cl

ass

feat

ures

in e

ach

clas

s

Page 9: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

9

Uns

uper

vise

d Le

arni

ngU

nsup

ervi

sed

Lear

ning

Giv

e th

e sy

stem

tex

t ex

ampl

esG

ive

the

syst

em t

ext

exam

ples

Text

1:

Onc

e up

on a

tim

e, t

here

was

a u

gly

duck

ling.

Text

1:

Onc

e up

on a

tim

e, t

here

was

a u

gly

duck

ling.

Text

2:

It li

ved

on a

lake

.Te

xt 2

: It

live

d on

a la

ke.

Text

3:

One

day

, the

litt

le b

ird t

urne

d in

to a

sw

an.

Text

3:

One

day

, the

litt

le b

ird t

urne

d in

to a

sw

an.

Text

4:

It li

ved

happ

ily, e

ver,

aft

er.

Text

4:

It li

ved

happ

ily, e

ver,

aft

er.

Tell

the

syst

em w

hat

Tell

the

syst

em w

hat

‘‘ feat

ures

feat

ures

’’ to

look

at

to lo

ok a

tAl

l wor

ds e

xcep

t ar

ticle

s, N

o pu

nctu

atio

nAl

l wor

ds e

xcep

t ar

ticle

s, N

o pu

nctu

atio

n

Def

ine

a re

latio

n be

twee

n fe

atur

es a

nd

Def

ine

a re

latio

n be

twee

n fe

atur

es a

nd

exam

ples

exam

ples

Clas

s 1

Clas

s 1

--on

ce, u

pon,

tim

e, t

here

, was

, ugl

y, d

uckl

ing

once

, upo

n, t

ime,

the

re, w

as, u

gly,

duc

klin

gCl

ass

2 Cl

ass

2 --

one,

day

, litt

le, b

ird, t

urne

d, in

to, s

wan

one,

day

, litt

le, b

ird, t

urne

d, in

to, s

wan

Clas

s 3

Clas

s 3

--it,

live

d, o

n, la

ke, h

appi

ly, e

ver,

aft

erit,

live

d, o

n, la

ke, h

appi

ly, e

ver,

aft

er

Page 10: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

10

Uns

uper

vise

d Le

arni

ngU

nsup

ervi

sed

Lear

ning

Uns

uper

vise

d le

arni

ng s

yste

m m

odel

s of

ten

Uns

uper

vise

d le

arni

ng s

yste

m m

odel

s of

ten

DO

NO

T m

atch

our

mod

els

DO

NO

T m

atch

our

mod

els

Clas

sify

new

tex

t ex

ampl

esCl

assi

fy n

ew t

ext

exam

ples

Onc

e up

on a

tim

e, t

here

wer

e 3

bear

s (B

EG)

Onc

e up

on a

tim

e, t

here

wer

e 3

bear

s (B

EG)

The

3 be

ars

lived

in a

big

hou

se. (

MID

)Th

e 3

bear

s liv

ed in

a b

ig h

ouse

. (M

ID)

They

all

stay

ed in

the

hou

se h

appi

ly e

ver

afte

r. (

END

)Th

ey a

ll st

ayed

in t

he h

ouse

hap

pily

eve

r af

ter.

(EN

D)

The

syst

em w

ill d

ecid

e Th

e sy

stem

will

dec

ide

……Cl

ass

1 (m

atch

ing

Clas

s 1

(mat

chin

g ‘‘ o

nce

once

’’ , , ‘‘ u

pon

upon

’’ , , ‘‘ ti

me

time ’’

, , ‘‘ th

ere

ther

e ’’))

Clas

s 3

(mat

chin

g Cl

ass

3 (m

atch

ing

‘‘ live

dliv

ed’’ ))

Clas

s 3

(mat

chin

g Cl

ass

3 (m

atch

ing

‘‘ hap

pily

happ

ily’’ , ,

‘‘ eve

rev

er’’ , ,

‘‘ aft

eraf

ter ’’ ))

Page 11: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

11

Syst

em D

esig

nSy

stem

Des

ign

‘‘ Uns

uper

vise

d Le

arni

ngU

nsup

ervi

sed

Lear

ning

’’ vsvs..

‘‘ Sup

ervi

sed

Lear

ning

Supe

rvis

ed L

earn

ing ’’

??In

Sup

ervi

sed

Lear

ning

,In

Sup

ervi

sed

Lear

ning

,G

ive

the

syst

em a

str

uctu

re m

odel

(se

t of

cla

sses

)G

ive

the

syst

em a

str

uctu

re m

odel

(se

t of

cla

sses

)G

ive

the

syst

em e

xam

ples

of

the

mod

elG

ive

the

syst

em e

xam

ples

of

the

mod

elTe

ll th

e sy

stem

wha

t Te

ll th

e sy

stem

wha

t ‘‘ fe

atur

esfe

atur

es’’ t

o lo

ok a

tto

look

at

Def

ine

a re

latio

n be

twee

n th

e cl

asse

s an

d th

e D

efin

e a

rela

tion

betw

een

the

clas

ses

and

the

feat

ures

feat

ures

Clas

sify

new

tex

t ex

ampl

es b

y co

mpa

ring

its

Clas

sify

new

tex

t ex

ampl

es b

y co

mpa

ring

its

feat

ures

with

tho

se in

eac

h cl

ass

feat

ures

with

tho

se in

eac

h cl

ass

Page 12: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

12

Supe

rvis

ed L

earn

ing

Supe

rvis

ed L

earn

ing

Giv

e th

e sy

stem

a s

truc

ture

mod

elG

ive

the

syst

em a

str

uctu

re m

odel

(set

of

clas

ses)

(set

of

clas

ses)

Clas

s 1:

BEG

INN

ING

Clas

s 1:

BEG

INN

ING

Clas

s 2:

MID

DLE

Clas

s 2:

MID

DLE

Clas

s 3:

EN

DCl

ass

3: E

ND

Giv

e th

e sy

stem

exa

mpl

es o

f th

e m

odel

Giv

e th

e sy

stem

exa

mpl

es o

f th

e m

odel

BEG

: O

nce

upon

a t

ime,

the

re w

as a

ugl

y du

cklin

g.BE

G:

Onc

e up

on a

tim

e, t

here

was

a u

gly

duck

ling.

MID

: It

live

d on

a la

ke.

MID

: It

live

d on

a la

ke.

MID

: O

ne d

ay, t

he li

ttle

bird

tur

ned

into

a s

wan

.M

ID:

One

day

, the

litt

le b

ird t

urne

d in

to a

sw

an.

END

: It

live

d ha

ppily

, eve

r, a

fter

.EN

D:

It li

ved

happ

ily, e

ver,

aft

er.

Tell

the

syst

em w

hat

Tell

the

syst

em w

hat

‘‘ feat

ures

feat

ures

’’ to

look

at

to lo

ok a

tAl

l wor

ds e

xcep

t ar

ticle

s...,

No

punc

tuat

ion

All w

ords

exc

ept

artic

les.

.., N

o pu

nctu

atio

n

Page 13: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

13

Supe

rvis

ed L

earn

ing

Supe

rvis

ed L

earn

ing

Def

ine

a re

latio

n be

twee

n cl

asse

s an

d fe

atur

esD

efin

e a

rela

tion

betw

een

clas

ses

and

feat

ures

Clas

s 1

(BEG

) Cl

ass

1 (B

EG)

--on

ce, u

pon,

tim

e, t

here

, was

, ug

ly, du

cklin

gon

ce, u

pon,

tim

e, t

here

, was

, ug

ly, du

cklin

gCl

ass

2 (M

ID)

Clas

s 2

(MID

) --

it, li

ved,

on,

lake

, one

, da

y, li

ttle

, bi

rd, t

urne

d,

it, li

ved,

on,

lake

, one

, da

y, li

ttle

, bi

rd, t

urne

d,

into

, sw

anin

to, s

wan

Clas

s 3

(EN

D)

Clas

s 3

(EN

D)

--liv

ed, h

appi

ly, e

ver,

aft

erliv

ed, h

appi

ly, e

ver,

aft

er

Clas

sify

new

tex

t ex

ampl

esCl

assi

fy n

ew t

ext

exam

ples

Onc

e up

on a

tim

e, t

here

wer

e 3

bear

s (B

EG)

Onc

e up

on a

tim

e, t

here

wer

e 3

bear

s (B

EG)

The

3 be

ars

lived

in a

big

hou

se. (

MID

)Th

e 3

bear

s liv

ed in

a b

ig h

ouse

. (M

ID)

They

all

lived

in t

he h

ouse

hap

pily

eve

r af

ter.

(EN

D)

They

all

lived

in t

he h

ouse

hap

pily

eve

r af

ter.

(EN

D)

The

syst

em w

ill d

ecid

eTh

e sy

stem

will

dec

ide ……

Clas

s 1

(BEG

) (m

atch

ing

Clas

s 1

(BEG

) (m

atch

ing

‘‘ onc

eon

ce’’ , ,

‘‘ upo

nup

on’’ , ,

‘‘ tim

etim

e ’’, , ‘‘ th

ere

ther

e ’’))

Clas

s 2

(MID

) (m

atch

ing

Clas

s 2

(MID

) (m

atch

ing

‘‘ live

dliv

ed’’ ))

Clas

s 3

(EN

D)

(mat

chin

g Cl

ass

3 (E

ND

) (m

atch

ing

‘‘ live

dliv

ed’’ , ,

‘‘ hap

pily

happ

ily’’ , ,

‘‘ eve

rev

er’’ , ,

‘‘ aft

eraf

ter ’’ ))

Page 14: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

14

Supe

rvis

ed L

earn

ing

Supe

rvis

ed L

earn

ing

Prob

lem

sPr

oble

ms

We

need

a

We

need

a ‘‘ g

ood

good

’’ mod

el o

f st

ruct

ure

mod

el o

f st

ruct

ure

––Bu

t th

ere

are

man

y m

odel

s of

str

uctu

re in

the

lite

ratu

reBu

t th

ere

are

man

y m

odel

s of

str

uctu

re in

the

lite

ratu

re

We

need

a s

et o

f W

e ne

ed a

set

of

‘‘ labe

led

exam

ples

labe

led

exam

ples

’’––

But

man

y sy

stem

s w

ork

wel

l with

onl

y a

few

labe

led

But

man

y sy

stem

s w

ork

wel

l with

onl

y a

few

labe

led

exam

ples

exam

ples

We

need

a

We

need

a ‘‘ g

ood

good

’’ set

of

feat

ures

set

of f

eatu

res

––Bu

t la

ngua

ge c

onta

ins

a LO

T of

red

unda

nt w

ords

!Bu

t la

ngua

ge c

onta

ins

a LO

T of

red

unda

nt w

ords

!(e

.g. a

, the

, of, in

, (e

.g. a

, the

, of, in

, ……

.).)––

Build

ing

a lis

t of

fea

ture

s by

han

d is

infe

asib

leBu

ildin

g a

list

of f

eatu

res

by h

and

is in

feas

ible

We

need

a

We

need

a ‘‘ g

ood

good

’’ rel

atio

n be

twee

n th

e cl

asse

s an

d re

latio

n be

twee

n th

e cl

asse

s an

d th

e fe

atur

es??

?th

e fe

atur

es??

?––

In p

ract

ice,

ver

y si

mpl

e re

latio

nshi

ps a

re e

ffec

tive!

In p

ract

ice,

ver

y si

mpl

e re

latio

nshi

ps a

re e

ffec

tive!

Page 15: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

15

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Giv

e th

e sy

stem

a s

truc

ture

mod

el:

Giv

e th

e sy

stem

a s

truc

ture

mod

el:

‘‘ Mod

ifie

dM

odif

ied ’’

CAR

S M

odel

(Sw

ales

, 199

0: A

ntho

ny, 1

999)

CAR

S M

odel

(Sw

ales

, 199

0: A

ntho

ny, 1

999)

Mov

e 1

Mov

e 1

Esta

blis

hing

Es

tabl

ishi

ng

1.1

1.1

Cla

imin

g c

entr

alit

yC

laim

ing

cen

tral

ity

a Te

rrito

ry

a Te

rrito

ry

1.2

1.2

Mak

ing

top

ic g

ener

aliz

atio

ns

Mak

ing

top

ic g

ener

aliz

atio

ns

1.3

1.3

Rev

iew

ing

item

s of

pre

viou

s re

sear

chR

evie

win

g it

ems

of p

revi

ous

rese

arch

Mov

e 2

Mov

e 2

Esta

blis

hing

Es

tabl

ishi

ng

2.1

A2

.1A

Cou

nte

r cl

amin

gC

oun

ter

clam

ing

a ni

che

a ni

che

2.1

B

2.1

B

Ind

icat

ing

a g

apIn

dic

atin

g a

gap

2.1

C

2.1

C

Qu

esti

on r

aisi

ng

Qu

esti

on r

aisi

ng

2.1

D

2.1

D

Con

tin

uin

g a

tra

diti

onC

onti

nu

ing

a t

radi

tion

Mov

e 3

Mov

e 3

Occ

upyi

ng

Occ

upyi

ng

3.1

A

3.1

A

Ou

tlin

ing

purp

ose

Ou

tlin

ing

purp

ose

the

nich

eth

e ni

che

3.1

B

3.1

B

An

nou

nci

ng

pres

ent

rese

arch

An

nou

nci

ng

pres

ent

rese

arch

3.2

3.2

An

nou

nci

ng

prin

cipa

l fin

din

gsA

nn

oun

cin

g pr

inci

pal f

ind

ings

3.3

3.3

Eval

uat

ion

of

rese

arch

Eval

uat

ion

of

rese

arch

3.4

3.4

Ind

icat

ing

RA

str

uct

ure

Ind

icat

ing

RA

str

uct

ure

Page 16: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

16

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Giv

e th

e sy

stem

exa

mpl

es o

f th

e m

odel

Giv

e th

e sy

stem

exa

mpl

es o

f th

e m

odel

100

Abst

ract

s (I

EEE

Tran

s. o

n PD

S) d

ivid

ed in

to10

0 Ab

stra

cts

(IEE

E Tr

ans.

on

PDS)

div

ided

into

692

labe

led

69

2 la

bele

d ‘‘ S

teps

Uni

tsSt

eps

Uni

ts’’ (

only

exa

mpl

es f

rom

6 c

lass

es)

(onl

y ex

ampl

es f

rom

6 c

lass

es)

554

Step

Uni

ts (

80%

) us

ed f

or

554

Step

Uni

ts (

80%

) us

ed f

or ‘‘ t

rain

ing

trai

ning

’’ the

sys

tem

the

syst

em13

8 St

ep U

nits

(20

%)

used

for

13

8 St

ep U

nits

(20

%)

used

for

‘‘ tes

ting

test

ing ’’

the

syst

emth

e sy

stem

Tell

the

syst

em w

hat

Tell

the

syst

em w

hat

‘‘ feat

ures

feat

ures

’’ to

look

at

to lo

ok a

tAl

l wor

d cl

uste

rs u

p to

5 w

ords

long

All w

ord

clus

ters

up

to 5

wor

ds lo

ngPo

sitio

n of

ste

p un

it in

abs

trac

t (i.

e. 1

st, 2

nd, 3

rd,

Posi

tion

of s

tep

unit

in a

bstr

act

(i.e.

1st

, 2nd

, 3rd

, ……

))

(Red

uce

(Red

uce

‘‘ Noi

seN

oise

’’ in

Feat

ures

)in

Fea

ture

s)Ran

k w

ords

by

Ran

k w

ords

by

‘‘ impo

rtan

ceim

port

ance

’’ usi

ng:

usin

g:ra

w f

requ

ency

, ra

w f

requ

ency

, in

form

atio

n ga

inin

form

atio

n ga

inU

se o

nly

high

ran

ked

wor

dsU

se o

nly

high

ran

ked

wor

ds

Page 17: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

17

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Ther

e w

ere

man

y pe

ople

in t

he p

ark.

Ther

e w

ere

man

y pe

ople

in t

he p

ark.

––1

wor

d1

wor

dth

ere/

wer

e/ m

any/

peo

ple/

in/

the/

par

kth

ere/

wer

e/ m

any/

peo

ple/

in/

the/

par

k

––2

wor

ds2

wor

dsth

ere

wer

e/ w

ere

man

y /

man

y pe

ople

/th

ere

wer

e/ w

ere

man

y /

man

y pe

ople

/pe

ople

in /

in t

he /

the

par

kpe

ople

in /

in t

he /

the

par

k

––3

wor

ds3

wor

dsth

ere

wer

e m

any

/ w

ere

man

y pe

ople

/ th

ere

wer

e m

any

/ w

ere

man

y pe

ople

/ m

any

peop

le in

/ p

eopl

e in

the

/m

any

peop

le in

/ p

eopl

e in

the

/in

the

par

kin

the

par

k

––……

Page 18: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

18

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Ther

e w

ere

man

y pe

ople

in t

he p

ark.

Ther

e w

ere

man

y pe

ople

in t

he p

ark.

––1

wor

d1

wor

dth

ere

ther

e / / w

ere

wer

e / / m

any

man

y / / pe

ople

peop

le/ /

inin/ /

the

the /

/ pa

rkpa

rk

––2

wor

ds2

wor

dsth

ere

wer

eth

ere

wer

e / w

ere

man

y /

/ w

ere

man

y /

man

y pe

ople

man

y pe

ople

//pe

ople

in /

in t

he /

the

par

kpe

ople

in /

in t

he /

the

par

k

––3

wor

ds3

wor

dsth

ere

wer

e m

any

ther

e w

ere

man

y/

wer

e m

any

peop

le/

/ w

ere

man

y pe

ople

/ m

any

peop

le in

/ p

eopl

e in

the

/m

any

peop

le in

/ p

eopl

e in

the

/in

the

par

kin

the

par

k

––……

Page 19: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

19

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Def

ine

a re

latio

n be

twee

n fe

atur

es a

nd m

odel

Def

ine

a re

latio

n be

twee

n fe

atur

es a

nd m

odel

––U

se p

roba

bilit

y of

wor

ds (

feat

ures

) be

ing

in e

ach

clas

sU

se p

roba

bilit

y of

wor

ds (

feat

ures

) be

ing

in e

ach

clas

sPr

obab

ility

= F

requ

ency

of

Wor

d/To

tal n

umbe

r of

wor

dsPr

obab

ility

= F

requ

ency

of

Wor

d/To

tal n

umbe

r of

wor

ds

Clas

s 1

(Cla

imin

g Ce

ntra

lity)

Clas

s 1

(Cla

imin

g Ce

ntra

lity)

Clas

s 2

(Mak

ing

topi

c ge

nera

lizat

ions

)Cl

ass

2 (M

akin

g to

pic

gene

raliz

atio

ns)

Clas

s 3

(Ind

icat

ing

a ga

p)

Clas

s 3

(Ind

icat

ing

a ga

p)

Clas

s 4

(Out

linin

g pu

rpos

e)Cl

ass

4 (O

utlin

ing

purp

ose)

Clas

s 5

(Ann

ounc

ing

prin

cipa

l fin

ding

s)

Clas

s 5

(Ann

ounc

ing

prin

cipa

l fin

ding

s)

Clas

s 6

(Eva

luat

ion

of r

esea

rch)

Cl

ass

6 (E

valu

atio

n of

res

earc

h)

Clas

s 1:

W

ord

1 Cl

ass

1:

Wor

d 1

prob

prob

Wor

d 2

Wor

d 2

prob

prob

Wor

d 3

Wor

d 3

prob

prob

. . ……

Clas

s 2:

W

ord

1 Cl

ass

2:

Wor

d 1

prob

pr

ob

Wor

d 2

Wor

d 2

prob

prob

Wor

d 3

Wor

d 3

prob

prob

. . ……

Clas

s 3:

Cl

ass

3:

Wor

d 1

Wor

d 1

prob

pr

ob

Wor

d 2

Wor

d 2

prob

prob

Wor

d 3

Wor

d 3

prob

prob

. . ……

Clas

s 4:

W

ord

1Cl

ass

4:

Wor

d 1

prob

prob

Wor

d 2

Wor

d 2

prob

prob

Wor

d 3

Wor

d 3

prob

prob

. . ……

Clas

s 5:

W

ord

1Cl

ass

5:

Wor

d 1

prob

prob

Wor

d 2

Wor

d 2

prob

prob

Wor

d 3

Wor

d 3

prob

prob

. . ……

Clas

s 6:

W

ord

1Cl

ass

6:

Wor

d 1

prob

prob

Wor

d 2

Wor

d 2

prob

prob

Wor

d 3

Wor

d 3

prob

prob

. . ……

Page 20: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

20

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Clas

sify

new

tex

t ex

ampl

esCl

assi

fy n

ew t

ext

exam

ples

For

each

new

tex

t, c

hoos

e cl

ass

with

hig

hest

Fo

r ea

ch n

ew t

ext,

cho

ose

clas

s w

ith h

ighe

st

prob

abili

ty o

f ha

ving

wor

ds (

feat

ures

)pr

obab

ility

of

havi

ng w

ords

(fe

atur

es)

e.g.

New

Tex

t on

ly h

as f

eatu

res

3, 8

, 10

e.g.

New

Tex

t on

ly h

as f

eatu

res

3, 8

, 10

Clas

s 1

P= p

.cla

ss 1

x

Clas

s 1

P= p

.cla

ss 1

x p

. f3

x p.

f8

x p.

f10

p.

f3

x p.

f8

x p.

f10

=

1.5

= 1

.5Cl

ass

2 P=

Cl

ass

2 P=

p.c

lass

2 x

p.

clas

s 2

x ......

= 1

.8=

1.8

Clas

s 3

P= p

.cla

ss 3

x

Clas

s 3

P= p

.cla

ss 3

x .

.....=

2.7

= 2

.7Cl

ass

4 P=

p.c

lass

4 x

Cl

ass

4 P=

p.c

lass

4 x

......

= 2

.3=

2.3

Clas

s 5

P=

Clas

s 5

P= p

.cla

ss 5

x

p.cl

ass

5 x

......=

1.8

= 1

.8Cl

ass

6 P=

p.c

lass

6 x

Cl

ass

6 P=

p.c

lass

6 x

......

= 1

.2=

1.2

Choo

se C

lass

3Ch

oose

Cla

ss 3

Page 21: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

21

Res

ults

Res

ults

Clas

sific

atio

n Ac

cura

cy (

Ove

rall)

Clas

sific

atio

n Ac

cura

cy (

Ove

rall)

554

Step

Uni

ts u

sed

for

554

Step

Uni

ts u

sed

for

‘‘ trai

ning

trai

ning

’’ the

sys

tem

the

syst

em13

8 St

ep U

nits

use

d fo

r 13

8 St

ep U

nits

use

d fo

r ‘‘ te

stin

gte

stin

g ’’th

e sy

stem

the

syst

em

No.

of F

eatu

res

Acc

urac

y(R

aw F

requ

ency

)A

ccur

acy

(Inf

orm

atio

n G

ain)

2208

(all)

56 %

-10

0051

%70

%70

056

%70

%50

059

%69

%30

059

%69

%10

054

%-

Not

e: R

ando

m g

uess

ing

has

an a

ccur

acy

of 1

6.66

% (

NO

T 50

%!)

Not

e: R

ando

m g

uess

ing

has

an a

ccur

acy

of 1

6.66

% (

NO

T 50

%!)

Choo

sing

the

mos

t co

mm

on c

lass

= 2

6%

Choo

sing

the

mos

t co

mm

on c

lass

= 2

6%

Page 22: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

22

Res

ults

Res

ults

Clas

sific

atio

n Ac

cura

cy (

Each

Ste

p U

nit)

Clas

sific

atio

n Ac

cura

cy (

Each

Ste

p U

nit)

Num

ber

of f

eatu

res

= 7

00N

umbe

r of

fea

ture

s =

700

Rank

ed b

y Ra

nked

by

info

rmat

ion

gain

info

rmat

ion

gain

mea

sure

mea

sure

Accu

racy

(ov

eral

l) =

70%

Accu

racy

(ov

eral

l) =

70%

Clas

sSt

ep 1.

1St

ep 1.

2St

ep 2.

1bSt

ep 3

.1bSt

ep 3.

2St

ep 3.

3St

ep 1.

12

(43

%)

40

01

0St

ep 1.

20

17 (7

7 %

)0

04

1St

ep 2.

1b0

21

(17

%)

02

1St

ep 3.

1b0

00

34 (9

2 %

)3

0St

ep 3.

20

20

225

(66

%)

9St

ep 3.

30

10

28

17 (6

1 %

)

Not

e: C

lass

ifica

tions

cor

resp

ond

with

CAR

S M

odel

N

ote:

Cla

ssifi

catio

ns c

orre

spon

d w

ith C

ARS

Mod

el ‘‘ m

oves

mov

es’’

(Acc

urac

y=88

% w

hen

usin

g (A

ccur

acy=

88%

whe

n us

ing

‘‘ sec

ond

opin

ion

seco

nd o

pini

on’’ ))

The

syst

em m

akes

the

sam

e m

ista

kes

as h

uman

s.Th

e sy

stem

mak

es t

he s

ame

mis

take

s as

hum

ans.

Page 23: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

23

Res

ults

Res

ults

Clas

sific

atio

n Ac

cura

cyCl

assi

ficat

ion

Accu

racy

(For

diff

eren

t da

ta s

ets)

(For

diff

eren

t da

ta s

ets)

Num

ber

of f

eatu

res

= 7

00 (

Rank

ed b

y in

form

atio

n ga

in)

Num

ber

of f

eatu

res

= 7

00 (

Rank

ed b

y in

form

atio

n ga

in)

––D

ata

Set

1 Ac

cura

cy =

70%

Dat

a Se

t 1

Accu

racy

= 7

0%––

Dat

a Se

t 2

Accu

racy

= 6

9%D

ata

Set

2 Ac

cura

cy =

69%

––D

ata

Set

3 Ac

cura

cy =

69%

Dat

a Se

t 3

Accu

racy

= 6

9%

Clas

sific

atio

n Ac

cura

cyCl

assi

ficat

ion

Accu

racy

(For

diff

eren

t da

ta s

ets)

(For

diff

eren

t da

ta s

ets)

(Usi

ng 1

st a

nd 2

nd r

anke

d cl

assi

ficat

ion

(Usi

ng 1

st a

nd 2

nd r

anke

d cl

assi

ficat

ion

--‘‘ S

econ

d O

pini

onSe

cond

Opi

nion

’’ ) )

Num

ber

of f

eatu

res

= 7

00 (

Rank

ed b

y in

form

atio

n ga

in)

Num

ber

of f

eatu

res

= 7

00 (

Rank

ed b

y in

form

atio

n ga

in)

––D

ata

Set

1 Ac

cura

cy =

88%

Dat

a Se

t 1

Accu

racy

= 8

8%––

Dat

a Se

t 2

Accu

racy

= 8

6%D

ata

Set

2 Ac

cura

cy =

86%

––D

ata

Set

3 Ac

cura

cy =

86%

Dat

a Se

t 3

Accu

racy

= 8

6%

Page 24: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

24

Res

ults

Res

ults

A A ‘‘ W

indo

ws

Win

dow

s ’’In

terf

ace

Inte

rfac

eTo

ena

ble

rese

arch

ers

to u

se t

he s

yste

m it

nee

ds

To e

nabl

e re

sear

cher

s to

use

the

sys

tem

it n

eeds

to

be

easi

ly a

cces

sibl

e vi

a a

to b

e ea

sily

acc

essi

ble

via

a ‘‘ w

indo

ws

win

dow

s ’’in

terf

ace

inte

rfac

e

A A ‘‘ w

indo

ws

win

dow

s ’’sy

stem

has

bee

n bu

ilt u

sing

the

sy

stem

has

bee

n bu

ilt u

sing

the

pr

ogra

mm

ing

lang

uage

PER

L 5.

6 an

d PE

RL/

prog

ram

min

g la

ngua

ge P

ERL

5.6

and

PERL

/ Tk

Tk––

The

syst

em o

ffer

s su

gges

tions

abo

ut t

he s

truc

ture

of

The

syst

em o

ffer

s su

gges

tions

abo

ut t

he s

truc

ture

of

new

tex

tsne

w t

exts

––Th

e st

ruct

ure

sugg

estio

ns c

an b

e ed

ited/

corr

ecte

dTh

e st

ruct

ure

sugg

estio

ns c

an b

e ed

ited/

corr

ecte

d––

The

new

tex

ts c

an b

e ad

ded

to t

he d

atab

ase

of t

rain

ing

The

new

tex

ts c

an b

e ad

ded

to t

he d

atab

ase

of t

rain

ing

exam

ple

text

sex

ampl

e te

xts

––Th

e sy

stem

can

Th

e sy

stem

can

‘‘ rel

earn

rele

arn ’’

the

stru

ctur

e an

d im

prov

e ov

er

the

stru

ctur

e an

d im

prov

e ov

er

time

time

Page 25: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

25

Conc

lusi

ons

Conc

lusi

ons

A co

mpu

ter

syst

em w

as d

evel

oped

to

A co

mpu

ter

syst

em w

as d

evel

oped

to

anal

yze

text

str

uctu

rean

alyz

e te

xt s

truc

ture

Lear

ning

met

hod:

Le

arni

ng m

etho

d: ‘‘ S

uper

vise

d Le

arni

ngSu

perv

ised

Lea

rnin

g ’’Tr

aini

ng e

xam

ples

: 55

4Tr

aini

ng e

xam

ples

: 55

4Te

stin

g ex

ampl

e: 1

38Te

stin

g ex

ampl

e: 1

38Ac

cura

cy 7

0% (

88%

whe

n us

ing

seco

nd o

pini

on)

Accu

racy

70%

(88

% w

hen

usin

g se

cond

opi

nion

)

Syst

em e

rror

s ar

e si

mila

r to

tho

se m

ade

Syst

em e

rror

s ar

e si

mila

r to

tho

se m

ade

by h

uman

sby

hum

ans

The

accu

racy

nee

ds t

o be

impr

oved

The

accu

racy

nee

ds t

o be

impr

oved

Curr

ently

wor

king

on

bett

er f

eatu

re s

elec

tion

Curr

ently

wor

king

on

bett

er f

eatu

re s

elec

tion

Page 26: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

26

Conc

lusi

ons

Conc

lusi

ons

The

syst

em r

uns

in a

Th

e sy

stem

run

s in

a ‘‘ w

indo

ws

win

dow

s ’’en

viro

nmen

ten

viro

nmen

tTh

e sy

stem

off

ers

The

syst

em o

ffer

s ‘‘ s

ugge

stio

nssu

gges

tions

’’ whi

ch c

an

whi

ch c

an

be e

dite

d by

the

use

rbe

edi

ted

by t

he u

ser

The

The

‘‘ win

dow

sw

indo

ws ’’

inte

rfac

e ne

eds

to b

e in

terf

ace

need

s to

be

enha

nced

enha

nced

I ho

pe t

o m

ake

a co

mpl

ete

envi

ronm

ent

to h

elp

I ho

pe t

o m

ake

a co

mpl

ete

envi

ronm

ent

to h

elp

rese

arch

ers

solv

e m

any

rese

arch

ers

solv

e m

any

‘‘ sup

ervi

sed

lear

ning

supe

rvis

ed le

arni

ng’’

prob

lem

spr

oble

ms

––M

ove

anal

ysis

, Tex

t ca

tego

rizat

ion,

Aut

hor

auth

entic

ity

Mov

e an

alys

is, T

ext

cate

goriz

atio

n, A

utho

r au

then

ticity

et

c.et

c.

Page 27: A Computer System for Automatically Identifying Text Structure in … · 2014-09-01 · A Computer System for Automatically Identifying Text Structure in Writing. Laurence Anthony

27

A Co

mpu

ter

Syst

em f

or A

utom

atic

ally

A

Com

pute

r Sy

stem

for

Aut

omat

ical

ly

Iden

tifyi

ng T

ext

Stru

ctur

e in

Writ

ing

Iden

tifyi

ng T

ext

Stru

ctur

e in

Writ

ing

Laur

ence

Ant

hony

and

Geo

rge

V.

Laur

ence

Ant

hony

and

Geo

rge

V. L

ashk

iaLa

shki

aD

ept.

of C

ompu

ter

Scie

nce,

Fac

ulty

of E

ngin

eerin

gD

ept.

of C

ompu

ter

Scie

nce,

Fac

ulty

of E

ngin

eerin

gO

kaya

ma

Univ

. of S

cien

ce, 1

Oka

yam

a Un

iv. o

f Sci

ence

, 1-- 1

1 Ri

dai

Rida

i -- chocho ,

Oka

yam

a, O

kaya

ma

anth

ony

anth

ony @

ice.

@ic

e.ou

sou

s .ac

..a

c.jp

jp

la

shki

ala

shki

a @ic

e.@

ice.

ous

ous .

ac.

.ac.

jpjp

http

://a

ntpc

1.ic

e.ht

tp:/

/ant

pc1.

ice.

ous

ous .

ac.

.ac.

jpjp