MIDST and IMAGES-M

46
MIDST and IMAGES-M Masao Yokota Fukuoka Institute of Technology

description

MIDST and IMAGES-M. Masao Yokota Fukuoka Institute of Technology. Background & motivation. Intelligent systems should be more human-friendly considering…  Floods of multimedia information  Increase of highly matured societies  Development of robots for practical use - PowerPoint PPT Presentation

Transcript of MIDST and IMAGES-M

Page 1: MIDST and IMAGES-M

MIDST and IMAGES-M

Masao Yokota Fukuoka Institute of Technology

Page 2: MIDST and IMAGES-M

Background & motivation Intelligent systems should be

more human-friendly considering…

Floods of multimedia information

Increase of highly matured societies

Development of robots for practical use

The others

Solution

Integrated Multimedia Understanding System

IMAGES-M

Page 3: MIDST and IMAGES-M

IMAGES-M

Speech Processing Unit(SPU)

Action Data Processing Unit(APU)

Text Processing Unit(TPU)

Picture Processing Unit(PPU)

Sensory Data Processing Unit(SDPU)

Knowledge Base(KB)

Inference Engine(IE)

Page 4: MIDST and IMAGES-M

Demonstration of IMAGES-M---Collaboration of TPU and PPU---

(Phase 1) Text to Picture translation

Input : Text

Output : Pictorial interpretation

(Phase 2) Q-A about Picture by Text

Input : Query Text

Output: Answer Text

Page 5: MIDST and IMAGES-M

The lamp above the chair is small.The red pot is 1m to the left of the chair.The blue big box is 3m to the right of the chair.

Input text (Japanese/ English/ Chinese)

Output picture

Page 6: MIDST and IMAGES-M

The octagon is to the upper right of the triangle.

The octagon is above the quadrangle.

The triangle is to the lower left of the octagon.

     ・     ・     ・

Output textInput picture

Page 7: MIDST and IMAGES-M

Input sentence: Taro ga kubi wo furu (=Taro shakes his head).

Output animation:

Page 8: MIDST and IMAGES-M

Cross-reference between picture and text 美和台通りは国道495号線とどこで出会いま

すか。

下和白交差点で出会います。

Page 9: MIDST and IMAGES-M

Integrated Multimedia Understanding based on

Lmd

Picture Animation

Text Action Speech Sensory data …… Descriptive power and Computability

of Meta Language Lmd

for Intermediate Representation

Intermediate Representation

Page 10: MIDST and IMAGES-M

Mental Image Directed Semantic Theory

(MIDST) proposed by Yokota,M.

• Information Processing by intelligent entities = Mental Image Processing• Mental Images Sensory Images = Sensations coded by Sensors Conceptual Images= Sensory Images processed by Brains ( e.g. Word Concepts)

Page 11: MIDST and IMAGES-M

Multimedia Description Language Lmd based on Mental

Image Directed Semantic Theory (MIDST)

• Syntax Many-sorted predicate logic with a special p

redicate constant L called “Atomic Locus”

• SemanticsInterpretation in association with an omnisen

sual mental image model so called “Loci in attribute spaces”

Page 12: MIDST and IMAGES-M

LOCATION

SHAPE

COLOR

Omnisensual Mental Image Model

Coded Sensations Loci in Attribute Spaces

Sensation (= Sensory event)= Spatio-temporal distribution of stimuli.

Page 13: MIDST and IMAGES-M

Atomic Locus

L(x,y,p,q,a,g,k)

ti tj

p

qx

y

a

ti tj

Pq

x

y

a

Gt : temporal eventGs : spatial eventg=

“Matter ‘x’ causes Attribute ‘a’ of Matter ‘y’ to keep or change its value temporally or spatially over a time interval, where the value ‘p’ and ‘q’ are relative to Standard ‘k’.”

Page 14: MIDST and IMAGES-M

Terms of Atomic LocusL(1,2,3,4,5,6,7)

Term Type Name Semantic Role

1 Matter Event Causer (EC)

2 Attribute Carrier (AC)

3 Attribute Value

Beginning of Locus

4 Ending of Locus

5 Attribute Domain of Attribute Value

6 Event Type Relation between AC and FAO

7 Standard Unit, Origin, Scale etc for Values

Page 15: MIDST and IMAGES-M

(S1) The bus runs from Tokyo to Osaka.( x, y, k) L( x, y, Tokyo, Osaka, A12, Gt, k)bus(y)

(S2) The road runs from Tokyo to Osaka.( x, y, k) L( x, y, Tokyo, Osaka, A12, Gs, k) road(y)

A12 : Physical Location

Event types

Temporal eventTokyo Osaka

Spatial eventFAO

AC

Page 16: MIDST and IMAGES-M

Attributes

Table 1 Attributes

Page 17: MIDST and IMAGES-M

Table 2 StandardsCategories of

standards Remarks

Rigid Standard Objective standards such as denoted by measuring units (meter, gram, etc.).

Species StandardThe attribute value ordinary for a species. A short train is ordinarily longer than a long pencil.

Proportional Standard

‘Oblong’ means that the width is greater than the height at a physical object.

Individual Standard

Much money for one person can be too little for another.

Purposive Standard

One room large enough for a person’s sleeping must be too small for his jogging.

Declarative Standard

The origin of an order such as ‘next’ must be declared explicitly just as ‘next to him’.

Standards

Page 18: MIDST and IMAGES-M

1 i 2 (1 2 ) i (1,2)

i : tempo-logical connective j : locus : binary logical connective (i.e., , , , ) : ‘AND’ i : temporal relation between loci such as

‘before’, ‘during’, etc.

Tempo-logical connectives

Page 19: MIDST and IMAGES-M

Definition of i

The durations of 1 and 2 are [t11, t12] and [t21, t22], respectively.

Page 20: MIDST and IMAGES-M

Conceptualization of sensory events

...L(x,x,p,q,A12,Gt,k) L(x,y,p,q,A12,Gt,k) xy pq...

x y

x

y

A12 : Location

Time

ConceptualizationEvent 1

Event NFormalization

xy

Page 21: MIDST and IMAGES-M

(x, y, p1, p2, k) L(x, x, p1, p2, A12, Gt, k)(L(x, x, p2, p1, A12, Gt, k) (L(x, y, p2, p1, A12, Gt, k)) xy p1p2

: Simultaneous AND (SAND) : Consecutive AND (CAND)

t1 t2

p1

x

A12

t3

y

p2

t

SAND and CAND

Image of ‘x fetches y’

Page 22: MIDST and IMAGES-M

A13: Direction

Page 23: MIDST and IMAGES-M

The square is between the circle and the triangle. The circle, square and triangle are in a line.

(u,x,y,z)((z,u,x,y,A12,Gs)(z,u,y,z,A12,Gs))(z,u,,,A13,Gs)isr(u)C(x)S(y)T(z)

x y z

u

isr: imaginary space region

Description of Discrete Spatial Relations

Page 24: MIDST and IMAGES-M

Description of spatial events associated with temporal loci in attribute spaces

(x,y,z,p,q)(L(_,x,A,B,A12,Gs,_) L(_,x,0,10km,A17,Gs,_) L(_,x,Point,Line,A15,Gs,_) L(_,x,East,East,A13,Gs,_))

s (L(_,x,p,C,A12,Gs,_) L(_,y,q,C,A12,Gs,_) L(_,z,y,y,A12,Gs,_)) road(x)street(y)sidewalk(z)pq   The road runs 10km straight east from A to B, and after a while, at C it meets the street with the sidewalk.

A C

road N

street

sidewalk

B 10km

Page 25: MIDST and IMAGES-M

Event Patterns about Location(A12)A12 A12 A12

A12 A12A12

return meet separate

carry start stop

Page 26: MIDST and IMAGES-M

Event Patterns about Color(A32)

Page 27: MIDST and IMAGES-M

Word meaning description Mw [Cp:Up] ( Cp : Concept Part, Up : Unification Part)

   Mw(red)=[ : ARG(Gov,X)]

Color of X is red. The ‘governor’ is X.

Mw(box)=[ : ___ ]

       Shape of Y is like this. Up is ‘empty’,

red box

X

Y

Y

Page 28: MIDST and IMAGES-M

The robot carries the book. Surface Structure

carries

Dep1 Dep2

robot book Surface Dependency Structure

the the

Conceptual Structure

(x, y, p1, p2, k) L(x, x, p1, p2, A12, Gt, k) L(x, y, p1, p2, A12, Gt, k) robot(x) book(y) xy p1p2

Mutual projection between surface and conceptual structuresusing word meaning descriptions and surface dependency structures.

Page 29: MIDST and IMAGES-M

Dep1 CARRY Dep2.

Mw (carry) [(x,y,p1,p2,k) L(x,x,p1,p2,A12,Gt,k)

L(x,y,p1,p2,A12,Gt,k)xyp1p2: ARG(Dep.1,x); ARG(Dep.2,y);]

Example(1): ‘carry (verb)’

Page 30: MIDST and IMAGES-M

Example(2): ‘desk (noun)’

Mw (desk) [(x) desk(x) : __ ;]

, where

(x) desk(x) (x) (…L*(_,x,/,/,A29,Gt,_)

… L*(_,x,/,/,A39,Gt,_ ) …)

‘At any time, a desk has no taste(A29), …..,

no vitality(A39), …..’

Page 31: MIDST and IMAGES-M

Fundamental Semantic Processing on texts by IMAGES-M

Detection of

• Semantic anomalies

• Semantic ambiguities

• Paraphrase relations

Page 32: MIDST and IMAGES-M

Postulates about the world

X Y* .. X Y, where Y* denotes that Y holds true over any time-interval.

L(x,y,p,q,a,g,k) L(z,y,r,s,a,g,k) . . p=r q=s

Page 33: MIDST and IMAGES-M

Detection of Semantic Anomaliesby using postulates

(Postulate 1) L(x,y,p1,q1,a,g,k) L(z,y,p2,q2,a,g,k) . . p

1=p2 q1=q2

‘A matter has never different values of an attribute at a time.’

Page 34: MIDST and IMAGES-M

Example(1)

Tom stays with the guest from Spain .

D1

D2

M(stay)=[( x, y, p1, p2, k) L(x, y, p1, p2, A12, Gt, k)

xy p1=p2 :……. ]

M(from)=[( x, y,p1, p2, k) L(x,y,p1, p2, A12, Gt, k)

p1 p2: ……… ]

D2 violates Postulate 1.

Page 35: MIDST and IMAGES-M

Example(2)

I drank the coffee on the desk, which was sweet.

D1

D2

D1 violates Postulate 1. L(x,y,sweet,sweet,A29,Gt,k) desk(y)

L(x,y,sweet,sweet,A29,Gt,k) L(z,y,/,/,A29,Gt,k)

‘sweet’ = /

Page 36: MIDST and IMAGES-M

Detection of Semantic Ambiguities

Tom follows Jim with the stick. D1 D2

J s T J T s

Pr(D1) Pr(D2)

Page 37: MIDST and IMAGES-M

Paraphrasing based on understanding

(Input) The girl fetches the book from the village to the town.

(Output) The girl goes to the village from the town, and then carries the book from the village to the town.)

(∃x1,x2,p1,p2,k) L(x1,x1,p1,p2,A12,Gt,k)•( L(x1,x1,p2,p1,A12,Gt,k)ΠL(x1,x2,p2,p1,A12,Gt,k) ) ∧girl(x1) ∧book(x2) ∧ town(p1)∧village( p2)

Page 38: MIDST and IMAGES-M

Why cross-media translation (CMT) is

important ?---Problem --- I have one chair, one flower-pot, one box, one

lamp and one cat in my room. The chair is 1m to the right of the flower-pot. The flower-pot is 4m to the left of the box. The red lamp hangs above the chair. The black cat lies under the chair.

  

Page 39: MIDST and IMAGES-M

Systematic CMT Explicit algorithms for :

(C1) translating source representations into target ones as for contents describable by both source and target media.

(C2) filtering out such contents that are describable by source medium but not by target one.

(C3) supplementing default contents, that is, such contents that need to be described in target representations but not explicitly described in source representations.

(C4) replacing default contents by definite ones given in the following contexts.

Page 40: MIDST and IMAGES-M

Algorithms for :(C1) translating source representations into target ones

as for contents describable by both source and target media APRs

(C2) filtering out such contents that are describable by source medium but not by target one. APRs

(C3) supplementing default contents, that is, such contents that need to be described in target representations but not explicitly described in source representations. X Y Z

(C4) replacing default contents by definite ones given in the following contexts.

Only to memorize the processing history

Realization of systematic CMT

Page 41: MIDST and IMAGES-M

Formalization of cross-media translation

  Y(Smt )=(X(Sms ))

In the case of text-to-picture CMT,

Sms= All the attributes in previous Table.

Smt = Visual attributes marked by * in previous Table. is defined by a set of APRs shown

in the next table.

Page 42: MIDST and IMAGES-M

CMT between Text and Picture

Text = The ominisensual world specified by Sms

Text Meaning Representation = X(Sms )

Picture Meaning Representation = Y(Smt )

Picture = The visual world specified by Smt

= APRs and Default reasoning

Page 43: MIDST and IMAGES-M

APRsCorrespondenc

es of attributes(Text : Picture)

Value conversion schema

(Text Picture)Interpretations of the schema

APR-01 A12 : A12 pp’ ‘position’ into 2D coordinates (within the display area).

APR-02 {A12, A13, A17} : A12 { p, d, l}p’+l’d’ {‘position’, ‘direction’, ‘distance’} into 2D coordinates.

APR-03 {A11, A10} : A11 {s, v}v’s’ {‘shape’, ‘volume’} into a set of outlines of the object.

APR-04 cc’ ‘color’ into 3D coordinates of the color solid.

APR-05 {A12, A44} : A12 {pa,m}{pa’, pb’} {‘position’, ‘topology’} into a pair of 2D coordinates.

A12 : A12

Attribute Paraphrasing Rules (APRs)

Table 4 Attribute paraphrasing rules for text-to-picture translation

For example, APR-02 is for such a sentence as “The box is 3 meters to the left of the chair.”

Page 44: MIDST and IMAGES-M

S1 = There is a hard cubic object.

P1 =

S2 = The object is large and red.

color=red C4

P2 = volume=large C4

shape=cube C1

hardness=indescribable C2

color=default C3

volume=default C3

Page 45: MIDST and IMAGES-M

Discussions and conclusions

· The cross-references between texts in several languages (Japanese, Chinese, Albanian and English) and pictorial patterns like maps were successfully implemented on our intelligent system IMAGES-M.

· At our best knowledge, there is no other system that can perform cross-media reference in such a seamless way as ours.

Page 46: MIDST and IMAGES-M

Future works

• Automatic acquisition of word meanings from sensory data.

• Human-robot communication by natural language under real environments

• etc