I N T R O D U C T I O N T O V I S U A Lspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2... ·...

V I S U A LQUESTION ANSWERING

沈昇勳Sheng-syun Shen

I N T R O D U C T I O N T O

Outline● Classical Question Answering

● End-to-End Viausal Question Answering

● Attention Model on Question Answering

● Libraries and Toolkits

Classical Question Answering

Question AnsweringOne of the oldest NLP tasks.

VQAApple Siri

Types of Questions in QA Sysyem● Factoid questions

○ Where is Apple Computer based ?

○ How many calories are there in two slices of apple pie ?

● Complex (Narrative) questions

○ In children with an acute febrile illness, what is the

efficacy of acetaminophen in reducing fever ?

Approaches for Solving QA● IR-based approaches (Information Retrieval)

○ TREC; IBM Watson; Google

● Knowledge-based and Hybrid approaches

○ Apple Siri; Wolfram Alpha

IR-based Factoid QA

IR-based Factoid QA● Question processing

○ Detect question type, answer type

○ Formulate queries to send to a search engine

● Passage retrieval

○ Retrieve ranked documents

○ Break into suitable passages and rerank

● Answer processing

○ Extract candidate answers

○ Rank candidates VQA

IR-based Factoid QA | Question Processing● Answer type detection

Decide the named entity type (person, place) of the answer

● Query formulation

Choose query keywords for the IR system

● Question type classification

Is this a definition question, a math question, a list question

IR-based Factoid QA | Question ProcessingAnswer type detection : Name entities

● Who founded Virgin Airlines ?

○ PERSON

● What Canadian city has the largest population ?

○ CITY

End-to-EndViausal Question Answering

Visual QA may contain some sub-problems...● Object detection

● Image segmentation

● Some Question Answering techniques

○ Question type classification

○ Answer type detection

Is there any banana in the picture ?

(A) Yes. (B) No.

End-to-End Visual QACan directly predict answers according to questions and images

Proposed approach

NeuralNetwork

Feature Vector : Question Feature Vector : Answer

Proposed approach

VQAResult

Multiple Choices

ilarity

Extract Feature Vectors | Word EmbeddingWith a view to understanding sentences or documents, we

need to model them in fixed-length vector representation.

Basic Representation Method :

Bag-of-words model / N-hot encoding

○ Each document is represented by a set of keywords

○ A pre-selected set of index terms can be used to

summarize the document contents

Extract Feature Vectors | Word EmbeddingBag-of-words model / N-hot encoding

Definition

The pre-selected vocabulary V = {k1,...,ki} is the set of all

distinct index terms in the collection

Examples

V = {John,game,to,likes,watch}

Sentence 1 S1 = [1,0,1,2,1]

John likes to watch movies. Mary likes movies too.

Sentence 2 S2 = [1,1,1,1,1]

John also likes to watch football games.

Extract Feature Vectors | Word EmbeddingBag-of-words model / N-hot encoding

Property

Simple and Powerful

Problem :

lose the ordering of the words

ignore the semantics of the words

Father = [0 0 0 0 0 1 0 0 … 0 0 0 0]

Mother = [0 0 1 0 0 0 0 0 … 0 0 0 0]

the cosine similarity between these two terms :

= 0 ?! VQA

Extract Feature Vectors | Word EmbeddingWhile word-embedding can solve these problems :

● Words are represented as a DENSE, FIX-LENGTH vector.

● Preserve semantic and syntatic information.

Extract Feature Vectors | Word EmbeddingUsing this technique, we can then represent phrases, or

sentences by :

● Averaging word vectors

● Adapting sentence-embeddinghttps://cs.stanford.edu/~quocle/paragraph_vector.pdf

Extract Feature Vectors | Image EmbeddingUsing a Pre-trained CNN model, we can classify images

Extract Feature Vectors | Image EmbeddingWe can also represent images in vector-form by feeding them

into the pre-trained CNN models

Proposed approach

NeuralNetwork

Feature Vector : Question Feature Vector : Answer

Proposed approach

Proposed approachReferences for implementation :

● https://avisingh599.github.io/deeplearning/visual-qa/

● http://www.cs.toronto.edu/~mren/imageqa/

● https://www.d2.mpi-inf.mpg.de/sites/default/files/iccv15-

neural_qa.pdf

Variations● BOW

“Blind” model. BOW+logistic regression

● LSTM

Another “Blind” model.

● IMG

CNN feature without question sentences but question type.

Attention Model on Question Answering

DiscussionHow to use image information precisely ?

Reference Paper

Xu, Huijuan, and Kate Saenko.

UMass Lowell

Ask, Attend and Answer: Exploring Question-Guided

Spatial Attention for Visual Question Answering.arXiv preprint arXiv:1511.05234 (2015).

Samples in this paper

Proposed Methodology

Proposed MethodologyCNN features :

extract the last convolutional layer of GoogLeNet

Text features :

extract the last convolutional layer of GoogLeNet

Proposed Methodology

Proposed Methodology | Attention LevelSentence (Question) Attention

Attention Matrix : WA

C: RL, S: RL×M, WA: RM×N, Q: RN, Watt: RL, WE: RM×N

Attention AnalysisObject Presence

Attention AnalysisAbsolute Position Recognition

With/O : 100% vs 75%

Attention AnalysisRelative Positition Recognition

With/O : 96% vs 75%

Experimental Result

Libraries and Toolkits

Word Embedding● Word2Vec

https://code.google.com/p/word2vec/

● GloVe

http://nlp.stanford.edu/projects/glove/

● Sentence2vec

https://github.com/klb3713/sentence2vec

Image EmbeddingAn pre-extracted feature set is provided :

http://cs.stanford.edu/people/karpathy/deepimagesent/coco.zip

This is the web page. Hope it works for you : http://cs.stanford.edu/people/karpathy/deepimagesent/

( It’s about generating image descriptions. )

KerasWebsite and documentation : http://keras.io/

Example :

KerasNotification :

If input features are too large for you, you can load them in batch, and apply batch learning as well.

Here are some examples :

https://github.com/avisingh599/visual-qa/blob/master/scripts/trainMLP.py

References

References● https://web.stanford.edu/class/cs124/lec/qa.pdf

● 懶得寫了

The EndThanks for your listening

I N T R O D U C T I O N T O V I S U A Lspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2... ·...

Documents

Transcript of I N T R O D U C T I O N T O V I S U A Lspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2... ·...

project appraisal document (pad) - P166538 (Final)...s^> s o µ } ( ^ ] ] o > ] ( t t ( ] } o Z ] o ] v / v À u v W } i t t } o v l t,K t } o , o Z K P v ] Ì ] } v z>> z } ( o ]

E N D O D Ô N T I C O A T E N D I M E N T O P R O T O C O ...

A P R O G R A M O F T R U C K E R S A G A I N S T T …...T R A N S I T O N T H E L O O K O U T T R A N S I T O N T H E L O O K O U T T O C O M B A T H U M A N T R A F F I C K I N

WA N T T O S T UDY IN ME L BOURNE ? H O W T O A PPLY T …

F O O T N T E S

Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the

C O N T R A T O C O L E C T I V O CELEBRADO ENTRE ...

S O T A , F U T U R E & O P P O R T U N I T Y A U T O N O ...

O a l L S I t A E T o T S T A T O A U R k T V C O T y s R ...

p r o ce dur e s t o k e e p o ur cus t o m e r s , s t af ...€¦ · t o t his indus t r y is t hat we ' r e al way s l o o k ing f o r way s t o do t hing s be t t e r and m o

WA N T T O S T UDY IN ME L BOURNE ? H O W T O A PPLY T O … · 2020. 11. 27. · WA N T T O S T UDY IN ME L BOURNE ? H O W T O A PPLY T O MON ASH. Indigenous Australian Bor n in

PA S S I O N. PE R F O R M A N C E . R E S U L T S . I N T ... · I N T E G R I T Y ™ T oato t l to aat a o lal o, owever, o tato o aat, o l, a a to t aa o olt o t oato o a o t

R E P O R T S C H O O L O F T H E S T A T E

C O T T O N C O T T O N How do they make…? Jeans Sheets Shirts.

Amplificateur · D 7 RA4 RA5 DIGITAL OUT D IGITAL IN DIGITAL IN DIGITAL OUT DIGITAL N DIGITAL IN DISPONIBLE O U T O U T O U T O U T O U T O U T I N O U T I N O U T 22mF 10mH Singlotron

C O R T I C O T O M Y

9 FULTON - MARTA · 2019-06-10 · a T a tato to o o at .. Da 0000 to t o a o at a aot o at .a.om a t a. o t a ot o to a t 000 m mmat.. Tap o a o t o t tat o t a a at o a o. ap t

L o s t H o l l o w T a r o t - Pixel Occult

Introduction of SVMstruct Hung-yi Leespeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/Structu… · svm_python_learn 1, 1, 2, 2,⋯, 𝑁, 𝑁 =read_examples(TrainingData)

S t e p s t o C o nv e r t O S T t o PS T