Word2VisualVec for Video-To-Text Matching and Ranking · Word2VisualVec for Video-To-Text Matching...

Word2VisualVec for Video-To-Text

Matching and Ranking

Jianfeng Dong1, Xirong Li2, Xiaoxu Wang2,

Qijie Wei2, Weiyu Lan2, Cees G. M. Snoek3

Zhejiang University1

Renmin University of China2

University of Amsterdam3

Our idea

Project sentences into a video feature space

Match sentences and videos in this space

Solution: Word2VisualVec

Transform text into a video feature vector

pooling σ(W1*s(q)+b1)

s(q) h1(q)

σ(W2*h1(q)+b2)

word matrix

Text videoCNN

J. Dong, X. Li, C. Snoek, Word2VisualVec: Cross-Media Retrieval by Visual Feature Prediction,

Arxiv:1604.06838, 2016

Word2VisualVec

s(q) h1(q)

σ(W2*h1(q)+b2)

word matrix

Text videoCNN

word2vec

Word2VisualVec

s(q) h1(q)

σ(W2*h1(q)+b2)

word matrix

Text videoCNN

word2vec Multi-layer perceptron+

Minimize Mean Squared Error between text vector and video vector

Implementation

Two video features- Visual: Mean pooling over frame-level CNN feature extracted by

GoogleNet-shuffle[Mettes et al ICMR16]

- Visual + Audio: GoogleNet-shuffle + Bag of quantized MFCC

Word2Vec- 500-dim, trained on user tags of 30m Flickr images

Word2VisualVec architecture- For predicting the visual feature: 500-1000-1024

- For predicting the visual + audio feature: 500-1000-2048

Training set- MSR-VTT training set of 6,513 videos[Xu et al. CVPR16]

Validation set- TRECVID 200 training videos

Video-to-text results

Adding the audio feature provides some improvement

set A set B

Word2VisualVec is effective

Video-to-text results

Text → Visual

a man with a beard is wearing glasses

Text → Visual + Audio

man talks into the camera

Text → Visual

soccer players are blocking the ball on a soccer field

Text → Visual + Audio

a soccer player scores a goal on a soccer field

More results at http://lixirong.net/demo/vtt/tv16.html

Video Description Generation

J. Dong, X. Li, W. Lan, Y. Huo, C. Snoek,

Early embedding and late reranking for video captioning,

ACM Multimedia 2016

Idea: Re-use Video Tags for Captioning

a group of people are running in a

race track

people

dancing

people are dancing on a stage

soccer

player

playing

a soccer player is playing a goal on a

soccer field

Predicted tags Generated caption

Our solution

Google’s model[Vinyals et al. CVPR 2015]

models are walking down the runway

models are walking on the runway

a woman is walking down the runway

a woman is dancing

models are walking in a fashion show

models are walking on the ramp

Google’s model for sentence generation

GoogleNet-shuffle

Our solution

fashion

walking

a woman is dancing

Re-encoding by Word2VisualVec

Better initialization by tag embedding

Our solution

a woman is dancing

Maximize tag matchesmodels are walking in a

fashion show

fashion

walking

Rerank sentences by matching with video tags

Re-encoding by Word2VisualVec

Heuristics to add ‘where’

Two simple rules to append ‘where’ description to the end

of the generated sentences:

1. Add “on a $sport_name field” if $sport appear in the

sentence, such as basketball, baseball, and football.

2. Add “on a stage” if “sing” or “dance” appear in the

sentence.

Description generation results

Adding “where” improve the performance

Live demo

http://lixirong.net/demo/vtt

accept video file less than 10 MB

Conclusion

Word2VisualVec for video-to-text matching in video space

Early embedding and late reranking improves LSTM

based video captioning

Winning results in the VTT task

Xirong Li

Word2VisualVec for Video-To-Text Matching and Ranking · Word2VisualVec for Video-To-Text Matching...

Documents

Transcript of Word2VisualVec for Video-To-Text Matching and Ranking · Word2VisualVec for Video-To-Text Matching...

HPAH ranking = 36 of 66 polygons PCB ranking = 20 of 66 ...€¦ · Mercury ranking = 13 of 66 polygons HPAH ranking = 36 of 66 polygons PCB ranking = 20 of 66 polygons TBT ranking

Team 12 2013/02/28 1. Chun Ta Huang Xirong Ye 2 Libo Dong Zongyang Zhu.

Image Ranking using Group Ranking methods

Response Ranking with Deep Matching Networks and External ... · text-based conversational search has also recently attracted signifi-cant attention in the information retrieval (IR)

“Gateway” - Denver · 2015-06-25 · Ranking 1 to 10 Weight x Ranking; Ranking 1 to 10. Weight x Ranking; Ranking 1 to 10 Weight x Ranking. Ranking 1 to 10 Weight x Ranking; Ranking

SmartPlant Idea Ranking Websitesspi-ltuf.org/GTUF15/13-CR Ranking Website.pdf · SmartPlant Idea Ranking Websites •Presentation Topics oCommunity oIdea Ranking Website oIdea Ranking

Framework and Principles of Matching Technologies · Text Matching and Entity Matching •Matching between two sets of objects •Text matching –Order exists between objects in

World Taekwondo Official Ranking Report World Ranking M ...

SEO strategy for ranking on Search Enginefor ranking

Fingerprint Matching and Non-Matching Analysis for Different … · Fingerprint Matching and Non-Matching Analysis for Different Tolerance Rotation Degrees in Commercial Matching

Lemur 1. 2. 3. · " Structured query language " Wildcard term matching o Distributed IR: " Query-based sampling " Database based ranking (CORI) " Results merging o Document clustering

Ranking Entities from Multiple Ontologies to Facilitate ...cs.utdallas.edu/semanticweb/Ont-Alignment/Automated/Tech...ontology matching. Finally, in section 5 we present conclusions

Academic Ranking of World Universities - 2012_ Top 500 Universities _ Shanghai Ranking - 2012 _ World University Ranking - 2012

Link-Based Ranking · 2 Purpose of Link-Based Ranking Static (query-independent) ranking Dynamic (query-dependent) ranking Applications: – Search in social networks – Search on

Property Matching and Weighted Matching

RUC-Tencent at ImageCLEF 2015: Concept Detection ...ceur-ws.org/Vol-1391/38-CR.pdf · RUC-Tencent at ImageCLEF 2015: Concept Detection, Localization and Sentence Generation Xirong

2sRanking-CNN: A 2-stage ranking-CNN for …2.2.1 1st-stage Ranking-CNN Before explaining 1st-stage ranking-CNN, let’s brieﬂy explain how ranking-CNN works. Ranking-CNN was proposed

Improving Ranking Consistency for Web Search by Leveraging ...Introduction Ranking Consistency Consistent Ranking Model Ensemble-based Re-ranking ExperimentsConclusions Outline for

TRADE & ACADEMIC - cambridge.edu.au · lecturer LMS (question types include MCQs, Wordclouds, ranking alternatives, short-answer, matching). ... The Cambridge Grammar ofThis is the

Xirong Bao 625544 final