CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability...

56
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/CAP6412.html Boqing Gong March 15, 2016

Transcript of CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability...

Page 1: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

CAP6412AdvancedComputerVision

http://www.cs.ucf.edu/~bgong/CAP6412.html

Boqing GongMarch 15,2016

Page 2: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Lastweek:Spring Break

• ECCV• OneofthetopconferencesonComputerVision• Everyotheryear(2016,2014,2012,…)• OnparwithCVPR(everyyear),ICCV(everyotheryear,2015,2013,…)• ~2000submissions,~22%acceptancerate

• Double-blindpeerreview• ConferenceattendeesvoteforProgramChairs(PCs)• PCsselectAreaChairs(ACs)anddistributepaperstoACs• ACsassigneachoftheirpaperstoatleastthreeReviewers• Reviewersreadandevaluatepapers• Authorsrespondtoreviewers• Reviewersreadauthorresponsesanddiscussthepapers• ACsmeettogetheranddecidewhethertoacceptortorejectpapers(orals,posters)

Page 3: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Lastweek:Spring Break

• AtypicalprogramofCVPR/ECCV/ICCV• Sunday:Tutorials• Monday—Thursday:Oralpresentations,spotlights,posterpresentations• Friday—Saturday:Workshops

Page 4: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Thisweek:VisionandLanguage

Tuesday(03/15)

Fareeha Irfan

[Book2Movie] Tapaswi,Makarand,MartinBauml,andRainerStiefelhagen."Book2movie:Aligningvideosceneswithbookchapters."InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pp.1827-1835.2015.& Secondary papers

Thursday(03/17)&Next Tuesday(03/22)

ShreyasSomashekar

[Visual Genome] Krishna, Ranjay, Yuke Zhu, Oliver Groth, JustinJohnson, Kenji Hata, Joshua Kravitz, Stephanie Chen et al. "VisualGenome: Connecting Language and Vision Using Crowdsourced DenseImage Annotations."arXiv preprint arXiv:1602.07332 (2016).& Secondary papers

Page 5: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Assignment 9:Dueon03/22, 12pm

• ReviewthefollowingpaperusingthePaperReviewTemplateathttp://www.cs.ucf.edu/~bgong/CAP6412/Review.docx.

[Visual Genome] Krishna, Ranjay, Yuke Zhu, Oliver Groth, JustinJohnson, Kenji Hata, Joshua Kravitz, Stephanie Chen et al. "VisualGenome: Connecting Language and Vision Using Crowdsourced DenseImage Annotations." arXiv preprint arXiv:1602.07332 (2016).

Page 6: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Nextweek:DAG-CNN&Transferability

Tuesday(03/22)

Niladri Basu Bal

[DAG-CNN] Yang,Songfan,andDevaRamanan."Multi-scalerecognitionwithDAG-CNNs."InProceedingsoftheIEEEInternationalConferenceonComputerVision,pp.1215-1223.2015.& Secondary papers

Thursday(03/24)

Mert Ozerdem

[Transferability] Yosinski, Jason, Jeff Clune, Yoshua Bengio, and HodLipson. “How transferable are features in deep neural networks?.” InAdvances in Neural Information Processing Systems, pp. 3320-3328.2014.& Secondary papers

Page 7: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

What’snext

Page 8: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

What’snext

• Signupforvolunteerpresentationsat• https://docs.google.com/spreadsheets/d/1DxMQ_RVMx8BLmc5gij51dtXI2ZJzktR8EzJJZkVpGW4/edit#gid=0

• Suggestpapersyouwouldliketoread,share,challenge

Page 9: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Today

• Administrivia• RecurrentNeuralNetworks(RNNs)(II)• OCR in the wild,byAisha

Page 10: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

(Discrete-time)RNN

• Threetimestepsandbeyond

Imagecredits:RichardSocher

Page 11: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

(Discrete-time)RNN

• Threetimestepsandbeyond • Alayeredfeedforward net• Tiedweights fordifferenttimesteps

• Conditioning (memorizing?)onallpreviousinput

• Cheap to save memoryinRAM

Imagecredits:RichardSocher

Page 12: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Detour: Hidden Markov Model

• A probabilistic model of sequences

• Emission probability:• Transition probability:• Initial probability:

Imagecredits:ErikSudderth

Page 13: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Detour: Hidden Markov Model

• Useful for modeling sequences• Discrete hidden states, which satisfy Markov assumption

• Inference and learning (optional)• Evaluation: forward probability• Decoding: forward-backward algorithm, Viterbi decoding• Learning: EM algorithm (Baum-Welch)

Imagecredits:ErikSudderth

Page 14: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Begin: Slides from Geoffrey Hinton

Page 15: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

HiddenMarkovModels(computerscientistslovethem!)• HiddenMarkovModelshaveadiscreteone-of-Nhiddenstate.Transitionsbetweenstatesarestochasticandcontrolledbyatransitionmatrix.Theoutputsproducedbyastatearestochastic.

• Wecannotbesurewhichstateproducedagivenoutput.Sothestateis“hidden”.

• ItiseasytorepresentaprobabilitydistributionacrossNstateswithNnumbers.

• Topredictthenextoutputweneedtoinfertheprobabilitydistributionoverhiddenstates.

• HMMshaveefficientalgorithmsforinferenceandlearning.

output

output

output

time à

Page 16: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

AfundamentallimitationofHMMs• ConsiderwhathappenswhenahiddenMarkovmodelgeneratesdata.

• Ateachtimestepitmustselectoneofitshiddenstates.SowithNhiddenstatesitcanonlyrememberlog(N)bitsaboutwhatitgeneratedsofar.

• Considertheinformationthatthefirsthalfofanutterancecontainsaboutthesecondhalf:

• Thesyntaxneedstofit(e.g.numberandtenseagreement).• Thesemanticsneedstofit.Theintonationneedstofit.• Theaccent,rate,volume,andvocaltractcharacteristicsmustallfit.

• Alltheseaspectscombinedcouldbe100bitsofinformationthatthefirsthalfofanutteranceneedstoconveytothesecondhalf.2^100isbig!

Page 17: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Recurrentneuralnetworks• RNNsareverypowerful,becausetheycombinetwoproperties:

• Distributedhiddenstatethatallowsthemtostorealotofinformationaboutthepastefficiently.

• Non-lineardynamicsthatallowsthemtoupdatetheirhiddenstateincomplicatedways.

• Withenoughneuronsandtime,RNNscancomputeanythingthatcanbecomputedbyyourcomputer. input

input

input

hidden

hidden

hidden

output

output

outputtime à

Page 18: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Dogenerativemodelsneedtobestochastic?• LineardynamicalsystemsandhiddenMarkovmodelsarestochasticmodels.

• Buttheposteriorprobabilitydistributionovertheirhiddenstatesgiventheobserveddatasofarisadeterministicfunctionofthedata.

• Recurrentneuralnetworksaredeterministic.

• SothinkofthehiddenstateofanRNNastheequivalentofthedeterministicprobabilitydistributionoverhiddenstatesinalineardynamicalsystemorhiddenMarkovmodel.

Page 19: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Recurrentneuralnetworks• WhatkindsofbehaviourcanRNNsexhibit?

• Theycanoscillate.Goodformotorcontrol?• Theycansettletopointattractors.Goodforretrievingmemories?• Theycanbehavechaotically.Badforinformationprocessing?• RNNscouldpotentiallylearntoimplementlotsofsmallprogramsthateachcaptureanuggetofknowledgeandruninparallel,interactingtoproduceverycomplicatedeffects.

• ButthecomputationalpowerofRNNsmakesthemveryhardtotrain.• FormanyyearswecouldnotexploitthecomputationalpowerofRNNsdespitesomeheroicefforts(e.g.TonyRobinson’sspeechrecognizer).

Page 20: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

End: Slides from Geoffrey Hinton

Page 21: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Today

• Administrivia• RecurrentNeuralNetworks(RNNs)(II)• OCR in the wild,byAisha

Page 22: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Uploadslidesbeforeorafterclass

• See“PaperPresentation”onUCFwebcourse

• Sharingyourslides• Refertotheoriginalssourcesofimages,figures,etc.inyourslides• ConvertthemtoaPDFfile• UploadthePDFfileto“PaperPresentation”afteryourpresentation

Page 24: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Motivation

Books provide rich fine-grained information:

● Appearance: how a character, an object or a scene looks like● High-level semantics: what someone is thinking, feeling and how these

states evolve through a story

Aim: Align books with its movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in current datasets.

Page 25: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Overview of Evaluations in this work

● Evaluate similarity between sentences, trained by a corpus of Books

● Similarity between Movie clips and sentences in Books, via learnt embedding

● Weaving “context” of sentences into similarity evaluation using a 3-layer CNN

● Timeline wise alignment between books and movie using Conditional Random Field

Page 26: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Demonstrated Results

● Movie/Book Alignment

● Describing a particular movie shot via its corresponding explanation in Book

● Given a Movie, retrieve its corresponding Book from a corpus of Books

● Caption a Movie clip w/ a paragraph from “any” Book

● Caption MS-COCO images using paragraph from Books

Page 27: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Related Work

● Early work on movie-to-text alignment include dynamic time warping for aligning movies to scripts with the help of subtitles [5, 4].

● Sankar et al. [28] further developed a system which identified sets of visual and audio features to align movies and scripts without making use of the subtitles.

● Aligning plot synopses to shots in the TV series for story-based content retrieval. This work adopts a similarity function between sentences in plot synopses and shots based on person identities and keywords in subtitles.

Page 28: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

ChallengesPlot synopsis closely follow the storyline of movies, books are more verbose and might vary in the storyline from their movie release.

● Parallel to our work: [BOOK2MOVIE]○ Tapaswi et al. aims to align scenes in movies to chapters in the book.

■ Coarse approach: Operates on chapter-level ■ Dataset evaluates on 90 scene-chapter correspondences,■ Matches the presence of characters in a scene to those in a chapter■ Uses hand-crafted similarity measures between sentences in the subtitles

and dialogs in the books

○ This paper: ■ Our approach: sentence/paragraph level.■ Dataset draws 2,070 shot-to-sentences alignments.■ We remove character names to learn semantics and context better■ CNN learnt similarity function

Page 29: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Dataset and Ground TruthAnnotators find correspondences:

● Mark exact time in movie to line number of beginning of the matched sentence.

○ If a shot is longer: Indicate time of ending.

○ If description in more lines: Indicate the last line.

● Tag alignment: ‘visual’, ‘dialogue’ or ‘audio’ match

Total: 11 movie/book pairs, 2070 correspondences

Page 30: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Training Sentence Similarity modelR. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fidler. Skip-Thought Vectors. In Arxiv, 2015

Overview

Sentence i is encoded. Conditioning on this the model tries to reconstruct the sentence before and after it.

Motivation

Sentences that have similar surrounding context are likely to be both semantically and syntactically similar.

Page 31: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

“Thought Vectors”

The term is popularized by Geoffrey Hinton (Google).

word vector:

● represents the word’s meaning, relative to others (context)● linked by grammar

thought vector:

● represents a thought, relative to others● linked by chain of reasoning

Goal: Feed enough data (thoughts) to NN, enabling it to mimic those thoughts

Page 32: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Sentence Embedding Model using Skip Thought

Training Data: Corpus of 11,038 books from the web

Page 33: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale
Page 34: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale
Page 35: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale
Page 36: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Encoder

Page 37: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Decoder

Page 38: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Encoder Decoder

Page 39: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale
Page 40: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Aligning Book w/ Movie

Inspired from the Multimodal Neural Model in Kiros et al.

Our approach: Embedding learnt

Training Data

● Each Clip Description:○ Descriptive Video Service dataset used to learn embedding○ 94 movies, 54000 described clips○ Pre-processing: Replace names with token ‘someone’

● Each Movie Clip vector: ○ mean-pooled features (GoogLeNet and hybrid-CNN) across each frame in the clip

Page 41: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Aligning Book w/ Movie

Vector Representation of description using LSTM:

The states correspond to input, forget, cell, output and memory vectors for embedding word xt of sentence at time t.

is vector representation of sentence of length N.

Page 42: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Aligning Book w/ Movie

Let q be a movie clip vector and its embedding

Scoring Function: s(m,v) = m . v

Optimize pairwise ranking loss:

where, mk is non-descriptive vector for embedding v and vk contrastive clip vector for sentence vector m

Model trained with Stochastic Gradient Descent w/o momentum

Page 43: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Context Aware SimilarityLocal-level ambiguity:

Despite being a dark novel, Gone Girl has 15 instances of “I love you”. Match not isolated from surrounding context .

To compute dialogue similarity:

● BLEU: to find near identical ● Tf-idf (term frequency–inverse document frequency): find duplicates but weighting

down less frequent words ● Skip-thought: to find semantically similar paraphrased sentences

Page 44: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Context Aware Similarity

To obtain Similarity, we take into account:

● Individual similarity measures● Fixed context window, in movie and book

Stack a set of M similarity measures into a tensor S(i, j, m), where

i: indices of sentences in the subtitle

j: in the book

m: individual similarity measures

Page 45: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Context Aware SimilarityM = 9 similarities measures used:

● Visual and sentence embedding ● BLEU1-5 ● tf-idf● A uniform prior

To predict a combined score(i, j) = f(S(I, J,M)) at each location (i, j) based on all similarity measures:

3-layer Convoluted Neural Network, with ReLU nonlinearity and dropout.

Cross-entropy optimized over training with Adam’s Algorithm

Page 46: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Context Aware Similarity

Page 47: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Global Movie/Book AlignmentMost scenes follow a timeline.

Dynamic time warping not suitable since storyline can have crossings in time

Page 48: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Global Movie/Book AlignmentMovie/Book alignment modelled as inference in a Conditional Random Field

● Each node yi: alignment (shot w/ subtitle, sentence in the book)● State space: set of all sentences in the book. ● CRF energy:

K: number of nodes (shots)N (i): the left and right neighbor of yiφu(·) unary potential: output of CNNψp(·) pairwise potentials measured by

ds(yi , yj ) time span between two neighbouring sentences in the subtitledb(yi , yj ) distance of their state space in the bookσ 2 is a robustness parameter to avoid punishing giant leaps too harsh

Pairwise potential to sure state consistency and incorporating long silence in the movie

Page 49: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Evaluation

Model: CNN + CRFDataset: 11 Books/MoviesTraining Data: 1 Book/Movie Gone GirlTest Data: Remaining 10 Movies

Recalled paragraph/shot considered Ground Truth, if:

● Paragraph at most 3 paragraphs away● Shot was at most 5 subtitles away

** Average Precision reported at multiple alignment thresholds

Page 50: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Evaluation: Movie/Book Alignment

Page 51: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Evaluation: Describing Movie via Book

Page 52: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Evaluation: Describing Movie via Book

Page 53: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Evaluation: Book Retrieval

Page 54: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Evaluation: Describe Shot via other Books

Page 55: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Evaluation: MSCOCO Image captioning

Page 56: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale

Thank you.