Compositional Captioning - University of...

45
Compositional Captioning: Describing Novel Object Categories without Paired Training Data MLSLP 2016 Lisa Anne Hendricks 1 , Subhashini Venugopalan 2 , Marcus Rohrbach 1 , Raymond Mooney 2 , Kate Saenko 3 , Trevor Darrell 1 1 University of California, Berkeley 2 University of Texas at Austin 3 Boston University

Transcript of Compositional Captioning - University of...

Page 1: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

CompositionalCaptioning:DescribingNovelObjectCategories

withoutPairedTrainingDataMLSLP2016

LisaAnneHendricks1,Subhashini Venugopalan2,MarcusRohrbach1,RaymondMooney2,KateSaenko3,TrevorDarrell1

1 UniversityofCalifornia,Berkeley2 UniversityofTexasatAustin3 BostonUniversity

Page 2: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

VisualDescription

BerkeleyLRCN:Abrownbearstandingontopofalushgreenfield.

MSCaptionBot:Alargebrownbearwalkingthroughaforest.

LRCN:Donahue, Jeffetal.CVPR2015.MicrosoftCaptionBot:http://captionbot.ai/

Page 3: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Abrownbearwalkingacrossalushgreenfield.

Alargebrownbearwalkingthroughaforest.

Abrownbearsittingontopofagreenfield.

A brownbearwalksinthegrassinfrontoftrees.

A brownbearwalkingonagrassyfieldnexttotrees.

A largebrownbearwalkingacrossalushgreenfield.

Page 4: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

ProblemswithVisualDescription

LRCN:Donahue, Jeffetal.CVPR2015.CaptionBot:http://captionbot.ai/

BerkeleyLRCN:“Ablackbear isstandinginthegrass.”

MSCaptionBot:“Abear thatiseatingsomegrass.”

Ours:“Aanteater isstandinginthegrass.”

Page 5: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

WepresenttheDeepCompositionalCaptioner (DCC)whichcancomposedescriptionsaboutnovelobjectsincontext.

ExistingMethods

PairedImage-SentenceDataAgreenandwhitebusdrivingdownthestreet.Abrowntablewithlotsofbottlesonit.

DeepCompositionalCaptioner

UnpairedImageData

bottle

otter

toad

bus

UnpairedTextData

Abusisaroadvehicledesigned tocarrymanypassengers.

Ottersliveinavarietyofaquaticenvironments.

Page 6: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

DCCKeyInsights2.Transferknowledgebetweenrelated

concepts

giraffe impala

dress tutu

cake scone

Learnimagefeatureswithunpairedimagedata

Learnlanguagefeatureswithunpairedtextdata

PreviousWord

𝑓" 𝑓#

PredictedWord

MultimodalUnit

1.Effectivelytrainwithoutsidedata

Page 7: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Impala:0.86Sunny:0.72…Bus:0.04

TrainingData:UnpairedImageData

Network:VGG+multilabel loss(sigmoidcrossentropy)

Feature:Vectorwithactivationscorrespondingtoscoresforvisualconceptsinanimage.

CNN

ClassificationLayer

𝑓"

LexicalClassifier

Page 8: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

TrainingData:UnpairedTextData

Network:Embedlayer+LSTMunit.Modeltrainedtopredictaword,𝑤%,giventhepreviouswordsinasentence,𝑤&:%().

Feature:Vectorwhichencodespreviouswordsinthesentence.

LanguageModelPreviousWord

Embed

LSTM

WL

PredictedWord

𝑓#

Page 9: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

LanguageModelPreviousWord

Embed

LSTM

𝑊#

PredictedWord

𝑓#

CaptionModelPreviousWord

𝑓" 𝑓#

PredictedWord

𝑊"𝑊#M

ultim

odal

Unit

CNN

ClassificationLayer

𝑓"

LexicalClassifier

Trainedwithunpairedimagedata

Trainedwithpairedimage-sentencedata

Trainedwithunpairedtextdata

Page 10: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

𝑓# 𝑓"

PredictedWord

𝑊#

𝑊"

Multim

odal

Unit

A brown

S 𝑤% 𝐼, 𝑤&:% = 𝑓#𝑊# + 𝑓"𝑊" + 𝑏

𝑓#𝑊# largefor:GiraffeHorseCouch…Standing

𝑓"𝑊" largefor:GiraffeTreesStanding…Couch

LanguageFeature ImageFeature

MultimodalUnit

Page 11: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

𝑓# 𝑓"

PredictedWord

𝑊#

𝑊"

Multim

odal

Unit

A brown

S 𝑤% 𝐼, 𝑤&:% = 𝑓#𝑊# + 𝑓"𝑊" + 𝑏

𝑓#𝑊# largefor:GiraffeHorseCouch…Standing

𝑓"𝑊" largefor:GiraffeTreesStanding…Couch

MultimodalUnit

Page 12: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Transferpairchosenusingword2vec

WeightTransfer

Page 13: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

MultimodalUnit𝑓# 𝑓"

Transferpairchosenusingword2vec

𝑊# : , 𝑣2

𝑊" : , 𝑣2

S 𝑤% = impala 𝐼,𝑤&:%()) =𝑓#𝑊# : , 𝑣2 + 𝑓"𝑊" : , 𝑣2 + 𝑏2

S 𝑤% = impala 𝐼,𝑤&:%())

WeightTransfer

𝑊" : ,𝑣:

𝑊# : , 𝑣:

S 𝑤% = giraffe 𝐼, 𝑤&:%()) =𝑓#𝑊# : , 𝑣: + 𝑓"𝑊" : , 𝑣: + 𝑏:

S 𝑤% = giraffe 𝐼,𝑤&:%())

0

0

giraffe impala

Page 14: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

MSCOCOPairedImage-SentenceData

MSCOCOUnpairedImageData

MSCOCOUnpairedTextData

”Anelephantgallopinginthegreengrass”

”Twopeopleplayingballinafield”

”Ablacktrainstoppedonthetracks”

”Someoneisabouttoeatsomepizza”

Elephant,Galloping,Green,Grass

People,Playing,Ball,Field

Black,Train,Tracks

Eat,Pizza

”Anelephantgalloping inthegreengrass”

”Twopeopleplayingballinafield”

”Ablacktrainstoppedonthetracks”

”Someoneisabouttoeatsomepizza”

”Amicrowaveissittingontopofakitchencounter”

”Akitchencounterwithamicrowaveonit”Kitchen,Microwave

Evaluation

Page 15: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

MSCOCOPairedImage-SentenceData

MSCOCOUnpairedImageData

MSCOCOUnpairedTextData

”Anelephantgallopinginthegreengrass”

”Twopeopleplayingballinafield”

”Ablacktrainstoppedonthetracks”

”Someoneisabouttoeatsomepizza”

Elephant,Galloping,Green,Grass

People,Playing,Ball,Field

Black,Train,Tracks

Pizza

”Anelephantgalloping inthegreengrass”

”Twopeopleplayingballinafield”

”Ablacktrainstoppedonthetracks”

”Someoneisabouttoeatsomepizza”

”Amicrowaveissittingontopofakitchencounter”

”Akitchencounterwithamicrowaveonit”Microwave

Held-outdataset

Evaluation

Page 16: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

DCC(Ours)

ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired

imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences

Results:MSCOCOIn-Domain

Page 17: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

LRCN DCC(Ours)

ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired

imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences

Results:MSCOCOIn-Domain

Page 18: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

LRCN DCC(No Transfer)

DCC(Ours)

ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired

imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences

Results:MSCOCOIn-Domain

Page 19: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

LRCN DCC(No Transfer)

DCC(Ours)

Efficacy(F1)

ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired

imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences

Results:MSCOCOIn-Domain

Page 20: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

LRCN DCC(No Transfer)

DCC(Ours)

Efficacy(F1)SentenceQuality(METEOR)

ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired

imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences

Results:MSCOCOIn-Domain

Page 21: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

LRCN DCC(No Transfer)

DCC(Ours)

Efficacy(F1) 0.00 0.00 39.78SentenceQuality(METEOR)

ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired

imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences

Results:MSCOCOIn-Domain

Page 22: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

LRCN DCC(No Transfer)

DCC(Ours)

Efficacy(F1) 0.00 0.00 39.78SentenceQuality(METEOR)

19.33 19.90 21.00

ComparisonofDCCtoLRCNandDCCwithnotransfer.Ø HighF1scoreindicatesDCCcandescribewordsoutsideofpaired

imagesentencedataØ IncreasedMETEORindicatesDCCproducesbettersentences

Results:MSCOCOIn-Domain

Page 23: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

EmpiricalEvaluation

MSCOCOPairedImage-SentenceData

MSCOCOUnpairedImageData

MSCOCOUnpairedTextData

”Anelephantgallopinginthegreengrass”

”Twopeopleplayingballinafield”

”Ablacktrainstoppedonthetracks”

”Someoneisabouttoeatsomepizza”

Elephant,Galloping,Green,Grass

People,Playing,Ball,Field

Black,Train,Tracks

”Anelephantgalloping inthegreengrass”

”Twopeopleplayingballinafield”

”Ablacktrainstoppedonthetracks”

”Akitchencounterwithamicrowaveonit”

Out-of-DomainHeldOutDataset

Pizza”Pepperoniisapopular

pizzatopping.”

”Allmicrowavesuseatimerforthecooking

time”

Microwave

Page 24: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

UnpairedImageData UnpairedTextData METEOR F1LRCN N/A N/A 19.33 0.00DCC(NoTransfer) MSCOCO MSCOCO 19.90 0.00DCC(Ours) MSCOCO MSCOCO 21.00 39.78

DCCperformswellwhenusingoutofdomaindatatotrainthelexicalclassifierandlanguagemodel.

Results:MSCOCOOut-Of-Domain

Page 25: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

UnpairedImageData UnpairedTextData METEOR F1LRCN N/A N/A 19.33 0.00DCC(NoTransfer) MSCOCO MSCOCO 19.90 0.00DCC(Ours) MSCOCO MSCOCO 21.00 39.78DCC(Ours) ImageNet MSCOCO 20.71 33.60

DCCperformswellwhenusingoutofdomaindatatotrainthelexicalclassifierandlanguagemodel.

Results:MSCOCOOut-Of-Domain

Page 26: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

UnpairedImageData UnpairedTextData METEOR F1LRCN N/A N/A 19.33 0.00DCC(NoTransfer) MSCOCO MSCOCO 19.90 0.00DCC(Ours) MSCOCO MSCOCO 21.00 39.78DCC(Ours) ImageNet MSCOCO 20.71 33.60DCC(Ours) ImageNet CaptionTxt 20.66 35.53DCC(Ours) ImageNet WebCorpus 20.66 34.94

DCCperformswellwhenusingoutofdomaindatatotrainthelexicalclassifierandlanguagemodel.

Results:MSCOCOOut-of-Domain

Page 27: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Notransfer:Agreenandwhitestreetsignonacitystreet.DCC:Agreenandwhitebus parkedonthesideofthestreet.

Notransfer:Adoglyingonabedwithalargebrowndog.DCC:Adoglyingonacouchwithalargewindowinthebackground.

Notransfer:Twogiraffesareeatinggrassinthefield.DCC:Twozebra grazinginagreengrassfield.

Notransfer:Awhiteandblackcatissittingonatoilet.DCC:Awhitemicrowave sittingonabrickwall.

Page 28: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

DCCcandescribeover300ImageNet visualconceptsindiversecontexts.

DCC:Apersonisholdingagecko intheirhand.

BerkeleyLRCN:Apersonholdingapieceoffoodintheirhand.

MSCaptionBot:Acloseupofapersonholdingababy.

DCC:Agecko isstandingonabranchofatree.

BerkeleyLRCN:Abirdisstandingontheedgeofarock.

MSCaptionBot:Abirdthatisstandinginthewater.

Page 29: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Awomaninachiffon tutu.

DCCcandescribeover300ImageNet visualconceptsindiversecontexts.

Awhitecentrifuge issittingonthetable.

Abunchofalychee areina

market.

Agroupofpeoplestandingaroundabaobab inafield.

Abrownbobcat inagreenfield.

Acloseupofawoodentablewithabottleofwhisky.

Acloseupofascone onaplate.

Ablackandwhitephotoofacandelabra

inaroom.

Page 30: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Awomanisridingaunicycle onaunicycle.

Agroupofpeoplestandingaroundafoxhuntingona

field.

FailureCases

Page 31: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

METEOR F1Baseline(NoTransfer) 28.80 0.0+DCC(ours) 28.9 6.0+ILSVRCVideos

(NoTransfer)29.0 0.0

+DCC(ours)+ILSVRCVideos

29.10 22.2

Results:VideoDescription

Page 32: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

“CaptioningImageswithDiverseObjects”Venugopalan 2016http://arxiv.org/abs/1606.07770

NovelObjectCaptioner

Page 33: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

DCCIssue:NotEnd-to-EndTrainableLanguageModel

PreviousWord

Embed

LSTM

𝑊#

PredictedWord

𝑓#

CaptionModelPreviousWord

𝑓" 𝑓#

PredictedWord

𝑊"

𝑊#Multim

odal

Unit

CNN

ClassificationLayer

𝑓"

LexicalClassifier

Page 34: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Image-SpecificLoss Image-TextLoss Text-SpecificLoss

PreviousWord

Embed

PredictedWord

EmbedLSTMEmbed

NOCSolution:JointObjectiveLoss

PreviousWord

PredictedWord

Embed

LSTM

Embed

CNN

Embed

PredictedWord

JointObjectiveLoss

Page 35: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Amanisplayingracket onaracket.

DCCIssue:TransferMechanism

Page 36: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

NOCSolution:SemanticEmbedding

PreviousWord

PredictedWord

𝑊:?@ABC

LSTM

𝑊:?@AB

PreviousWord

PredictedWord

Embed

LSTM

Embed

Page 37: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Training

Image-SpecificLoss Text-SpecificLoss

PreviousWord

PredictedWord

Embed

LSTM

Embed

CNN

Embed

PredictedWord

Image-TextLoss

PreviousWord

Embed

PredictedWord

EmbedLSTMEmbed

JointObjectiveLoss

Page 38: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Bottle Bus Couch Microwave Pizza Racket Suitcase Zebra AverageDCC 4.63 29.79 45.87 28.09 64.59 52.24 13.16 79.88 39.78NOC 19.02 69.34 33.25 26.46 69.16 62.45 34.65 89.78 50.51

F1ScoresforNOCandDCC

Page 39: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Contributing Factor Glove LMPretrain

ImagePretrain

AuxiliaryObjective

Meteor F1

Pretraining &Glove X X X 19.80 25.38FixImageModel X X Fixed 18.91 39.70All X X X X 20.69 50.51

Ablation:AuxiliaryObjective

Page 40: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Contributing Factor Glove LMPretrain

ImagePretrain

AuxiliaryObjective

Meteor F1

AuxiliaryObjective X X 15.78 14.41Glove X X X 19.69 47.02All X X X X 20.69 50.51

Ablation:GloveEmbedding

Page 41: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

ImageData TextData Meteor F1MSCOCO MSCOCO 20.69 50.51MSCOCO WebCorpus 19.15 41.74ImageNet WebCorpus 17.55 36.50

TrainingwithOutsideData

Page 42: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

DescribingImageNet

Aotter issittingonarockinthesun.

Alargeflounder isrestingonarock.

Atablewithaplateofsashimi andvegetables.

Alargeglacier withamountaininthe

background.

Amanisstandingonabeachholdinga

snapper.

Agroupofpeoplestandingaroundalargewhitewarship.

Page 43: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

Errors

Achainsaw issittingonachainsaw near

achainsaw.

Avolcano viewofavolcano inthesun.

Page 44: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired

OurTeam:

LisaAnneHendricks

SubhashiniVenugopalan

MarcusRohrbach

RaymondMooney

KateSaenko

TrevorDarrell

ExistingMethods

CompositionalCaptioner

Aanteater isstandinginthegrass.

LRCN:Ablackbear isstandinginthegrass.CaptionBot:Abear thatiseatingsomegrass.

PairedImage-SentenceDataUnpairedImageData UnpairedTextData

Page 45: Compositional Captioning - University of Chicagottic.uchicago.edu/~klivescu/MLSLP2016/hendricks_MLSLP2016.pdfCompositional Captioning: Describing Novel Object Categories without Paired