CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project...
Transcript of CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project...
![Page 1: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/1.jpg)
Synthesizing images with generative adversarial networks (GANs)
CS5670: Computer VisionGuest Lecture - Jin Sun
Slides from Philipp IsolaWhich face is real?
![Page 2: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/2.jpg)
Announcements
• Project 5 due Friday, 5/10 at 11:59pm
• Course evals (you will receive a few bonus points!)
– https://apps.engineering.cornell.edu/CourseEval
• Final exam in class next Monday, 5/6
– Please arrange yourselves with at least one space between you and the closest person in the same row when you arrive
![Page 3: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/3.jpg)
Motivation: Synthesizing images
![Page 4: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/4.jpg)
Motivation: Synthesizing images
![Page 6: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/6.jpg)
Why Synthesizing Images
“What I cannot create, I do not understand.”
—Richard Feynman
Computer vision is all about understanding the world from images
a busy downtown area with a lot of traffic and buildings.this is a picture of a busy downtown cross walk with several cars in the flow of traffic.cars passing on a street in the city....
Understand
Synthesize
![Page 7: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/7.jpg)
“Fish”
label Y
Learner
image X
Image classification
![Page 8: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/8.jpg)
Learner
image X
“Fish”
label Y
Image classification
![Page 9: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/9.jpg)
Learner
image X
“Fish”
label Y
Image classification
![Page 10: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/10.jpg)
…
Learner
image X
“Fish”
label Y
Image classification
![Page 11: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/11.jpg)
Image prediction (“structured prediction”)
Object labeling
[Long et al. 2015, …]
Edge Detection
[Xie et al. 2015, …]
Style transfer
[Gatys et al. 2016, …][Reed et al. 2014, …]
Text-to-photo
“this small bird has a pink
breast and crown…”
![Page 12: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/12.jpg)
Challenges
1. Output is high-dimensional and structured
2. Uncertainty in mapping; many plausible outputs
3. Lack of supervised training data
“this small bird has a pink
breast and crown…”
![Page 13: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/13.jpg)
A standard CNN for classification
Need new network modules to generate same-sized output!
![Page 14: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/14.jpg)
Deconvolution
Compare to convolution with different params
![Page 15: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/15.jpg)
Deconvolution
Also known as: transpose conv, upconv, ...
![Page 16: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/16.jpg)
UnetA popular network structure to generate same-sized output
![Page 17: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/17.jpg)
Image Colorization
![Page 18: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/18.jpg)
“What should I do” “How should I do it?”
![Page 19: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/19.jpg)
Color information: ab channelsGrayscale image: L channel
Objective function
(loss)
Neural Network
Training data
…
![Page 20: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/20.jpg)
“yellow”
…
![Page 21: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/21.jpg)
…
“black”
![Page 22: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/22.jpg)
…
![Page 23: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/23.jpg)
![Page 24: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/24.jpg)
Input Output Ground truth
Designing loss functions
![Page 25: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/25.jpg)
![Page 26: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/26.jpg)
Color distribution cross-entropy loss with colorfulness enhancing term.
Zhang et al. 2016
[Zhang, Isola, Efros, ECCV 2016]
Designing loss functions
Input Ground truth
![Page 27: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/27.jpg)
![Page 28: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/28.jpg)
Image colorization
Designing loss functions
L2 regression
Super-resolution
[Johnson, Alahi, Li, ECCV 2016]
L2 regression
[Zhang, Isola, Efros, ECCV 2016]
![Page 29: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/29.jpg)
Image colorization
Designing loss functions
Cross entropy objective,
with colorfulness term
Deep feature covariance
matching objective
[Johnson, Alahi, Li, ECCV 2016]
Super-resolution
[Zhang, Isola, Efros, ECCV 2016]
![Page 30: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/30.jpg)
Universal loss?
… …
![Page 31: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/31.jpg)
…
Generated
vs Real(classifier)
[Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-
Farley, Ozair, Courville, Bengio 2014]
“Generative Adversarial Network”
(GANs)
Real photos
Generated images
…
…
![Page 32: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/32.jpg)
Conditional GANs
[Goodfellow et al., 2014]
[Isola et al., 2017]
![Page 33: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/33.jpg)
Generator
[Goodfellow et al., 2014]
![Page 34: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/34.jpg)
G tries to synthesize fake images that fool D
D tries to identify the fakes
Generator Discriminator
real or fake?
[Goodfellow et al., 2014]
![Page 35: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/35.jpg)
fake (0.9)
real (0.1)
[Goodfellow et al., 2014]
![Page 36: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/36.jpg)
G tries to synthesize fake images that fool D:
real or fake?
[Goodfellow et al., 2014]
![Page 37: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/37.jpg)
G tries to synthesize fake images that fool the best D:
real or fake?
[Goodfellow et al., 2014]
![Page 38: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/38.jpg)
Loss Function
G’s perspective: D is a loss function.
Rather than being hand-designed, it is learned.
[Isola et al., 2017]
[Goodfellow et al., 2014]
![Page 39: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/39.jpg)
real or fake?
[Goodfellow et al., 2014]
![Page 40: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/40.jpg)
real!(“Aquarius”)
[Goodfellow et al., 2014]
![Page 41: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/41.jpg)
real or fake pair ?
[Goodfellow et al., 2014]
[Isola et al., 2017]
![Page 42: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/42.jpg)
real or fake pair ?
[Goodfellow et al., 2014]
[Isola et al., 2017]
![Page 43: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/43.jpg)
fake pair
[Goodfellow et al., 2014]
[Isola et al., 2017]
![Page 44: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/44.jpg)
real pair
[Goodfellow et al., 2014]
[Isola et al., 2017]
![Page 45: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/45.jpg)
real or fake pair ?
[Goodfellow et al., 2014]
[Isola et al., 2017]
![Page 46: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/46.jpg)
BW → Color
Input Output Input Output Input Output
Data from [Russakovsky et al. 2015]
![Page 48: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/48.jpg)
Labels → Street Views
Data from [Wang et al, 2018]
![Page 49: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/49.jpg)
Day → Night
Input Output Input Output Input Output
Data from [Laffont et al., 2014]
![Page 50: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/50.jpg)
Edges → Images
Input Output Input Output Input Output
Edges from [Xie & Tu, 2015]
![Page 51: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/51.jpg)
Ivy Tasi @ivymyt
Vitaly Vidmirov @vvid
![Page 52: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/52.jpg)
Image Inpainting
Data from [Pathak et al., 2016]
![Page 53: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/53.jpg)
Pose-guided Generation
Data from [Ma et al., 2018]
![Page 54: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/54.jpg)
Challenges —> Solutions
1. Output is high-dimensional, structured object
2. Uncertainty in mapping; many plausible outputs
3. Lack of supervised training data
“this small bird has a pink
breast and crown…”
—> Use a deep net, D, to analyze output!
—> D only cares about “plausibility”, doesn’t hedge
![Page 55: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/55.jpg)
Challenges —> Solutions
1. Output is high-dimensional, structured object
2. Uncertainty in mapping; many plausible outputs
3. Lack of supervised training data
“this small bird has a pink
breast and crown…”
—> Use a deep net, D, to analyze output!
—> D only cares about “plausibility”, doesn’t hedge
![Page 58: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/58.jpg)
![Page 60: CS5670: Computer Vision - Cornell University · Slides fromPhilipp Isola. Announcements •Project 5 due Friday, 5/10 at 11:59pm ... Real photos Generated images ... Day →Night](https://reader033.fdocuments.us/reader033/viewer/2022060604/6058bdfc289fd13ebb6d5991/html5/thumbnails/60.jpg)
Questions?