Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light...

31
1E8 Introduction to Electrical Engineering Image and Video Processing Or Electronics is not all about Circuits Dr. Anil Kokaram www.mee.tcd.ie/~ack -> Teaching [email protected] You have been learning about Resistors, Inductors, Capacitors and now electric circuit design in 1E7 [do NOT miss those labs] But electronics is more than circuit design This course in a way shows you that Engineering is more about problem solving than one particular discipline Content Introduction to Image and Video Processing Human Visual Perception Cleaning Dirty Pictures (Motion Picture Restoration) Image and Video Compression Digital Compositing in the Movies

Transcript of Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light...

Page 1: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

1

1E8 Introduction to Electrical EngineeringImage and Video ProcessingOr Electronics is not all about CircuitsDr. Anil Kokaram

www.mee.tcd.ie/~ack -> [email protected] have been learning about Resistors, Inductors, Capacitors and now electric circuit design in 1E7 [do NOT miss those labs]But electronics is more than circuit designThis course in a way shows you that Engineering is more about problem solving than one particular discipline

ContentIntroduction to Image and Video ProcessingHuman Visual PerceptionCleaning Dirty Pictures (Motion Picture Restoration)Image and Video CompressionDigital Compositing in the Movies

Page 2: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

2

What we’re really doing …How does DVD/DTV work?What the hell is MPEG and JPEG exactlyWhat is Digital Cinema? Who cares?What is digital compositing and how do they use it in the movies?What do people doing research actually do? And what have they done lately anyway? Who cares?

The Digital Image

The basic idea is to represent the continuous coloursin a real image with a set of numbers from a fixed range at a subset of locationsThe smallest element of a digital picture is called a Pixel (Picture Element)Typically, a television image is created with 576 lines of 720 pixels eachA film frame is represented with over 2048 lines of 2880 pixels each

Page 3: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

3

The Digital Image

101 109 11099 90 94

112 123 108

123 131 141121 112 118134 145 132

38 46 6575 66 8688 99 100

r

g

b

The Digital Imageh

k

I([h,k])

0

I([50,50])=[40 70 200]

Page 4: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

4

The Rise of Digital Visual Media

Began in early 1960’s with NASARise of DIP coincides with availability of good picture reproduction/printing. [Why do DIP if you can’t see the results?]Last 5 years has seen exponential increase in Dvisualdevices. DV cameras, D camerasDTV for last 4 years. Free SKY in IrelandDigital Video (Versatile) Disc allows movies to be played on a CD-like deviceInternet streaming video, Real Networks, PacketVideo (video for mobile phones), PDAs

Complex systemsDevices and media require increasingly complex system designImage and Video compression one of the key technologies that enabled consumer devices like D CamerasDesigner needs to understand compromises to be made in handling visual mediaALOT of electronic circuit design is about making hardware for Digital Video Processing in Mobile Phones these days

Page 5: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

5

Motivation: Impact of Digital Media

TV Services merging with Internet? DVD Replacing VHS. Digital Television works.Watching TV on your PCGames consoles = TV = PC = internet access = DVD player [Sony PlayStation II]Cheap, high quality broadcastingEquality. Small producers can reach the same audience as large conglomeratesSo many “channels” but nothing on!!

Motivation: Impact of Digital Media

Digital TV broadcasters cannot find contentMobile operators looking for more interesting content for their phone users e.g. through WAP or iMODECompelling content in demandContent creators = producers of movies, editors of live events, character generatorsArchives more importantTools for Digital Movie making more important

Page 6: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

6

Motivation: Automated Restoration

Archive material in demandDVD re-release importantPictures in bad shapeNeed Automated restorationA challengeCompression is improved (Bonus!) Jitter

Ghosting

Dropout

Dirt and Sparkle

DIY De-Blotching(To get you thinking …)

Detect then InterpolateArea occluded in next frame

Area uncoveredfrom the previous frame

n-1

n

n+1

Page 7: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

7

A simple example

Some dirty pictures

n-1

n

n+1

Page 8: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

8

Simple Detection

Hmmm … not so good. Too many false alarms due to bad motion

A Model for Blotches(How to make Dirty Pictures)

x

x

+Original

Corruption Data Location of Corruption

Location of OK pixels

Page 9: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

9

Better Detection

Use more knowledge: Blotches are flat and chunky

What about reconstructing the Picture?

n n+1 n+2n-1n-2

Page 10: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

10

Motion is key for Picture Building

n n+1 n+2n-1n-2

But we don’t know the motion

n n+1 n+2n-1n-2

Page 11: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

11

Trick is to unify all sources of information

Motion

Picture

Blotches

Motion Smoothness

Picture Smoothness

Spatial Smoothness

Blotch Colour

Needs probabilistic framework. Need to know about probability and statistics

Motivation: Automated Restoration

Removing blotches, lines, noise

Page 12: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

12

Motivation: Robust Video Transmission

Video comms over difficult channels: Wireless, InternetMPEG4 (Motion Picture Experts Group) partly addresses thisBUT encoder techniques not defined: Object Segmentation/Tracking still a major challengeAND a few errors can lead to big problems. Error detection/correction/concealment v. important.

Motivation: Robust Video Transmission

MP4 Correctly Rec’d MP4 with errors

Page 13: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

13

Motivation: Robust Video Transmission

MP4 with errors Corrected with video processing

Motivation: Digital Special Fx (Rig Removal)

www.mee.tcd.ie/~sigmedia/postpro/postproSee paper on www.mee.tcd.ie/~sigmedia

Tool being used in movies now.Produced by www.thefoundry.co.uk

Page 14: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

14

Motivation: Content Protection

Digital Media easy to copyNeed to protect rights of exploitationDigital Watermarking allows an i.d. signature to be embedded invisibly into pictures thus connecting the actual data to an owner or legitimate exploitation chainSeminal work done in EEE, TCD around 1996

Motivation: Digital Cinema

Digital Cinema possible because of digital projection (Texas Instruments)Cheaper film <-> digital scannersNew Digital Cameras (Star Wars will use Panasonic HD Cameras)Allows more control by distributors, better reproduction [no film developing accidents ]BUT one frame approx. 1920 x 3815 = 7MB!Good Quality Compression needed badly

Page 15: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

15

Motivation: Information management/Retrieval [DVD Example]

Somebody has to generate the DVD IndexUse Storyboard (sometimes not available)But need FRAME accurate location (24 or 25 frames per second)Editors do this all the time e.g. news, live events. But have to watch the whole event.Painful to do by hand; possible but painful

Motivation: Information management/Retrieval

Indexing is part of a BIGGER problemWant to access digital media in a way relevant to usersYou are familiar with text searching on Internet e.g. AltaVista, Google sitesYou want to look something up: just type in a keyword combinationWorks pretty good if your keywords exist in the documents to be searched i.e. text is OK

Page 16: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

16

Motivation: Information management/Retrieval

Want to be able to search digital media in the same wayBut associating keywords to audio or video is hard in generalPeople want to access the same data for different reasonsIn other words: people describe the same audio and video using different keywords

Picture archives like www.bridgeman.co.ukBook publisher wants a picture from the Manetcollection with predominantly red shades, and trees in the background. HUH?Nobody thought of keywording pictures for predominant colour when it was first put into the database !!!!So ask the 4 people in cataloguing

if they can REMEMBER seeing a picture like that ??

Motivation: Information Retrieval Example

Page 17: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

17

Motivation: Information Retrieval Possible

Can solve these problems by using Signal/Image ProcessingAnalyse the data itself AS THE NEW QUERY OCCURSPredominant Colour : easy to automate: remember RGB pixels?Spotting famous people in video: harder but possibleBiometrics ….

Motivation: Information Retrieval Automated Storyboarding? [DVD]

Want to automatically segment the movie into Sematically Meaningful Scenes/ChaptersHard to make a signal processing algorithm which understands movies (don’t even try…)The basic building block is the SHOTCan detect each Shot automatically by detecting cuts

Page 18: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

18

Motivation: Information Retrieval Automated Storyboarding? [DVD]

Shot 1 Shot 2 Shot 3 Shot 4

Cut 1 Cut 2 Cut 3

Motivation: Shot Cut DetectionYour first Image Processing Algorithm

How to detect cuts automatically?Cuts = consecutive images that show drastic change between shotsNeed to define cuts MathematicallyRemember Digital Images are composed of PixelsEach pixel is associated with 3 numbersAny ideas? (see next slide for reminder)

Page 19: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

19

The Digital Image (a reminder)

h

k

I([h,k]) 0

I([50,50])=[40 70 200]

A Video sequence is just a bunch of frames recorded at regular intervals in time.

Frame 1

Frame 2

Frame 3

)(xI

Using position vector notation

Motivation: Shot Cut DetectionYour first Image Processing Algorithm

∑ −−=x

xx |)()(| 1nnn IIe

0 5 0 0 1 0 0 0 1 5 0 00

1 0

2 0

3 0

Frame Number

Page 20: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

20

Motivation: Shot Cut DetectionYour first Image Processing Algorithm

Frame Difference not so goodHistograms better (will show you what these are later)

Image Moments for Sports Parsing

Content Based Analysis for Video from Snooker Broadcasts, H. Denman, N. Rea and A. Kokaram, International Conference on Image and Video Retrieval (CIVR), 2002, July, London,UK

Page 21: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

21

Content Retrieval for Snooker at www.mee.tcd.ie/~sigmedia

Rea, DenmanKokaram

Tracking with a particle filterUsing histograms for matching

Content Based Analysis for Video from Snooker Broadcasts, H. Denman, N. Rea and A. Kokaram, to be published in Computer Vision and Image Understanding Journal (CVIU): Special Issue on Video Retrieval and Summarization

Multimodal Fusion (?) www.mee.tcd.ie/~sigmedia

Dahyot, Rea, Denman,Delacourt, Kokaram

Joint Audio Visual Retrieval for Tennis Broadcasts, R. Dahyot, A. Kokaram, N. Rea and H. Denman, International Conference on Acoustics, Speech, and Signal Processing, 2003

Page 22: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

22

Conferences/Journals/etc

ICASSP, ICIP (Research/Industrial)NAB, IBC (Industrial)www.citeseer.orgIEEE Trans IP, Ccts Sys Video Tech, PAMI, SP, SPL, Systems & Cybernetics [H/M Interfacing, Sensors etc]EURASIP SP, Image Comms.SIGGRAPH: Cool conference for movie special fx industrywww.howstuffworks.com

OverviewImage and Video Processing useful for more than just compressionMany interesting new areas driven by availability of new devicesInformation Retrieval from Digital Media requires even more understanding of Image and Video analysis [MPEG7 on its way]Lets have a look at some background …

now for Sampling then Perception

Page 23: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

23

Sampling an audio signalOriginal CD Audio 44.1 KHzDownsampled by 4 = 11.02 KHz (no anti-aliasing)Downsampled by 8 = 5.5 KHz (no anti-aliasing)Downsampled by 8 = 5.5 KHz (with anti-aliasing)Sampling frequency affects the position of the samples in time hence affects frequency content of signal (sortof)

Quantisation of the samples to make a digital signal

Original CD Audio 16 bit 44.1 KHz (65536 levels)8 bit Quantization (256 levels)4 bit Quantisation (16 Levels)2 bit Quantisation (4 Levels)Quantisation introduces NOISE into the digital signal because the accuracy of the digital samples as compared to the analogue signal is affected

Page 24: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

24

Vision : The Human Visual System (HVS)

Light is focussed onto the retinaElectrical Impulses from the retina are

chanelled by the optic nerve to the Visual CortexThe Visual Cortex does a whole bunch

of smart things including filtering, object recognition, edge detection.In `primitive animals’ A LOT of

processing happens just behind the retina. Frogs and Rabbits have TEMPLATES for spotting birds of prey.Our motion sensitivity is better at the

periphery of vision than at the centre. [Helps to avoid people sneaking up on you.]

Lens

PupilRetina

Optic NerveBlind Spot

(hence the usual card trick ..)

Intensity Sensitivity of HVS

next

Page 25: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

25

Foreground 167 Foreground 140

Objects appear to have similar brightness

latex

14 0 1 6 0 1 80 2 0 0 2 2 0 2 4 020

30

40

50

60

70

80

90

1 00

1 10

1 20

Gre

ysca

le (-

) and

Sen

sitiv

ity (-

-)

C o lum n

50 100 150 200 250 300 350 400 450 500

Spatial Freq. ResponseMach Banding

latex

Page 26: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

26

Meaning of Spatial Frequency

h

5h

tan(h/(5h)) = tan(1/5)-1-1

768 pels = 11.3 degrees

Monitor Display

CCIR rec 500

latex

Frequency Sensitivity

Grating increases in freq. Left to RightIntensity decreases verticallySensitivity given by j.n.d. junction latex

Page 27: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

27

Colour Spaces

Matlab demo color_wheels and why_hsv

Vision: Your colour sensitivity isn’t great cf intensity

Rods are active at low light levelsCones allow you to see colourThere are 100 Million Rods and 7

Million Cones in your retinaHence you SAMPLE luminance space A

LOT more finely (at a higher frequency) than COLOUR SPACEThe cells are arranged in a hexagonal

pattern .. Hence some suspect that hexagonal arrangements of light sensitive ccts in cameras is a good idea. Better than rectangular anyway.Fuji cameras claim to use hex grids,

while others use normal grids at higher densities of elements

Retina

Rod Cone

Page 28: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

28

Orig ina l a t full co lour re s o lution

Consequences of Colour Sensitivity

512 x 512 x 3

= 0.64 MB

Original Image

Subsampling Colour Planes

2:1 in bothdirections

Keep Discard

Page 29: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

29

4:1 Colour

Downsampling

OK

512 x 512 + 256 x 256 x 2 = 0.31 MB (1/2 bandwidth of original)

16:1 Colour

Downsampling

Still OK

512 x 512 + 128 x 128 x 2 = 0.24 MB (1/3 bandwidth of original)

Page 30: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

30

16:1 Luminance

Downsampling

Not good

128 x 128 x 3 = 0.04 MB (1/16 bandwidth of original)Latex

A Noisy PictureActivity Masking and Weber’s Law

Noise hard to see in Textured areas, easy in flat areasNoise harder to see in Bright Areas than dark Areas

Latex

I(h,k) I(h,k)+e(h,k)

Page 31: Contentack/teaching/1e8/introductory_lectures.pdf · Vision : The Human Visual System (HVS) Light is focussed onto the retina Electrical Impulses from the retina are chanelled by

31

Measuring Picture Quality

Matlab demo snr_mse