2 - 2 - Image and Video Signals (18-56)

download 2 - 2 - Image and Video Signals (18-56)

of 8

Transcript of 2 - 2 - Image and Video Signals (18-56)

  • 8/12/2019 2 - 2 - Image and Video Signals (18-56)

    1/8

  • 8/12/2019 2 - 2 - Image and Video Signals (18-56)

    2/8

    Text and grayscale images are certainlytwo dimensionalsignals where they're independentvariables to present space.I can have s, x, y, for example, I cancall it.While color, multispectral, hyperspectralimages, we'll saya few words about them, have twospecial coordinates, and then one can lookat the amplitude in two different ways.The one is to say that the amplitude isnotscalar anymore, but the vector three byone vector whenit comes to colors since they have threedifferent channelsor a seven by one, let's say, for amultispectral image.Or, they can viewed as three dimensionalimages with two spacial and one spectralcoordinates.Video is a 3D signal.It has two spacial and one temporal

    coordinates,while a 3D volume has three spacialcoordinate x, y, z.As an example of a four dimensional signalI can look at their volumeand x, y, z signal that changesover time, so time is the fourthindependent variable.Some of the tools that we use to describesignals carry overfrom 1D to 2D to MD is the straightforward extension onejust adds one more variable, and

    everything remains the same in some sense.On the other hand, there are certainresults that hold truefor undimensional signals, but they cannotbe generalized for higher dimensionalsignals.Images and videos are clearly the focus ofthisclass and the images are two dimensionalsignals butcan also be three dimensional, where wetalk about multispectral,hyperspectral images while video is a

    three dimensional signal.So we are a dimension.So let us look now at some examples,representative examples of, of signals.Tones are examples of one-dimensionalsignals.On top you see a sine wave that only hasone frequency present in it, at 660 hertz.That's why it's called a pure tone.Let us listen to that.

  • 8/12/2019 2 - 2 - Image and Video Signals (18-56)

    3/8

    [SOUND]At the bottom, you see a square wave.It also has the frequency at 660 hertz.However, it has additional frequencies,the so-called harmonic.Let's listen to that as well.[SOUND].It should be clear that the bottom squarewave is richer in the sense ofhaving additional frequencies down theproton the sinus wave.Here we have another example of anotheronedimensional signal, it's a piano piecefamiliar to, Iassume most of you, and this is asynthesizedpiece, and we can listen to that as well.[MUSIC]Here are some examples of the images.On the left, you see a binary image, thatis I only have two bits torepresent the different colors, I have ablack and a white value here.

    Binary images, text images like this oneare used in the fax encoding whenI want to transmit such text from point ato point b through the fax machine.In the middle, you see the again, an eightbits per pixelimage, while on the right is a 24 bit perpixel image.This is a true color image.I have two to, to the 24 different colorvalues.This should be around 16 million [SOUND]different colors to represent such an

    image.There are different ways to represent thecolor image, one of which isshown here, in terms of three differentchannels, the red, green and bluechannels.This is an RGB decomposition of the image.So, each of these channels is a, black andwhite image.I use eight bits to represent it, so eachof the channels here has two, 256different values and eight bits perchannel times

    three, therefore 24 bits to represent thecolor image.So, if we look at this you notice the noseof the mandrill here is quite red.And therefore, we see that the pixelvalues in the red channel are quite high.Why it is represented by height?Values closer to 56, while 0-55 so closeto55 while the darker values are closer to

  • 8/12/2019 2 - 2 - Image and Video Signals (18-56)

    4/8

    zero.So, the red nose has high values in theredchannel and pretty small values in theother three channels.[INAUDIBLE].On the other hand if I look at the cheeksof the mandrill this is the variation ofblue not exactly the blue color, so I seehigh values inthe red, I'm sorry, in the blue channeland there are the smaller values, welldefinitely smaller than the red and semisomewherein the middle in the green channel, right?Actually this particular value apparentlyis a combination of green and blue.The Landsat program is responsible for theacquisition of satellite imagery of, ofthe earth.It started in 1972, and the most recentsatellite that Landsat 8 was launched inthis year.LANDSAT 7 data, here's eight spectral

    bands with special resolutions rangingfrom 15 to 60 meters, and the temporalresolution is 16 days.The main instrument on-board LANDSAT 7 isthe enhanced thematic marker plus, ETMplus.The resolution is 30 meters, except bandsix,that has 60 meter resolution, and bandeight, that has 50 meter resolution.Band eight by the way, is the panchromaticband.So you have low, or high rather

    special resolution, 15 meters, while wehave low spectral resolution.By the way, this is the visible range oflight here.So this is the blue, green, red channel,while the rest are infrared.LANDSAT data have helped to improve ourunderstanding of earth, and due to LANDSATtoday,we have a better understanding of thingsasdiverse coral reefs, tropicaldeforestation and Antarctica's glaciers.

    So here's an example of a LANDSAT 7 imageof the city of Amsterdam.Up here you see the blue, green and redchannels,and the rest of the channels for otherchannels are infrared.By the way hyperspectral images are theones that have many more bonds, up to 200and 300 bonds, but the main characteristicof

  • 8/12/2019 2 - 2 - Image and Video Signals (18-56)

    5/8

    the, of the bonds are much closer spaced.Let's compare it to the multispectralimages.Very often in order to be able toappreciatethe LANDSAT the images, the visible bandsare combined.This is what you see of the left image.This is by the way, the LANDSAT image ofthe City of London.So, these natural looking images arefamiliar with the human eyes.So, the reservoirs here close to Heathroware shown indark blue, while the city itself is shownin gray.So these images offer very good viewsof city infrastructure, sediment and alsoby symmetry.We can tell how, how deep the water isover there.however, the Band 1-3 image, it's notuseful, it's difficultto distinguish between different types of

    vegetation and between clouds and snow.So, therefore in the middle image, wecombinethe three channels, channel four, two andthree.In other words one infrared bond and thegreenand blue channels, this is a widely usedcombination, that'sespecially used for studies of vegetation,since the differenttypes of plants, reflect the infraredlight in different ways.

    Now another combination is the seven,four, two shown on the right.So two infrared bands plus green, andthis combination is especially useful forgeological andagricultural studies, because band sevencan helpdiscriminate between various types of rockand mineral.And bright green here indicates vegetationwhilethe water appears dark blue or black.Now our natural world is three dimensional

    while the images are two dimensional.They present the projection of a three Dworld on to the two D plane.In order to perceive depth, we need twoeyes.so, we need to cameras that will capturethe samescene, and therefore they will emulate thehuman visual system.So, here you see on the left what the left

  • 8/12/2019 2 - 2 - Image and Video Signals (18-56)

    6/8

    camera sees, and on the right, what theright camera sees.And the difference between these twoimages is the so-called disparity map.It tells us how each and every pixel movedfromone image, and going from one image to thenext.Or if I use disparity map, I can map theleft image onto the right image.The disparity map relates to the depth in,in the image.So the two channels infuse the propertywill give us thedepth perception, and the fusion could bedone through a redand blue channel and therefore I can usethe color codedglasses or I could use glasses withdifferent polarization and so on.Here's another example of a stereo image.In this case this is a LANDSAT image,actually two LANDSAT images were usedthat were acquired of the same scene that

    were acquired one year apart.So the satellite does not go exactly overthe sameposition necessarily and the two imagesare represent two different imagesfrom the left and right channel of thesame scene, andtherefore they can be combined to give usthe depth perception.You can obtain depth information alsousing the kinect camera.It projects a known pattern onto the sceneand infers depth from the information from

    the pattern.So the kinect combines structural lightwith two computervision techniques depth from focus anddepth from stereo.The kinect is intended to use with theXbox and therefore it is ofinterest there to compute the depth mapbut also infer the body position.So you see here the skeleton of the personon, on the left image, right.So here is the person, this is the depthmap,

    and also the kinect provides a visibleimage shown here.So you see the visible and thecorresponding depth image that showedwhere the person stands and where the dooris and so on.[BLANK_AUDIO]We can see here [SOUND] short video butshows both the visibleimage as well as the depth image acquired

  • 8/12/2019 2 - 2 - Image and Video Signals (18-56)

    7/8

    by the kinect camera.[BLANK_AUDIO]Very often we're interested in capturingthe three-dimensional structure of anobject.So, instead of using two cameras as in thestereo case, we use many cameras on aspecific rig.So, as you can see here, the image ofthis particular object is viewed from manydifferent angles.Here's an example of a video.A video consists of individual frames, andone could arguethat a video is nothing else than acollection of images,and therefore, if I have an outgoing thatis effective inprocessing an image, a still frame, asit's called.Also, I could apply the same algorithm toframe after frame and I will be done.However, what is special about video isthat these frames are highly correlated

    and therefore, I can gain if in processingsuch frames.I take this correlation into account.Of course these frames, if they'redisplayed at some frame rate, 30 framesper second, for example, one can perceivethe actual motion indice.Finally, processing in the title of thecourse means the manipulationof the values of an image or a video bycomputer.So as the resulting image is more usefulto us, or has some desirable properties.

    For example the result of processing mightbe the removal of blur,as is the case here you, you see, theinput to the systemis an aerial photograph that is blurredwith the motion between thecamera and the scene and the output is asharpened, a restored image.Now if an image is input, an image isoutput this has been the kind of narrowdefinition of processing, or so but if Ido as filtering, and the broader meaningof processing that we adopt here is that

    an image or video can be input toa system and we're interested inextracting importantfeatures from such an image or we'reinterestedin making decisions based on the image.The best example here shows this is achest x-rayand the input and based on the analysisthat would be

  • 8/12/2019 2 - 2 - Image and Video Signals (18-56)

    8/8

    performed whether there's a malignanttumor or not, for example,certain decisions and actions will be theoutput of the system.As time goes on, what we see is that theboundary is between tradition andseparate areas become fuzzy or in otherwords there's overlap between thesetraditionally separate areas.So, when it comes to signal processing andmore specifically betweenwith the video processing that we'll bedealing with in this class.There is overlap between the, this fieldand thefields of communication, computer vision,machine learning and optimization.