A Brief Overview of Computer Vision Jinxiang Chai.

A Brief Overview of Computer Vision

Jinxiang Chai

What is Computer Vision?

• Computer vision is the science and technology of machines that see.

• Concerned with the theory for building artificial systems that obtain information from images.

• The image data can take many forms, such as a video sequence, views from multiple cameras, or multi-dimensional data from a medical scanner

Applications

• Robot perception (e.g. an industrial robot or an autonomous vehicle, autonomous helicopter, humanoid robots).

Honda ASIMO Humanoid Robot

• Face detection• Face recognition• Posture/gesture recognition

(e.g., hand waving)• Environment recognition

(e.g., obstacles)

Applications

• Robot perception (e.g. an industrial robot or an autonomous vehicle, humanoid robots).

• Detecting events (e.g. for visual surveillance or people counting).

Detecting Events

• Customer tracking and activity analysis

Applications



• Modeling objects or environments

Modeling objects or environments

• Modeling buildings, plants, faces, cars etc.

Applications




• Interaction (e.g. as the input to a device for computer-

human interaction).

Interactions

• Interactions with computers and video games, etc.

Face recognition for automatic login

Computer vision for game interfaces (Sony eyetoy, Microsoft Kinect)

Applications




• Interaction (e.g. as the input to a device for

computer-human interaction). • Organizing information (e.g. for indexing databases of

images and image sequences).

http://en.wikipedia.org/wiki/Computer-human_interaction

Organizing information

• Flickr (www. Flickr.com) has 3 billion images

• Youtube has tons of videos.

• Need new ways to search, analyze, summarize a large collection of internet images and videos

Image Representation

An image is a 2D rectilinear array of Pixels

- A width X height array where each entry of the array stores a single pixel


A pixel stores color information

Luminance pixels - gray-scale images (intensity images) - 0-255 - 8 bits per pixel

Red, green, blue pixels (RGB) - Color images - Each channel: 0-255 - 24 bits per pixel


An image is a 2D rectilinear array of Pixels

- A width X height array where each entry of the array stores a single pixel

- Each pixel stores color information

(255,255,255)

Images

• Which kind of information you can obtain from images

Images


Edge detection

Images


Edge detection Corner& feature detection

Images



Geometric primitive detection

Images




Object detection

Images




Object detection Face alignment and recognition

……

How about multiple images?

• What can we obtain if we have multiple images?

How about multiple images?

• What can we obtain if we have multiple images?

Two images of the same scene

Structure and motion analysis• Given two or more images of the same scene or object,

estimate camera motion and 3D object structure (e.g., depth)

unknownunknowncameracamera

viewpointsviewpoints





How to estimate camera parameters?

- where is the camera?

- where is it pointing?

- what are internal parameters, e.g. focal length?





How to estimate camera parameters?

- where is the camera?

- where is it pointing?

- what are internal parameters, e.g. focal length?

Camera calibration!

Structure and motion analysis

• Reconstruct the depth information.

Input images

How to find the depth information of this point?



Input images

How to find the depth information of this point?

- find the corresponding point in the right image.



Input images



Input images

Depth images


• Reconstruct 3D models from multiple images

Reconstruction results from 23 images

All together video

• Click here- feature detection

- feature matching (epipolar geometry)

- structure from motion

- stereo reconstruction

- triangulation

- texture mapping

http://www.youtube.com/watch?v=jtyXkuNPpIg

How about video sequences?

• What can we obtain from video?

How about video sequences?

• What can we obtain from video?

Optical flow: where are pixels moving to?

How about multiple video sequences

• Modeling dynamic objects (video click here)

http://people.csail.mit.edu/jovan/assets/movies/vlasic-2008-ama.mp4

Modeling human motion from video

• Single-view camera

• Interactively construct human motion form video

A Brief Overview of Computer Vision Jinxiang Chai.

Documents

Transcript of A Brief Overview of Computer Vision Jinxiang Chai.