EE837, CS867, CE803 Computer Vision -...
Transcript of EE837, CS867, CE803 Computer Vision -...
Computer Vision
EE837, CS867, CE803
Introduction
Lecture 01 Computer Vision
• Basic linear Algebra, probability, calculus - Required
• Basic data structures/programming knowledge - Required
• Working knowledge of MATLAB - Required
• Knowledge and understanding of basic image processing - Preferable
Prerequisites
• Class slides, research papers, tutorials and supplemental material
• Linda G. Shapiro and George Stockman, “Computer Vision”, Upper Saddle River, NJ: Prentice Hall, 2001.
• David A. Forsyth and Jean Ponce, “Computer Vision A Modern Approach”, 2nd edition, Prentice Hall, Inc., 2003.
• Richard Szeliski , “Computer Vision: Algorithms and Applications”, Springer; 2011 edition, 2011. Made available online by the author: http://szeliski.org/Book/drafts/SzeliskiBook_20100903_draft.pdf
Extra
List of CV books http://homepages.inf.ed.ac.uk/rbf/CVonline/books.htm
Text and Reading
• Camera geometry and basic transformations
• Camera calibration and camera- parameters estimation
• Sources, shadows, shading and shape from shading
• Feature Extraction
• Texture Synthesis
• Template Matching and Image Registration
• Segmentation
• Vision based Tracking
• Multiple view geometry
Broader course topics
Homework – Mostly programming assignments: 15%
Midterm/Hourly: 15%
Surprise quizzes/attendance: 10%
Final Project: 30%
Final exam: 30%
Grading policy
• Homework/programming assignments:– Reports should be type-written– Code and program output are required
• Final Project:– Brain storm on project ideas– Project highlights – 10 minute each group– Individual or a group of max two with individual roles clearly defined– Type-written report upto 10 pages in CVPR format, with additional pages for commented codes as appendix. – Project presentation
• Late Policy:– No credit for late submissions
Grading policy
• Plagiarism is strictly prohibited• Cite the source• Negative marking will be done, where found
Grading policy
• Dr George Stockman Professor Emeritus, Michigan State University
• Dr Mubarak Shah Professor, University of Central Florida
• The Robotics Institute Carnegie Mellon University
Material citations
Any queries?
By appointment only:
Cdr Dr Hammad
PG 111
Preferably Tuesday and Wednesday11AM –Noon
Email: [email protected]
Lets start !!!
What is an image?
What we see
What a computer sees
What is an image?
What we see
What a computer sees
Where is the Sun?
Image Processing
Fourier TransformSampling, Convolution
Image enhancement Feature detection
What is Computer Vision?
• Inverse Optics
• Intelligent interpretation of Imagery
• Building a Visual CortexPart of the cerebral cortex responsible for processing visual
information
• No matter what your definition is…
Vision is complex…….but is FUN !!!
Difference between CV and IP
• Image processing: Process the output of sensors.Computer vision: Relates the output of the sensors to real world.
• Image processing: The output is a transformed image.Computer vision: The output is usually a decision.
• Image Processing: Signal processing.Computer Vision: Artificial Intelligence .
• Defect detection or automatic driving relates to ?Enhancing an image relates to ?
Lighting
Scene
Camera
Computer
Scene Interpretation
Components of a Computer Vision System
Image acquisition
Video clip
Sequence of images
16 images in succession that shows motion
Shape from shading
• Shade deceives human visual system• Changes the 3D shape• Gradual variation of the shading gives 3D information
(1,0,1) (-1,1,1) (-1,-1,1)
Shape from texture
3D from Shading
Shape from Shading
Shape from texture
• Same shape (circles) repeated, forms texture• Circles become ellipses at some places• Gives 3D cue• Texture can be used to recover 3D
Shape from motion
• Cannot understand just from dots that what it is• Humans have this capability to understand motion
Shape from motion
Optical flow
• Color wheel• Completing pixel wise motion
Sequence Raw optical flow
Optical Flow
Microsoft photosynth
• Panorama stitching• Can capture in amazing resolution and full 3D.• For anyone with a D-SLR (Single Lens Reflex) or a point-and-shoot camera.• https://photosynth.net/preview/about/
Video clip and mosaic
• Stitching images together
Applications of Computer Vision
• Face Recognition
• Object Recognition
• Video Surveillance and Monitoring
• Object detection, tracking and behavior analysis
• Remote Sensing: UAVs
• Robotics
• Computer Graphics
• And more ………….
Face Recognition
• Principle Components Analysis (PCA)• Fisher Linear Discriminant (FLD)
Face recognition
Facial expression
Surprised Smiling
Detecting driver alertness
Human detection
• Left – UAV image• Bounding boxes• Will learn basic techniques on how we can track these moving objects
Video surveillance and monitoring
• Automated surveillance systems – Detection and tracking
Object detection Object tracking Object classification
Activity recognition
Airport surveillance
Aerial imagery - UAVs
• Drones Military use
Instead of drones many want to brand the technology as "Unmanned Aerial Systems" (UAS) in preference over "drones.“
• Aerial surveying of crops• Acrobatic aerial footage in filmmaking• Search and rescue operations• Inspecting power lines and pipelines• Counting wildlife• Delivering medical supplies to inaccessible regions
Aerial imagery
Object tracking
Kernel tracking +blob tracking +
occlusion
Motion detection
Frame differencing + background modeling +
object segmentation
Camera motion compensation
Feature based +gradient
Event detection and tracking
Aerial imagery – Registration results
Aerial imagery – Detection results
Aerial imagery – Tracking results
Wide area surveillance
Wide area surveillance
Tracking results
Unmanned Ground Vehicle
• Comes under Robot vision
• Google Self driving car
• The system combines information from Google Street View with artificial intelligence software that combines input from:
• Video cameras inside the car• Identifying pedestrians and moving obstacles
• LIDAR sensor on top of the vehicle• For 3D map
• Radar sensors on the front of the vehicle• Position of distant objects
• Position sensor attached to one of the rear wheels• Locate the car's position on the map.
Unmanned Ground Vehicle
• Defense Advanced Research Projects Agency (DARPA) urban challenge
Human activity recognition
• Involves
• Events• Actions• Activities
• Different datasets available for analysis
Human activity recognition - datasets
• Weizmann action dataset• 10 actions• 09 actors per action
• KTH Data Set• 06 categories• 25 actors• 04 instances• 600 clips
Human activity recognition - datasets
• UCF Sports dataset• 9 actions• 142 videos
• IMAX multi-view dataset
Bench swing Dive Swing Run
Kick Lift Ride Golf swing Skate
Human activity recognition - datasets• UCF 50
Stereo• Regular camera lose 3D information• Microsoft Kinect sensor – game changer• Gives direct 3D information + RGB image• 50,000 different gestures – Challenge is that can you identify all/some of these
3D depth sensors
RGB Camera
IR LED Emitter
Array of microphones
Tilt motor
Binocular Stereo
Stereo• Regular camera lose 3D information
Range Scanning and Structured Light
High density crowded scenes
• Tracking required for:
• Crowd management• Public space design• Virtual environments• Visual surveillance• Intelligent environments• And more !!!
High density crowded scenes• Can we do tracking in this kind of crowd?
Political Rallies Religious Festivals Marathons High Density Moving Objects
High density crowded scenes• Can we do tracking in this kind of crowd?
• Average chip size 14 x 22 pixels• 492 Frames• Selected 199 athletes for tracking• Successfully tracked 143 athletes
High density crowded scenes• Can we do tracking in this kind of crowd?
High density crowded scenes• Can we do tracking in this kind of crowd?
• Average chip size 14 x 17 pixels• 453 Frames• Selected 50 athletes for tracking
High density crowded scenes• Can we do tracking in this kind of crowd?
Behaviors in crowded scenes• Can we identify the behavior of the crowd?
Image localization
Location in terms of Longitude (40.4419) Latitude (40.4419)
Input Output
• Image compared with database of images
Geospatial trajectory extraction• Sequence of images compared with database
Computer graphics• CV used for movies like Harry Potter, Avatar, Matrix etc
Layer based image composition
• Segmentation method• Green Chroma key screen• Green and blue differ the most in hue from skin colors
Virtual studiohttp://en.wikipedia.org/wiki/Chroma_key
Layer based video composition• Segmentation method
Layer based video composition• Segmentation method
Industrial robots vs low skilled workers