Akanksha Saran Priyam Parasharasaran/reports/saliency.pdf · 2017. 6. 5. · To analyze video...

1
Extracting Interest Regions from Egocentric Image Streams and Comparison with Existing Saliency Techniques Akanksha Saran Priyam Parashar Estimating Interesting Frames via Optical Flow: Adhering to our assumptions, in order to find the interesting frames we train an SVM to distinguish between minimal motion frames and heavy motion frames. For this we use post-processed optical flow estimation of the video feed. REFERENCES 1. A benchmark of computational models of Saliency to predict Human Fixations, Judd, Tilke; Durand, Frédo; Torralba, Antonio 2. "Graph-Based Visual Saliency", J. Harel, C. Koch, and P. Perona, Proceedings of Neural Information Processing Systems (NIPS), 2006. 3. ”Secrets of optical flow estimation and their principles” Sun, D., Roth, S., and Black, M. J., IEEE Conf. on Computer Vision and Pattern Recog., CVPR, June 2010.. 4. ”A saliency map based on sampling an image into random rectangular regions of interest”, Tadmeri Narayan Vikram, Marko Tscherepanow , Britta Wrede 5. “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis” Laurent Itti, Christof Koch, and Ernst Niebur, IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 20, NO. 11, NOVEMBER 1998 FUTURE WORK 1. Ability to detect non-static regions of interest for the user in a similar setting 2. Ability to handle large ego motion around the object of interest ASSUMPTIONS For our implementation, we make the following assumptions and provid a sandboxed definition to the term ‘Regions of Interest’ 1. Interesting Frames : The user is interested in a frame if he/she spends a considerable time looking at the same scene for around 2-4 seconds 2. Region of Interest : The item that the user is interested in and is looking at will always be at the center of the visual feed OBJECTIVE To analyze video streams from first-person view wearable devices like the Google Glass and estimate the regions of interest that the user spent time looking at, say a concert poster, a business card, et cetera. Once a good estimation has been made about the fixation points, we extract out the relevant objects of interest from the image feeds for future user reference. Employing Saliency to Identify the Regions of Interest: We employ two methods complying with our assumptions to get the ROI : 1. Image Gradient based Saliency : We blur out the peripheral part of the image (as per our assumption) and compute the gradient of the central part. High variance given by high gradient magnitude defines the ROI in the frame. 1. DoG Mean-shift: We compute DOG key points (using Principle Curvature) for the interesting frames and cluster them based on the population density using Mean-shift clustering. Next we determine which cluster is closest to the center and assign it as the most salient point of the Region of Interest Extracting the Objects of Interest from Interesting Frames (Segmentation): Finally we extract or segment out the item of interest from the frame based upon our estimation of the ROI. Segmentation techniques used by us : 1. Watershed : This technique was a big fail as it couldn’t really handle our approximations well 2. Varying size Grab-cut: Automatically choosing the center as our salient point, we segment out three varying sized regions from the frame 3. Grab-cut with DoG key points Cluster center: Choosing the center of DoG key points cluster as the salient point, grab-cut was employed giving it a rough initialization of the foreground and background 4. Gradient Dilation : Thresholding the gradient provided us with higher variation regions which we dilated by a fraction of the image size. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 GBVS Gradient based Mean shift Grabcut Ittikoch RCSS Comparison of saliency techniques Area under ROC Curve Similarity with Ground Truth Mean False Alarms 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Centered Grabcut small Centered Grabcut medium Centered Grabcut Large DOG Mean shift Grabcut Gradient based segmentation F-1 measure for different segmentations F1-measure Segmentation for target selection GBVS DoG MShift Gradient Itti Koch RCSS Probability Map (GT) Heat Map (GT) Above we show different comparison statistics between other saliency techniques and our techniques. On the left we report average measures for all methods across seven static frames we found during our experiment. On average, GBVS seems to give the best results followed by our two algorithms, followed by RCSS and then Ittikoch method. DoG Meanshift Grabcut Gradient based segmentation Ground Truth Centered Grabcut medium Centered Grabcut large Centered Grabcut small The best segmentation results in terms of higher F1 measure do not necessarily store the entire object of interest in context for the user. COMPARISON Original Image Visual Saliency ROC Curves Similarity Probability Map Heat Map Optical Flow between two frames demonstrating heavy motion such as walking down a hallway Optical Flow between two frames demonstrating minimal motion near an object of interest, in this case a fire extinguisher in a hall way Computer Vision 16-720 A Final Project Presentation

Transcript of Akanksha Saran Priyam Parasharasaran/reports/saliency.pdf · 2017. 6. 5. · To analyze video...

Page 1: Akanksha Saran Priyam Parasharasaran/reports/saliency.pdf · 2017. 6. 5. · To analyze video streams from first-person view wearable devices like the Google Glass and estimate the

Extracting Interest Regions from Egocentric Image Streams and Comparison with Existing Saliency Techniques

Akanksha Saran Priyam Parashar

Estimating Interesting Frames via Optical Flow:Adhering to our assumptions, in order to find the interesting frames we train an SVM to distinguish between minimal motion frames and heavy motion frames. For this we use post-processed optical flow estimation of the video feed.

REFERENCES1. A benchmark of computational models of Saliency to predict Human Fixations, Judd, Tilke; Durand, Frédo; Torralba, Antonio2. "Graph-Based Visual Saliency", J. Harel, C. Koch, and P. Perona, Proceedings of Neural Information Processing Systems (NIPS), 2006.3. ”Secrets of optical flow estimation and their principles” Sun, D., Roth, S., and Black, M. J., IEEE Conf. on Computer Vision and Pattern Recog., CVPR, June 2010..4. ”A saliency map based on sampling an image into random rectangular regions of interest”, Tadmeri Narayan Vikram, Marko Tscherepanow , Britta Wrede5. “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis” Laurent Itti, Christof Koch, and Ernst Niebur, IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 20, NO. 11, NOVEMBER 1998

FUTURE WORK1. Ability to detect non-static regions of interest for the user in

a similar setting2. Ability to handle large ego motion around the object of

interest

ASSUMPTIONSFor our implementation, we make the following assumptions and provid a sandboxed definition to the term ‘Regions of Interest’1. Interesting Frames: The user is interested in a frame if he/she spends a considerable time looking at the same scene for around 2-4 seconds2. Region of Interest: The item that the user is interested in and is looking at will always be at the center of the visual feed

OBJECTIVETo analyze video streams from first-person view wearable devices like the Google Glass and estimate the regions of interest that the user spent time looking at, say a concert poster, a business card, et cetera. Once a good estimation has been made about the fixation points, we extract out the relevant objects of interest from the image feeds for future user reference.

Employing Saliency to Identify the Regions of Interest:We employ two methods complying with our assumptions to get the ROI :1. Image Gradient based Saliency: We blur out the peripheral part of the image (as per our assumption) and compute the gradient of the central part. High

variance given by high gradient magnitude defines the ROI in the frame.

1. DoG Mean-shift: We compute DOG key points (using Principle Curvature) for the interesting frames and cluster them based on the population density using Mean-shift clustering. Next we determine which cluster is closest to the center and assign it as the most salient point of the Region of Interest

Extracting the Objects of Interest from Interesting Frames

(Segmentation):Finally we extract or segment out the item of interest from the frame based upon our estimation of the ROI. Segmentation techniques used by us :1. Watershed: This technique was a big fail as it couldn’t really handle our

approximations well2. Varying size Grab-cut: Automatically choosing the center as our salient

point, we segment out three varying sized regions from the frame

3. Grab-cut with DoG key points Cluster center: Choosing the center of DoGkey points cluster as the salient point, grab-cut was employed giving it a rough initialization of the foreground and background

4. Gradient Dilation: Thresholding the gradient provided us with higher variation regions which we dilated by a fraction of the image size.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

GBVS Gradient based Mean shiftGrabcut

Ittikoch RCSS

Comparison of saliency techniques

Area under ROC Curve Similarity with Ground Truth Mean False Alarms

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

CenteredGrabcut

small

CenteredGrabcutmedium

CenteredGrabcut

Large

DOG Meanshift Grabcut

Gradientbased

segmentation

F-1 measure for different segmentations

F1-measure

Segmentation

for target selection

GBVS DoG MShift Gradient Itti Koch RCSS

Probability Map(GT)

Heat Map(GT)

Above we show different comparison statistics between other saliency techniques and our techniques. On the left we report average measures for all methods across seven static frames we found during our experiment. On average, GBVS seems to give the best results followed by our two algorithms, followed by RCSS and then Ittikoch method.

DoGMeanshiftGrabcut

Gradient based segmentation

Ground

Truth

Centered Grabcutmedium

Centered Grabcutlarge

Centered Grabcutsmall

The best segmentation results in terms of higher F1 measure do not necessarily store the entire object of interest in context for the user.

COMPARISON

Original Image

Visual Saliency

ROC

Curves

Similarity

Probability Map

Heat Map

Optical Flow between two frames

demonstrating heavy motion such

as walking down a hallway

Optical Flow between two frames demonstrating

minimal motion near an object of interest, in this

case a fire extinguisher in a hall way

Computer Vision

16-720 AFinal Project

Presentation