Visual Summary of Egocentric Photostreams by Representative Keyframes (BSc Ricard Mestre)

53
Visual Summary of Egocentric Photostreams by Representative Keyframes Author: Ricard Mestre Supervisor: Xavier Giró Date: Tuesday, 17th of February 2015 1

Transcript of Visual Summary of Egocentric Photostreams by Representative Keyframes (BSc Ricard Mestre)

Visual Summary of Egocentric Photostreams by Representative

Keyframes

Author: Ricard MestreSupervisor: Xavier GiróDate: Tuesday, 17th of February 2015

1

Contents● Collaboration● Motivation and goals● State of the art● Methodology● Evaluation● Conclusions and future work

2

Collaboration Collaboration with UB group BCNPCL (Barcelona Percepture Computer Laboratory)

3

Contents● Collaboration● Motivation and goals● State of the art● Methodology● Evaluation● Conclusions and future work

4

Motivation and goals

● Lifelogging with Narrative Clip

● Up to 2000 images/day

● A visual summary can help the memory of Alzheimer affected people

5

Motivation and goals● Extract a visual summary

of a day

○ Clustering strategy for event detection

○ Automatic selection of representative frames

6

Contents● Collaboration● Motivation and goals● State of the art● Methodology● Evaluation● Conclusions and future work

7

State of the art

Chandrasekar et al, “Efficient retrieval from large-scale egocentric visual data using a sparse graph representation” (CVPR Workshop 2014)

8

State of the art

Lu and Grauman, ”Story-driven summarization for egocentric video” (CVPR 2013)

9

Contents● Collaboration● Motivation and goals● State of the art● Methodology● Evaluation● Conclusions and future work

10

Methodology

Feature extraction Clustering Division-fusion Keyframe extraction

11

Feature extraction

● Convolutional Neural Networks (CNN) trained with ImageNet.

12

Jia et al, “Caffe: Convolutional Architecture for Fast Feature Embedding” (ACM MM 2014)

Methodology

Feature extraction Clustering Division-fusion Keyframe extraction

13

Clustering● Obtain separated events● Agglomerative clustering

14

cutoff parameter

Talavera, E., Dimiccoli, M., Bolaños, M., Aghaei, M., & Radeva, P. (2015). “R-Clustering for Egocentric Video Segmentation”. In 7th Iberian Conference on Pattern Recognition and Image Analysis (ACCEPTED).

Clustering: linkage method

● Different linkage methods

● Our case: average linkage

15

Methodology

Feature extraction Clustering Division-fusion Keyframe extraction

16

Division● Long events with short events inside

● Groundtruth labelling

17

1 2 3

18

Fusion● Short clusters (less than 5 images) are not

representative

● Join the short events into larger ones

19

?

?

20

Example of good segmentation

21

Example of good segmentation

22

Example of bad segmentation

23

Methodology

Feature extraction Clustering Division-fusion Keyframe extraction

24

Keyframe extraction● Criterion: visual similarity-based keyframe● Graph-based approach:

25Similarity GraphAdjacency Matrix

Random walk● One pedestrian moving along the graph● The most visited the most representative

26

Minimum distance● Adjacency matrix approach● The minimum distance the most representative

27

Example of summary

28

Example of summary

29

Contents● Collaboration● Motivation and goals● State of the art● Methodology● Evaluation

○ Database○ Clustering○ Keyframe extraction

● Conclusions and future work30

Evaluation: Database● 5 days● 3 users● 4005 images● Groundtruth available

31

Talavera, E., Dimiccoli, M., Bolaños, M., Aghaei, M., & Radeva, P. (2015). “R-Clustering for Egocentric Video Segmentation”. In 7th Iberian Conference on Pattern Recognition and Image Analysis (ACCEPTED).

Contents● Collaboration● Motivation and goals● State of the art● Methodology● Evaluation

○ Database○ Clustering

■ Jaccard index■ Linkage effect■ Relabelling effect

○ Keyframe extraction● Conclusions and future work 32

Evaluation: Clustering

● Jaccard index:

33

Linkage effect

34

Relabelling effect

35

Contents● Collaboration● Motivation and goals● State of the art● Methodology● Evaluation

○ Database○ Clustering○ Keyframe extraction

■ Blind taste test■ Representative quality of keyframe■ Summary validations

● Conclusions and future work 36

Evaluation: keyframe extraction● User Surveys:

○ Representative quality of keyframe

○ Quality of summary

37

● Methodology: Blind taste test

38Lu and Grauman, ”Story-driven summarization for egocentric video” (CVPR 2013)

Figure: brandchannel.com

Blind taste test: quality of keyframe

39

40

Representative quality of keyframe

41

Do you think that the image of the left/center/right can represent the event?

Example of multi-event segmentation

42

Representative quality of keyframe

43

What image is more representative of the event, in your opinion?

Blind taste test: quality of summary

44

Summary validations

45

Can this set of images represent the complete day?

Summary validations

46

Which summary is the best, in your opinion?

Contents● Collaboration● Motivation and goals● State of the art● Methodology● Evaluation● Conclusions and future work

47

Conclusions and future work● New methodology taking into account visual and

temporal information

● Keyframe extraction through graph-based approaches

48

Conclusions and future work● 0.53 Jaccard index of segmentation

● 88-86% user acceptance with our summaries

● 58% users choose our summaries as best option

49

Conclusions and future work● Temporal information causes important improvements

● First method of summary extraction for high temporal resolution sets

50

Conclusions and future work

● Apply object detection

● Different criteria of representativity

● Clinical application of this work

51

Conclusions and future work

52

Planned submission:

March 30, 2015

Thanks for your attention!

53