CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

51
Juho Kim Phu Nguyen Sarah Weir Philip J. Guo Robert C. Miller Krzysztof Z. Gajos Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

description

Millions of learners today use how-to videos to master new skills in a variety of domains. But browsing such videos is often tedious and inefficient because video player interfaces are not optimized for the unique step-by-step structure of such videos. This research aims to improve the learning experience of existing how-to videos with step-by-step annotations. We first performed a formative study to verify that annotations are actually useful to learners. We created ToolScape, an interactive video player that displays step descriptions and intermediate result thumbnails in the video timeline. Learners in our study performed better and gained more self-efficacy using ToolScape versus a traditional video player. To add the needed step annotations to existing how-to videos at scale, we introduce a novel crowdsourcing workflow. It extracts step-by-step structure from an existing video, including step times, descriptions, and before and after images. We introduce the Find-Verify-Expand design pattern for temporal and visual annotation, which applies clustering, text processing, and visual analysis algorithms to merge crowd output. The workflow does not rely on domain-specific customization, works on top of existing videos, and recruits untrained crowd workers. We evaluated the workflow with Mechanical Turk, using 75 cooking, makeup, and Photoshop videos on YouTube. Results show that our workflow can extract steps with a quality comparable to that of trained annotators across all three domains with 77% precision and 81% recall.

Transcript of CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Page 1: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Juho Kim Phu Nguyen

Sarah Weir Philip J. Guo

Robert C. Miller Krzysztof Z. Gajos

Crowdsourcing Step-by-Step

Information Extraction to

Enhance Existing How-to

Videos

Page 2: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

how-to videos

online

Page 3: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos
Page 4: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

learning from how-to videos:

limited by video player interfaces

Page 5: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos
Page 6: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Watching Example

Page 7: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Problem in Watching

It’s difficult to navigate to

specific parts you’re interested in.

Page 8: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Problem in Watching

It’s difficult to navigate to

specific parts you’re interested in.

find

repeat

skip

Page 9: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

How-to Video: Step-by-Step

Nature

Apply

gradient map

Page 10: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Completeness & detail of step-by-step instructions are

integral to task performance.Eiriksdottir and Catrambone, 2011

Proactive & random access, semantic indices in

instructional videos: better task performance and learner

satisfactionZhang et al., 2006

Interactivity can help overcome the difficulties of

perception and comprehension. Stopping, starting and

replaying an animation can allow reinspection.Tversky et al., 2002

Page 11: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos
Page 12: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Design InsightEnable step-by-step navigation with high interactivity

Page 13: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

ToolScape: Step-aware video player

Page 14: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

work in progress

images

parts with no

visual progress

step labels & links

Page 15: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

enhance existing how-to videos with

step-level interactivity & annotation

Page 16: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Research Questions

Does step-by-step navigation help learners?

Preliminary user study

How can we annotate an existing how-to

video with step-by-step information?

Crowdsourcing annotation workflow

Page 17: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Research Questions

Does step-by-step navigation help learners?

Preliminary user study

How can we annotate an existing how-to

video with step-by-step information?

Crowdsourcing annotation workflow

Page 18: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Study: Photoshop Design Tasks

12 novice Photoshop users

manually annotated videos

Page 19: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Baseline ToolScape

Page 20: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

With ToolScape, learners will…

H1. feel more confident about their design skills.

- self-efficacy gain

H2. believe they produced better designs.

- self-rating on designs produced

H3. actually produce better designs.

- external rating on designs produced

Page 21: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

H1. Higher self-efficacy gain with ToolScape– Four 7-Likert scale questions

– Mann-Whitney’s U test (Z=2.06, p<0.05), error bar: standard error

1.4

0 1 2 3 4 5 6 7

ToolScape

Baseline 0.13.8

3.8

Page 22: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

H2. Higher self-rating with ToolScape– One 7-Likert scale question

– Mann-Whitney’s U test (Z=2.70, p<0.01), error bar: standard error

5.3

3.5

0 1 2 3 4 5 6 7

ToolScape

Baseline

Page 23: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

H3. External raters rank ToolScape designs higher.– (Ranking: Lower is better)

– Wilcoxon Signed-rank test (W=317, Z=-2.79, p<0.01, r=0.29) , error bar: standard error

– Krippendorff’s alpha = 0.753

5.7

7.3

0 2 4 6 8 10 12

ToolScape

Baseline

Page 24: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Non-sequentially navigating

videoStep-level navigation: clicked 8.9 times per task

“It is great for skipping straight to relevant

portions of the tutorial.”

“It was also easier to go back to parts I missed.”

Page 25: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Research Questions

Does step-by-step navigation help learners?

Preliminary user study

How can we annotate an existing how-to

video with step-by-step information?

Crowdsourcing annotation workflow

Page 26: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Annotations for Step-Aware Video

Player

• step time

• step label

• before/after results

Page 27: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Design Goals for Annotation

Method• domain-independent

• existing videos

• untrained annotators

Page 28: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Crowdsourcing

Page 29: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Multi-stage crowdsourcing

workflow

When & What are

the steps?

Vote & Improve

Before/After the steps?

FIND VERIFY EXPAND

Page 30: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

When & What are

the steps?

Vote & Improve

Before/After the steps?

FIND VERIFY EXPAND

Input video

Page 31: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

When & What are

the steps?

Vote & Improve

Before/After the steps?

FIND VERIFY EXPAND

Input video

Page 32: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

When & What are

the steps?

Vote & Improve

Before/After the steps?

FIND VERIFY EXPAND

Input video

Page 33: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

When & What are

the steps?

Vote & Improve

Before/After the steps?

FIND VERIFY EXPAND

Input video

Page 34: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

When & What are

the steps?

Vote & Improve

Before/After the steps?

FIND VERIFY EXPAND

Input video

Output timeline

Page 35: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Stage 1. FIND candidate steps

Page 36: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Labeling a step

Page 37: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Time-based Clustering

Page 38: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Stage 2. VERIFY steps by

voting/improving

Page 39: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Quality control for Stage 2

• Majority voting

• Breaking ties

– String matching to combine

“similar enough” labels

– Longer string

“grate three cups of cheese” > “grate cheese”

Page 40: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Stage 3.

EXPAND with

before/after

images

Page 41: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Quality control for Stage 3

• Majority voting

• Breaking ties:

– Pixel diff to combine

“similar enough” frames

– Choose what’s closer to the step

Page 42: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Evaluation

• Generalizable?

75 Photoshop / Cooking / Makeup videos

• Accurate?

precision and recall

against trained annotators’ labels

Page 43: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Across all domains,

~80% precision and recall

Domain Precision Recall

Cooking 0.77 0.84

Makeup 0.74 0.77

Photoshop 0.79 0.79

All 0.77 0.81

Page 44: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Conceptual Level Differences

• “Now apply the bronzer to your face

evenly”

• “Apply the bronzer to the forehead”

• “Apply the bronzer to the cheekbones”

• “Apply the bronzer to the jawline”

Page 45: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Timing is 2.7 seconds off on

average

Ground truth: one step every 17.3 seconds

2.7 seconds

Page 46: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Cost: $1.07 per minute of video

• 111 HITs / video (3 workers / task)

• $2.50 / video (Find + Verify)

• $4.85 / video (Find + Verify + Expand)

• $0.32 / step (time + label + before/after)

Page 47: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Contributions

• Study: increased interactivity improved task performance & self-efficacy

• Crowd video annotation method & Find-Verify-Expand design pattern

• Evaluation: fully extracted 75 existing videos across 3 domains, 80% accuracy

Page 48: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

hierarchical solution structure extraction

Catrambone, R. The subgoal learning model: Creating better examples so that

students can solve novel problems. Journal of Experimental Psychology: General, 127, (1998).

Ongoing Work: Beyond low-level

steps

Page 49: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

hierarchical solution structure extraction

Ongoing Work: Beyond low-level

steps

Learnersourcing: learners as a crowd

• Motivated, qualified

• Feedback loop between learners & system

Page 50: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Future of How-to Video

Learning

What if we had 1000s of

fully annotated videos?

• Flexible learning paths with multiple videos

• Step-level search, recommendation

• Patterns from multiple solutions

Page 51: CHI2014 - Crowdsourcing Step-by-Step Information Extraction to Enhance Existing How-to Videos

Crowdsourcing Step-by-Step Information Extraction

to

Enhance Existing How-to Videos

Juho Kim

MIT CSAIL

[email protected]

juhokim.com

Acknowledgement: This work was supported in part by

Quanta Computer & the Samsung Fellowship.