RecSys 2016 Talk: Feature Selection For Human Recommenders

Proprietary and confidential


Human Computation At Stitch Fix


Heavy and repetitive computation

Large-scale working memory

Large-scale long-term memory

Context sensitivity/nuance

Aesthetic judgements

Relationship building

Novel inferences

Unstructured data


Processes information to make recommendations

Can specify internal mechanisms

Can specify the data being used

Recommendations improve with better features (data)

Needs to be trained and tuned

Comes with internal mechanisms

Can consider the entire world


Determine what they’re processing

Determine what they should be

processing

Change/shape what they’re processing


Determine what they’re processing

Determine what they should be

processing

Change/shape what they’re processing

Make more recommendations

Deliver those recommendationsReceive feedback


1: Determining what they’re processing


If someone isn’t attending to something, but you’re showing it anyways you might

■ Make your worker less efficient (slower)■ Fatigue them (unnecessary filtering)■ Lose opportunities for including something more useful

Figure out what your human workers are attending to while they make their recommendations

If they aren’t attending to a feature, then they’re not making recommendations off of it


Exploration


‘Online’ Observations

What you get

○ Ability to reduce the hypothesis space

○ Higher granularity observations ○ Time-dependent observations

(when is something considered)


Mouse Tracking

Cheap measure of attention

Non-invasive

Easy widespread deployment


Eye Tracking


[Visual] search patterns lie somewhere between random and systematic…. humans will attempt a more systematic search, but will still suffer from imperfect memory.

(Nickles et al., 2003)


Eye Tracking

Resistant to strategy

Deterministic

Higher accuracy


AREA OF INTEREST (AOI)

Eye Tracking

Resistant to strategy

Deterministic

Higher accuracy


Features You Want To Select!


2: Determining what they should be

processing


You’re interested in overall performance and can optimize for whatever is most important to you

■ True hits, false positives, false negatives■ Processing time

Given the features that they’re using, which ones produce the best recommendations?


The Logic:○ Workers may vary in what features they use○ Look for correlations between attention to features and positive

metrics

Allows you to learn the optimal features amongst your current candidates


Feature Drop Out Studies


Feature Drop Out Studies A/B



Logic

Show a feature to one cell, and remove it for another

If a positive difference in performance is observed, then that feature promotes better outcomes



Optimal Conditions

A highly controlled “offline” environment

○ Allows for true participant randomization

○ Allows for repeated measures○ Allows for high “internal validity”


Task-relevantbackground information(optional)

Ability to provide a response - track accuracy, RT, confidence, etc.

Trial-specific stimuli - use historic data with known outcomes


Correct ~ Condition + (1|participant_id)

Condition differences

Feature promotes better recommendations

Feature either isn’t considered or makes no difference to recommendations if it is

No condition differences


Further Use Of ‘Online’

Observations

What you get

○ Ability to determine whether there are certain times at which certain features are beneficial

○ Ability to figure out how information is searched for


-Status: loved-Department: top-Color: purple

-Status: loved-Department: dress-Color: green

-Status: hated-Department: pants-Color: orange

-Status: ...-Department: ...-Color: ...

Start with a study to determine correlations


Multiple metrics possible

■ Overall trajectories (http://www.eyetracking-r.com/)■ Saccade patterns■ Fixation times and locations

Correlate with success

http://www.eyetracking-r.com/


correct ~ fixated_on_loves + fixated_on_color_matches + … + (1|participant_id) …

Factors predict success

Attention to features may promote better recommendations

Attention to features may make no difference to recommendations

Factors don’t predict success


correct ~ condition + … + (1|participant_id)

Follow up with a full experiment to determine whether the behavior

actually causes better recommendations

Manipulation congruent with ‘positive’ behaviors


3: Shaping What They’re

Processing


Controlled Lab Study Full A/B Test


Stitch Fix’s “Styling Lab”

Full A/B Test in the live styling

environment


Behavior Shaping : Humans :: Tuning : Computers Algorithms

Can be “in the moment”

● UX Changes● Directed Attention

Can be more sustained

● Training


Change how the information is displayed - exploit human perception (consult UX)


Testing

● Create questions relevant to what you want to train

● Have participants complete them

● Use IRT to determine question difficulty

Training

● Order questions by difficulty

● Have those being trained complete them in that order

● Given feedback on performance along the way

● Reinforce key concepts

Experimental Approach!


This approach is grounded in Cognitive research!

Progressive Alignment prescribes giving people tasks that they’re more likely to succeed at, then progressively making those tasks harder

.02

.08


Processes information to make recommendations

Can specify internal mechanisms

Can specify the data being used

Recommendations improve with better features (data)

Needs to be trained and tuned

Comes with internal mechanisms

Can consider the entire world


Questions?

RecSys 2016 Talk: Feature Selection For Human Recommenders

Technology

Transcript of RecSys 2016 Talk: Feature Selection For Human Recommenders