What Questions Are Worth Answering?

What Questions are Worth Answering?

Ehren ReillySr. Product Manager, Content,

Ask.comSentiment Analysis Innovation

SummitSan Francisco, CA

April 25, 2013

Overview

• Our challenge: What queries deserve an editorial answer?

• Our approach to cost-effectively figuring this out

• Advantages of our approach

• Details of our approach

Our Challenge

• Ask.com Snapshot:• Q&A service combining the power of search,

quality editorial content and content from the web

• Top 10 US internet site (according to comScore)• 100 million unique users globally; 70 million

unique users in the US

• Search Q&A• When you come to Ask.com and ask a question,

we give you the answer to your question.

• Ask.com editors create answers to questions that are asked frequently, based on search query data

• Problem: Not every query is suitable for evergreen, static editorial answers

Our Challenge

What type of information does the query deserve?

• Entities & services (people, things, websites, products, media, resources)o General web search, shopping search, Wikipedia

data, tools, applications

• Dynamic data and frequently-changing facts (e.g., the weather)o Data partnerships

• Static, evergreen information, suitable for editorial expert answerso Writers, editors, crowd labor, etc. $$

• Extremely detailed/technical answer which needs long article by a true experto Writers, editors. $$$$

Our Challenge

Not answer requests

Wants dynamic data answer

Wants evergreen expert answer

• Facebook login• Barack Obama• Tickets Seattle to

Miami• Olympus Has Fallen• Sonicare

HX6711/02• Selena Gomez

photos• Philippines map• German Shepherds• Chichen Itza

timelapse• Salary calculator

• What time is it in Bangkok

• Dollars to pounds• SF Giants score• Weather in

Cleveland• What’s my IP

address• Kim Kardashian

pregnant?• NBA assists leader• Where is Justin

Bieber?• Oldest living person• Gay marriage states

• When was the Ming Dynasty

• Tom Cruise baby name• How long to bake

chicken• What is Renaissance

art• Highest alcohol beer• Head gasket repair

cost• Abraham Lincoln’s wife• Parachute material• Most reliable

dishwashers• How to remove hair

dye

What type of information does the query want?

Our Challenge

Editorial Answerability Spectrum

NavigationalDynamic

FactsEntitiesShopping

Evergreen Facts

How do we pick out these answerable evergreen fact queries?

• Let the editors do it themselves?o Valuable editorial time wasted considering

obvious stuffo For crowd editorial labor, conflict of interest

“OK” bias

• Crowd labor vetting?o Hard to communicate tasko Still very costly

• Template-based filters?o Coverage is too lowo Lots of work to develop these

• Machine learning?o Very fuzzy problemo Target set is a small segment of huge search

spaceo Hard to achieve high accuracy

Our Hybrid Approach

1. Filter out the obvious stuff (e.g., “Facebook.com”, “What time is it”, “What does ‘looking a gift horse in the mouth’ mean?”)

2. Dedicated classifiers to filter out specific types of non-suitable queries• Duplicates & near-duplicates• Navigational• Adult / profane / creepy • Temporal / dynamic / timely• Shopping / product search• Wiki / entity exact match

3. Build machine learning “answerability” model for the tricky remaining cases

4. Where the model returns low confidence, send those queries to crowd labor for classification

Advantages to This Approach

Evergreen Facts


Evergreen Facts

Don’t Send to Editorial


Evergreen Facts

Don’t Send to Editorial

Requires Human Review


• Filtering and partial automation first makes human review much less costlyo Tasks requiring human scoring reduced by 97%

• Domain of ML model is narrower than entire query mix, which improves accuracy

• Making the model better over timeo Human rating data becomes training data for

algorithmo Gradually, algorithm gets better, you need fewer

human ratings

Human Rater Biases

• Two very different tasks:o Look for attribute X, which occurs in 1% of data.o Look for attribute X, which occurs in 50% of data.

• The harder you have to look for instances of X, the more things start to look like X.o Your sensitivity increases. You get trigger-happy.

Human Rater Biases

Thought experiment:“Listen for any naughty words or phrases” Corpus 1: Nationally televised sports color

commentary Corpus 2: Gangster rap music

• Some words sound bad in the nationally televised sports context, but wouldn’t in the gangster rap context.

• Cognitive psychologists call this the Contrast Effect.

Human Rater Biases

• We gave two sets of crowdsource workers (same agency, same pay rate) the same data, mixed in with two different surrounding data setso Group A: Raw query fileo Group B: Filtered with heuristics and

templates first

• Of the queries that group A thought were answerable, Group B only though 64% of those were answerable

• Queries where the two groups disagreed where overwhelmingly false positives by Group A, rather than false negatives from Group B:• how you spell a word• how much does a book of stamps cost• is randy fenoli married• when does the alabama football game start• where to donate old magazines

By removing the noise, we improved the performance of our crowdsourcers.

What Crowdsource Writers Will and Won’t Do for You

• Don’t rely on crowdsource workers to self-select which tasks are viableo “Only answer the answerable queries” (and we

only pay you for what you answer)o Writers biased towards everything being

answerable

• Exception: If the task is too big, they are happy to flag thoseo How to repair a transmissiono History of Chinao US senators all timeo How does organic chemistry work

Easy Filters: Dynamic

Easy Filters: Dumb

What to Include in Training Data

• Some question patterns are almost universally answerable questionso Who invented [NP]?o Where was [person] born?o How to [cooking verb] a [food item]o What does […] mean?

• We grab these queries using template filters, and don’t need ML

• Should we included these in our training data?

• This is an empirical question. Does the algorithm perform better or worse if the “easy” data is included in the training data?

• In this specific case, the model is more accurate when trained without “easy” data

Conclusions

• If you have a firehose of data, don’t just:o Send it to crowdsourcerso Try to build a ML model

• Instead, figure out what the “easy” cases are, and deal with those separately, using common sense rules

• Put your crowdsourcing and machine learning efforts on just the hard part of the problem

THANK YOU

Ehren [email protected]

@ehrenreilly

What Questions Are Worth Answering?

Technology

Transcript of What Questions Are Worth Answering?