WTC-2011 Accelerate - Guesstimates Engine · Guesstimates engine A computational engine for...

8
Guesstimates engine A computational engine for educated guesses and rough estimations Anton Antonov Consultant at Synergis Slide 1 of 21 What are guesstimates? Approximate answers to quantitative queries for which we have incomplete or missing information or data. A guesstimate is considered correct if it is within a factor of ten of the correct answer. Examples -- easy questions What is the average life of a fly? How many golf balls get be fit in a school bus? Slide 3 of 21 Examples -- harder questions How many people can cover the state of Illinois? How many batteries are needed to replace your car engine? Examples -- hard questions How much total extra time would Americans spend driving each year if we lowered the highway speed limit from 65 to 55 mph? Give your answer in lifetimes. How much food (in kg) does a typical human female consume in her lifetime? How does that compare to her mass?

Transcript of WTC-2011 Accelerate - Guesstimates Engine · Guesstimates engine A computational engine for...

Guesstimates engineA computational engine for educated guesses and rough estimationsAnton AntonovConsultant at Synergis

Slide 1 of 21

What are guesstimates?

Approximate answers to quantitative queries for which we have incomplete or missing information or data.

A guesstimate is considered correct if it is within a factor of ten of the correct answer.

Slide 2 of 21

Examples -- easy questions

What is the average life of a fly?

How many golf balls get be fit in a school bus?

Slide 3 of 21

Examples -- harder questions

How many people can cover the state of Illinois?

How many batteries are needed to replace your car engine?

Slide 4 of 21

Examples -- hard questions

How much total extra time would Americans spend driving each year if we lowered the highway speed limit from 65 to 55 mph? Give your answer in lifetimes.

How much food (in kg) does a typical human female consume in her lifetime? How does that compare to her mass?

What is the kinetic energy of the Moon as it orbits the Earth?

Slide 5 of 21

Type and complexity classification

Slide 6 of 21

2 WTC-2011 Accelerate - Guesstimates Engine.nb

Sources of fuzziness

Slide 7 of 21

The general idea

We accept that the queries to the Guesstimate Computational Engine (GCE) are going to be fuzzy and imprecise.

We address fuzziness by multiple answers: the human would interpret them and pick the right one.

Of course, the system would figure out the obvious interpretations, using

1. Natural Language Processing (NLP), and

2. Dimensional Analysis (DA).

Slide 8 of 21

The general idea -- concept relations

A dashed blue arrow means “the source is used to do the target”.

WTC-2011 Accelerate - Guesstimates Engine.nb 3

Slide 9 of 21

The general idea -- using two NLP strategies

A dashed blue arrow means “the sours is used to do the target”. The purple is for equivalent role.

4 WTC-2011 Accelerate - Guesstimates Engine.nb

Slide 10 of 21

High level design

The inter-connections of the Guesstimates Computation Engine components are shown in this diagram:

WTC-2011 Accelerate - Guesstimates Engine.nb 5

Slide 11 of 21

NLP for guesstimating queries 1

The standard approach is to use Natural Language Processing (NLP) of the queries given to the Guesstimate Computation Engine (GCE).

We can use context-free grammar specification for some reasonably big subset of the queries for GCE.

Context-free grammars can be specified using Backus-Naur Form (BNF).

A simple context-free grammar example for NLP follows …

Slide 12 of 21

NLP for guesstimating queries 2

Consider the following (simple and restricted) grammar:

<q1>::="how many" <objects> <verb> <article> <object>"?"

<q2>::="what is" <definition> <preposition> <article> <object> <preposition> <article> <object>"?"

<article>::="a"|“an”|”the”

<object>::="bus"|“earth”|”car”|”human”|...

<objects>::="golf balls"|“pickles”|”apples”|...

<preposition>::=<”around”|”of”|”in”|”on”>

<verb>::="can be put in"|“can cover”|”can be placed on”|...

Slide 13 of 21

Which type of NLP parser?

We need to consider the following question

NLP parsers of what type should be used: top-to-bottom or bottom-up?

Slide 14 of 21

Statistical NLP

The standard approach might be:

(1) too slow;

(2) too hard to develop.

Statistical NLP can be used to make statistical thesaurus for related terms.

After stemming and stop words removal, the possible interpretation equations would be derived between the different clusters of terms.

Slide 15 of 21

6 WTC-2011 Accelerate - Guesstimates Engine.nb

Statistical NLP 2

For the query

“How many golf balls can be fit in a school bus?”

we would know that the words “golf” and “ball” belong to the same set of words in the statistical thesaurus. Same for “school” and “bus”.

We donʼt even need to recognize (and parse, etc.) “how many” and “fit”.

We can just try to do multiple fitting between the physical dimensions of "golf ball" and "school bus".

Slide 16 of 21

The use of dimensional analysis 1

By using dimensional analysis we can derive interpretations of the queries.

Consider the question “How many humans would cover the state of IL?”

The verb “cover” would mean using surface units (m2).

The state of IL has surface related units (land area 143800 km2)

A human has average height, width, girth, etc.

From these data we can derive interpretations of the question:

1. How many human beings would cover the state of IL if the humans are placed in supine position?2. placed standing? 3. placed on their side?

Slide 17 of 21

The use of dimensional analysis 2

The question “How many golf balls would fit in a school bus?”

The verb “fit” would mean using units of volume (m3).

The question would have been easier than the previous one if all school buses had the same dimensions.

We need to determine:

1. the average length of a school bus,

2. and come up with estimates for the for the school busʼ height and width.

We might use a dialog with the user in order to get a length estimate.

The height and width can be estimated from databases with technical and ergonomic requirements for vehicles of that class.(Or, again, use dialogs.)

Slide 18 of 21

Approximations using min and max

Examples:

How many feet is the average bus length?

WTC-2011 Accelerate - Guesstimates Engine.nb 7

We can say 12 feet on average.

To come with an average we need to know the distribution bus lengths.

The average mpg of an American car?

min mpg: 9

max mpg: 60

average mpg: 9 µ 60 º 23

This is close to the government estimates.

Slide 19 of 21

Approximations using min and max

The use of Power law -- life phenomenas tend to follow exponential laws, and many distributions have exponents in them.

So, we are better off using the geometric mean.

If we donʼt know the distribution -- (1) estimate min and max, (2) take min *max as an average.

I would conjecture that when we can use linear approximation for a quantity, then then the mean of that quantity can be found in the databases.

Slide 20 of 21

Summary

Roughly speaking, we can have a computational engine for guesstimates that would return several answers for a given query using:

1. Context-free grammar formulation or statistical thesaurus;

2. Combinatorical rules to produce different interpretations;

3. Linear space representation of the equations corresponding to the interpretations;

4. Approximations dialogs and databases.

Slide 21 of 21

8 WTC-2011 Accelerate - Guesstimates Engine.nb