1 Question Answering Techniques and Systems Mihai Surdeanu (TALP) Marius Paşca (Google - Research)...

65
1 Question Answering Techniques and Systems Mihai Surdeanu (TALP) Marius Paşca (Google - Research)* TALP Research Center Dep. Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya [email protected] *The work by Marius Pasca (currently [email protected]) was performed as part of his PhD work at Southern Methodist University in Dallas, Texas.
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    3

Transcript of 1 Question Answering Techniques and Systems Mihai Surdeanu (TALP) Marius Paşca (Google - Research)...

1

Question Answering Techniques and Systems

Mihai Surdeanu (TALP)Marius Paşca (Google - Research)*

TALP Research CenterDep. Llenguatges i Sistemes Informàtics

Universitat Politècnica de [email protected]

*The work by Marius Pasca (currently [email protected]) was performed as part of his PhD work at Southern Methodist University in Dallas, Texas.

2

Overview What is Question Answering? A “traditional” system Other relevant approaches Distributed Question Answering

3

Problem of Question Answering

What is the nationality of Pope John Paul II?… stabilize the country with its help, the Catholic hierarchy stoutly held

out for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the…

When was the San Francisco fire?… were driven over it. After the ceremonial tie was removed - it burned in the

San Francisco fire of 1906 – historians believe an unknown Chinese worker probably drove the last steel spike into a wooden tie. If so, it was only…

Where is the Taj Mahal?… list of more than 360 cities around the world includes the Great Reef in

Australia, the Taj Mahal in India, Chartre’s Cathedral in France, and Serengeti National Park in Tanzania. The four sites Japan has listed include…

4

Problem of Question Answering

What is the nationality of Pope John Paul II?… stabilize the country with its help, the Catholic hierarchy stoutly held

out for pluralism, in large part at the urging of Polish-born Pope John Paul II. When the Pope emphatically defended the Solidarity trade union during a 1987 tour of the…

Natural language question, not keyword

queries

Short text fragment, not URL

list

5Searching for: Etna Where is Naxos? Searching for: Naxos What continent is Taormina in? What is the highest volcano in Europe?

Compare with…

Documentcollection

From the Caledonian Star in the Mediterranean – September 23, 1990 (www.expeditions.com):

On a beautiful early morning the Caledonian Star approaches Naxos, situated on the east coast of Sicily. As we anchored and put the Zodiacs into the sea we enjoyed the great scenery. Under Mount Etna, the highest volcano in Europe, perches the fabulous town of Taormina. This is the goal for our morning.After a short Zodiac ride we embarked our buses with local guides and went up into the hills to reach the town of Taormina.Naxos was the first Greek settlement at Sicily. Soon a harbor was established but the town was later destroyed by invaders.[...]

Searching for: Taormina

6

Document Retrieval Users submit queries corresponding to their

information needs. System returns (voluminous) list of full-length

documents. It is the responsibility of the users to find information of

interest within the returned documents. Open-Domain Question Answering (QA)

Users ask questions in natural language. What is the highest volcano in Europe? System returns list of short answers. … Under Mount Etna, the highest volcano in Europe,

perches the fabulous town … Often more useful for specific information needs.

Beyond Document Retrieval

7

Evaluating QA Systems National Institute of Standards and Technology (NIST) organizes yearly

the Text Retrieval Conference (TREC), which has had a QA track for the past 5 years: from TREC-8 in 1999 to TREC-12 in 2003.

The document set Newswire textual documents from LA Times, San Jose Mercury News, Wall

Street Journal, NY Times etcetera: over 1M documents now. Well-formed lexically, syntactically and semantically (were reviewed by

professional editors). The questions

Hundreds of new questions every year, the total is close to 2000 for all TRECs.

Task Initially extract at most 5 answers: long (250B) and short (50B). Now extract only one exact answer. Several other sub-tasks added later: definition, list, context.

Metrics Mean Reciprocal Rank (MRR): each question assigned the reciprocal rank of

the first correct answer. If correct answer at position k, the score is 1/k.

8

Overview What is Question Answering? A “traditional” system

SMU ranked first at TREC-8 and TREC-9

The foundation of LCC’s PowerAnswer system (http://www.languagecomputer.com)

Other relevant approaches Distributed Question Answering

9

QA Block Architecture

QuestionProcessing

QuestionProcessing

PassageRetrieval

PassageRetrieval

AnswerExtraction

AnswerExtraction

WordNet

NER

Parser

WordNet

NER

ParserDocumentRetrieval

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q A

10

Question Processing Flow

Q Questionparsing

Questionparsing

Construction of thequestion representation

Construction of thequestion representation

Answer type detectionAnswer type detection

Keyword selectionKeyword selection

Questionsemantic representation

AT category

Keywords

11

Lexical Terms Examples Questions approximated by sets of

unrelated words (lexical terms) Similar to bag-of-word IR models

Question (from TREC QA track) Lexical terms

Q002: What was the monetary value of the Nobel Peace Prize in 1989?

monetary, value, Nobel, Peace, Prize

Q003: What does the Peugeot company manufacture?

Peugeot, company, manufacture

Q004: How much did Mercury spend on advertising in 1993?

Mercury, spend, advertising, 1993

Q005: What is the name of the managing director of Apricot Computer?

name, managing, director, Apricot, Computer

12

Question Stems and Answer Type Examples

Question Question stem

Answer type

Q555: What was the name of Titanic’s captain?

What Person

Q654: What U.S. Government agency registers trademarks?

What Organization

Q162: What is the capital of Kosovo? What City

Q661: How much does one ton of cement cost?

How much Quantity

Other question stems: Who, Which, Name, How hot... Other answer types: Country, Number, Product...

Identify the semantic category of expected answers

13

Building the Question Representation

from the question parse tree, bottom-up traversal with a set of propagation rules

Q006: Why did David Koresh ask the FBI for a word processor?Q006: Why did David Koresh ask the FBI for a word processor?

Why did David Koresh ask the FBI for a word processor

WRB VBD NNP NNP VB DT NNP IN DT NN NN

WHADVP NP NP NP

PP VP SQ SBARQ

- assign labels to non-skip leaf nodes- propagate label of head child node, to parent node- link head child node to other children nodes [published in COLING 2000]

14

Building the Question Representation

from the question parse tree, bottom-up traversal with a set of propagation rules

Q006: Why did David Koresh ask the FBI for a word processor?Q006: Why did David Koresh ask the FBI for a word processor?

Why did David Koresh ask the FBI for a word processor

WRB VBD NNP NNP VB DT NNP IN DT NN NN

WHADVP NP NP NP

PP VP SQ SBARQ

askprocessor

askask

FBI processor

REASON

Koresh

Questionrepresentation

DavidKoresh ask FBI

word processorREASON

15

Detecting the Expected Answer Type

In some cases, the question stem is sufficient to indicate the answer type (AT)

Why REASON When DATE

In many cases, the question stem is ambiguous Examples

What was the name of Titanic’s captain ? What U.S. Government agency registers trademarks? What is the capital of Kosovo?

Solution: select additional question concepts (AT words) that help disambiguate the expected answer type

Examples captain agency capital

16

AT Detection Algorithm Select the answer type word from the question

representation. Select the word(s) connected to the question. Some

content-free words are skipped (e.g. “name”). From the previous set select the word with the highest

connectivity in the question representation. Map the AT word in a previously built AT

hierarchy The AT hierarchy is based on WordNet, with some

concepts associated with semantic categories, e.g. “writer” PERSON.

Select the AT(s) from the first hypernym(s) associated with a semantic category.

17

Answer Type Hierarchy

researcheroceanographer

chemist

scientist,man of science

Americanislander,

island-dweller

westerner

inhabitant,dweller, denizen

actor

actress

dancer

performer,performing artist

balletdancertragedian

ERSONP

Whatresearcher

discovered

Hepatitis-B vaccine

What researcher discovered thevaccine against Hepatitis-B?

What is the name of the French oceanographer who owned Calypso?

PERSON What oceanographer

ownedCalypso

name

French

PERSON

18

Evaluation of Answer Type Hierarchy

Controlled variation of the number of WordNet synsets included in answer type hierarchy.

Test on 800 TREC questions.

0% 0.296 3% 0.404 10% 0.437 25% 0.451 50% 0.461

Precision score (50-byte answers)

Hierarchy coverage

The derivation of the answer type is the main source of unrecoverable errors in the QA system

19

Keyword Selection AT indicates what the question is

looking for, but provides insufficient context to locate the answer in very large document collection

Lexical terms (keywords) from the question, possibly expanded with lexical/semantic variations provide the required context

20

Keyword Selection Algorithm

1. Select all non-stop words in quotations2. Select all NNP words in recognized named

entities3. Select all complex nominals with their

adjectival modifiers4. Select all other complex nominals5. Select all nouns with adjectival modifiers6. Select all other nouns7. Select all verbs8. Select the AT word (which was skipped in all

previous steps)

21

Keyword Selection Examples

What researcher discovered the vaccine against Hepatitis-B? Hepatitis-B, vaccine, discover, researcher

What is the name of the French oceanographer who owned Calypso? Calypso, French, own, oceanographer

What U.S. government agency registers trademarks? U.S., government, trademarks, register, agency

What is the capital of Kosovo? Kosovo, capital

22

Passage Retrieval

QuestionProcessing

QuestionProcessing

PassageRetrieval

PassageRetrieval

AnswerExtraction

AnswerExtraction

WordNet

NER

Parser

WordNet

NER

ParserDocumentRetrieval

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q A

23

Passage Retrieval Architecture

Passage ExtractionPassage Extraction

Passage Quality

Keyword Adjustment

Keyword Adjustment

Passage Scoring

Passage Scoring

Passage Ordering

Passage Ordering

Keywords No

Passages

Yes

Documents

DocumentRetrieval

DocumentRetrieval

RankedPassages

24

Passage Extraction Loop Passage Extraction Component

Extracts passages that contain all selected keywords

Passage size dynamic Start position dynamic

Passage quality and keyword adjustment In the first iteration use the first 6 keyword

selection heuristics If the number of passages is lower than a

threshold query is too strict drop a keyword If the number of passages is higher than a

threshold query is too relaxed add a keyword

25

Passage Scoring (1/2) Passages are scored based on keyword windows For example, if a question has a set of keywords: {k1,

k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:

k1 k2 k3k2 k1

Window 1

k1 k2 k3k2 k1

Window 2

k1 k2 k3k2 k1

Window 3

k1 k2 k3k2 k1

Window 4

26

Passage Scoring (2/2) Passage ordering is performed using a radix

sort that involves three scores: largest SameWordSequenceScore, largest DistanceScore, smallest MissingKeywordScore.

SameWordSequenceScore Computes the number of words from the question

that are recognized in the same sequence in the window

DistanceScore The number of words that separate the most distant

keywords in the window MissingKeywordScore

The number of unmatched keywords in the window

27

Answer Extraction

QuestionProcessing

QuestionProcessing

PassageRetrieval

PassageRetrieval

AnswerExtraction

AnswerExtraction

WordNet

NER

Parser

WordNet

NER

ParserDocumentRetrieval

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q A

28

Ranking Candidate Answers

Answer type: Person Text passage: “Among them was Christa

McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”

Best candidate answer: Christa McAuliffe

Q066: Name the first private citizen to fly in space.

Answer ranking scheme ranking features

29

Features for Answer Ranking

relNMW – number of question terms matched in the answer passage

relSP – number of question terms matched in the same phrase as the candidate answer

relSS – number of question terms matched in the same sentence as the candidate answer

relFP – flag set to 1 if the candidate answer is followed by a punctuation sign

relOCTW – number of question terms matched, separated from the candidate answer by at most three words and one comma

relSWS – number of terms occurring in the same order in the answer passage as in the question

relDTW – average distance from candidate answer to question term matchesRobust heuristics that work on unrestricted text!

30

Answer Ranking based on Machine Learning

Relative relevance score computed for each pair of candidates (answer windows)relPAIR = wSWS relSWS + wFP relFP

+ wOCTW relOCTW + wSP relSP + wSS relSS

+ wNMW relNMW + wDTW relDTW + threshold if relPAIR positive, then first candidate from pair is

more relevant Perceptron model used to learn the weights

published in SIGIR 2001 Scores in the 50% MRR for short answers, in

the 60% MRR for long answers

31

Evaluation on the Web

Google

Answer extraction from Google

AltaVista Answer extraction from AltaVista

Precision score

0.29 0.44 0.15 0.37

Questions with a correct answer among top 5 returned answers

0.44 0.57 0.27 0.45

- test on 350 questions from TREC (Q250-Q600)- extract 250-byte answers

32

System Extension:Answer Justification

Experiments with Open-Domain Textual Question Answering. Sanda Harabagiu, Marius Paşca and Steve Maiorano. Answer justification using unnamed

relations extracted from the question representation and the answer representation (constructed through a similar process).

33

System Extension:Definition Questions

Definition questions ask about the definition or description of a concept:

Who is John Galt? What is anorexia nervosa?

Many “information nuggets” are acceptable answers

Who is George W. Bush? … George W. Bush, the 43rd President of the United

States… George W. Bush defeated Democratic incumbent

Ann Richards to become the 46th Governor of the State of Texas…

Scoring Any information nugget is acceptable Precision score over all information nuggets

34

Answer Detection with Pattern Matching

What <be> a <QP> ?Who <be> <QP> ? example: “Who is Zebulon Pike?”<QP>, the <AP><QP> (a <AP>)<AP HumanConcept> <QP> example: “explorer Zebulon Pike”

Question patterns

Answer patterns

Q386: What is anorexia nervosa?

cause of anorexia nervosa, an eating disorder...

Q358: What is a meerkat?

the meerkat, a type of mongoose, thrives in...

Q340: Who is Zebulon Pike?

in 1806, explorer Zebulon Pike sighted the...

For Definition questions

35

Answer Detection with Concept Expansion

Enhancement for Definition questions Identify terms that are semantically related to the

phrase to define WordNet hypernyms (more general concepts)

Question WordNet hypernym

Detected answer candidate

What is a shaman?

{priest, non-Christian priest}

Mathews is the priest or shaman

What is a nematode?

{worm} nematodes, tiny worms in soil

What is anise? {herb, herbaceous plant}

anise, rhubarb and other herbs

[published in AAAI Spring Symposium 2002]

36

Evaluation on Definition Questions

Determine the impact of answer type detection with pattern matching and concept expansion test on the Definition questions from TREC-9

and TREC-10 (approx. 200 questions) extract 50-byte answers

Results precision score: 0.56 questions with a correct answer among top

5 returned answers: 0.67

37

References Marius Paşca. High-Performance, Open-Domain

Question Answering from Large Text Collections, Ph.D. Thesis, Computer Science and Engineering Department, Southern Methodist University, Defended September 2001, Dallas, Texas

Marius Paşca. Open-Domain Question Answering from Large Text Collections, Center for the Study of Language and Information (CSLI Publications, series: Studies in Computational Linguistics), Stanford, California, Distributed by the University of Chicago Press, ISBN (Paperback): 1575864282, ISBN (Cloth): 1575864274. 2003

38

Overview What is Question Answering? A “traditional” system Other relevant approaches

LCC´s PowerAnswer + COGEX IBM’s PIQUANT CMU’s Javelin ISI’s TextMap BBN’s AQUA

Distributed Question Answering

39

PowerAnswer + COGEX (1/2)

Automated reasoning for QA: A Q, using a logic prover. Facilititates both answer validation and answer extraction.

Both question and answer(s) transformed in logic forms. Example:

Heavy selling of Standard & Poor’s 500-stock index futures in Chicago relentlessly beat stocks downwards.

Heavy_JJ(x1) & selling_NN(x1) & of_IN(x1,x6) & Standard_NN(x2) & &_CC(x13,x2,x3) & Poor(x3) & ‘s_POS(x6,x13) & 500-stock_JJ(x6) & index_NN(x4) & futures(x5) & nn_NNC(x6,x4,x5) & in_IN(x1,x8) & Chicago_NNP(x8) & relentlessly_RB(e12) & beat_VB(e12,x1,x9) & stocks_NN(x9) & downward_RB(e12)

40

PowerAnswer + COGEX (2/2)

World knowledge from: WordNet glosses converted to logic forms in the eXtended

WordNet (XWN) project (http://www.utdallas.edu/~moldovan)

Lexical chains game:n#3 HYPERNYM recreation:n#1 HYPONYM

sport:n#1 Argentine:a#1 GLOSS Argentina:n#1

NLP axioms to handle complex NPs, coordinations, appositions, equivalence classes for prepositions etcetera

Named-entity recognizer John Galt HUMAN

A relaxation mechanism is used to iteratively uncouple predicates, remove terms from LFs. The proofs are penalized based on the amount of relaxation involved.

41

IBM’s Piquant Question processing conceptually similar

to SMU, but a series of different strategies (“agents”) available for answer extraction. For each question type, multiple agents might run in parallel.

Reasoning engine and general-purpose ontology from Cyc used as sanity checker.

Answer resolution: remaining answers are normalized and a voting strategy is used to select the “correct” (meaning most redundant) answer.

42

Piquant QA Agents Predictive annotation agent

“Predictive annotation” = the technique of indexing named entities and other NL constructs along with lexical terms. Lemur has built-in support for this now.

General-purpose agent, used for almost all question types. Statistical Query Agent

Derivation from a probabilistic IR model, also developed at IBM. Also general-purpose.

Description Query Generic descriptions: appositions, parenthetical expressions. Applied mostly to definition questions.

Structured Knowledge Agent Answers from WordNet/Cyc. Applied whenever possible.

Pattern-Based Agent Looks for specific syntactic patterns based on the question form. Applied when the answer is expected in a well-structured form.

Dossier Agent For “Who is X?” questions. A dynamic set of factual questions used to learn “information nuggets”

about persons.

43

Pattern-based Agent Motivation: some questions (with or without AT)

indicate that the answer might be in a structured form

What does Knight Rider publish? transitive verb, missing object.

Knight Rider publishes X. Patterns generated:

From a static pattern repository, e.g. birth and death dates recognition.

Dynamically from the question structure. Matching of the expected answer pattern with

the actual answer text is not at word level, but at a higher linguistic level based on full parse trees (see IE lecture).

44

Dossier Agent Addresses “Who is X?” questions. Generates initially a series of generic

questions: When was X born? What was X’s profession?

Future iterations dynamically decided based on the previous answers? If X’s profession is “writer” the next

question is: What did X write? A static ontology of biographical questions

used.

45

CyC Sanity Checker Post-processing component that

Rejects insane answers “How much does a grey wolf weigh?” “300 tons” A grey wold IS-A wolf. Weight of a wolf known in Cyc. Cyc returns: SANE, INSANE, or DON’T KNOW.

Boosts answer confidence when the answer is SANE. Typically called for numerical answer types:

What is the population of Maryland? How much does a grey wolf weigh? How high is Mt. Hood?

46

Answer Resolution Called when multiple agents are applied

for the same question. Distribution of agents: the predictive-annotation and the statistical agent by far the most common.

Each agent provides a canonical answer (e.g. normalized named entity) and a confidence score.

Final confidence for each candidate answer computed using a ML model with SVM.

47

CMU’s Javelin Architecture combines SMU’s and IBM’s

approaches. Question processing close to SMU’s approach. Passage retrieval loop conceptually similar to

SMU’s, but an elegant implementation. Multiple answer strategies similar to IBM’s

system. All of them are based on ML models (K nearest neighbours, decision trees) that use shallow-text features (close to SMU’s).

Answer voting, similar to IBM’s, used to exploit answer redundancy.

48

Javelin’s Retrieval Strategist

Implements passage retrieval, including the passage retrieval loop.

Uses the Inquiry IR system, probably Lemur by now. The retrieval loop uses all keywords in close proximity of each

other initially (stricter than SMU). Subsequent iterations relax the following query terms

Proximity for all question keywords: 20, 100, 250, AND Phrase proximity for phrase operators: less than 3 words or PHRASE Phrase proximity for named entities: less than 3 words or PHRASE Inclusion/exclusion of AT word

Accuracy for TREC-11 queries: how many questions had at least one correct document in the top N documents:

Top 30 docs: 80% Top 60 docs: 85% Top 120 docs: 86%

49

ISI’s TextMap: Pattern-Based QA

Examples Who invented the cotton gin?

<who> invented the cotton gin <who>'s invention of the cotton gin <who> received a patent for the cotton gin

How did Mahatma Gandhi die? Mahatma Gandhi died <how> Mahatma Gandhi drowned <who> assassinated Mahatma Gandhi

Patterns generated from the question form (similar to IBM), learned using a pattern discovery mechanism, or added manually to a pattern repository

The pattern discovery mechanism performs a series of generalizations from annotated examples:

Babe Ruth was born in Baltimore, on February 6, 1895. PERSON was born *g* in DATE

50

TextMap: QA Machine Translation

In machine translation, one collects translations pairs (s, d) and learns a model how to transform the source s into the destination d.

QA is redefined in a similar way: collect question-answer pairs (a, q) and learn a model that computes the probability that a question is generated from the given answer: p(q | parsetree(a)). The correct answer maximizes this probability.

Only the subsets of answer parse trees where the answer lies are used as training (not the whole sentence).

An off-the-shelf machine translation package (Giza) used to train the model.

51

TextMap:Exploiting the Data Redundancy

Additional knowledge resources are used whenever applicable

WordNet glosses What is a meerkat?

www.acronymfinder.com What is ARDA?

Etcetera The “known” answers are then simply searched in the

document collection together with question keywords Google is used for answer redundancy

TREC and Web (through Google) are searched in parallel. Final answer selected using a maximum entropy ML

model. IBM introduced redundancy for QA agents, ISI uses data

redundancy.

52

BBN’s AQUA Factual system converts both question and

answer to a semantic form (close to SMU’s) Machine learning used to measure the similarity

of the two representations. Was ranked best at the TREC definition pilot

organized before TREC-12 Definition system conceptually close to

SMU’s Had pronominal and nominal coreference

resolution Used a (probably) better parser (Charniak) Post-ranking of candidate answers using a tf * idf

model

53

Overview What is Question Answering? A “traditional” system Other relevant approaches Distributed Question Answering

54

Paragraphs

Sequential Q/A Architecture

Question ProcessingQuestion Processing

ParagraphRetrieval

ParagraphRetrieval

ParagraphScoring

ParagraphScoring

ParagraphOrdering

ParagraphOrdering

AnswerProcessing

AnswerProcessing

Question

Keywords

AcceptedParagraphs

Answers

55

Sequential Architecture Analysis

Module timing analysis

Analysis conclusions Performance bottleneck modules have well-

specified resource requirements fit for DLB Iterative tasks fit for partitioning Reduced inter-module communication effective

module migration/partitioning

Module % of

Task Time Iterative Task?

Granularity

QP 1.2% No PR 26.5% Yes Collection PS 2.2% Yes Paragraph PO 0.1% No AP 69.7% Yes Paragraph

56

Inter-Question Parallelism (1)

Q/A Task

QuestionDispatcher

QuestionDispatcher

LoadMonitor

LoadMonitor

Node 1

Q/A Task

QuestionDispatcher

QuestionDispatcher

LoadMonitor

LoadMonitor

Node N

Local Interconnection NetworkLocal Interconnection Network

Internet/DNS

57

Inter-Question Parallelism (2)

Question dispatcher Improves upon the DNS “blind” allocation Allocates a new question to the processor p

best fit for the average question. Processor p minimizes

Recovers from failed questions Load monitor

Updates and broadcasts local load Receives remote load information Detects system configuration changes

)(load)(load)(load pWpWp CPUCPU

QADISKDISK

QAQA

58

Intra-Question Parallelism (1)

ParagraphRetrieval (1)

ParagraphRetrieval (1)

…Question Processing

Question ProcessingQuestio

n

Keywords

ParagraphRetrieval

Dispatcher

ParagraphRetrieval

Dispatcher

ParagraphScoring (1)

ParagraphScoring (1)

ParagraphMerging

ParagraphMerging

Paragraphs

LoadMonitor

LoadMonitor

ParagraphRetrieval (2)

ParagraphRetrieval (2)

ParagraphScoring (2)

ParagraphScoring (2)

ParagraphRetrieval (k)

ParagraphRetrieval (k)

ParagraphScoring (k)

ParagraphScoring (k)

59

Intra-Question Parallelism (2)

ParagraphOrdering

ParagraphOrdering

AcceptedParagraphs

AnswerProcessing (1)

AnswerProcessing (1)

AnswerProcessingDispatcher

AnswerProcessingDispatcher

AnswerMerging

AnswerMerging

UnrankedAnswers

AnswerSorting

AnswerSorting Answer

sAnswer

Processing (2)

AnswerProcessing (2)

AnswerProcessing (n)

AnswerProcessing (n)

Paragraphs

LoadMonitor

LoadMonitor

60

Meta-Scheduling Algorithm

metaScheduler(task, loadFunction, underloadCondition)

select all processors p with underloadCondition(p) true if none selected then select processor p with the smallest

value for loadFunction(p) assign to each selected processor p an weight wp based on

its current load assign to each selected processor p a fraction wp of the

global task

partitioning

migration

61

Migration Example

P1 P2 Pn…

processorstime QPQP

PRPR

PSPS

APAP

QPQP

PRPR

PSPS

APAP

POPOPOPO

62

Partitioning Example

P1 P2 Pn…

processorstime QPQP

PR1PR1

PS1PS1

AP1AP1

PR2PR2

PS2PS2

PRnPRn

PSnPSn

POPO

AP2AP2 APn

APn

63

Inter-Question ParallelismSystem Throughput

DNSINTER

DQA

4

8

12

0

2

4

6

8

10

12

14

Throughput(Questions/Minute)

Protocols

Processors

64

Intra-Question Parallelism

0

20

40

60

80

100

120

140

160

Question Response Time

(Seconds)

1 4 8 12

Processors

QPPR + PSPOAPOverhead

65

End

Gràcies!