Hybrid Cloud Architecture: How to Streamline Hybrid Cloud Migration
Hybrid system architecture overview
Click here to load reader
-
Upload
jesse-wang -
Category
Technology
-
view
1.787 -
download
0
Transcript of Hybrid system architecture overview
![Page 1: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/1.jpg)
Overview of Hybrid Architecture in Project Halo Jesse Wang, Peter ClarkMarch 18, 2013
![Page 2: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/2.jpg)
2
Status of Hybrid ArchitectureGoals, Modularity, Dispatcher, Evaluation
![Page 3: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/3.jpg)
3
Hybrid System Near Term Goals
• Setup the infrastructure to communicate with existing reasoners
• Reliably dispatch questions and collect answers
• Create related tools and resourceso Question generation/selection, answer
evaluation, report analysis, etc.
• Experiment ways to choose the answers from available reasoners – as hybrid solver
AURA
CYC
TEQA
Dispatcher
AURA
CYC
TEQA
![Page 4: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/4.jpg)
4
Focus Areas of Hybrid Framework (until mid 2013)
• Loose coupling, high cohesion, data exchange protocols
Modularity
• Send requests and handle the responses
Dispatching
• Ability to get ratings on answers, and report results
Evaluation
![Page 5: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/5.jpg)
5
DirectQA
AURA
CYC TEQA
IR?SQDB
Retrieval
AURA
SQs
CYC SQs
TEQA
SQsHybri
d SQs
EVALUATIONReport
Hybrid System Core Components
SQs: suggested questions SQA: QA with suggested questions TEQA: Textual Entailment QAIR: Information Retrieval
Yellow Outline: New or Updated
Filtered Set of
Questions
In Campbe
ll
Chapt 7
Find-A-Value
![Page 6: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/6.jpg)
6
Infrastructure: Dispatchers
Dispatcher
AURA
CYC TEQA
IR
Live Single QA
Suggested QA Batch QA
![Page 7: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/7.jpg)
7
Dispatcher Features
• Asynchronous batch mode and single/experiment mode
• Parallel dispatching to reasonerso Very functional UI: Live progress indicator, view question file, logso Exception and error handling
• Retry question when server is busy
• Batch service can continue to finish even if the client dieso Cancel/stop the batch process also available
• Input and output support both XML and CSV/TSV formatso Pipeline support: accept Question-Selector input
• Configurable dispatchers, select reasonerso Collect answers and compute basic statistics
![Page 8: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/8.jpg)
8
Question-Answering via Suggested Questions
• Similar features as Live/Direct QA
• Aggregate suggested questions’ answers as a solver
• Unique features:o Interactively browse suggested questions databaseo Filter on certain facetso Using Q/A concepts, question types, etc. to improve relevanceo Automatic comparison of filtered and non-filtered results by
chapters
![Page 9: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/9.jpg)
9
Question and Answer Handling
• Handling and parsing reasoner’s returned resultso Customized programming
• Information on execution: details and summary
• Report generationo Automatic evaluation
• Question Selectoro Support multiple facets/filterso Question bankso Dynamic UI to pick questionso Hidden tags support
![Page 10: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/10.jpg)
10
Automatic Evaluation: Status as of 2013.3
• Automatic result evaluation features• Web UI/service to use• Algorithms to score exact and variable answers
– brevity/clarity– relevance: correctness + completeness– overall score
• Generate reports – Summary & details– Graph plot
• Improving evaluation result accuracy • Using: basic text processing tricks (stop words, stemming, trigram
similarity, etc.), location of answer, length of answer, bio concepts, counts of concepts, chapters referred, question types, answer type
• Experiments and analysis (several rounds, W.I.P.)0
20
40
60
80
100
120
User overall AutoEval Overall
![Page 11: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/11.jpg)
11
Hybrid PerformanceHow we evaluate and how can improve overall system performance
![Page 12: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/12.jpg)
12
Caveats: Question Generation and Selection
• Generated by a small group of SMEs (senior biology students)
• In natural language, without textbook (only syllabus)
![Page 13: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/13.jpg)
13
Question Set Facets
04
5
6
7
89
10
11
12
Chapter Distribution
EV
FIND-A-VALUE46%
IS-IT-TRUE-THAT9%
HAVE-RELATIONSHIP8%
HOW7%
PROPERTY6%
WHY5%
HOW-MANY5%
WHERE5%
WHAT-DOES-X-DO3%
WHAT-IS-A3%
HAVE-SIMILARITIES2%
X-OR-Y2%FUNCTION-OF-X
1%HAVE-DIFFERENCES
1%
Question Types
![Page 14: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/14.jpg)
14
Caveat: Evaluation Criteria
• We provided a clear guideline, but still subjectiveo A(4.0) = correct, complete answer, no major weaknesso B(3.0) = correct, complete answer with small cosmetic issueso C(2.0) = partially correct or complete answers, with some big issueso D(1.0) = somewhat relevant answer or information, or poor presentationo F(0.0) = wrong or irrelevant, conflicting or hard-to-locate answers
• Only 3 users to rate the answers, under tight timeline
7 15 230
0.51
1.52
2.53
User Preferences
AuraCycText QA
![Page 15: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/15.jpg)
15
Evaluation ExampleQ: What is the maximum number of different atoms a carbon atom can bind at once?
![Page 16: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/16.jpg)
16
More Evaluation Samples (Snapshot)
![Page 17: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/17.jpg)
17
0.00 0.33 0.67 1.00 1.33 1.67 2.00 2.33 2.67 3.00 3.33 3.67 4.000
20
40
60
80
100
120
140
160
Answer Counts Over Rating
Aura
Cyc
Text QA
Reasoner Quality Overview
![Page 18: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/18.jpg)
18
Performance Number
Precision Recall F10.000
0.100
0.200
0.300
0.400
0.500
0.600
Reasoner Performance on All Ratings (0..4)
AuraCycText QA
Precision Recall F10.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
0.400
Reasoner Performance on "Good" (>= 3.0)
Answers
AuraCycText QA
![Page 19: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/19.jpg)
19
Answers Over Question Types
FIND-A-VALUE
HOW
HOW-MANY
PROPERTY
WHAT-DOES-X-DO
WHAT-IS-A
X-OR-Y
IS-IT-TRUE-THAT
HAVE-DIFFERENCES
HAVE-SIMILARITIES
HAVE-RELATIONSHIP
0.000.501.001.502.002.503.003.504.00
Answer Overall Rating
Text QACycAura
FIND-A-VALUE
HOW
HOW-MANY
PROPERTY
WHAT-DOES-X-DO
WHAT-IS-A
X-OR-Y
IS-IT-TRUE-THAT
HAVE-DIFFERENCES
HAVE-SIMILARITIES
HAVE-RELATIONSHIP
0 2 4 6 8 10 12 14 16 18 20
36
Count of Answered Questions
Text QACycAura
![Page 20: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/20.jpg)
20
Answer Distribution Over Chapters
Aura
AuraAuraAura
Aura
Aura
AuraAura
Cyc
Cyc
Cyc
CycCyc
Cyc
Cyc
Cyc
Text QA
Text QA
Text QA
Text QA
Text QAText QA
Text QA
Text QA
Text QA
Text QA
0 4
5 6
7 8
9 10
11 12
0 4 5 6 7 8 9 10 11 120.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00Answer Quality Over Chapters
Aura
Cyc
Text QA
![Page 21: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/21.jpg)
21
Answers on Questions with E/V Answer Type
Aura Cyc Text QA Average0.000.501.001.502.002.503.00
Exact/Variou Answer Quality
EV
Aura Cyc Text QA0
1020304050
5 5
45
25
13
40
Exact/Various Answer Count
EV
![Page 22: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/22.jpg)
22
Improve Performance: Hybrid Solver – Combine!• Random selector (dumbest, baseline)
o Total question answered correctly should beat the best solver
• Priority selector (less dumb)o Pick reasoner following a good order (e.g. Aura > Cyc > Text QA) *o Expected performance: better than best individual
• Trained selector: Feature and rule-based selector (smarter)o Decision-Tree (CTree…) learning over Q-Type, Chapter, …o Expected performance: slightly better than above
• Theoretical best selector: MAX – the upper limit (smartest)o Suppose we can always pick the best performing reasoner
![Page 23: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/23.jpg)
24
Performance (F1) with Hybrid Solvers
Aura Cyc Text QA Random Priority D-Tree Max0.000
0.050
0.100
0.150
0.200
0.250
0.300
Performance of Solvers on Good Answers (Good: Rating >= 3.0)
F1
![Page 24: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/24.jpg)
25
Conclusion
• Each reasoner has its own strength and weaknesso Some aspects not handled well by AURA & CYCo Low hanging: IS-IT-TRUE-THAT for all, WHAT-IS-A for CYC, …
• Aggregated performance easily beats the best individual (Text QA)o Random solver does a good job (F1: mean=0.609): F1
MAX –
F1random
~ 2.5%
• Little room for better performance via answer selectiono F1
MAX – F1
D-Tree ~ 0.5%
o Better focus on MORE and/or BETTER solvers
![Page 25: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/25.jpg)
26
Future and Discussions
![Page 26: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/26.jpg)
27
Near Future Plans
• Include SQDB-based answers as a “Solver”o Help alleviate question interpretation problems by reasoners
• Include Information Retrieval-based answers as a “Solver”o Help understand the extra power reasoners can have over search
• Improvement evaluation mechanism
• Extract more features from questions and answers to enable a better solver, and see how close we can get to the upper limit (MAX)
• Improve question selector to support multiple sources and automatic update/merge of question metadata
• Find ways to handle question bank evolution
![Page 27: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/27.jpg)
28
Get More, Better Reasoners
• Extract and use more features to select best answers• Evidence collection and weighing
Machine learning, Evidence combination
• Easier to explore individual results and diagnose failures
• Support to tune and optimize performance over target question-answer datasets
Analytics & tuning
• Support shared data, shared answers• Subgoaling• Allow reasoners to call each other for subgoals
Inter-solver communication
Further Technical Directions (2013.6+)
Open Data
Open Services
Open Environment
![Page 28: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/28.jpg)
29
Open *Data*
• Clear Semantics, Common Format (standard), Easy to Access, Persistent (available)
Requirements
• Questions bank, training sets, knowledge base, protocol for intermediate and final data exchange
Data Sources
• Design and implement protocols and services for data I/O
Open Data Access Layer
![Page 29: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/29.jpg)
30
Open *Services*
• Pure machine/algorithms based• Human-computation (social, crowd sourcing)
Two Categories
• Communicate with open data, generate meta data, • More reliable, scalable, reusable
Requirements
• Convert raw, noisy, inaccurate data refined, structured, useful
Goal: Process and refine data
![Page 30: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/30.jpg)
31
Open *Environment*
• AI development environment to facilitate collaboration, efficiency and scalability
Definition
• like MMPOG, each “player” gets credits: contribution, resource consumption; interests, loans; ratings…
Operation
• self-organized projects, growth potential, encourage collaboration, grand prize
Opportunities
![Page 31: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/31.jpg)
32
Thank You!For having the opportunity for Q&A
Backup slides next
![Page 32: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/32.jpg)
33
IBM Watson’s “DeepQA” Hybrid Architecture
![Page 33: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/33.jpg)
34
DeepQA Answer Merging And Ranking Module
![Page 34: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/34.jpg)
35
Wolfram Alpha Hybrid Architecture
• Data Curation
• Computation
• Linguistic components
• Presentation
![Page 35: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/35.jpg)
36
![Page 36: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/36.jpg)
37
![Page 37: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/37.jpg)
38
Answer Distribution (Density)
0.00 0.33 0.67 1.00 1.33 1.67 2.00 2.33 2.67 3.00 3.33 3.67 4.000
2
4
6
8
10
12
14
16
Answer Distribution
Text QACycAura
Average User Rating
Coun
t of
Ans
wer
s
![Page 38: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/38.jpg)
39
Data Table for Answer Quality Distribution
![Page 39: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/39.jpg)
40
Work Performed
• Created web-based dispatcher infrastructureo For both Live Direct QA and Live Suggested Questionso Batch mode to handle larger amount
• Built a web UI for UW student to rate answers of questions (HEF)o Coherent UI, duplicate removal, queued tasks
• Established automatic ways for result evaluation and comparison
• Applied first versions of file exchange format and protocols
• Employed initial file and data exchange formats and protocols• Setup faceted browsing and search (retrieval) UI
o And web services for 3rd party consumption
• Carried out many rounds of relevance studies and analysis
![Page 40: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/40.jpg)
41
First Evaluation via Halo Evaluation Framework• We sent individual QA result set to UW students for evaluation
• First round hybrid system evaluation:o Cyc SQA: 9 best (3 ties), 14 good, 15 / 60 answeredo Aura QA: 1 best, 9 good, 14/60 answered; o Aura SQA: 4 best (3 ties), 7 good, 8/60 answeredo Text QA: 27 best, 29 good; SQA: 3 best, 5 good, 7/60 answeredo Best scenario: 41/60 answered o Note: Cyc Live was not included
o * SQA (Answering via suggested questions)
![Page 41: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/41.jpg)
42
Ask a question Waiting for answers
Answers returned?
Live Direct QA Dispatcher ServiceWhat does ribosome make?
![Page 42: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/42.jpg)
43
Live Suggested QA Dispatcher Service
![Page 43: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/43.jpg)
44
Batch QA Dispatcher Service
Result automatically downloaded once finished
![Page 44: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/44.jpg)
45
Live solver Service Dispatchers
![Page 45: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/45.jpg)
46
Direct Live QA: What does ribosome make?
![Page 46: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/46.jpg)
47
Direct Live QA: What does ribosome make?
![Page 47: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/47.jpg)
48
Suggested Questions Dispatcher
![Page 48: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/48.jpg)
49
Results for Suggested Question Dispatcher
![Page 49: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/49.jpg)
50
Batch M
ode QA D
ispatcher
![Page 50: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/50.jpg)
51
Batch QA Progress Bar
Result automatically downloaded once finished
![Page 51: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/51.jpg)
52
Suggested questions database browser
![Page 52: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/52.jpg)
53
Faceted Search on Suggested Questions
![Page 53: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/53.jpg)
54
Tuning the Suggested Question RecommendationAccomplished• Indexed suggested questions
database– Concept, question, answers
• Created a web service for upload new set of suggested questions
• Extracted chapter information from answer text (TEXT)
• Analyzed question types– Pattern-based
• Experimented with some basic retrieval criteria
Not Yet Implemented• Parsing the questions• More experiment
(heuristics) on retrieval/ranking criteria– manual
• Get SME generate training data to evaluate– Automatic
• More feature extraction
![Page 54: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/54.jpg)
55
Parsing, Indexing and Ranking
In-place• New local concept
extraction service• Concept extracted and in
index• Both sentences and
paragraphs are in index • Basic sentence type
identified• Chapter and section
information in• Several ways of ranking
evaluated
NYI• More sentence features
– Content type: Questions, figures, header, regular, review…
– Previous and next concepts– Count of concepts– Clauses – Universal truth– Relevance or not
• Question parsing• More refining on ranking• Learning to Rank ??
![Page 55: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/55.jpg)
56
Browse Hybrid system
![Page 56: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/56.jpg)
57
WIP: Ranking Experiments (Ablation Study)Features Only
(Easy)Without(Easy)
Only (Hard)
W/O (Hard)
Sentence Text 139/201 31/146
Sentence Concept 79/201 13/146
Prev/Next Sentence Concept
- -
Locality info (Chapter, etc.)
- -
Stopword list - -
Stemming comparison
- -
Other features (type…)
- -
Weighting (variations)
![Page 57: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/57.jpg)
58
Automatic Evaluation of IR Results
• Inexpensive, consistent results for tuningo Always using human judgments would be expensive and
somehow inconsistent
• Quick turnover
• With both “easy” and “difficult” question-answer sets
• Validated by UW students to be trustworthyo 95% accuracy on average with threshold
![Page 58: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/58.jpg)
59
First UW Students’ Evaluation on AutoEval
• Notations:o 0 = right on. 100% is right, 0% is wrong.o -1 = false positive. It means we gave it a high score (>50%), but
the retrieved text does NOT contain or imply answero +1 = false negative. It means we gave it a low score (<50%), but
the retrieved text actually DOES contain or imply the answer
• We gave each of 4 studentso 15 questions, 15*5=75 sentences and scores to ranko 5 of the questions are the same, 10 are unique to each studento 23/45 questions from “hard” set, 22/45 from “easy” set
![Page 59: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/59.jpg)
60
Results: Auto-Evaluation Validity Verification
12
34
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Threshold at 50%
Threshold at 80%
Threshold at 50%Threshold at 80%
![Page 60: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/60.jpg)
61
The “Easy” QA set *
• Task: automatic evaluate if retrieved sentences contain the answer
• Scoring: Max score, Mean Average Precision (MAP)
• Result using Max (with threshold at 80%):o 193 regular questions and 8 yes/no questions (via concepts
overlap)• Only with sentence text: 139 (69.2%)• Peter’s test set: 149 (74.1%)• Peter’s more refined: 158 (78.6%)• (Lower) Upper bound for IR: 170 (84.2%)• Jesse’s best: ?? * The evaluation is for IR portion ONLY, no answer pinpointing
![Page 61: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/61.jpg)
62
“Easy” QA Set Auto-Evaluation
Q text Only Vulcan Basic Vulcan Refined BaseIR Current Upper Bound
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Result
Result
![Page 62: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/62.jpg)
64
Best Upper Bound for Hard Set as of Today
With weighting on Answer Text, Answer Concepts, Question Text, Question Concepts, matching over Sentence Text, Concepts, and Concepts from Previous and Next Sentences, and sentence type…Comparison with keyword overlap, concept overlap, stopwords removal and smart stemming techniques…
56/146=38.4%
![Page 63: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/63.jpg)
66
Sharing the Data and Knowledge
• Information We Want, and each solver may also want
• Everyone’s result
• Everyone’s confidence on results
• Everyone’s supporting evidenceo From textbook sentences, reviews, homework section, figures…o From related web material, e.g. biology WikiPediao From common world knowledge, ParaPara, WordNet, …
• Training data – for offline use
![Page 64: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/64.jpg)
67
More Timeline Details for First Integration
We are in control• AURA
– Now• Text
– before 12/7• Vulcan IR Baseline
– before 12/15• Initial Hybrid System Output
– Before 12/21– Without unified data format– With limited (possibly
outdated) suggested questions
Partners• Cyc
– ? Hopefully before EOY 2012
• JHU– ?? Hopefully before EOY
2012• ReVerb
– ??? EOM January 2013
![Page 65: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/65.jpg)
68
Rounds of ImprovementsAnalysis (evaluation)• E
valuation with humans
• With each solver + hybrid system
![Page 66: Hybrid system architecture overview](https://reader038.fdocuments.us/reader038/viewer/2022102322/5483b7bdb4af9f42278b4782/html5/thumbnails/66.jpg)
69
OpenHalo
Vulcan Hybrid System
CYC QA
SILK QA
Other QA
TEQA
AURA
Data Service Collaboration