Im2013vit
-
Upload
information-sciences-institute -
Category
Technology
-
view
127 -
download
1
description
Transcript of Im2013vit
![Page 1: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/1.jpg)
SKIMMR: Making KnowledgeDiscovery Easier
Vít Novácek ([email protected])
February 8th, 2013 @ DERI meeting
![Page 2: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/2.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
Outline
IntroductionSKIMMR
KB ComputationKB Utilisation
DemoEvaluation
Evaluated FeaturesEvaluation Methodology
Conclusions
1 / 10
![Page 3: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/3.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
Machine-Aided Skim Reading
Traditional (Skim) Readingfull reading – deep insights (slow)skim reading – superficial overview (quicker)
How Can Automation Help?going deep is hardlarge scale shallow processing more feasible
What Kind of Automation?extraction (text and data mining)augmentation (computing more complex content)indexing and queryingpresentation of the results
Related Workprocessing: text mining, graph analysis, distributional semantics, fuzzy IRpresentation: GoPubMed, Textpresso, IVEA, CORAAL, Exhibit, . . .
Image source:http://a-pieceofpaper.blogspot.com
2 / 10
![Page 4: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/4.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
Input/Extraction Pipe-Lines
Text Extractionpreprocessing (tokenization, tagging, shallow parsing)NE recognitionrelation extractionco-occurrence analysis + statistics (PMI, TF/IDF, . . . )
Digesting Linked Datagraph decompositioncluster analysisco-occurrence analysis + statistics (PMI, TF/IDF, . . . )
Extraction Results(s,p,o, r ,w) statementssubject, predicate, object, provenance, weight
Image source:http://atyoursurveys.blogspot.com
3 / 10
![Page 5: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/5.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
Computing the Knowledge Base
Distributional Representationaggregated co-occurrence/relation statementsstatements → tensor representationevery element still linked to its provenancematrix perspectives of the tensor
Augmentationperspectives give rise to emergent patterns like:
semantic similarityconcept clusters and taxonomiesIF-THEN rulesconcept ordering and relative relevance
Image source:www.bystonline.org
4 / 10
![Page 6: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/6.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
Indexing the Knowledge Base
Term IndexT1 T2 . . . Tn
T1 w1,1 w1,2 . . . w1,n
T2 w2,1 w2,2 . . . w2,n...
......
. . ....
Tn wn,1 wn,2 . . . wn,n
wi,j ∈ [0, 1]
Statement IndexS1 S2 . . . Sm
T1 c1,1 c1,2 . . . c1,m
T2 c2,1 c2,2 . . . c2,m...
......
. . ....
Tn cn,1 cn,2 . . . cn,m
ci,j ∈ {0, 1}
Provenance IndexP1 P2 . . . Pq
S1 w1,1 w1,2 . . . w1,q
S2 w2,1 w2,2 . . . w2,q...
......
. . ....
Sm wm,1 wm,2 . . . wm,q
wi,j ∈ [0, 1]
Auxiliary Fulltext Indexuser’s entry pointincreasing robustness“keys”: queriesvalues: term identifiersfairly standard IR:
OKAPI BM25F
Image source:http://teptdataservices.blogspot.com
5 / 10
![Page 7: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/7.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
Querying the Knowledge Base
Initial Result Term Setexample query: ? ↔ Tx AND (? ↔ Ty OR ? ↔ Tz)term index look-up:
Fx = {(T1, wx,1), (T2, wx,2), . . . , (Tn, wx,n)}Fy = {(T1, wy,1), (T2, wy,2), . . . , (Tn, wy,n)}Fz = {(T1, wz,1), (T2, wz,2), . . . , (Tn, wz,n)}
combining atomic results: Fx ∩ (Fy ∪ Fz)
Complete Results
terms: RT = {(T1,wT1 ), (T2,wT
2 ), . . . ,Tn,wTn }, where wT
i arethe weights resulting from the combinationstatements: RS = {(S1,wS
1 ), (S2,wS2 ), . . . , (Sm,wS
m)}, wherewS
i = fν(∑n
j=1 wTj cj,i)
provenances: RP = {(P1,wP1 ), (P2,wP
2 ), . . . , (Pq,wPq )}, where
wPi = fν(
∑mj=1 wS
j wj,i)
Image source:http://nuget.org
6 / 10
![Page 8: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/8.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
Let’s Learn About Some Grim Stuff!
7 / 10
![Page 9: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/9.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
What to Evaluate?
Quality of the Extracted/Computed Content“noise-to-signal” ratiorelevance of results w.r.t. queriesinformation value (obvious vs. enlightening)
User Experienceusability of SKIMMR
generaldomain-specific
performance benefits (over a base-line)
Image source:http://voguepay.com
8 / 10
![Page 10: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/10.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
How to Evaluate?
Quality of the Extracted/Computed Contentidentification (or creation) of a gold standardgeneralised IR measurescommittee-based annotation of the results
User ExperienceSUS surveydomain-specific surveyuser performance analysis (SKIMMR vs. base-line)
Image source:http://www.123rf.com
9 / 10
![Page 11: Im2013vit](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e5b531a28ab35018b45b3/html5/thumbnails/11.jpg)
Introduction SKIMMR Demo Evaluation Conclusions
Conclusions and Future Work
Current Statusmachine-aided skim reading notion coinedbasic theoretical background proposeda prototype implemented (general and biomedical versions)
http://pypi.python.org/pypi/skimmr_gt/0.1-a1http://pypi.python.org/pypi/skimmr_bm/0.1-a1
Next Stepsevaluation (with a gold standard and sample users)dissemination and follow-ups (write-up, proposals)back-end extensions:
more (complex) types of relationsproper APIs (development, web service, . . . )database and/or cloud storage
front-end extensions:smoother transition between the graphscomplex queryingadditional visualisations (trends, focused provenances, . . . )
Image source:http://support.pacifichost.com
10 / 10