Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.
Iterative Readability Computation for Domain-Specific Resources
description
Transcript of Iterative Readability Computation for Domain-Specific Resources
![Page 1: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/1.jpg)
Iterative Readability Computation for Domain-Specific Resources
• By Jin Zhao and Min-Yen Kan11/06/2010
![Page 2: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/2.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Domain-Specific Resources
2WING, NUS
Wikipedia page on modular arithmetic
Interactivate page on clocks and modular arithmetic
Domain-specific resources cater for a wide range of audience.
![Page 3: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/3.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Challenge for a Domain-Specific Search Engine
3WING, NUS
How to measure readability for domain-specific resources?
![Page 4: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/4.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Literature Review• Heuristic Readability Measures– Weighted sum of textual feature values
– Examples: Flesch Kincaid Reading Ease:
Dale-Chall:
– Quick and indicative but oversimplifying
4WING, NUS
![Page 5: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/5.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Literature Review• Natural Language Processing and Machine Learning
Approaches– Extract deep text features and construct sophisticated models for
prediction
– Text Features N-gram, height of parse tree, Discourse relations
– Models Language Model, Naïve Bayes, Support Vector Machine
– More accurate but annotated corpus required and ignorant of the domain-specific concepts
5WING, NUS
![Page 6: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/6.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Literature Review• Domain-Specific Readability Measures– Derive information of domain-specific concepts from expert
knowledge sources
– Examples: Wordlist Ontology
– Also improves performance but knowledge sources still expensive and not always available
6WING, NUS
Is it possible to measure readability for domain-specific resources without expensive
corpus/knowledge source?
![Page 7: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/7.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Intuitions• A domain-specific resource is less readable than another if the
former contains more difficult concepts
• A domain-specific concept is more difficult than another if the former appears in less readable resources
• Use an iterative computation algorithm to estimate these two scores from each other
• Example:– Pythagorean theorem vs. ring theory
7WING, NUS
![Page 8: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/8.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Algorithm• Required Input– A collection of domain-specific resources (w/o annotation)– A list of domain-specific concepts
• Graph Construction– Construct a graph representing resources, concepts and
occurrence information
• Score Computation– Initialize and iteratively compute the readability score of domain-
specific resources and the difficulty score of domain-specific concepts
8WING, NUS
![Page 9: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/9.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 209WING, NUS
Graph Construction• Preprocessing– Extraction of occurrence information
• Construction steps– Resource node creation– Concept node creation– Edge creation based on occurrence information
Pythagorean Theorem……triangle… …sine……tangent…
trigonometry...sine… …tangent……triangle…
Resource 1 Resource 2 Concept List
Pythagorean Theorem,tangent, triangle trigonometry, sine,
Pythagorean Theorem
triangle
sine
tangent
trigonometry
Resource 1
Resource 2
![Page 10: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/10.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 2010WING, NUS
Score Computation• Initialization– Resource Node (FKRE)– Concept Node (Average score of neighboring nodes)
• Iterative Computation– All nodes (Current score + average score of neighboring nodes)
• Termination Condition– The ranking of the resources stabilizes
w x y z
a b c
Resource Nodes
Concept Nodes
w x y z a b cInitialization 1 3 3 5 2 3 4
Iteration 1 3 5.5 6.5 9 4 6 8
Iteration 2 7 10.5 13.5 17 8.25 12 15.75
![Page 11: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/11.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Evaluation• Goals– Effectiveness
Iterative computation vs. other readability measures in math domain
– Efficiency Iterative computation with domain-specific resources and
concepts selection in math domain– Portability
Iterative computation vs. other readability measures in medical domain
11WING, NUS
![Page 12: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/12.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Effectiveness Experiment• Corpus– Collection
27 math concepts 1st 100 search results from Google
– Annotation 120 randomly chosen webpages
Annotated by first author and 30 undergraduate students using a 7-point readability scale
Kappa: 0.71, Spearman’s rho: 0.9312WING, NUS
Value Education Background
1 Primary
2 Lower Secondary
3 Higher Secondary
4 Junior College (Basic)
5 Junior College (Advanced)
6 University (Basic)
7 University (Advanced)
![Page 13: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/13.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Effectiveness Experiment• Baseline:– Heuristic
FKRE– Supervised learning
Naïve Bayes, Support Vector Machine, Maximum Entropy Binary word features only
• Metrics:– Pairwise accuracy– Spearman’s rho
13WING, NUS
![Page 14: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/14.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Effectiveness Experiment• Results– FKRE and NB show modest
correlation
– SVM and Maxent perform significantly better
– Best performance is achieved by iterative computation
14WING, NUS
Pairwise SpearmanFKRE .72 .48
NB .72 .52
SVM .80 .70
Maxent .82 .67
IC .85 .72
![Page 15: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/15.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Efficiency Experiment• Corpus/Metrics same as before
• Different selection strategies– Resource selection by random– Resource selection by quality– Concept selection by random– Concept selection by TF.IDF
15WING, NUS
![Page 16: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/16.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Efficiency Experiment• Results– If chosen at random, the more
resources/concepts the better
– When chosen by quality, a small set of resources is also sufficient
– Selection by TF.IDF helps to filter out useless concepts
16WING, NUS
20% 40% 60% 80% 100%0.5
0.550.6
0.650.7
0.750.8
0.850.9
0.951
Quality (Pairwise) Random (Pairwise)Quality (Spearman) Random (Spearman)
20% 40% 60% 80% 100%0.5
0.550.6
0.650.7
0.750.8
0.850.9
0.951
TF.IDF (Pairwise) Random (Pairwise)TF.IDF (Spearman) Random (Spearman)
![Page 17: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/17.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Portability Experiment• Corpus– Collection
27 medical concepts 1st 100 search results from Google
– Annotation Readability of 946 randomly chosen webpages annotated by
first author on the same readability scale
• Metric/Baseline same as before
17WING, NUS
![Page 18: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/18.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Portability Experiment• Results– Heuristic is still the weakest
– Supervised approaches benefit greatly from the larger amount of annotation
– Iterative computation remains competitive
– Limited readability spectrum in medical domain
18WING, NUS
Pairwise Spearman
FKRE .63 .28
NB .73 .53
SVM .82 .70Maxent .76 .60
IC .72 .49
ICS .75 .54
![Page 19: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/19.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Future Work• Processing– Noise reduction
• Probabilistic formulation– Distribution of values
e.g. 70% of webpages highly readable and 30% much less readable
– Correlations between multiple pairs of attributes e.g. Genericity and page type
19WING, NUS
![Page 20: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/20.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Conclusion• Iterative Computation– Readability of domain-specific resources and difficulty of
domain-specific concepts can be estimated from each other– Simple yet effective, efficient and portable
• Part of the exploration in Domain-specific Information Retrieval
– Categorization– Readability– Text to domain-specific construct linking
20WING, NUS
![Page 21: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/21.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Any questions?
21WING, NUS
![Page 22: Iterative Readability Computation for Domain-Specific Resources](https://reader036.fdocuments.us/reader036/viewer/2022081514/56816618550346895dd96705/html5/thumbnails/22.jpg)
Jin Zhao and Min-Yen Kan
11/06/2010 / 20
Related Graph-based Algorithms • PageRank– Directed links– Backlinks indicate popularity/recommendation
• HITS– Hub and authority score for each node
• SALSA
22WING, NUS