Jun Liu ([email protected]) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian
-
Upload
tanner-mclean -
Category
Documents
-
view
61 -
download
4
description
Transcript of Jun Liu ([email protected]) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian
Jun Liu ([email protected])
Lu Jiang, Zhaohui Wu
Qinghua Zheng, Yanan QianSAC 2010, Sierre,
SwitzerlandApril 19, 2023
Motivation and Challenges
Two features of the Preorder
Relation
Process of Mining Preorder Relations
Experimental Evaluation
Conclusions
Learning is an incremental process. To understand a new knowledge unit often relies on the understanding of certain existing knowledge units.
Preorder relations among the knowledge units help the learners avoid the disorientation problem in learning.
Manually annotating the potential preorder relations is very time consuming, and requires the annotators be the domain experts.
Definition of Triangle
Triangle Interior Angles Sum Theorem
Definition of Interior Angle
Definition of Exterior Angle
Triangle Exterior Angle
Theorem
Preorder Relation
Given a text document set , and a knowledge unit set extracted from T as the input, the preorder relation mining process will output a set .
Each can be further represented as a triplet of (name, type, content).
Name: such as “definition of subnet mask”
Type: such as definition, property or method
Content: the text content of the knowledge unit
)1}({ niuU i )1}({ mktT k
UUA
iu
There has been no previous work on mining the relation among the knowledge units.
Ontology learning, KAT and RDC can hardly be applied to mine the preorder relations .
Challenges:Knowledge units expressed in natural language are ambiguous or ill-formedKnowledge units have far more complex structures than the concepts and named entities Preorder relations have the characteristic of long distance dependency
Motivation and Challenges
Two features of the Preorder
Relation
Process of Mining Preorder Relations
Experimental Evaluation
Conclusions
We generated KUs in the given document set by using our extraction method and manually refined the results. Then we manually annotated the preorder relation among the extracted KUs.
The annotating work was conducted as follows:a) Developed web-based annotating system
b) Hired 24 undergraduates from the CS department
c) Create a set of rules to guide the work
d) Created the experimental data set that covers the five courses: Computer Network , Advanced Mathematics, Computer Organization and Architecture, Database System and Application and Geometry (KUs: 5000+; Relations: 7000+ )
is inversely proportional to exponential function of d, that is, .
Preorder relation can be mined within the same document, or the documents with similar topic.
If knowledge units in are precursors of knowledge units in , then .
Motivation and Challenges
Two features of the Preorder
Relation
Process of Mining Preorder Relations
Experimental Evaluation
Conclusions
Text Set Text AssociationMining
Candidate KU-Pairs Generation
Preorder Relation Identification
Distribution Asymmetry of Domain Term
Locality of Preorder Relation
Text Associations Candidate KU-Pairs Preorder Relations
: Knowledge Unit (KU): Text
Text Association Mining aims at finding the documents of similar topic, and then ranks them in pairs.
The clustering process deals with three cases:1. Two documents ti and tj are put into one cluster;
2. A document ti is put into the cluster S (assume tj in S is closest to ti);
3. Cluster S and cluster S’ merge into a new cluster (assume ti in S and tj in S’ are closest document pair).
For each pair (ti , tj ), set a proper threshold F0 (F0<1 ),
If , ;
If , ;
Once the clustering is finished, the directed graph is also generated.
For each node in , .
For each ,
Three useful features for classification–based recognition algorithm
1. Term frequency:
The greater the is , the more likely that has preorder relation.
2. Distance:
decays exponentially while grows.
3. Semantic type:
Motivation and Challenges
Two features of the Preorder
Relation
Process of Mining Preorder Relations
Experimental Evaluation
Conclusions
ID Course Name #KUs #Preorder relations
1 Computer Network 889 758
2 Computer Organization and Architecture 743 839
3 Database System and Application 1,398 1,176
4 Geometry 427 1,325
ID#possible
pairs#candidate
pairsretention
ratio
#training samples
- +
1 49,506 1,858 91.9 1,828 620
2 28,392 4,678 94.3 1,477 680
3 195,806 3,219 96.8 2,524 890
4 12,882 2,313 95.2 1,454 704
ClassifierClassifier CriteriaCriteriaID = 1ID = 1 ID = 2ID = 2 ID = 3ID = 3
-- ++ -- ++ -- ++
SVM
precision 99.3 93.3 99.4 61.6 99.1 73.3
recall 99.5 89.6 97.2 88.3 95.6 93.5
F1-score 99.4 91.4 98.3 72.6 97.3 82.2
DT( C4.5 )
precision 97.6 70.3 99.8 52.4 96.2 81.8
recall 98.0 66.4 95.5 95.7 98.0 69.8
F1-score 97.8 68.3 97.6 67.7 97.1 75.3
NB
precision 99.4 60.8 99.7 48.8 97.2 75.5
recall 95.7 92.0 94.8 94.8 96.7 77.9
F1-score 97.5 73.2 97.2 64.4 96.9 76.7
MLP
precision 99.5 56.3 99.7 53.3 99.3 70.8
recall 94.7 93.6 95.7 95.2 95.0 94.6
F1-score 97.1 70.3 97.7 68.3 97.1 81.0
The classification results is immune to the changing of β within a certain range. we set β to 0.4.
Yotta (1024) : A topic-map-based knowledge management system(under construction)
Motivation and Challenges
Two features of the Preorder
Relation
Process of Mining Preorder Relations
Experimental Evaluation
Conclusions
Two features of the preorder relation were discovered : the locality of the preorder relation and the distribution asymmetry of the domain terms.
A classification-based method of mining the preorder relations was proposed.
Future work: to extend the method to mining the preorder relation residing in online knowledge repository –Wikipedia.
1. J. M. Ruiz-Sanchez, R. Valencia-Garca, J. T. Fernandez-Breis, R. Martnez-Bejar and P. Compton. An Approach for Incremental Knowledge Acquisition from Text. Expert Systems with Applications, July 2003, 25(1):77-86.
2. C. Timothy and P. Patrick. VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-04, Barcelona, Spain, 2004 :33-40.
3. X.Y. Du, M. Li, S. Wang. A Survey on Ontology Learning Research. Journal of Software, 2006, 17(9):1837-1847.
4. F. Michael and H. Eduard. Offline Strategies for Online Question Answering: Answering Questions Before They Are Asked. The 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), Sapporo, Japan, 2003: 1-7.
5. D. Zhou, J. Su and M. Zhang. Modeling Commonality among Related classes in Relation Extraction. The 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL’2006), Sydney, Australia, 2006: 121-128.
6. M. Witbrock, D. Baxter, J. Curtis, et al. An Interactive Dialogue System for Knowledge Acquisition in CYC. The 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 2003: 138-145.
7. X. Chang, Q.H. Zheng. Knowledge Element Extraction for Knowledge-Based Learning Resources Organization. The 6th International Conference on Web-based Learning. Edinburgh, United Kingdom, 2007: 102-113.
Thank You!Thank You!Thank You!Thank You!
Questions?Questions?