Jun Liu (liukeen@mail.xjtu) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian

Post on 01-Jan-2016

61 views 4 download

Tags:

description

Mining Preorder Relation between Knowledge Units from Text. Jun Liu (liukeen@mail.xjtu.edu.cn) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian. November 10, 2014. SAC 2010, Sierre, Switzerland. Outline. Motivation and Challenges Two features of the Preorder Relation - PowerPoint PPT Presentation

Transcript of Jun Liu (liukeen@mail.xjtu) Lu Jiang, Zhaohui Wu Qinghua Zheng, Yanan Qian

Jun Liu (liukeen@mail.xjtu.edu.cn)

Lu Jiang, Zhaohui Wu

Qinghua Zheng, Yanan QianSAC 2010, Sierre,

SwitzerlandApril 19, 2023

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

Learning is an incremental process. To understand a new knowledge unit often relies on the understanding of certain existing knowledge units.

Preorder relations among the knowledge units help the learners avoid the disorientation problem in learning.

Manually annotating the potential preorder relations is very time consuming, and requires the annotators be the domain experts.

Definition of Triangle

Triangle Interior Angles Sum Theorem

Definition of Interior Angle

Definition of Exterior Angle

Triangle Exterior Angle

Theorem

Preorder Relation

Given a text document set , and a knowledge unit set extracted from T as the input, the preorder relation mining process will output a set .

Each can be further represented as a triplet of (name, type, content).

Name: such as “definition of subnet mask”

Type: such as definition, property or method

Content: the text content of the knowledge unit

)1}({ niuU i )1}({ mktT k

UUA

iu

There has been no previous work on mining the relation among the knowledge units.

Ontology learning, KAT and RDC can hardly be applied to mine the preorder relations .

Challenges:Knowledge units expressed in natural language are ambiguous or ill-formedKnowledge units have far more complex structures than the concepts and named entities Preorder relations have the characteristic of long distance dependency

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

We generated KUs in the given document set by using our extraction method and manually refined the results. Then we manually annotated the preorder relation among the extracted KUs.

The annotating work was conducted as follows:a) Developed web-based annotating system

b) Hired 24 undergraduates from the CS department

c) Create a set of rules to guide the work

d) Created the experimental data set that covers the five courses: Computer Network , Advanced Mathematics, Computer Organization and Architecture, Database System and Application and Geometry (KUs: 5000+; Relations: 7000+ )

is inversely proportional to exponential function of d, that is, .

Preorder relation can be mined within the same document, or the documents with similar topic.

If knowledge units in are precursors of knowledge units in , then .

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

Text Set Text AssociationMining

Candidate KU-Pairs Generation

Preorder Relation Identification

Distribution Asymmetry of Domain Term

Locality of Preorder Relation

Text Associations Candidate KU-Pairs Preorder Relations

: Knowledge Unit (KU): Text

Text Association Mining aims at finding the documents of similar topic, and then ranks them in pairs.

The clustering process deals with three cases:1. Two documents ti and tj are put into one cluster;

2. A document ti is put into the cluster S (assume tj in S is closest to ti);

3. Cluster S and cluster S’ merge into a new cluster (assume ti in S and tj in S’ are closest document pair).

For each pair (ti , tj ), set a proper threshold F0 (F0<1 ),

If , ;

If , ;

Once the clustering is finished, the directed graph is also generated.

For each node in , .

For each ,

Three useful features for classification–based recognition algorithm

1. Term frequency:

The greater the is , the more likely that has preorder relation.

2. Distance:

decays exponentially while grows.

3. Semantic type:

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

ID Course Name #KUs #Preorder relations

1 Computer Network 889 758

2 Computer Organization and Architecture 743 839

3 Database System and Application 1,398 1,176

4 Geometry 427 1,325

ID#possible

pairs#candidate

pairsretention

ratio

#training samples

- +

1 49,506 1,858 91.9 1,828 620

2 28,392 4,678 94.3 1,477 680

3 195,806 3,219 96.8 2,524 890

4 12,882 2,313 95.2 1,454 704

ClassifierClassifier CriteriaCriteriaID = 1ID = 1 ID = 2ID = 2 ID = 3ID = 3

-- ++ -- ++ -- ++

SVM

precision 99.3 93.3 99.4 61.6 99.1 73.3

recall 99.5 89.6 97.2 88.3 95.6 93.5

F1-score 99.4 91.4 98.3 72.6 97.3 82.2

DT( C4.5 )

precision 97.6 70.3 99.8 52.4 96.2 81.8

recall 98.0 66.4 95.5 95.7 98.0 69.8

F1-score 97.8 68.3 97.6 67.7 97.1 75.3

NB

precision 99.4 60.8 99.7 48.8 97.2 75.5

recall 95.7 92.0 94.8 94.8 96.7 77.9

F1-score 97.5 73.2 97.2 64.4 96.9 76.7

MLP

precision 99.5 56.3 99.7 53.3 99.3 70.8

recall 94.7 93.6 95.7 95.2 95.0 94.6

F1-score 97.1 70.3 97.7 68.3 97.1 81.0

The classification results is immune to the changing of β within a certain range. we set β to 0.4.

Yotta (1024) : A topic-map-based knowledge management system(under construction)

Motivation and Challenges

Two features of the Preorder

Relation

Process of Mining Preorder Relations

Experimental Evaluation

Conclusions

Two features of the preorder relation were discovered : the locality of the preorder relation and the distribution asymmetry of the domain terms.

A classification-based method of mining the preorder relations was proposed.

Future work: to extend the method to mining the preorder relation residing in online knowledge repository –Wikipedia.

1. J. M. Ruiz-Sanchez, R. Valencia-Garca, J. T. Fernandez-Breis, R. Martnez-Bejar and P. Compton. An Approach for Incremental Knowledge Acquisition from Text. Expert Systems with Applications, July 2003, 25(1):77-86.

2. C. Timothy and P. Patrick. VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-04, Barcelona, Spain, 2004 :33-40.

3. X.Y. Du, M. Li, S. Wang. A Survey on Ontology Learning Research. Journal of Software, 2006, 17(9):1837-1847.

4. F. Michael and H. Eduard. Offline Strategies for Online Question Answering: Answering Questions Before They Are Asked. The 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), Sapporo, Japan, 2003: 1-7.

5. D. Zhou, J. Su and M. Zhang. Modeling Commonality among Related classes in Relation Extraction. The 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL’2006), Sydney, Australia, 2006: 121-128.

6. M. Witbrock, D. Baxter, J. Curtis, et al. An Interactive Dialogue System for Knowledge Acquisition in CYC. The 18th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 2003: 138-145.

7. X. Chang, Q.H. Zheng. Knowledge Element Extraction for Knowledge-Based Learning Resources Organization. The 6th International Conference on Web-based Learning. Edinburgh, United Kingdom, 2007: 102-113.

Thank You!Thank You!Thank You!Thank You!

Questions?Questions?