Visual Data Mining: Concepts, Frameworks and Algorithm Development

30
Visual Data Mining: Concepts, Frameworks and Algorithm Development Student: Fasheng Qiu Instructor: Dr. Yingshu Li

description

Visual Data Mining: Concepts, Frameworks and Algorithm Development. Student: Fasheng Qiu Instructor: Dr. Yingshu Li. Outline. Introduction to VDM VDM frameworks Case study of a specific framework VQLBCI Conclusions References. Introduction to VDM. - PowerPoint PPT Presentation

Transcript of Visual Data Mining: Concepts, Frameworks and Algorithm Development

Page 1: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Visual Data Mining: Concepts, Frameworks and Algorithm Development

Student: Fasheng Qiu

Instructor: Dr. Yingshu Li

Page 2: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Outline

• Introduction to VDM• VDM frameworks• Case study of a specific framework VQLBCI• Conclusions• References

Page 3: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Introduction to VDM• The use of visualization techniques to allow data miners to

evaluate, monitor, and guide the inputs, products and process of data mining.

• It can help introduce user insights, preferences, and biases in the earlier stages of the data mining life-cycle to reduce its overall computation complexity and reduce the set of uninteresting patterns in the product.

• The new insights may facilitate the development of better algorithms and processes for data mining.

Page 4: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Why Visual Data Mining

VDM takes advantage of both,• The power of automatic calculations, and• The capabilities of human processing.

– Human perception is involved

Page 5: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Why Visual Data Mining

• It is useful because1. Early user feedback about level of interest can filter out

uninteresting patterns early in the process. 2. User feedback can improve the selection of appropriate

learning algorithms based on the application domain.3. Visual inspection of datasets can provide direct clues

towards interesting patterns in the data.4. Visualization of the data mining process can provide new

insights and may result in the design of better algorithms for data mining.

Page 6: Visual Data Mining: Concepts, Frameworks and Algorithm Development

The current research in VDM• Mainly focus on the framework of integrating the

information visualization paradigms and data mining techniques.

For example,1. A Flexible Approach for Visual Data Mining. 2. A framework for visual data mining of structures.3. A Visual Data Mining Framework for Convenient Identification of

Useful knowledge.4. A Visual Data Mining Framework by loose-coupling of databases and

visualization systems.

Page 7: Visual Data Mining: Concepts, Frameworks and Algorithm Development

A VDM framework-VQLBCI• The following slides present a VDM framework and the

application of the VDM framework towards designing new algorithms that can learn decision trees by manually refining some of the decisions made by well known algorithms such as C4.5.

Page 8: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Components of VQLBCI• The three major components of VQLBCI are Visual

Representations, Computations and Events.

Page 9: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Visual Development of Algorithms

• Most interesting use of visual data mining is the development of new insights and algorithms.

• The figure below shows the ER diagram for learning classification decision trees.

• This model allows the user to monitor the quality and impact of decisions made by the learning procedure.

• Learning procedure can be refined interactively via a visual interface.

Page 10: Visual Data Mining: Concepts, Frameworks and Algorithm Development

ER diagram for the search space of decision tree learning algorithm

Page 11: Visual Data Mining: Concepts, Frameworks and Algorithm Development

General Process• Learning a classification decision tree from a training data set

can be regarded as a process of searching for the best decision tree that meets user-provided goal constraints.

• The problem space of this search process consists of Model Candidates, Model Candidate Generator and Model Constraints.

• Many existing classification-learning algorithms like C4.5 and CDP fit nicely within this search framework. New learning algorithms that fit user’s requirements can be developed by defining the components of the problem space.

Page 12: Visual Data Mining: Concepts, Frameworks and Algorithm Development

General Process• Model Candidate corresponds to the partial classification

decision tree. Each node of the decision tree is a Model Atom.

• Search process is the process of finding a final model candidate such that it meets user goal specifications.

• Model Candidate Generator transforms the current model candidate into a new model candidate by selecting one model atom to expand from the expandable leaf model atoms.

• Model Constraints (used by Model Candidate Generator) provide controls and boundaries to the search space.

Page 13: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Example of Search Process

13

Page 14: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Example of Search Process

Page 15: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Acceptability Constraint• Model Constraints consist of Acceptability constraints,

Expandability constraints and a Data-Entropy calculation function.

• Acceptability constraint predicate specifies when a model candidate is acceptable and thus allows search process to stop. EX:– A1) Total # of expandable leaf model atoms = 0. (Default)– A2) Overall error rate of the model candidate <= acceptable error rate.– A3) Total number of model atoms in the model candidate>= maximal

allowable tree size.

A1 is used in C4.5 and CDP

Page 16: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Expandability Constraint• An Expandability constraint predicate specifies whether a leaf

model atom is expandable or not. EX:– C4.5 uses E1 and E2– CDP uses E2 and E3

Page 17: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Traversal Strategy• Traversal strategy ranks expandable leaf model atoms based

on the model atom attributes. EX:

– Breadth-first– Depth-first– Orders based on other model atom attributes. (Best-first)

Page 18: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Steps in Visual Algorithm Development

• No single algorithm is the best all the time, performance is highly data dependent.

• By changing different predicates of model constraints, users can construct new classification-learning algorithm.

• This enables users to find an algorithm that works the best on a given data set.

• Two algorithms are developed : BF based on Best First search idea and CDP+ which is a modification of CDP

Page 19: Visual Data Mining: Concepts, Frameworks and Algorithm Development

BF• This algorithm is based on the Best-First search idea.• For Acceptability criteria, it includes A1 and A2 with a user

specified acceptable error rate.• The Traversal strategy chosen is T3• In Best-First, expandable leaf model atoms are ranked

according to the decreasing order of the number of misclassified training cases. (local error rate * size of subset training data set)

• The traversal strategy will expand a model atom that has the most misclassified training cases, thus reducing the overall error rate the most.

Page 20: Visual Data Mining: Concepts, Frameworks and Algorithm Development

CDP +• CDP+ is a modification of CDP

• CDP has dynamic pruning using expandability constraint E3.

• Here, the depth is modified according to the size of the training data set of the model atom.

We set b is the branching factor of the decision tree, t is the size of training data set belonging to model atom, T is the whole training data set.

Page 21: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Comparison of different classification learning algorithms

Page 22: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Experiment• The new BF and CDP+ algorithms are compared with the C4.5

and CDP algorithms.• Various metrics are selected to compare the efficiency,

accuracy and size of final decision trees of the classification algorithm.

• The generation efficiency of the nodes is measured in terms of the total number of nodes generated.

• To compare accuracy of the various algorithms, the mean classification error on the test data sets have been computed.

Page 23: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Classification error for 10 data sets

Page 24: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Nodes generated for 10 data sets

Page 25: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Final decision tree size

Page 26: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Results• CDP has accuracy comparable to C4.5 while generating

considerably fewer nodes.

• CDP+ has accuracy comparable to C4.5 while generating considerably fewer nodes.

• CDP+ outperformed CDP in error rate and number of nodes generated.

• Considering all performance metrics together, CDP+ is the best overall algorithm.

• Considering classification accuracy alone, C4.5P is the winner.

Page 27: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Results• Different datasets require different algorithms for best

results.• Diverse user requirements put different constraints on the

final decision tree.• The experiment shows that Visual Data Mining Framework

can help find the most suitable algorithm for a given data set and group of user requirements.

Page 28: Visual Data Mining: Concepts, Frameworks and Algorithm Development

Conclusions• VDM is useful in enhancing the data mining process.

• Different VDM frameworks are provided.

• VDM can help the design of better algorithms.

28

Page 29: Visual Data Mining: Concepts, Frameworks and Algorithm Development

References• Matthias Kreuseler and Heidrun Schumann. A Flexible Approach for Visual Data Mining.

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002

• M. Kreuseler, T.Nocke, etc. A History Mechanism for Visual Data Mining. IEEE Symposium on Information Visualization 2004 October 10-12, Austin, Texas, USA

• Kaidi Zhao, Bing Liu, etc. A Visual Data Mining Framework for Convenient Identification of Useful Knowledge. Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

• Hans-J¨org Schulz, Thomas Nocke, etc. A Framework for Visual Data Mining of Structures. Twenty-Ninth Australasian Computer Science Conference (ACSC2006), Hobart, Tasmania, Australia, January 2006.

• Alexander Kort. Visual Data Mining and Zoomable Interfaces. IUI’04, January 13–16, 2004, Madeira, Funchal, Portugal. ACM 1-58113-815-6/04/0001

• Koji Kato, Tomoyuki Shibata. Visual Data Mining using Omni-directional Sensor. IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems 2003

• Daniel A. Keim. Information Visualization and Visual Data Mining. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 7, NO. 1, JANUARY-MARCH 2002

• M.Ganesh, Eui-Hong Han, etc. Visual Data Mining: Framework and Algorithm Development. Technical Report, March 12, 1996

Page 30: Visual Data Mining: Concepts, Frameworks and Algorithm Development

That’s it• Questions? Comments?

Thank you!