Frida Pres

download Frida Pres

of 30

Transcript of Frida Pres

  • 8/12/2019 Frida Pres

    1/30

    On Applications of RoughSets theory to Knowledge

    DiscoveryFrida Coaquira

    UNIVERSITY OF PUERTO RICO

    MAYAGEZ [email protected]

  • 8/12/2019 Frida Pres

    2/30

    Introduction

    One goal of the Knowledge Discovery is extract meaningful

    knowledge.

    Rough Sets theory was introduced by Z. Pawlak (1982) as

    a mathematical tool for data analysis.

    Rough sets have many applications in the field of

    Knowledge Discovery: feature selection, discretization

    process, data imputations and create decision Rules.

    Rough set have been introduced as a tool to deal with,

    uncertain Knowledge in Artificial Intelligence Application.

  • 8/12/2019 Frida Pres

    3/30

    Equivalence Relation

    Let X be a set and let x, y, and z be elements of X.

    An equivalence relation R on X is a Relation on X

    such that:

    Reflexive Property: xRx for all x in X.

    Symmetric Property: if xRy, then yRx.

    Transitive Property: if xRy and yRz, then xRz.

  • 8/12/2019 Frida Pres

    4/30

    Rough Sets Theory

    Let , be a Decision system data,

    Where: U is a non-empty, finite set called the universe,A is a non-empty finite set of attributes, C and D are subsets

    of A, Conditional and Decision attributes subsetsrespectively.

    for is called the value set of a,

    The elements of U are objects, cases, states, observations.The Attributes are interpreted as features, variables,

    characteristics conditions, etc.

    ),,,( DCAUT

    aVUa : ,Aa aV

  • 8/12/2019 Frida Pres

    5/30

    Indiscernibility Relation

    The Indecernibility relationIND(P) is an

    equivalence relation.

    Let , , the indiscernibilityrelation IND(P), is defined as follows:

    for all

    Aa AP

    :),{()( UUyxPIND ,Pa )}()( yaxa

  • 8/12/2019 Frida Pres

    6/30

    Indiscernibility Relation

    The indiscernibility relation defines apartitionin U.

    Let , U/IND(P) denotes a family of all equivalence

    classes of the relationIND(P), called elementary sets.

    Two other equivalence classes U/IND(C) and

    U/IND(D), called condition and decision equivalenceclasses respectively, can also be defined.

    AP

  • 8/12/2019 Frida Pres

    7/30

    R-lower approximation

    Let and , R is a subset of conditional

    features, then the R-lower approximation

    set of X, is the set of all elements of U which

    can be with certainty classified as elements ofX.

    R-lower approximation set of X is a subset of X

    CRUX

    }:/{ XYRUYXR

  • 8/12/2019 Frida Pres

    8/30

    R-upper approximation

    the R-upper approximation set of X, is theset of all elements of U such that:

    X is a subset of R-upper approximation set of X.R-upper approximation contains all data which can possibly

    be classified as belonging to the setX

    the R-Boundary set of Xis defined as:

    }:/{ XYRUYXR

    XRXRXBN )(

  • 8/12/2019 Frida Pres

    9/30

    Representation of the approximation sets

    XRXR If then, X is R-definible(the boundary set is empty)If then X is Rough with respect to R.

    ACCURACY := Card(Lower)/ Card (Upper)

    XRXR

  • 8/12/2019 Frida Pres

    10/30

    Decision Class

    The decision d determines thepartition

    of the universe U.

    Where for

    will be called the classification of objects

    in T

    determined by the decision d.

    The setXkis called the k-th decision class of T

    },...,{)( )(1 drT XXdCLASS

    })(:{ kxdUxXk )(1 drk

    )(dCLASST

  • 8/12/2019 Frida Pres

    11/30

    Decision Class

    This system data information has 3 classes, We represent the

    partition: lower approximation, upper approximation and boundary

    set.

  • 8/12/2019 Frida Pres

    12/30

    Rough Sets Theory

    Lets considerU={x1, x2, x3, x4, x5, x6, x7, x8} and the

    equivalence relation R with the equivalence classes:

    X1={x1,x3,x5}, X2={x2,x4}and X3={x6,x7,x8} is a Partition.

    Let the classification C={Y1,Y2,Y3} such that

    Y1={x1, x2, x4}, Y2={x3, x5, x8}, Y3={x6, x7}

    Only Y1has lower approximation, i.e. ,21 XYR

  • 8/12/2019 Frida Pres

    13/30

    Positive region and Reduct

    Positive regionPOSR(d) is called thepositive regionof classification

    CLASST(d) is equal to the union of all lower approximation

    of decision classes.

    Reducts ,are defined as minimal subset of condition

    attributeswhich preserve positive region defined by the set

    of all condition attributes, i.e.

    A subset is a relative reduct iff1 ,

    2 For every proper subset condition 1 is not true.

    )()( DPOSDPOS CR CR

    RR '

  • 8/12/2019 Frida Pres

    14/30

    Dependency coefficient

    Is a measure of association, Dependency coefficient

    between condition attributes A and a decision attribute dis

    defined by the formula:

    Where, Card represent the cardinality of a set.

    )(

    ))((),(

    UCard

    dPOSCarddA A

  • 8/12/2019 Frida Pres

    15/30

    Discernibility matrix

    Let U={x1, x2, x3,, xn} the universe on decision system

    Data.Discernibilitymatrixis defined by:

    ,

    where, is the set of all attributes that classify objects

    xiandxjinto different decision classes in U/Dpartition.

    for some i, j } .

    ))}()(,()()(:{ jijiij xdxdDdxaxaCam nji ,...,3,2,1,

    ijm

    }{:{)( amCaCCORE ij

  • 8/12/2019 Frida Pres

    16/30

    Dispensable feature

    Let Ra family of equivalence relations and let P R,

    P is dispensablein Rif IND(R) =IND(R-{P}),

    otherwise P is indispensable in R.

    COREThe set of all indispensable relation in C will be called the

    core of C.

    CORE(C)= RED(C), whereRED(C) is the family of all

    reducts of C.

  • 8/12/2019 Frida Pres

    17/30

    Small Example

    Let , the universe set., the conditional features set.

    ,Decision features set.

    },,,,,,{ 7654321 xxxxxxxU

    },,,{ 4321 aaaaC

    }{dD

    d

    1 0 2 1 1

    1 0 2 0 1

    1 2 0 0 2

    1 2 2 1 0

    2 1 0 0 2

    2 1 1 0 2

    2 1 2 1 1

    1

    a2

    a 3

    a4

    a

    1x

    2x

    3x

    4x

    5x

    6x

    7x

    {,,{, {{, {,{,,,{,,{,,, {,,,,,{,,, ,,

    ,,

  • 8/12/2019 Frida Pres

    18/30

    Discernibility Matrix

    -

    -

    - -

    - -

    1x2x 3x 4x 5x 6x

    2x

    3x

    4x

    5x

    6x

    7x

    },,{ 432 aaa

    }{ 2a

    },{ 32 aa

    },{ 42 aa

    },,{ 321 aaa},,,{ 4321 aaaa

    },,,{ 4321 aaaa },,{ 321 aaa

    },,,{ 4321 aaaa

    },,,{ 4321 aaaa

    },,,{ 4321 aaaa

    },{ 43 aa

    },{ 43 aa},{ 43 aa},{ 21 aa

  • 8/12/2019 Frida Pres

    19/30

    Example

    Then, the Core(C) = {a2}

    The partition produces by Core is

    U/{a2} = {{ x1,x2},{x5, x6,x7},{x3,x4}},

    and the partition produces by the decision feature dis

    U/{d}={{ x4},{ x1,x2,x7},{x3,x5,x6}}

  • 8/12/2019 Frida Pres

    20/30

    Similarity relation

    A similarity relationon the set of objects is

    , It contain all objects similar to x.

    Lower approximation

    , is the set of all element of Uwhich can be with certainty classified as elements of X.

    Upper approximation

    SIM-Possitive regionof partitionLet

    }:{ xySIMUyxSIM TT

    }:{)( XxSIMXxXSIM TT

    UX

    Xx

    TT xSIMXSIM

    )(

    )}(,...,1:{ driXi

    })(:{ ixdUxXi )(

    1

    )(}){(dr

    i

    iTT XSIMdSIMPOS

    UX

  • 8/12/2019 Frida Pres

    21/30

    Similarity measures

    a

    b

    are parameters, this measure is not symmetric.

    Similarity for nominal attribute

    minma x

    1),(aa

    vvvvS

    ji

    jia

    otherwise.0

    if1),( jia vvS

    ajaji vvv

    aa ,

    )(

    1 )().(

    ),(),(

    ),(

    dr

    k

    ji

    jiakdPdr

    vakdPvakdP

    vvS

  • 8/12/2019 Frida Pres

    22/30

  • 8/12/2019 Frida Pres

    23/30

    Attribute Reduction

    The purpose is select a subset of attributes from an Original

    set of attributes to use in the rest of the process.

    Selection criteria: Reduct concept description.

    Reduct is the essential part of the knowledge, which defineall basic concepts.

    Other methods are:

    Discernibility matrix (nn)

    Generate all combination of attributes and then evaluatethe classification power or dependency coefficient

    (complete search).

  • 8/12/2019 Frida Pres

    24/30

    Discretization Methods

    The purpose is development an algorithm that find a

    consistent set of cuts point which minimizes the number of

    Regions that are consistent.

    Discretization methods based on Rough set theory try to find

    These cutpointsA set of S points P1, , Pn in the plane R2 , partitioned into

    two disjoint categories S1, S2 and a natural number T.

    Is there a consistent setof lines such that the partition of the

    plane into region defined by them consist of at most T

    regions?

  • 8/12/2019 Frida Pres

    25/30

    Consistent

    Def.A set of cuts P is consistent with A (or A-consistent) iff,

    where and are general decisions of A and AP

    respectively.

    Def.A set Pirrof cuts isA-irreducible iff Pirris A-consistent

    and any its proper subfamily P ( P PPirr) is not

    A-inconsistent.

    PAA

    A PA

  • 8/12/2019 Frida Pres

    26/30

    Level of Inconsistency

    Let Ba subset of Aand

    WhereXiis a classification of U and

    , i= 1,2,,n

    Lcrepresents the percentage of instances which can beCorrectly classified into classXiwith respect to subset B.

    U

    XBL

    i

    c

    ji XX

    UXi

  • 8/12/2019 Frida Pres

    27/30

    Imputation Data

    The rules of the system should have Maximum in terms of

    consistency.

    The relevant attributes for x is defined by.

    is defined }

    And the relation

    for all

    x and y are consistent if .

    ExampleLet x=(1,3,?,4), y=(2,?,5,4) and z=(1,?,5,4)

    x and z are consistent

    x and y are not consistent

    )(:{)( xaRaxrelR

    )()( yaxayxRc )()( yrelxrela RR

    yxRc

    zxRc

  • 8/12/2019 Frida Pres

    28/30

    Decision rules

    F1 F2 F3 F4 D Rules

    O3 0 0 0 1 L R1

    O5 0 0 1 3 L R1

    O1 0 1 0 2 L R2

    O4 0 1 1 0 M R3

    O2 1 1 0 2 H R4

    Rule1 if (F2=0) then (D=L)Rule2 if (F1=0) then (D=L)

    Rule3 if (F4=0) then (D=M)

    Rule4 if (F1=0) then (D=H)

    The algorithm should minimize the number of features

    included in decision rules.

  • 8/12/2019 Frida Pres

    29/30

    References

    [1] Gediga, G. And Duntsch, I. (2002) Maximum Consistency ofIncomplete Data Via Non-invasive Imputation. ArtificialIntelligence.

    [2] Grzymala, J. and Siddhave, S. (2004) Rough set Approach to RuleInduction from Incomplete Data. Proceeding of the IPMU2004,the10th International Conference on information Processing and

    Management of Uncertainty in Knowledge-Based System.[3] Pawlak, Z. (1995) Rough sets. Proccedings of the 1995 ACM 23rd

    annual conference on computer science.

    [4]Tay, F. and Shen, L. (2002) A modified Chi2 Algorithm forDiscretization. In IEEE Transaction on Knowledge and Dataengineering, Vol 14, No. 3 may/june.

    [5] Zhong, N. (2001) Using Rough Sets with Heuristics for FeatureSelection. Journal of Intelligent Information Systems, 16, 199-214,Kluwer Academic Publishers.

  • 8/12/2019 Frida Pres

    30/30

    THANK YOU!