Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let...

8
Revisiting FAUST for One-class Classification (one class C): Revisiting FAUST for One-class Classification (one class C): I. I. Let x be an unclassified sample. Let D Let x be an unclassified sample. Let D x be the vector from VoM be the vector from VoM C to x. Use UDR to to x. Use UDR to construct the count distributions of D construct the count distributions of D x o C (down to 2 C (down to 2 k intervals for some small intervals for some small k). Use as a cut point, the point where the D k). Use as a cut point, the point where the D x o C count drops below a threshold C count drops below a threshold (e.g., 0) starting from the VOM (e.g., 0) starting from the VOM C side (or use as the cutpoint, the last side (or use as the cutpoint, the last Precipitous Count Decrease?) Precipitous Count Decrease?) Classify x according to where D Classify x according to where D x o x falls wrt that cut point. There may be a trick or x falls wrt that cut point. There may be a trick or short cut that would speed this up markedly! short cut that would speed this up markedly! Also we might classify x "not in C" if it is gapped away from C in the D Also we might classify x "not in C" if it is gapped away from C in the D x - distribution? e.g., 2 3 7 6 4 8 0 0 distribution? e.g., 2 3 7 6 4 8 0 0 D x ox 0 0 0 4 5 7 8 9 0 0 0 4 5 7 8 9 To classify a large batch, X, this may be slow, since we'd start over with a new D To classify a large batch, X, this may be slow, since we'd start over with a new D x for each x for each xX. If we have X={x X. If we have X={x 1 ..x ..x n } unclassified sample set, deriving the SPTSs, } unclassified sample set, deriving the SPTSs, D x k oC may be doable as a batch (one loop through C)? oC may be doable as a batch (one loop through C)? II. II. (I is lazy. II is model-based Approximates the entire boundary of C first and (I is lazy. II is model-based Approximates the entire boundary of C first and then use it as our model-based classifier for X. then use it as our model-based classifier for X. C might be approximated as the set of points "inside" the intersection of half-spaces C might be approximated as the set of points "inside" the intersection of half-spaces each of which is the C side of a hyper-plane, i.e., get a series of (d,a) pairs, each of which is the C side of a hyper-plane, i.e., get a series of (d,a) pairs, each of which defines a half-space as, e.g., {z | d each of which defines a half-space as, e.g., {z | do z>a} (simplest? d z>a} (simplest? d k =e =e k ) Next AND mask X Next AND mask Xo d>a, giving X points which get classified into C (> above will be < d>a, giving X points which get classified into C (> above will be < for some of the half-spaces.). for some of the half-spaces.). The hard question remaining here is how to determine the series of (d.a) pairs??? The hard question remaining here is how to determine the series of (d.a) pairs??? 1. Choose the next d to be perpendicular to all previous (e.g., use as the series 1. Choose the next d to be perpendicular to all previous (e.g., use as the series e 1 , e , e 2 , ...e , ...e n ) 2. User the diagonals, e's, mean-to-median, mean-to-furthest, ... 2. User the diagonals, e's, mean-to-median, mean-to-furthest, ... 3. Start with {e 3. Start with {e i }. Add a finer and finer grid of unit vectors until diameter of the }. Add a finer and finer grid of unit vectors until diameter of the C-approximation is close to the diameter of C C-approximation is close to the diameter of C III. III. For very high value and durable training sets (e.g., 10 years of normal For very high value and durable training sets (e.g., 10 years of normal communications known not to be associated with a terrorist plot - because there communications known not to be associated with a terrorist plot - because there was no terrorist activity over those 10 years), we might want to try to build a was no terrorist activity over those 10 years), we might want to try to build a better model than II by analyzing the corners more carefully better model than II by analyzing the corners more carefully Let C Let C 1 be the 1st approx by circumscribing hyperplanes ( be the 1st approx by circumscribing hyperplanes ( k, lower bd hypl, L k, lower bd hypl, L k ={x| ={x| x k =minX =minX k } and higher bd H } and higher bd H k ={x|x ={x|x k =maxX =maxX k } Classify x into C iff x is in C Classify x into C iff x is in C 1 (minX (minX k x x k maxX maxX k k . Can replace minX . Can replace minX k by lowest by lowest large count change and maxX large count change and maxX k by highest or use some other outlier elimination by highest or use some other outlier elimination process). process). Does C fill the corners of C Does C fill the corners of C 1 ? (for high dimensions, corners can be huge and C can ? (for high dimensions, corners can be huge and C can have very different shape in each corner!) have very different shape in each corner!) Could try to cap each corner with a Could try to cap each corner with a round round cap (r cap (r 2 , r , r 4 , ...). , ...). diagonal and sub-diagonal, cap diagonal and sub-diagonal, cap to it: D's; D to it: D's; D 12 12 =e =e 1 +e +e 2 2 (note Y (note YoD 12 12 =Y =Y 1 +Y +Y 2 ), D ), D 123 123 =e =e 1 +e +e 2 +e +e 3

Transcript of Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let...

Page 1: Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.

Revisiting FAUST for One-class Classification (one class C):Revisiting FAUST for One-class Classification (one class C):

I.I. Let x be an unclassified sample. Let D Let x be an unclassified sample. Let Dxx be the vector from VoM be the vector from VoMCC to x. Use UDR to construct the count distributions of to x. Use UDR to construct the count distributions of

DDxxooC (down to 2C (down to 2kk intervals for some small k). Use as a cut point, the point where the D intervals for some small k). Use as a cut point, the point where the DxxooC count drops below a C count drops below a

threshold (e.g., 0) starting from the VOMthreshold (e.g., 0) starting from the VOMCC side (or use as the cutpoint, the last Precipitous Count Decrease?) side (or use as the cutpoint, the last Precipitous Count Decrease?)

Classify x according to where DClassify x according to where Dxxoox falls wrt that cut point. There may be a trick or short cut that would speed this up x falls wrt that cut point. There may be a trick or short cut that would speed this up

markedly! markedly! Also we might classify x "not in C" if it is gapped away from C in the DAlso we might classify x "not in C" if it is gapped away from C in the Dxx-distribution? e.g., 2 3 7 6 4 8 0 0 -distribution? e.g., 2 3 7 6 4 8 0 0 DDxxooxx 0 0 0 4 5 7 8 9 0 0 0 4 5 7 8 9

To classify a large batch, X, this may be slow, since we'd start over with a new DTo classify a large batch, X, this may be slow, since we'd start over with a new Dxx for each x for each xX. If we have X={xX. If we have X={x11..x..xnn} }

unclassified sample set, deriving the SPTSs, Dunclassified sample set, deriving the SPTSs, DxxkkoC may be doable as a batch (one loop through C)?oC may be doable as a batch (one loop through C)?

II.II. (I is lazy. II is model-based Approximates the entire boundary of C first and then use it as our model-based classifier for (I is lazy. II is model-based Approximates the entire boundary of C first and then use it as our model-based classifier for X. X.

C might be approximated as the set of points "inside" the intersection of half-spaces each of which is the C side of a hyper-C might be approximated as the set of points "inside" the intersection of half-spaces each of which is the C side of a hyper-plane, i.e., get a series of (d,a) pairs, each of which defines a half-space as, e.g., {z | dplane, i.e., get a series of (d,a) pairs, each of which defines a half-space as, e.g., {z | dooz>a} (simplest? dz>a} (simplest? dkk=e=ekk))

Next AND mask XNext AND mask Xood>a, giving X points which get classified into C (> above will be < for some of the half-spaces.).d>a, giving X points which get classified into C (> above will be < for some of the half-spaces.).The hard question remaining here is how to determine the series of (d.a) pairs???The hard question remaining here is how to determine the series of (d.a) pairs???1. Choose the next d to be perpendicular to all previous (e.g., use as the series e1. Choose the next d to be perpendicular to all previous (e.g., use as the series e 11, e, e22, ...e, ...enn))

2. User the diagonals, e's, mean-to-median, mean-to-furthest, ...2. User the diagonals, e's, mean-to-median, mean-to-furthest, ...3. Start with {e3. Start with {eii}. Add a finer and finer grid of unit vectors until diameter of the C-approximation is close to the diameter of C}. Add a finer and finer grid of unit vectors until diameter of the C-approximation is close to the diameter of C

III.III. For very high value and durable training sets (e.g., 10 years of normal communications known not to be associated with a For very high value and durable training sets (e.g., 10 years of normal communications known not to be associated with a terrorist plot - because there was no terrorist activity over those 10 years), we might want to try to build a better model terrorist plot - because there was no terrorist activity over those 10 years), we might want to try to build a better model than II by analyzing the corners more carefullythan II by analyzing the corners more carefully

Let CLet C11 be the 1st approx by circumscribing hyperplanes ( be the 1st approx by circumscribing hyperplanes (k, lower bd hypl, Lk, lower bd hypl, Lkk={x|x={x|xkk=minX=minXkk} and higher bd H} and higher bd Hkk={x|={x|

xxkk=maxX=maxXkk}}

Classify x into C iff x is in CClassify x into C iff x is in C11 (minX (minXkk x xkk maxX maxXk k . Can replace minX. Can replace minXkk by lowest large count change and maxX by lowest large count change and maxXkk by highest by highest

or use some other outlier elimination process).or use some other outlier elimination process).

Does C fill the corners of CDoes C fill the corners of C11? (for high dimensions, corners can be huge and C can have very different shape in each corner!)? (for high dimensions, corners can be huge and C can have very different shape in each corner!)

Could try to cap each corner with a Could try to cap each corner with a roundround cap (r cap (r22, r, r44, ...)., ...).diagonal and sub-diagonal, cap diagonal and sub-diagonal, cap to it: D's; D to it: D's; D1212=e=e11+e+e2 2 (note Y(note YooDD1212=Y=Y11+Y+Y22), D), D123123=e=e11+e+e22+e+e33 (Y (YooDD123123=Y=Y11+Y+Y22+Y+Y33) etc.) etc.

This general method (of enclosing classes with piecewise linear boundaries This general method (of enclosing classes with piecewise linear boundaries to sums of dimensional unit vectors and their to sums of dimensional unit vectors and their negatives, may be a good model based method for multiclass classification as well??!!negatives, may be a good model based method for multiclass classification as well??!!

Page 2: Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.

THESESTHESESMohammad and Arjun are approaching the deadline for finishing and since they are working in related areas, I thought I would try to lay out my Mohammad and Arjun are approaching the deadline for finishing and since they are working in related areas, I thought I would try to lay out my

understanding of what your theses will be (to start the discussion).understanding of what your theses will be (to start the discussion).

Mohammad’s thesis might be titled something like, Mohammad’s thesis might be titled something like, “Horizontal Operators and Operations for MiningVertical Data”“Horizontal Operators and Operations for MiningVertical Data” and could detail and and could detail and compare the performance of all the SPTS operators we use and the various implementation methods.  Keep in mind, that “best” may vary compare the performance of all the SPTS operators we use and the various implementation methods.  Keep in mind, that “best” may vary depending upon lots of things, such as the type of data, the type of data mining, the size of the data, the complexity of the data, etc.  depending upon lots of things, such as the type of data, the type of data mining, the size of the data, the complexity of the data, etc. 

Even though I often recommend paper type thesis, It seems that non-paper theses are more valuable (witness how many times I refer to Yue Cui’s)Even though I often recommend paper type thesis, It seems that non-paper theses are more valuable (witness how many times I refer to Yue Cui’s)

Arjun’s thesis could be titled Arjun’s thesis could be titled “Performance Evaluation of the FAUST Methodology for Classification, Prediction and Clustering”“Performance Evaluation of the FAUST Methodology for Classification, Prediction and Clustering” and will and will compare the performance of all data mining methods in the FAUST genre (to the others in the FAUST genre and at least roughly to the compare the performance of all data mining methods in the FAUST genre (to the others in the FAUST genre and at least roughly to the other main methods out there).  The point should be made up front that for big vertical data, there aren’t many implementations and the other main methods out there).  The point should be made up front that for big vertical data, there aren’t many implementations and the issue is speed because applying traditional methods (to the corresponding horizontal version of the data) takes much too long.  The issue is speed because applying traditional methods (to the corresponding horizontal version of the data) takes much too long.  The comparison to traditional horizontal data methods can be explained to be limited to showing that pTree methods compare favorably to those comparison to traditional horizontal data methods can be explained to be limited to showing that pTree methods compare favorably to those others on accuracy, and with respect to speed, the comparison can be a rough Big-O comparison (and might also bring in the things Dr. others on accuracy, and with respect to speed, the comparison can be a rough Big-O comparison (and might also bring in the things Dr. Wettstein pointed out to us (see the 1_4_14 notes). Of course give reference if you do.Wettstein pointed out to us (see the 1_4_14 notes). Of course give reference if you do.

The structure chart for FAUST might be:The structure chart for FAUST might be:                                                                                                                          ________________________FAUST_________________________________________________________________________FAUST_________________________________________________                                                                                                                      /                                                                                                                        \ \/                                                                                                                        \ \                                        _____Classification Method (Cut point goes where?)                                                 ______Clustering Method_____ ARM?_____Classification Method (Cut point goes where?)                                                 ______Clustering Method_____ ARM?                                      /                       |                                      \                    \                                                /                                                      \/                       |                                      \                    \                                                /                                                      \Midpt of Means   Midpt of VoMs    STD ratio of Means   STD ratio of Medians         sequence of D-lines?                            Cut pt goes where?Midpt of Means   Midpt of VoMs    STD ratio of Means   STD ratio of Medians         sequence of D-lines?                            Cut pt goes where?                                                                                                                                                                                                                                                                                     /                  |               \                       /            |                  \/                  |               \                       /            |                  \                                                                                                                                                                                                                                            Mean-VoM    Cycle_diags  Mean-furthest    gap   count_change   othersMean-VoM    Cycle_diags  Mean-furthest    gap   count_change   others

(we did some others that I can’t recall at the moment)(we did some others that I can’t recall at the moment)

Then any of these modules might call any or all Mohammad’s SPTS procedures and some of my stuff as well as Dr. Wettstein’s procedures  Then any of these modules might call any or all Mohammad’s SPTS procedures and some of my stuff as well as Dr. Wettstein’s procedures 

These procedures include: Dot product     add/subtr/mult/mult_by_constant SPTSs      …..These procedures include: Dot product     add/subtr/mult/mult_by_constant SPTSs      …..  

My thinking was that you would performance analyze the structure chart stuff above and Mohammad would detail his 2’s comp stuff and then My thinking was that you would performance analyze the structure chart stuff above and Mohammad would detail his 2’s comp stuff and then performance analyze it (and various implementations of his stuff) as well as the other lower level procedural stuff.performance analyze it (and various implementations of his stuff) as well as the other lower level procedural stuff.

  

Both of you would consider the various dataset types and sizes and both would quote the results of the other probably Both of you would consider the various dataset types and sizes and both would quote the results of the other probably 

Page 3: Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.

Here's the kind of thing that Md's thesis will detail (essentially on SPTS Operations)Here's the kind of thing that Md's thesis will detail (essentially on SPTS Operations)Computing Squared Euclidean Distance, SED, from a point, p: (YComputing Squared Euclidean Distance, SED, from a point, p: (Yoop)p)22, Y is a set and p is a fixed pt in n-space, Y is a set and p is a fixed pt in n-space

YYoop = p = i=1..ni=1..n(y(yiippii) ED(y,p) = SQRT() ED(y,p) = SQRT(i=1..ni=1..n(y(yii – p – pii))22 ) )

SED(y,p) = SED(y,p) = i=1..ni=1..n(y(yii – p – pii))22 = = i=1..ni=1..n(Y(Yii-p-pii)(Y)(Yii-p-pii) = ) = i=1..ni=1..n(Y(YiiYYii – 2p – 2piiYYi i + p+ pii2 2 )   = )   = i=1..ni=1..nYYiiYYii – 2 – 2i=1..ni=1..nppiiY + pY + poopp

Md: I can calculate (YMd: I can calculate (Yii-p-pii) using 2's complement and then multiply (Y) using 2's complement and then multiply (Y ii-p-pii) with (Y) with (Yii-p-pii) to get the (Y) to get the (Yii-p-pii))22, then , then

add them for i=1..n which will give me SED (Squared Euclidian Distance). add them for i=1..n which will give me SED (Squared Euclidian Distance).

But if we break up: But if we break up: i=1..ni=1..n(Y(Yii-p-pii))22 = = i=1..ni=1..n(Y(Yii22 - 2Y - 2Yiippii + p + pii

22) = ) = i=1..ni=1..nYYii22 - 2 - 2i=1..ni=1..nYYiippii + + i=1..ni=1..nppii

22

I think we need more multiplication than addition which is an expensive operation.I think we need more multiplication than addition which is an expensive operation.

I have a little example comparing these two methods.I have a little example comparing these two methods.

Page 4: Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.

Improved Oblique FAUSTImproved Oblique FAUSTCuts are made at count changes, not just at gaps. Cuts are made at count changes, not just at gaps. Count changes reveal the entry or exit of a cluster by the perpendicular hyper-plane.Count changes reveal the entry or exit of a cluster by the perpendicular hyper-plane.This improves Oblique FAUST's ability to cluster big data (compared to cutting only at gaps.).This improves Oblique FAUST's ability to cluster big data (compared to cutting only at gaps.).

We tried Improved Oblique FAUST on the Spaeth dataset successfully (produces a full dendogram of sub-clusterings by We tried Improved Oblique FAUST on the Spaeth dataset successfully (produces a full dendogram of sub-clusterings by recursively taking the dot product with the vector from the Mean to the VOM (Vector-Of-Medians) and by cutting at recursively taking the dot product with the vector from the Mean to the VOM (Vector-Of-Medians) and by cutting at each each 25% count change25% count change in the interval count distribution produced by the UDR procedure with interval widths of 2 in the interval count distribution produced by the UDR procedure with interval widths of 2 33 . .

We claim that an appropriate count change will reveal cluster boundaries almost always. i.e., almost always a We claim that an appropriate count change will reveal cluster boundaries almost always. i.e., almost always a precipitous count decrease will occur as the cut hyper-plane enters a cluster and a precipitous count decrease will occur as the cut hyper-plane enters a cluster and a precipitous count increase will occur as the cut hyper-plane exits a cluster.precipitous count increase will occur as the cut hyper-plane exits a cluster.We also claim that Improved Oblique FAUST will scale up for big data, because entering and leaving clusters "smoothly" We also claim that Improved Oblique FAUST will scale up for big data, because entering and leaving clusters "smoothly"

((without noticeable count changewithout noticeable count change) is no more likely for big data than for small. (since it's a measure=0 phenomenon).) is no more likely for big data than for small. (since it's a measure=0 phenomenon).For the count changes to reveal themselves, it may be necessary in some data settings to look for a For the count changes to reveal themselves, it may be necessary in some data settings to look for a change patternchange pattern over a over a

distribution window because entering a distribution window because entering a roundround cluster may not produce a large abrupt change in counts but may produce cluster may not produce a large abrupt change in counts but may produce a noticeable change a noticeable change patternpattern over a window of counts. It may be sufficient for this purpose to just use a naive over a window of counts. It may be sufficient for this purpose to just use a naive windowingwindowing in which we stop the UDR count distribution generation process at intervals of width=2 in which we stop the UDR count distribution generation process at intervals of width=2 kk for some small for some small value of k and look for consecutive count changes in that rough count distribution. This approach appears to be value of k and look for consecutive count changes in that rough count distribution. This approach appears to be effective and is fast.effective and is fast.

We built the distribution down to intervals of width 2We built the distribution down to intervals of width 233=8 for the Spaeth dataset, which has diameter=114. So, for Spaeth we =8 for the Spaeth dataset, which has diameter=114. So, for Spaeth we stopped UDR at interval widths equal to 7% of the overall diameter (8/114=.07). stopped UDR at interval widths equal to 7% of the overall diameter (8/114=.07).

Outliers, especially exterior outliers, can produce a Outliers, especially exterior outliers, can produce a badbad diameter estimate. To get a good cluster diameter estimate, we should diameter estimate. To get a good cluster diameter estimate, we should identify and mask off exterior outliers first (before applying the Pythagorean diameter estimation formula).identify and mask off exterior outliers first (before applying the Pythagorean diameter estimation formula).

Cluster outliers can be identified as singleton sub-clusters that are sufficiently gapped away from the rest of the cluster. Note Cluster outliers can be identified as singleton sub-clusters that are sufficiently gapped away from the rest of the cluster. Note that pure outlier or anomaly detection procedure need not use the Improved Oblique FAUST method since outliers are that pure outlier or anomaly detection procedure need not use the Improved Oblique FAUST method since outliers are always surrounded by gaps and they do not produce big count changes.always surrounded by gaps and they do not produce big count changes.

Points furthest from [or just far from] the VOM are high probability candidates for exterior outliers. These can be identified Points furthest from [or just far from] the VOM are high probability candidates for exterior outliers. These can be identified and then checked for outliers by creating SPTS, (Yand then checked for outliers by creating SPTS, (YooVOM)VOM)22 and use just the high end of the UDR to mask those and use just the high end of the UDR to mask those candidates. Of course points that project at the extremes of any dot product projection set are outlier candidates too.candidates. Of course points that project at the extremes of any dot product projection set are outlier candidates too.

Page 5: Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.

FAUST Technology for Clustering and Classification FAUST Technology for Clustering and Classification is built for speed improvements so that big data can be mined in human time.is built for speed improvements so that big data can be mined in human time.Improved Oblique FAUSTImproved Oblique FAUST places cuts at all large places cuts at all large Count Changes Count Changes, , each of which reveals a cluster boundary almost always (i.e., almost always a each of which reveals a cluster boundary almost always (i.e., almost always a

large count decrease occurs iff we are exiting a cluster on the cut hyper-plane and a large count increase occurs iff we are entering a cluster.large count decrease occurs iff we are exiting a cluster on the cut hyper-plane and a large count increase occurs iff we are entering a cluster.IO FAUST makes a cut at each large count change in the yIO FAUST makes a cut at each large count change in the yood values (A gap is a large decr followed by a large incr, so gaps are included)d values (A gap is a large decr followed by a large incr, so gaps are included)IO FAUST is Divisive Hierarchical Clustering which builds a cluster dendogram.IO FAUST is Divisive Hierarchical Clustering which builds a cluster dendogram.IO FAUST will scale up, because entering and leaving a cluster "smoothly" (IO FAUST will scale up, because entering and leaving a cluster "smoothly" (w/o noticeable count changew/o noticeable count change) is no more likely for large datasets than ) is no more likely for large datasets than

for small. (It's a measure=0 phenomenon). Do we need BARREL FAUST at all now?for small. (It's a measure=0 phenomenon). Do we need BARREL FAUST at all now?A radius estimate for a set, Y, is SQRT( (width(YA radius estimate for a set, Y, is SQRT( (width(Yood)/2)d)/2)22 + (max d-barrel radius) + (max d-barrel radius)2 2 ), assuming all outer edge outliers have been removed), assuming all outer edge outliers have been removedDensity UniformityDensity Uniformity (DU)(DU) of a sub-cluster might be defined as the reciprocal of the variance of the counts. of a sub-cluster might be defined as the reciprocal of the variance of the counts.A cluster dendogram should have a Density=count/volume label and a Density Uniformity=reciprocal_of_count_variance label on each edge.We can end a dendogram branch as soon as Density and Density Uniformity are high enough (> thresholds, DT and DUT) to save time.We can end a dendogram branch as soon as Density and Density Uniformity are high enough (> thresholds, DT and DUT) to save time.We can [quickly] estimate Density as count/cnrn. We have the count a radius estimate and n. cn is a known constant (e.g., c1=, c2=4/3...)..In advance, we decide on a density threshold, DET, and a Density Uniformity Threshold DUT.In advance, we decide on a density threshold, DET, and a Density Uniformity Threshold DUT.To choose the "best" clustering, we proceed depth first until the DET and DUT thresholds are met.To choose the "best" clustering, we proceed depth first until the DET and DUT thresholds are met.

Oblique FAUST Code Layering?Oblique FAUST Code Layering? A layer (or object or black box or procedure) in the code called the A layer (or object or black box or procedure) in the code called the CUTTERCUTTER: : INPUTS:INPUTS: I.1. SPTS I.2.method: Cut_at? I.2.a. p%_CountChange), I.2.b. non-uniform thresholds? I.2.c. centers of gaps only I.1. SPTS I.2.method: Cut_at? I.2.a. p%_CountChange), I.2.b. non-uniform thresholds? I.2.c. centers of gaps onlyI.3. Return sub-cluster masks (Y/N), since it is an expensive step and therefore we wouldn't want to do it unless the count was needed..I.3. Return sub-cluster masks (Y/N), since it is an expensive step and therefore we wouldn't want to do it unless the count was needed..OUTPUTS:OUTPUTS: O.1. A pointer to a mask pTree for each new "sub-cluster" (i.e., identifying each set of points separated by consecutive cuts).O.1. A pointer to a mask pTree for each new "sub-cluster" (i.e., identifying each set of points separated by consecutive cuts).O.2. The 1-count of each of those mask pTreesO.2. The 1-count of each of those mask pTrees

GRAMMERGRAMMER: : INPUTS:INPUTS: I.1. An existing Labeled Dendogram (labeled with e.g., the unit vector that produced it, the density of each edge sub- I.1. An existing Labeled Dendogram (labeled with e.g., the unit vector that produced it, the density of each edge sub-cluster...) including the tree of pointers to a mask pTrees for each node (incl. the root, which need not be all of the original set)cluster...) including the tree of pointers to a mask pTrees for each node (incl. the root, which need not be all of the original set)

I.2 The new threshold levels (if, e.g., the density threshold is lower than that of the existing, GRAMMER prunes the dendogramI.2 The new threshold levels (if, e.g., the density threshold is lower than that of the existing, GRAMMER prunes the dendogramOUTPUTS:OUTPUTS: O.1. The new labeled DendogramO.1. The new labeled Dendogram

TREEMINER UPDATE TREEMINER UPDATE Mark has a Hadoop-MapReduce verison going with Oblique FAUST to do classification and 1-class classification. Mark has a Hadoop-MapReduce verison going with Oblique FAUST to do classification and 1-class classification. He uses a Smart Distributed File System which turns tables on their side so columns (SPTSs and therefore bit slices) are Map Reduce rows. He uses a Smart Distributed File System which turns tables on their side so columns (SPTSs and therefore bit slices) are Map Reduce rows. Then each node has access to a section of rows. So each node gets a section of the original column set. Those columns are also cut into sections.Then each node has access to a section of rows. So each node gets a section of the original column set. Those columns are also cut into sections.

WHAT IS NEEDED:WHAT IS NEEDED:1. An Auto K Clusterer, when there is no preconceived idea as to how many clusters there should be. 1. An Auto K Clusterer, when there is no preconceived idea as to how many clusters there should be. Improved Oblique FAUST should help.Improved Oblique FAUST should help.2. A New Cluster Finder (e.g., for finding anomalies). 2. A New Cluster Finder (e.g., for finding anomalies). Improved Oblique FAUST should help.Improved Oblique FAUST should help.Need to track clusters over time (e.g., in a corpus of documents with new ones coming in). Need to track clusters over time (e.g., in a corpus of documents with new ones coming in). If a new batch of rows are added (e.g., documents), If a new batch of rows are added (e.g., documents),

and if IO FAUST has already established a cluster dendogram from a tree of dot product vectors and density settings, etc., we just and if IO FAUST has already established a cluster dendogram from a tree of dot product vectors and density settings, etc., we just apply those to the new batch. We establish the new dendogram (or just the new version of the single cluster being watched) with:apply those to the new batch. We establish the new dendogram (or just the new version of the single cluster being watched) with:a. a. Establish a new set of count changes based on count changes in the new batch and those in the original (count changes in the new Establish a new set of count changes based on count changes in the new batch and those in the original (count changes in the new batch that are significant enough to be count changes of the composite and, rarely, count decreases of the batch that coincide with batch that are significant enough to be count changes of the composite and, rarely, count decreases of the batch that coincide with count increases of the original an vice versa (However, I don't think this incremental method will work for us!)count increases of the original an vice versa (However, I don't think this incremental method will work for us!)b,b, Redo UDR from scratch on the composite distribution Redo UDR from scratch on the composite distribution

3. A real-time Cluster Analyzer (If I change this parameter, how does this cluster change?)3. A real-time Cluster Analyzer (If I change this parameter, how does this cluster change?) The user should be able to isolate a cluster, use The user should be able to isolate a cluster, use sliders to tune weightings (e.g., rotate the D-line) and to change density and DU levels. sliders to tune weightings (e.g., rotate the D-line) and to change density and DU levels.

Page 6: Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.

Choosing a clustering from a DEL and DUL labeled Dendogram

A B C D E F GA B C D E F G

The algorithm for choosing the optimal clustering from a labeled dendogram is as follows: The algorithm for choosing the optimal clustering from a labeled dendogram is as follows: Let DET=.4 Let DET=.4 and DUT=and DUT=½½

DEL=.1 DEL=.1 DUL=1/6DUL=1/6

DEL=.2 DEL=.2 DUL=1/8DUL=1/8

DEL=.4 DEL=.4 DUL=1DUL=1

DEL= DEL= DUL=DUL=

DEL= DEL= DUL=DUL=

DEL= DEL= DUL=DUL=

DEL= DEL= DUL=DUL=

DEL= DEL= DUL=DUL=

DEL= DEL= DUL=DUL=

DEL= DEL= DUL=DUL=

DEL=.5 DEL=.5 DUL=DUL=½½

DEL=.3 DEL=.3 DUL=DUL=½½

Since a full dendogram is far bigger than the original table, we set threshold(s), Since a full dendogram is far bigger than the original table, we set threshold(s), We build a partial dendogram (ending a branch when threshold(s) are met)We build a partial dendogram (ending a branch when threshold(s) are met)

Then a slider for density would work as follows:Then a slider for density would work as follows:The user set the threshold(s). We give the clustering.The user set the threshold(s). We give the clustering.The user increases threshold(s). We prune the dendogram and give clustering.The user increases threshold(s). We prune the dendogram and give clustering.The user decreases threshold(s). We build each branches down further until the The user decreases threshold(s). We build each branches down further until the

new threshold(s) are exceeded and give the new clustering.new threshold(s) are exceeded and give the new clustering.

We might want to also display the dendogram to the user and let him select a We might want to also display the dendogram to the user and let him select a "root" for further analysis, etc."root" for further analysis, etc.

Page 7: Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.

1 3 1 0 2 0 6 21 3 1 0 2 0 6 2

11234234 55 6677 88 99 aa bbcc ddeeff

1 y1y2 y72 y3 y5 y83 y4 y6 y94 ya5 6 78 yf9 yba ycb yd yecdef0 1 2 3 4 5 6 7 8 9 a b c d e f

MA cut at 7 and 11MA cut at 7 and 11

APPLYING CC FAUST TO SPAETH APPLYING CC FAUST TO SPAETH DensityCount/rDensityCount/r22 labeled dendogram labeled dendogram for LCC FAUST on Spaeth with D=AvgMedian for LCC FAUST on Spaeth with D=AvgMedian DET=.3DET=.3

YY(.15)(.15)

{y1,y2,y3,y4,y5}{y1,y2,y3,y4,y5}(.37)(.37) {y6,yf}{y6,yf}(.08)(.08) {y7,y8,y9,ya,yb.yc.yd.ye}{y7,y8,y9,ya,yb.yc.yd.ye}(.07)(.07)

{y7,y8,y9,ya}{y7,y8,y9,ya}(.39)(.39) {yb,yc,yd,ye}{yb,yc,yd,ye}(1.01)(1.01){y6}{y6}()() {yf}{yf}()()

D=AM D=AM DET=.5DET=.5

{y1,y2,y3,y4}{y1,y2,y3,y4}(.63)(.63) {y5}{y5}()()

{y7,y8,y9}{y7,y8,y9}(1.27)(1.27) {ya}{ya}()()D=AM D=AM DET=1DET=1{y1,y2,y3}{y1,y2,y3}(2.54)(2.54) {y4}{y4}()()

DCount/rDCount/r22 labeled dendogram for LCC FAUST on Spaeth w labeled dendogram for LCC FAUST on Spaeth w D=cylces thru diagonals nnxx,nxxn,nnxx,nxxn..., D=cylces thru diagonals nnxx,nxxn,nnxx,nxxn..., DET=.3DET=.3

YY(.15)(.15)

{y1,y2,y3,y4,y5}{y1,y2,y3,y4,y5}(.37)(.37) {y6,y7,y8,y9,ya,yb.yc.yd.ye,yf}{y6,y7,y8,y9,ya,yb.yc.yd.ye,yf}(.09)(.09)

{y6,y7,y8,y9,ya}{y6,y7,y8,y9,ya}(.17)(.17) {yb,yc,yd,ye,yf}{yb,yc,yd,ye,yf}(.25)(.25)

{yf}{yf}()() {yb,yc,yd,ye}{yb,yc,yd,ye}(1.01)(1.01)

{y7,y8,y9,ya}{y7,y8,y9,ya}(.39)(.39) {y6}{y6}()()

D-lineD-line

labeled dendogram for LCC FAUST on Spaeth labeled dendogram for LCC FAUST on Spaeth w D=furthestAvg, w D=furthestAvg, DET=.3DET=.3

YY(.15)(.15)

y1,2,3,4,5y1,2,3,4,5(.37(.37{y6,yf}{y6,yf}(.08)(.08) {y7,y8,y9,ya,yb.yc.yd.ye}{y7,y8,y9,ya,yb.yc.yd.ye}(.07)(.07)

{y7,8,9,a}{y7,8,9,a}(.39)(.39){yb,yc,yd,ye}{yb,yc,yd,ye}(1.01)(1.01)

{y6}{y6}()() {yf}{yf}()()

1 y1y2 y72 y3 y5 y83 y4 y6 y94 ya5 6 78 yf9 yba ycb yd ye0 1 2 3 4 5 6 7 8 9 a b c d e f

Page 8: Revisiting FAUST for One-class Classification (one class C): I. Let x be an unclassified sample. Let D x be the vector from VoM C to x. Use UDR to construct.

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 05/64 [0,64)

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 110/64 [64,128)

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

Y y1 y2y1 1 1y2 3 1y3 2 2y4 3 3y5 6 2y6 9 3y7 15 1y8 14 2y9 15 3ya 13 4pb 10 9yc 11 10yd 9 11ye 11 11yf 7 8

yofM 11 27 23 34 53 80118114125114110121109125 83

p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

p2 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0

p1 1 1 1 1 0 0 1 1 0 1 1 0 0 0 1

p0 1 1 1 0 1 0 0 0 1 0 0 1 1 1 1

p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

p2' 1 1 0 1 0 1 0 1 0 1 0 1 0 0 1

p1' 0 0 0 0 1 1 0 0 1 0 0 1 1 1 0

p0' 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

0[0,8)

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

1[8,16)

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

1[16,24)

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

1[24,32)

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

1[32,40)

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

0[40,48)

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

1[48,56)

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

0[56,64)

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

2[80,88)

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

0[88,96)

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

0[96,104)

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

2[194,112)

p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1

3[112,120)

p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0

3[120,128)

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

1/16[0,16)

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

2/16[16,32)

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

1[32,48)

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

1[48,64)

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

0[64,80)

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

2[80,96)

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

2[96,112)

p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

6[112,128)

p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

3/32[0,32)

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

2/32[64,96)

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

2/32[32,64)

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

¼[96,128)

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0

f=

UDR Univariate Distribution Revealer (on Spaeth:)

Pre-compute and enter into the ToC, all DT(YPre-compute and enter into the ToC, all DT(Ykk) plus those for selected Linear Functionals (e.g., d=main diagonals, ModeVector .) plus those for selected Linear Functionals (e.g., d=main diagonals, ModeVector .Suggestion: In our pTree-base, every pTree (basic, mask,...) should be referenced in ToC( pTree, pTreeLocationPointer, pTreeOneCount ).and these Suggestion: In our pTree-base, every pTree (basic, mask,...) should be referenced in ToC( pTree, pTreeLocationPointer, pTreeOneCount ).and these OneCts should be repeated everywhere (e.g., in every DT). The reason is that these OneCts help us in selecting the pertinent pTrees to access - and in OneCts should be repeated everywhere (e.g., in every DT). The reason is that these OneCts help us in selecting the pertinent pTrees to access - and in

fact are often all we need to know about the pTree to get the answers we are after.).fact are often all we need to know about the pTree to get the answers we are after.).

0 0 1 1 1 1 0 1 01 1 1 1 0 1 0 00 0 0 2 0 0 2 3 32 0 0 2 3 3

1 2 1 1 0 2 2 6 1 2 1 1 0 2 2 6

3 2 2 8 3 2 2 8

5 105 10

depthDT(S)depthDT(S)bb≡≡BitWidth(S) h=depth of a node k=node offsetBitWidth(S) h=depth of a node k=node offsetNodeNodeh,kh,k has a ptr to pTree{x has a ptr to pTree{xS | F(x)S | F(x)[k2[k2b-h+1b-h+1, (k+1)2, (k+1)2b-h+1b-h+1)} and )} and

its 1countits 1count

applied to S, a column of numbers in bistlice format (an SpTS), will applied to S, a column of numbers in bistlice format (an SpTS), will produce the produce the DistributionTree of S DT(S)DistributionTree of S DT(S)

1515 depth=h=0depth=h=0

depth=h=1depth=h=1

nodenode2,32,3

[96.128)[96.128)