Geometric margin domain description with instance-specific margins Adam Gripton Thursday, 5 th May,...

Post on 14-Jan-2016

215 views 0 download

Tags:

Transcript of Geometric margin domain description with instance-specific margins Adam Gripton Thursday, 5 th May,...

Geometric margin domain description with instance-specific

margins

Adam GriptonThursday, 5th May, 2011

Presentation Contents

• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions

High-level motivation

Task as originally stated:

• Expert replacement system to deal with Non-Destructive Assay (NDA) data provided by a sponsor for analysis and classification

• Involves automatic feature extraction and inference via classification step

High-level motivation

Data consignment:Fissile elements• Californium-252• Highly Enriched Uranium• Weapons-Grade Plutonium

Shielding methods• Aluminium• Steel ball• Steel planar to det• Lead• CHON (HE sim.)

Detectors• Sodium Iodide scintillator (NaI)• High-Resolution Germanium (semiconductor) spectrometer (HRGS)• Neutron array counter (N50R)

NaI

Neutron

HRGS

Source

Shield

NaI

Source

High-level motivation

Data consignment:

Spectroscopy experiments

High-level motivation

Data consignment:

τ0 2 * τ0 3 * τ0 etc

BX 0 279384403 138774738 91909165 to

BX 1 1805235 1785515 1770553 16

BX 2 49548 58784 65688

..etc … … …

Neutron multiplicity arrays

High-level motivationf1 f2 f3 f4 Class

0.1 0.2 -0.3 0.4 1

0.15 x -0.2 x 1

0.05 0.22 x x 2

x x -0.4 0.401 2

0.08 0.24 -0.5 0.399 3

• Features (columns) based on physically relevant projections of raw experimental data

• Class vector: refers to fissile material or shielding method• Some data absent: either not measured or not applicable

(structurally missing)

High-level motivation

Two principal aims:

1.Devise a novel contribution to existing literature on classification methods

2.Provide system of classification of abstract data that is applicable to provided dataset

Presentation Contents

• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions

Development of system

Overview

Aim 2

Applicability To Dataset

Aim 1

Novel Contribution

Multi-Class

KernelMethods

Missing Data

Development of system

Overview

Multi-Class

KernelMethods

Missing Data• SVDD (Tax, Duin) and

Multi-Class Hybrid (Lee)• Geometric SVM (Chechik)

Development of system

• “Kernel trick” : ML algorithms that only query data values implicitly via the dot product

Working with Kernels

• Replace <x,y>←k(x,y) to imitate a mapping {x→φ(x)} such that k(x,y)=<φ(x), φ(y)>

• Valid if Mercer condition holds ({k(xi,xj)} p.semid.)• Allows analysis in complex superspace without

need to directly address its Cartesian form

Development of system

Support Vector Domain Description

• “One-class classification”• Fits sphere around cluster

of data, allowing errors {ξi}

• Extends in kernel space to more complex boundary

• Hybrid methods: multi-class classification

Development of system

Support Vector Domain Description

Dual formulation allows centre to be described in kernel space via weighting factors αi:

Development of system

Support Vector Domain Description

Values of αi form partition:• αi =0 inside• αi =1 outside• αi =C (support vectors)

Only support vectors determine size and position of sphere

Development of system

• Cannot use kernel methods directly with missing features

• Must impute (fill in) or assume probability distribution of missing values: pre-processing

• Missing features describe complex parametric curves in kernel space

• Seek a method which can address incomplete data directly: minimise point-to-line distances

Dealing with Missing Data

Development of system

• Chechik’s GM-SVM method provides analogue of binary SVM for structurally missing data

• Uses two loops of optimisation to replace instance-specific norms with scalings of full norm

• Questionable applicability to kernel spaces – difficult to choose proper scaling terms and ultimately equivalent to zero imputation

Dealing with Missing Data

Development of system

Synopsis for Novel System

Structurally missing features

Abstract, context-free

Domain description (one-class)

Avoid imputation / prob. models

Kernel Extension

Applicable to provided

data

Presentation Contents

• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions

Exact-centre method

• Seeks solution in input space only• Demonstrates concept of optimisation-based

distance metric

Exact-centre method

• Cannot sample from entire feature space!• Selects centre point a such that φ(a) is optimal

centre (hence solves a slightly different problem)• Tricky (but possible) to optimise for soft margins

Exact-centre method

• Always performs at least as well as imputation in linear space w.r.t. sphere volume

• Often underperforms in quadratic space (which is expected, as domain restricted)

Presentation Contents

• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions

Dual-optimisation method

• Motivated by desire to search over entire kernel feature space, to match imputation methods for non-trivial kernel maps

• Takes lead from dual formulation of SVDD where weighting factors αi are appended to dataset and implicitly describe centre a

Dual-optimisation method

• a must itself have full features, and therefore so must the “xi” in the sum

• Must therefore provide auxiliary dataset X* with full features to perform this computation

• Choice is largely arbitrary, but must span in FS• Weighting factors no longer “tied” to dataset

Dual-optimisation methodGiven an initial guess α:• Need to first produce full dataset Xa optimally

aligned to a, by optimisation over all possible imputations of incomplete dataset

• Then need to perform minimax optimisation step on vector of point-to-centre distances:

New candidate α at each optimisation step

Presentation Contents

• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions

Experimental data collection

Preparatory trials of datasets constructed to exhibit degree of “structural missingness”:

• 2-D cluster of data with censoring applied to all values |x| > 1

• Two disjoint clusters –in [f1,f2], and in [f3,f4]

• One common dimension and three other dimensions each common to one part of set

Synthetic Data

Experimental data collection

Structure of comparisons:

Synthetic Data

Linear KernelK(x,y)=<x,y>

Quadratic KernelK(x,y)=(1+<x,y>)2

Hard Margin(all within sphere)

Soft Margin(50% outwith

sphere)

Imputation with [zeros, feature

means, 3 nearest neighbours]

vs.

Our XC and DO methods

Experimental data collection

Structure of comparisons:

Synthetic Data

Linear KernelK(x,y)=<x,y>

Quadratic KernelK(x,y)=(1+<x,y>)2

Hard Margin(all within sphere)

Soft Margin(50% outwith

sphere)

• Dual-optimisation method on hard margins only

• Particle-Swarm Optimisation also used to provide cross-validated classification study

•Main study is into effect on sphere size

Experimental data collection

Four main features selected for analysis:• Compton edge position (6 features)• Area under graph up to Compton edge (6)• Mean multiplicity of neutron data (1)• Poisson fit on neutron data (9) and chi-

squared goodness-of-fit (3)Total 25 features

Feature Extraction

Experimental data collection

Feature Extraction

PCA used on groups of features with identical presence flags to reduce dataset to 10 principal components missingness pattern intact

Presentation Contents

• High-level motivation• Development of system• Exact-centre method• Dual-optimisation method• Experimental data collection• Conclusions

Conclusions

• Dual-Opt method generally equalled or surpassed imputation methods in hard margin cases; XC method, predictably, did not operate as well in quadratic case

• Unreasonably small spheres start appearing with a soft-margin classifier as datapoints with few features start holding too much weight

• Cross-validation study using a joint optimiser shows improvement with quadratic kernel

Conclusions

• Insight provided into the behaviour of a kernel method with missing data – not much literature deals with this issue

• Link exists with the Randomised Maximum Likelihood (RML) sampling technique

• Deliberate concentration for now on entirely uninformed methods; scope exists for incorporation of this information where known to improve efficiency

Conclusions

• Sphere size ≠ Overall classification accuracy (c.f. a delta-function Parzen window) but this is arguably not we set out to achieve

• Divergent remit – not a catch-all procedure for handling all types of data, but gives insight into how structural missingness can be analysed

Caveats

Conclusions

• Fuller exploration into PSJO technique to provide alternative to auxiliary dataset

• Heavily reliant on optimisation procedures: could make more efficient than nested loop

• Extension to popular radial-basis (RBF) kernel• A more concrete application to sponsor

dataset

Room for Improvement

Thank you for listening…

Figure 1.1 (a)