Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows...
-
Upload
chester-clarke -
Category
Documents
-
view
218 -
download
0
Transcript of Overview Overview A Quantum Computation Simulation Language Anomaly Detection in the Windows...
OverviewOverview
A Quantum Computation Simulation Language
Anomaly Detection in the Windows Registry
Detecting Splice Sites in Genes
Rotationally Invariant Face Detection
-HSK-HSKA Quantum Programming A Quantum Programming Language and CompilerLanguage and Compiler
Katherine Heller, Krysta Svore, Maryam Kamvar(Al Aho)
What is -HSK?What is -HSK?
Quantum Computation Simulation LanguageQuantum Computation Simulation Language
Quantum CompilerQuantum Compiler
Q-HSK enables simplified programming of Q-HSK enables simplified programming of quantum algorithms with built-in graphicsquantum algorithms with built-in graphics
Many Worlds InterpretationMany Worlds Interpretation
One formulation of quantum theoryOne formulation of quantum theory
Each universe has a corresponding Each universe has a corresponding amplitude (i.e. complex number)amplitude (i.e. complex number)
|amplitude||amplitude|22 = probability of existence = probability of existence
xu1
u2
u4
u3
QubitsQubits
Quantum analogue of a classical bitQuantum analogue of a classical bit Takes on values 0, 1, or superposition of states:Takes on values 0, 1, or superposition of states:
|| ωω›› = = αα || 00›› + + ββ || 11›› wherewhere | |αα||22 + + ||ββ||22 = 1 = 1
|| ωω›› = cos( = cos(θθ / 2) / 2) || 00›› + e + eiiφφ sin( sin(θθ / 2) / 2) || 11››
Quantum GatesQuantum Gates
Reversible – all unitary operators (UReversible – all unitary operators (U† † U=U=II))
Universal quantum gates – {U2,XOR}, ToffoliUniversal quantum gates – {U2,XOR}, Toffoli
Some common gates – Hadamard, QFT, CNOTSome common gates – Hadamard, QFT, CNOT
H H|| 11›› || 00››
1/√21/√2 ( | ( | 00›› ++ || 11››))
Key Features of Key Features of the Q-HSK Compilerthe Q-HSK Compiler
Familiar C-style syntaxFamiliar C-style syntax
Matrix operations via CBLASMatrix operations via CBLAS
ComplexComplex and and real real data typesdata types
A quantum type A quantum type qregqreg
A graphical view of quantum algorithmsA graphical view of quantum algorithms Lucid representation of quantum qubits, registers, and gatesLucid representation of quantum qubits, registers, and gates
Interactive user options (start, stop, pause, change Interactive user options (start, stop, pause, change animation rate)animation rate)
Detailed text output to trace algorithmDetailed text output to trace algorithm
A Simple ExampleA Simple Exampleint main( )int main( ){{
int a, i;int a, i;qreg *q;qreg *q;q=create(5);q=create(5);i = 0;i = 0;while (i < 5)while (i < 5)
{{q[i] = (0.0, 0.0);q[i] = (0.0, 0.0);i = i + 1;i = i + 1;
}}q = computeHadamard(q);q = computeHadamard(q);a = Measure(q);a = Measure(q);printf(“This is the measure: %d”, a);printf(“This is the measure: %d”, a);return 0;return 0;
}}
00000
q H M
Shor’s AlgorithmShor’s Algorithm
Factors large numbersFactors large numbers
n - number to factorizen - number to factorize
x – random numberx – random number
a – ranges from 0 to q-1a – ranges from 0 to q-1
nn22<=q<=2n<=q<=2n22
r – period of xr – period of xaa (mod n) – exp. classically (mod n) – exp. classically
one factor of n is gcd(xone factor of n is gcd(xr/2r/2-1,n) – fast classically-1,n) – fast classically
Graphical InterfaceGraphical Interface
Architecture of Q-HSK CompilerArchitecture of Q-HSK Compiler
Program.q Lexical Analyzer Syntax Analyzer Semantic Analyzer Translator
Program.cpp g++
Java
Graphics
Executable
lex.yy.c y.tab.c translate.c
javac
One Class Support Vector Machines One Class Support Vector Machines for Detecting Anomalous Windows for Detecting Anomalous Windows
Registry AccessesRegistry Accesses
Collaborators: Krysta Svore, Angelos Keromytis, Sal Stolfo
Host Based Intrusion Detection Host Based Intrusion Detection SystemsSystems
Microsoft Windows – most often attackedMicrosoft Windows – most often attacked
Current method to combat attacksCurrent method to combat attacks Virus Scanners and Security PatchesVirus Scanners and Security Patches
Problem: These do not combat unknown attacks Problem: These do not combat unknown attacks so frequent updates are neededso frequent updates are needed
Host based IDSHost based IDS Monitor system accesses to detect intrusionsMonitor system accesses to detect intrusions
Application of data mining techniques Application of data mining techniques
The Windows Registry and RADThe Windows Registry and RAD
Windows RegistryWindows Registry Stores configuration settings for system Stores configuration settings for system parameters – security information, programs, etc.parameters – security information, programs, etc. Programs query the registry for informationPrograms query the registry for information
Registry Anomaly DetectionRegistry Anomaly Detection audit sensoraudit sensor model generatormodel generator anomaly detectoranomaly detector
Process: EXPLORER.EXEQuery: OpenKey
Key: HKCR\CKSUD\{B41DB860-8EE4-11D2-9906-EA9FADC173CA}\shellex\MayChangeDefaultMenuResponse: SUCCESS
ResultValue: NOTFOUND
Probabilistic Anomaly Detection Probabilistic Anomaly Detection AlgorithmAlgorithm
Computes 25 consistency checks: Computes 25 consistency checks:
P(XP(Xii) and P(X) and P(Xii|X|Xjj))
Multinomial with Hierarchical PriorMultinomial with Hierarchical PriorFor observed elements i:For observed elements i:
P(X = i) = C*(NP(X = i) = C*(Nii + + αα)/(k)/(k00αα+N) +N)
where N - total number of observationswhere N - total number of observations
Ni - number of observations of symbol INi - number of observations of symbol I
αα – “pseudo count” for each observed symbol – “pseudo count” for each observed symbol
kk00 – number of observed symbols – number of observed symbols
L – number of possible symbolsL – number of possible symbols
For unobserved elements i:For unobserved elements i:
P(X = i) = (1-C)*1P(X = i) = (1-C)*1/(L-k/(L-k00))
C= N/(N+L-kC= N/(N+L-k00 ) )
One Class SVMsOne Class SVMs
Analogous to two class SVM where all data lies in the first class Analogous to two class SVM where all data lies in the first class and the origin is sole member of second classand the origin is sole member of second class
Solve optimization problem to find rule f with maximal marginSolve optimization problem to find rule f with maximal margin
f(f(xx)=)=‹‹ww,,xx›+b›+b
Equivalent to solving the dual quadratic programming problem:Equivalent to solving the dual quadratic programming problem:
minminαα (1/2) (1/2) ∑∑I,j I,j ααiiααjjK(xK(xii,x,xjj)) s.t. 0 s.t. 0≤≤ααii≤1/(≤1/(ννl) , ∑l) , ∑i i ααi i = 0= 0
Kernel function projects input vectors into a feature space allowing Kernel function projects input vectors into a feature space allowing for non-linear decision boundariesfor non-linear decision boundaries
ΦΦ: X → R: X → RN N K(xK(xii,x,xjj) = ) = ‹‹ΦΦ((xxii), ), ΦΦ(x(xjj)›)›
ExperimentsExperiments
Kernels:Kernels: Linear: K(x,y) = (xLinear: K(x,y) = (x·y)·y)
Polynomial: K(x,y) = (x·y+1)Polynomial: K(x,y) = (x·y+1)dd
Gaussian: K(x,y) = e Gaussian: K(x,y) = e -║x-y║-║x-y║22/(2/(2σσ22))
Feature Vectors:Feature Vectors: BinaryBinary
Frequency-basedFrequency-based
ResultsResults
Sequence Information for the Sequence Information for the Splicing of Human Pre-mRNA Splicing of Human Pre-mRNA Identified by Support Vector Identified by Support Vector
Machine ClassificationMachine Classification
Collaborators: Xiang Zhang, Ilana Hefter, Christina Leslie, Larry Chasin
What Is Splicing?What Is Splicing?
Exon1 Exon2Intron
Exon1 Exon2
Exon2Exon1
Donor Branch Acceptor
DNA
mRNA
Pseudo ExonsPseudo Exons
Consensus SequencesConsensus Sequences Donor Site: Donor Site: MAG|gtragt (M=A/C, r=a/g)
Acceptor Site: Acceptor Site: (y)10ncag|G (y=c/t, n=a/c/g/t)
Donor and acceptor sites scored based on Donor and acceptor sites scored based on closeness to consensuscloseness to consensus
Identifying Pseudo ExonsIdentifying Pseudo Exons Intronic segmentsIntronic segments
Have high scoring “donor” and “acceptor” sitesHave high scoring “donor” and “acceptor” sites
We look for discriminative signals in intronic We look for discriminative signals in intronic regions near real and pseudo exonsregions near real and pseudo exons
String KernelsString Kernels
Feature map: number of times each k-length Feature map: number of times each k-length (contiguous) string occurs in sequence(contiguous) string occurs in sequence
Dimension of feature space is NDimension of feature space is Nkk
Example:
k=2 Sequence = ACCTGGTG
1
AC
0
AA
0
AG
0
AT
0
CA
1
CC
0
CG
1
CT
0
GA
0
GC
1
GG
1
GT
0
TA
0
TC
2
TG
0
TT
Splice KernelsSplice Kernels
Hypothesis: False splice sites are Hypothesis: False splice sites are intrinsically defective due to bad internal nt intrinsically defective due to bad internal nt combinationscombinations
All possible size k internal nt combinations All possible size k internal nt combinations are featuresare features
Example (k=2): If the internal combination Example (k=2): If the internal combination (3g,5a) occurs, that feature value is 1, (3g,5a) occurs, that feature value is 1, otherwise it is 0otherwise it is 0
Recursive Feature SelectionRecursive Feature Selection
Normal vector to the hyperplane:Normal vector to the hyperplane:
ww==∑∑i=1..m i=1..m
yyiiααiixxii
If |wIf |wjj| large in absolute value, the jth feature is | large in absolute value, the jth feature is
important for SVM discriminationimportant for SVM discrimination
Approximation due to degree 2 polynomial Approximation due to degree 2 polynomial kernel – calculate wkernel – calculate wupup and w and wdowndown separately, then separately, then
eliminate bottom 50% of features for eacheliminate bottom 50% of features for each
Stop when ROC score drops below 90% of Stop when ROC score drops below 90% of original value on untouched test setoriginal value on untouched test set
ResultsResults
Flanks Splice SitesExon Body ROC Specificitya
US DS 3’ 5’
CVb 0.609 0.484
+ – – – – 0.791 0.638
– + – – – 0.784 0.618
+ + – – – 0.855 0.695
– – + – – 0.823 0.672
– – – + – 0.837 0.698
– – + + – 0.907 0.777
+ + + + – 0.932 0.825
– – – – + 0.946 0.841
+ + – – + 0.984 0.956
– – + + + 0.987 0.964
+ + + + + 0.991 0.976
Splice Sites
FlanksExon Bodies
True positives detected 32/37 35/37 37/37
- - - 1225 1225 1225
- + - 164 259 668
- - + 108 232 383
+ - + 58 111 180
+ + + 19 53 90
Rotationally Invariant Face Rotationally Invariant Face Detection Using Multi-Resolution Detection Using Multi-Resolution
HistogramsHistograms
Collaborators: Shikher Bisaria, Tony Jebara
Face DetectionFace Detection
Given a picture with faces, how do we Given a picture with faces, how do we determine where the faces are in the determine where the faces are in the image? Which pixels are face pixels?image? Which pixels are face pixels?
We would like to determine this with a We would like to determine this with a system that:system that:
Runs in real timeRuns in real time
Recognizes rotations of faces Recognizes rotations of faces
(e.g. when someone tilts their head to one side)(e.g. when someone tilts their head to one side)
Gaussian BlurringGaussian Blurring
Face images are greyscale (.pgms)Face images are greyscale (.pgms) Successive levels of blur are obtained by Successive levels of blur are obtained by reconvolving previous level of blur images with a reconvolving previous level of blur images with a 2 dimensional gaussian function 2 dimensional gaussian functionMathematically equivalent to two passes of a Mathematically equivalent to two passes of a one dimensional gaussian functionone dimensional gaussian functiong(i,j) = 1/(2g(i,j) = 1/(2πσπσ22) ∑) ∑mm∑∑nn e e -(m-(m22+n+n22)/(2)/(2σσ22)) · f(i-m,j-n)· f(i-m,j-n)
= = 1/(21/(2πσπσ22) ∑) ∑mm e e -m-m22/(2/(2σσ22)) · ∑· ∑nn e e -n-n22/(2/(2σσ22)) · f(i-m,j-n) · f(i-m,j-n)
Multi-Resolution HistogramsMulti-Resolution Histograms
Histogram equalize the imageHistogram equalize the image
Concatenate histograms of image together Concatenate histograms of image together after successive levels of gaussian blurringafter successive levels of gaussian blurring
Average HistogramsAverage Histograms
Compute average face and non-face Compute average face and non-face multi-resolution histograms from training setmulti-resolution histograms from training set
Average Non-Face HistogramAverage Non-Face Histogram Average Face Average Face HistogramHistogram
Optimization ProblemOptimization Problem
C(C(αα) = min) = minαα ║║HHFAVGFAVG – h – hFF║║22 + + ║║HHNFAVGNFAVG – – hhNFNF║║22
Where Where hhF F = (1/= (1/∑∑i i ααii) ∑) ∑ii ααiihhii
hhNF NF = (1/= (1/∑∑i i (1- (1- ααii)) ∑)) ∑ii (1- (1-ααii)h)hii
such that 0≤ such that 0≤ ααii ≤ 1 , ∑ ≤ 1 , ∑i i ααi i = 1= 1
Let Let ββii = (1- = (1- ααii) ) Q = ‹hQ = ‹hii,h,hjj› ›
ccαα = ‹h = ‹hii,,HHFAVGFAVG› · constant› · constant ccββ = ‹h = ‹hii,,HHNFAVGNFAVG› · constant› · constant
= min= minαα,,ββ ααTTQQαα + 1/(N-1) + 1/(N-1)2 2 ββTTQQββ – 2c – 2cαα
TTαα – 2/(N-1)c – 2/(N-1)cββTTββ
Solve Using SMOSolve Using SMO
ααiiNEWNEW = [ 1/(N-1) = [ 1/(N-1)2 2 QQii ii - 1/(N-1)- 1/(N-1)2 2 ∑∑k≠i,jk≠i,jααkk Q Qjjjj + (1- ∑ + (1- ∑
k≠i,jk≠i,jααkk ) Q ) Qjjjj
- (1- ∑- (1- ∑k≠i,jk≠i,jααkk ) Q ) Qij ij + 1/(N-1)+ 1/(N-1)2 2 ∑∑k≠i,jk≠i,jααkk Q Qij ij - 1/(N-1)- 1/(N-1)2 2 QQijij - - ccααii
+ c+ cββii + c + cααjj
- c - cββjj + ∑ + ∑
k≠i,jk≠i,j((ααkk Q Qikik) - ∑) - ∑k≠i,jk≠i,j((ααkk Q Qjkjk) )
- 1/(N-1)- 1/(N-1)2 2 ∑∑k≠i,jk≠i,j((ααkk Q Qikik) + 1/(N-1)) + 1/(N-1)2 2 ∑∑k≠i,jk≠i,j((ααkk Q Qjkjk)] / [Q)] / [Qii ii + Q+ Qjjjj
- 2Q- 2Qij ij + 1/(N-1)+ 1/(N-1)2 2 QQii ii + 1/(N-1)+ 1/(N-1)2 2 QQjj jj - 2/(N-1)- 2/(N-1)2 2 QQijij] ]
Bounds for Bounds for ααiiNEWNEW : :
L = 0L = 0
H = 1 - ∑H = 1 - ∑k≠i,jk≠i,jααkk
ααjjNEW NEW = (1 - ∑= (1 - ∑
k≠i,jk≠i,jααkk ) - ) - ααiiNEWNEW
ResultsResults