Pattern Recognition in EEG - UGent · Pattern Recognition in EEG Pieter-Jan Kindermans, UGent,...
Transcript of Pattern Recognition in EEG - UGent · Pattern Recognition in EEG Pieter-Jan Kindermans, UGent,...
Pattern Recognition in EEG
Pieter-Jan Kindermans, UGent, Department of Electronics and Information Systems (ELIS)
1
Who is familiar with machine learning?
2
Who is familiar with MATLAB?
3
Who knows how to program?
4
We are
Thibault Verhoeven, Pieter-Jan Kindermans
- Faculty of engineering and architecture
- Department of Electronics and Information Systems (ELIS)
- Reservoir Lab (a Machine learning group)
- PhD students
- Work on/related to Brain-Computer Interfaces
5
To illustrate basic machine learning principles
6
Outline
- Event-Related Potential classification (the task)
- Machine learning methods (the basic tools)
- Unsupervised classification in BCI (advanced tools)
- The hands on session (the work)
- Your own data?
7
Event-Related Potential classification (the task)
focus on ERPs in Brain-Computer Interfaces
8
Application: Brain-Computer Interfaces
9
Event-Related Potentials (Oddball paradigm)
10
Stimuli
0 0.2 0.4 0.6 0.8 1−0.1
−0.05
0
0.05
0.1
0.15
time (s)
P300NON−P300
ERP based BCI
11
General principle behind ERP based BCI
12
Stimulus 1
EEG/ Response
General principle behind ERP based BCI
13
Stimulus 1
EEG/ Response
2
General principle behind ERP based BCI
14
Stimulus 1
EEG/ Response
2 3
General principle behind ERP based BCI
15
Stimulus 1
EEG/ Response
2 3
1 iteration
General principle behind ERP based BCI
16
Stimulus 1
EEG/ Response
2 3 3 1 2
1 iteration
General principle behind ERP based BCI
17
Stimulus 1
EEG/ Response
2 3 3 1 2 3 21
1 iteration
General principle behind ERP based BCI
18
Stimulus 1
EEG/ Response
2 3 3 1 2 3 21
1 iteration
1 trial
General principle behind ERP based BCI
19
Stimulus 1
EEG/ Response
2 3 3 1 2 3 21
Attended stimulus?
1 iteration
1 trial
ERP variations
All these variations exhibit the same stimulus/iteration structure
- Visual speller
- Auditory (e.g. Amuse, PASS2D)
- Tactile
- ...
20
Example: auditory ERPs
21
A - supervised blocks
�100 0 100 200 300 400 500 600 700 800�2
0
2
[µV
]
tnt
[µV
]
�2
0
2
[µV
]
tnt
130 � 160 [ms] 180 � 240 [ms] 260 � 280 [ms] 300 � 350 [ms] 420 � 460 [ms]
�0.1
�0.05
0
0.05
0.1
( t ,
nt )
ssA
UC
�100 0 100 200 300 400 500 600 700 800�2
0
2
[µV
]
tnt
130 � 160 [ms] 180 � 240 [ms] 260 � 280 [ms] 300 � 350 [ms] 420 � 460 [ms]
�2
0
2
ssAUC
tnt
( t ,
nt )
ssA
UC
B - unsupervised blocks
Cz (thick)
F5 (thin)
time [ms] time [ms]
Many differences between subjects
22
Time after stimulus [ms]
100
faw
fcb
150 200 250 300 350 400 600500
GA
Unfortunately, the raw data looks like this
23
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40
−20
0
20
40
signaal
tijd
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40
−20
0
20
40
signaal
tijd
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40
−20
0
20
40
signaal
tijd
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40
−20
0
20
40
signaal
tijd
ERP Speller: The default approach
1. Record training data (quite boring)
2. Machine learning magic (supervised)
3. Use the BCI
24
Questions?
25
We will build a decoder to discriminate between target and non-target ERP responses
It is already implemented. If you get bored, you can extend the implementation such that it predicts the symbols as well.
26
Machine learning methods (the basic tools)
27
Machine learning rules
- Do not optimise the model on the data used for evaluation
28
Machine learning rules
- Do not optimise the model on the data used for evaluation
- Keep the model as simple as possible
29
Machine learning rules
- Do not optimise the model on the data used for evaluation
- Keep the model as simple as possible
- Use a proper cost function
30
Machine learning rules
- Do not optimise the model on the data used for evaluation
- Keep the model as simple as possible
- Use a proper cost function
- Do not directly interpret the classifier weights
31
Linear Discriminant Analysis
Pictures from Pattern Recognition and Machine Learning (C. Bishop)
32
Linear Discriminant Analysis
Pictures from Pattern Recognition and Machine Learning (C. Bishop)
33
Linear Discriminant Analysis
34
p(x|C1) =
1
(2⇡)D2
1
|⌃| 12exp(�1
2
(x� µ1)T⌃
�1(x� µ1))
p(x|C2) =
1
(2⇡)D2
1
|⌃| 12exp(�1
2
(x� µ2)T⌃
�1(x� µ2))
p(C1) = ⇡C1, 0 ⇡C1 1
p(C2) = 1� ⇡C1
Linear Discriminant Analysis
35
wx+ w0 > 0
w = ⌃
�1(µ1 � µ2)
w0 = �1
2
µ1T⌃
�1µ1 +
1
2
µ
T2 ⌃
�1µ2 + log
p(C1)
p(C2)
Linear Discriminant Analysis
36
wx+ w0 > 0
w = ⌃
�1(µ1 � µ2)
w0 = �1
2
µ1T⌃
�1µ1 +
1
2
µ
T2 ⌃
�1µ2 + log
p(C1)
p(C2)
−20 0 20
−20
−10
0
10
20
−20 0 20
−20
−10
0
10
20
−20 0 20
−20
−10
0
10
20
Linear Discriminant Analysis
37
wx+ w0 > 0
w = ⌃
�1(µ1 � µ2)
w0 = �1
2
µ1T⌃
�1µ1 +
1
2
µ
T2 ⌃
�1µ2 + log
p(C1)
p(C2)
−10 0 10
−10
−5
0
5
10
−10 0 10
−10
−5
0
5
10
−10 0 10
−10
−5
0
5
10
Overfitting and regularisation
38
model complexity complexity
Erro
r
Train errorTest errorOptimum
Regularisation for LDA
Estimating covariance matrices is difficult (especially for high dimensions) Shrinkage regularisation
!
Effect: the weight vector becomes equal to the difference between the class means:
39
w = ⌃̂�1(µ1 � µ2)
⌃̂ = ⌃+ �I
Training and testing
40
data
train validation
fold 1fold 2fold 3fold 4fold 5
test
Training and testing
41
data
train validation
fold 1fold 2fold 3fold 4fold 5
test
Crossvalidation
42
data
train test
fold 1fold 2fold 3fold 4fold 5
Nested crossvalidation
43
data
train test
fold 1fold 2fold 3fold 4fold 5
data
train validation
subfold 1subfold 2subfold 3subfold 4
For all the inner folds
The importance of multivariate interactions
44
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40
−20
0
20
40
signaal
tijd
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40
−20
0
20
40
signaal
tijd
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40
−20
0
20
40
signaal
tijd
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2−40
−20
0
20
40
signaal
tijd
0 0.2 0.4 0.6 0.8 1−0.1
−0.05
0
0.05
0.1
0.15
time (s)
P300NON−P300
The importance of multivariate interactions
45
−10 0 10
−10
−5
0
5
10
−10 0 10
−10
−5
0
5
10
−10 0 10
−10
−5
0
5
10
The importance of multivariate interactions
46
−10 0 10
−10
−5
0
5
10
−10 0 10
−10
−5
0
5
10
−10 0 10
−10
−5
0
5
10
0 0.2 0.4 0.6 0.8 1−0.1
−0.05
0
0.05
0.1
0.15
time (s)
P300NON−P300
The importance of multivariate interactions
47
�� �� � � ���
��
�
�
�� �� � � ���
��
�
�
������
�����
���������
��������� ������� ������������ ����� �������������
�������
���������
�����
�� � ��
��
���
����������������������������
�������������������������������
������������� ������������
��������
��
���� ���� �
�� ��
������� ���� ��
������� ���� ��
��������
�
���� ����
����� ����
�
��������
�
The importance of multivariate interactions
48
�� �� � � ���
��
�
�
�� �� � � ���
��
�
�
������
�����
���������
��������� ������� ������������ ����� �������������
�������
���������
�����
�� � ��
��
���
����������������������������
�������������������������������
������������� ������������
��������
��
���� ���� �
�� ��
������� ���� ��
������� ���� ��
��������
�
���� ����
����� ����
�
��������
�
The importance of multivariate interactions
49
�� �� � � ���
��
�
�
�� �� � � ���
��
�
�
������
�����
���������
��������� ������� ������������ ����� �������������
�������
���������
�����
�� � ��
��
���
����������������������������
�������������������������������
������������� ������������
��������
��
���� ���� �
�� ��
������� ���� ��
������� ���� ��
��������
�
���� ����
����� ����
�
��������
�
�� �� � � ���
��
�
�
�� �� � � ���
��
�
�
������
�����
���������
��������� ������� ������������ ����� �������������
�������
���������
�����
�� � ��
��
���
����������������������������
�������������������������������
������������� ������������
��������
��
���� ���� �
�� ��
������� ���� ��
������� ���� ��
��������
�
���� ����
����� ����
�
��������
�
The importance of multivariate interactions
50
�� �� � � ���
��
�
�
�� �� � � ���
��
�
�
������
�����
���������
��������� ������� ������������ ����� �������������
�������
���������
�����
�� � ��
��
���
����������������������������
�������������������������������
������������� ������������
��������
��
���� ���� �
�� ��
������� ���� ��
������� ���� ��
��������
�
���� ����
����� ����
�
��������
�
�� �� � � ���
��
�
�
�� �� � � ���
��
�
�
������
�����
���������
��������� ������� ������������ ����� �������������
�������
���������
�����
�� � ��
��
���
�� � ��
��
���
����������������������������
�������������������������������
������������� ������������
��������
��
���� ���� �
�� ��
������� ���� ��
������� ���� ��
��������
����� ����
����� ����
�
��������
�
�� �� � � ���
��
�
�
�� �� � � ���
��
�
�
������
�����
���������
��������� ������� ������������ ����� �������������
�������
���������
�����
�� � ��
��
���
����������������������������
�������������������������������
������������� ������������
��������
��
���� ���� �
�� ��
������� ���� ��
������� ���� ��
��������
�
���� ����
����� ����
�
��������
�
Error measures
Computing the accuracy is simple, just count how many examples you have classified correctly!
51
Error measures
Computing the accuracy is simple, just count how many examples you have classified correctly!
Yes, but …
What if the data is such that 99% of the samples are belonging to the non-target class. If I constantly predict non-target, this will be a good model.
52−20 0 20
−20
−10
0
10
20
−20 0 20
−20
−10
0
10
20
−20 0 20
−20
−10
0
10
20
Images: wikipedia
53
Error measures
True positive rate (or sensitivity, recall):
!
True negative rate (or specificity)
!
False positive rate
54
TPR =TP
P
FPR =FP
N
TNR =TN
N
Error measures: balanced accuracy
True positive rate (or sensitivity, recall):
!
True negative rate (or specificity)
!
!
Possible to combine TPR and TNR in a balanced accuracy by averaging.
55
TPR =TP
P
TNR =TN
N
Error measures: area under curve
56
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
false positive rate
true
posi
tive
rate
Questions?
57
The hands on session (the work)
58
Data
- Visual ERP data (6x6) matrix speller
- 1:5 ratio of target to non-targets
- 15 iterations
- 12 stimuli per iteration
- 64 channels at 240 Hz
59
Find the target samples!
60
Feedback
61