A genetic programming approach to the evolution of brain–computer interfaces for 2-D...
Transcript of A genetic programming approach to the evolution of brain–computer interfaces for 2-D...
A genetic programming approach to the evolutionof brain–computer interfaces for 2-Dmouse–pointer control
Riccardo Poli • Mathew Salvaris • Caterina Cinel
Received: 11 July 2011 / Revised: 13 March 2012 / Published online: 1 May 2012
� Springer Science+Business Media, LLC 2012
Abstract We propose the use of genetic programming (GP) as a means to evolve
brain–computer interfaces for mouse control. Our objective is to synthesise com-
plete systems, which analyse electrical brain signals and directly transform them
into pointer movements, almost from scratch, the only input provided by us in the
process being the set of visual stimuli to be used to generate recognisable brain
activity. Experimental results with our GP approach are very promising and com-
pare favourably with those produced by support vector machines.
Keywords Genetic programming � Brain–computer interfaces � Mouse �Support-vector machines
1 Introduction
The keyboard is a primitive device: a plastic case with an array of electro-
mechanical switches on it. The mouse is also primitive being a simple transducer of
musculo-skeletal motion. Wouldn’t it be nice to 1 day be able to dispose of them?
This is precisely the objective of brain–computer interfaces (BCIs).
Over the past few years an increasing number of studies (e.g., [1–5]) have shown
that BCIs are possible. These convert signals generated from the brain into control
commands for devices such as computers, wheel chairs or prostheses. As illustrated
R. Poli (&) � M. Salvaris � C. Cinel
School of Computer Science and Electronic Engineering, University of Essex,
Colchester CO4 3SQ, UK
e-mail: [email protected]
M. Salvaris
e-mail: [email protected]
C. Cinel
e-mail: [email protected]
123
Genet Program Evolvable Mach (2012) 13:377–405
DOI 10.1007/s10710-012-9161-x
in Fig. 1, such systems are often based on the analysis of brain electrical activity
obtained via electroencephalography (EEG)—the recording of the electrical activity
generated by the brain via electrodes on the scalp.
Within EEG-based BCIs, the analysis of P300 waves and other event related
potentials (ERPs) has been particularly effective [1, 6–8]. ERPs are electrical
components of brain activity produced in response to external stimuli. The P300 is a
positive ERP which occurs around 300 ms after the presentation of such stimuli
(hence its name) which is normally generated by the brain when an observer attends
to rare and/or significant stimuli, although we have recently shown that it can also
be generated in other conditions [9]. P300 waves can be used to determine user
intentions in BCIs by relating their presence (or an increase in their amplitude) to
specific external stimuli.
Given the point-and-click nature of most modern user interfaces, an important
application of BCI is controlling 2-D pointer movements. The work presented in
this paper is part of an interdisciplinary research project aimed at developing a
P300-based BCI mouse capable of providing full hands-free 2-D motion control.
Specifically, here we report the results of our efforts to use genetic programming
(GP) [10, 11] as a means to evolve complete and effective BCI mouse controllers.1
The paper is organised as follows. Sect. 2 provides some background of previous
research aimed at developing a BCI mouse. In Sect. 3 we describe the stimuli,
procedure, participants and signal analysis used in our own BCI mouse. Sect. 4
describes the GP system used, its primitives, parameter settings and fitness function.
In Sect. 5 we report our experimental results. We discuss the issue of generalisation
and show evidence that the evolved solutions generalise well in Sect. 6, while we
provide some conclusions in Sect. 7.
2 Previous attempts to develop a BCI mouse
Over the years, there have been some attempts to develop BCI systems aimed at
controlling 2-D pointer movements. Naturally, there are alternative technologies,
such as eye trackers, that can also achieve hands-free pointer control via muscular
EEG
DISPLAY
Fig. 1 A high-level schematic of a brain–computer interface
1 Some of this work has appeared in a much shorter paper published in proceedings of EuroGP 2011 [24].
378 Genet Program Evolvable Mach (2012) 13:377–405
123
transducers. However, these are not available to people who suffer from severe
musculo-skeletal disorders or locked-in syndrome.
The most successful BCI approaches for 2-D pointer control, to date, are those
based on frequency analysis and the detection of l (8–13 Hz) and b (12–30 Hz)
rhythms in EEG [5] and those using cortical electrode arrays (e.g., [12]), i.e.,
electrodes directly placed on the surface of the brain. Both, however, have serious
drawbacks. The former require subjects to spend weeks or months training their
brain to produce and control l and b rhythms. The latter are massively invasive,
requiring surgery for implantation, and can only stay in place for relatively short
periods of time before the electrodes get covered with scar tissue, which renders
them useless. For these reasons they have only been tested in monkeys and also in
human subjects with severe epilepsy (who required the implantation of an electrode
array for treatment).
Much less troublesome are systems based on P300 waves since they require
neither user training nor invasive procedures. Some initial success in developing a
P300-based mouse was reported in [6]. As illustrated in Fig. 2, in that system four
letters were presented near the centre of the screen, labelled as ‘‘N’’, ‘‘S’’, ‘‘E’’ and
‘‘W’’, to represent corresponding directions of motion for the cursor. The letters
were momentarily replaced by an asterisk, in random order. Subjects focused their
attention of one of the letters. When the target letter was replaced by the asterisk, a
pattern of ERPs, including a P300, would be generated. No P300 would be
generated when the other letters were replaced by the asterisk. So, in principle, by
detecting P300 waves, the computer could work out the desired direction of motion.
Target detection was based on the simple method of measuring the amplitude of the
ERPs elicited by the ‘‘asterisking’’ of the four letters (in the time window typically
associated with a P300) and taking the letter with maximum amplitude as the target.
Unfortunately, the system had a slow speed—a cursor movement every 4 s—and
low accuracy in detecting P300s (only about 50 %) leading to many incorrect
movements. Also, pointer motion was in discrete steps, which makes it rather
unnatural.
A first angle of attack on this problem based on machine learning techniques was
reported in [7]. In this system, four arrows were positioned at the margins of the
display (see Fig. 3), to represent the possible directions of motion of a mouse cursor.
These were momentarily highlighted (or flashed) in random order. Users had to
Fig. 2 Schematic of the P300-based BCI mouse developed by Polikoff et al. [6]
Genet Program Evolvable Mach (2012) 13:377–405 379
123
focus on one of the arrows (the target), so that when that arrow was flashed their
brain would generate a P300 wave. In practice, discriminating P300 trials from non-
P300 trials is extremely difficult due to the enormous amount of noise and muscular
artifacts present in EEG. To tackle this, Beverina et al. used independent component
analysis (ICA) [13, 14] as a feature-extraction stage and then applied a support-
vector machine (SVM) [15] for classification. The system used rather long intervals
between the end of a stimulus and the beginning of the following one, to allow for
the complete ERP to be recorded without overlaps. This resulted in the pointer
moving at an excruciatingly slow rate: one movement every 10 s. Also, pointer
motion was in discrete steps, which, again, makes it rather unnatural.
A more responsive P300-based system for the 2-D control of a cursor on a
computer screen was presented in [8, 16]. In this system (see Fig. 4) four randomly-
flashing squares were displayed on the screen to represent four directions of
movement. Users focused their attention on the flashes of the square representing
the desired direction of motion for the cursor. Again flashing of the attended square
produced ERPs—including P300 components—which were larger in amplitude
compared to ERPs produced by the flashing of non-attended squares. To help with
focusing attention on the target stimuli, subjects were given the task of mentally
counting the target flashes.
While this setup is not very different from those reviewed above, the system
presented three important and distinguishing features. Firstly, it used a fast stimulus
flashing rate which allowed the mouse to perform one movement per second.
Secondly, the system heavily relied on a genetic algorithm for the selection of the
Fig. 3 Schematic of the P300-based BCI mouse developed by Beverina et al. [7]
Fig. 4 Schematic of the P300-based analogue BCI mouse developed in [8, 16]
380 Genet Program Evolvable Mach (2012) 13:377–405
123
best EEG channels, time steps and wavelet scales to use as features in the control of
the mouse and for the identification of the parameters of an ERP scoring function
that used them. Thirdly, the system directly used the scores produced by the scoring
function to move the mouse, without ever having to make a P300/non-P300 decision
(e.g., by thresholding). This was simply achieved setting
Dx ¼ most recent RIGHT score�most recent LEFT score
Dy ¼ most recent UP score�most recent DOWN score
where Dx and Dy are the displacements to be applied to the mouse cursor to move it.
So, this system completely dispensed with the problem of detecting P300s (a
notoriously difficult task) by logically behaving as an analogue device, i.e., a device
where the output is a continuous function of the input, as opposed to a binary
classifier. The use of an analogue approach provided the system with more infor-
mation about the brain state, which, in turn, made it a more accurate, gradual and
controllable mouse.
We will use similar ingredients in the work presented here. We should note,
however, that while this system was a substantial technical improvement over
previous systems, it still was not accurate enough for reliable mouse control, as
illustrated in Fig. 5. It was also quite sensitive to artifacts.
A variety of alternatives to this scheme were explored in [17] where 8 different
stimuli (4 for up, down, left and right, and 4 for the 45� diagonal directions) were
Fig. 5 Typical trajectories produced by the analogue BCI mouse described in [8, 16]. Each line stylerepresents a different subject. Each line represents 15 typical consecutive trials (in the validation set) for asubject. For each trajectory we also show the values of three quality measures for a trajectory: w1 assessesthe extension of the motion in the target direction, w2 assesses the average extension of the motionorthogonal to the desired direction at the end of the 15 trials, w3 evaluates the average absolute errorderiving from motion orthogonal to the target direction thereby assessing the extent to which thetrajectory towards the target is convoluted
Genet Program Evolvable Mach (2012) 13:377–405 381
123
used. The best stimulation scheme found is illustrated in Fig. 6. Again subjects
were tasked with counting the number of flashes of target stimuli. A linear SVM
was trained to classify the ERPs produced in response to stimuli into two classes:
target and non-target. Note that the SVM is a binary classifier, where a raw
(continuous) output is converted into binary decisions via thresholding. In [17],
after training, the SVM raw output was used as a score for ERPs: the higher the
output, the more likely an ERP was generated by a target stimulus. The SVM’s
score for each ERP was then turned into a vector pointing in the direction of the
stimulus and with a length proportional to the score. This could be used directly to
determine the pointer’s motion. However, to smooth trajectories these micro-
movements were integrated together using the following analogue integration
strategy:
Dx ¼X
a
(most recent score in direction a)� sin a
Dy ¼X
a
(most recent score in direction a)� cos að1Þ
where a 2 0; p4; p
2; 3
4p; p; 5
4p; 3
2p; 7
4p
� �:
At the fastest rate of stimulus presentation, which also provided the best
information transfer rate [18], this system performed one pointer movement every
100 ms, which is a significant improvement over the previous systems. Also, as in
[8, 16], motion was analogue, and not stepwise.
Despite these successes, in the system just described the trajectories of the BCI-
mouse pointer tended to be quite convoluted and indirect. This is mainly because of
the noise present in EEG signals (which is often bigger than the signal itself), the
presence of eye-blinks and other muscle contractions (which produce artifacts up to
two orders of magnitude bigger than the signal) and the difficulty of keeping a user’s
mind focused on the target stimulus. Eye blinks are particularly problematic in an
analogue system, such as the one described above, because they can lead to very
large artifactual pointer movements that then require time and patience on the part
of the user to be corrected. These are particularly malignant when the integration
strategy in Eq. (1) is used, since an incorrect displacement vector remains in the
calculation until a new score obtained for the same direction replaces it, which can
take up to 15 time steps (and corresponding pointer movements). We show a
Fig. 6 Schematic of the best P300-based protocol found in [17]
382 Genet Program Evolvable Mach (2012) 13:377–405
123
pictorial representation of the type of motion an eye blink can cause in Fig. 7. What
is worse is that in normal conditions we blink 10–20 times per minute, although this
rate is typically reduced somewhat when using a computer. So, the problem of
artifacts is a very real and serious one.
3 Our BCI mouse
Notwithstanding the difficulties highlighted in the previous section, the success of
an analogue approach to BCI mouse design and the benefits accrued in it through the
use a machine learning techniques suggest that there is more mileage in this
approach. In particular, we should note that, although both in [8, 16] and [17]
feature selection was performed using powerful algorithms, only semi-linear
transformations were used to transform EEG signals into mouse movements. Linear
systems have obvious limitations, particularly in relation to noise and artifact
rejection. So, we wondered if GP, with its ability to explore the huge space of
computer programs, could produce even more powerful transformations while
performing feature selection and artifact handling at the same time.
In the rest of this section we will describe the stimuli, procedure, participants and
signal processing used in a BCI mouse where we put this idea to the test, while in
Sect. 4 we will provide details on the GP system used.
Fig. 7 Typical effect of an eyeblink on the trajectory of themouse in the system describedin [17]
Fig. 8 Stimulation protocol used in this paper
Genet Program Evolvable Mach (2012) 13:377–405 383
123
Our new BCI mouse uses a flashing-stimuli protocol with some similarities to the
P300-based BCI mice described in the previous section. More specifically, we used
visual displays showing 8 circles (with a diameter of 1.5 cm) arranged around a
circle at the centre of the display as shown in Fig. 8. We used a black background.
Each circle represented a direction of movement for the mouse cursor. Circles
temporarily changed colour—from grey to either red or green—for a fraction of a
second. We will call this a flash. Stimuli flashed sequentially in clockwise order, as
shown in Fig. 8. The aim was to obtain mouse control by mentally focusing on the
flashes of the stimulus representing the desired direction. Flashes lasted for 100 ms
and the inter-stimulus interval was 0 ms. This meant that the interval between two
target events for the protocol was 800 ms.
Data from 2 healthy participants aged 29 (male, normal eyesight) and 40
(female, corrected to normal eyesight) were used for GP training. These subjects
were chosen because their ERPs present significant morphological differences
and are affected by different types of artifacts. Each session was divided into
sections, which we will call direction runs. All of the data acquired in a
direction run were associated with the same target direction. Each participant
carried out 16 direction runs; this resulted in the 8 possible directions being
carried out twice.
Each direction run started with a blank screen and after 2 s the eight circles
appeared near the centre of the screen. A red arrow then appeared for 1 s pointing to
the target (representing the direction for that epoch). Subjects were instructed to
mentally name the colour of the flashes for that target.2 After 2 s the flashing of the
stimuli started. This stopped after a random number of trials, from 20 to 24, with a
trial consisting of the sequential activation of each of the 8 circles. In other words,
each direction run involved between 20 9 8 = 160 and 24 9 8 = 192 flashes.
After the direction epoch had been completed, subjects were requested to verbally
communicate the colour of the last flash of the target, as a sanity check to ensure
they were indeed focused on the task.
Participants were seated comfortably at approximately 80 cm from an LCD
screen. Data were collected from 64 electrode sites using a BioSemi ActiveTwo
EEG system. The EEG channels were referenced to the mean of the electrodes
placed on either earlobe. The data were initially sampled at 2,048 samples per
second. We then band-pass filtered the data between 0.15 and 30 Hz, and sub-
sampled it to 32 samples per second. Following each flash, an 800 ms window of
the signal in each channel was extracted. We will call it an epoch. Epochs contained
26 data points per channel. Thus each epoch was described by a feature vector of
26 9 64 = 1,664 elements.
Our training set for GP included approximately 2,800 such feature vectors per
subject (16 direction runs per subject, 160–192 epochs per direction run).
2 While focusing attention on the target stimulus is a requirement, the amplitude of the P300 is
significantly enhanced if, additionally, a task is assigned to the occurrence of such a stimulus [25] as long
as task demands are still within a subject’s capabilities. In previous experiments we found that naming
target colours produced the strongest P300s.
384 Genet Program Evolvable Mach (2012) 13:377–405
123
4 GP system and parameter settings
We used a strongly-typed GP system [19] implemented in Python with all numerical
calculations done very efficiently using the Numpy library (which is implemented in
C). Since fitness evaluation in our domain of application is extremely computa-
tionally intensive, we created a parallel implementation which performs fitness
evaluations across multiple CPU cores (via farming).
The system used a steady-state update policy. It evolved a population of 10,000
individuals with tournament selection with a tournament size of 5, a strongly-typed
version of the Grow initialisation method with a maximum initial depth of 4, and
strongly-typed versions of sub-tree crossover and sub-tree mutation. Both were
applied with a 50 % rate and used a uniform selection of crossover/mutation points.
The system used the primitive set shown in Table 1. Program trees were required to
have a Float return type.
With this setup we performed runs of up to 50 generations, manually stopping
them whenever we felt they were unlikely to make further significant progress.
Because of the extreme computational load required by our fitness evaluation and
the complexity of the problem (which forced us to use a relatively large population),
here we only report the results of one run per subject. (Naturally, we performed a
number of smaller runs while developing the fitness function.) We feel this is
reasonable since we are really interested in the output produced by GP—as is the
case in many practical applications of GP— rather than in optimising the process
leading to such output. Each run took approximately 40 CPU days to complete.
Let us now turn to the fitness function we used to guide evolution. Fitness was the
dissimilarity between the ideal trajectory and the actual trajectory produced by a
program averaged over the 16 direction runs. Measuring this required executing
each program for over 2,800 times. Being an error measure, fitness was, naturally,
minimised in our system. We describe its elements below.
The actual trajectory produced by a program in a direction run was obtained by
iteratively evaluating the program, each time feeding the samples of a new epoch
into the Fp1, AF7, etc. terminals (which effectively acted as a sliding window on
the EEG). The output of the program, which, as noted above, was of type Float,
was multiplied by a unit vector representing the direction corresponding to the
stimulus that flashed on the screen. This produced a result of the form ðDx;DyÞwhich was used as a displacement to be applied to the current mouse position. The
process is illustrated in Fig. 9. Note that exactly the same process was used to
convert the values output by the SVM into mouse movements.
The ideal trajectory for each direction run was obtained by sampling at regular
intervals the line segment connecting the origin to a point along the desired
direction, as illustrated in Fig. 10. The point was chosen by projecting the end-point
of the trajectory produced by a linear SVM classifier trained on the same data as GP
in the same way as in [17].3 In this way, when comparing the results obtained by GP
3 As in [17] the SVM classifier was implemented as an ensemble of 6 linear SVMs, with each SVM
trained on a subset of the collected data across all the channels. All channels were used to train the SVMs
since it has been shown that channel selection, when used in conjunction with SVMs, only marginally
improves results [26].
Genet Program Evolvable Mach (2012) 13:377–405 385
123
Table 1 Primitive set used in our application
Primitive Output
type
Input type(s) Functionality
0.5, -0.5, 0.1, -0.1,0, 1,…,25
Float None Floating point constants used for
numeric calculations and as array
indexes (see below)
Fp1, AF7, AF3, F1, F3,F5, F7, FT7, FC5, FC3,FC1, C1, C3, C5, T7,TP7, CP5, CP3, CP1, P1,P3, P5, P7, P9, PO7,PO3, O1, Iz, Oz, POz,Pz, CPz, Fpz, Fp2, AF8,AF4, AFz, Fz, F2, F4,F6, F8, FT8, FC6, FC4,FC2, FCz, Cz, C2, C4,C6, T8, TP8, CP6, CP4,CP2, P2, P4, P6, P8,P10, PO8, PO4, O2
Array None Returns an array of 26 samples from
one of the channels in an epoch.
The samples are of type Float.
Channels are named after
corresponding brain regions—F for
frontal, P for parietal, T for
temporal and O for occipital—as
shown below, C representing the
central line between the parietal
and frontal regions. Two-letter
channels are in intermediate
position between two areas
?, -, *, min, max Float (Float, Float) Standard arithmetic operations plus
maximum and minimum on floats
[, \ Bool (Float, Float) Standard relational operations on
floats
if Float (Bool, Float,
Float)
If-then-else function. If the first
argument evaluates to True, then
the result of evaluating its second
argument is returned. Otherwise the
result of evaluating the third
argument is returned
abs Float Float Returns the absolute value of a Float
mean, median, std,Amin, Amax
Float (Float, Float,
Array)
Given a 26-sample Array and two
floats, treat the floats as indices for
the array by casting them to integer
via truncation and then applying a
modulus 26 operation (if the
indices are identical, one is
incremented by 1). Then compute
the mean, median, standard
deviation, minimum or maximum,
respectively, of the samples in the
Array falling between such indices
(inclusive)
386 Genet Program Evolvable Mach (2012) 13:377–405
123
to those produced by an SVM, we ensured that both had the same ideal trajectories.
The ideal trajectory was sampled in such a way as to have the same number of
samples as the actual trajectory. The comparison between actual and ideal trajectory
was then a matter of measuring the Euclidean distance between pairs of
corresponding points in the two trajectories and taking an average. Notice that
Fig. 9 Illustration of the conversion of the outputs produced by a GP program (or by an SVM) intomouse movements. At t = 0 ms the target is flashed. So, there is a P300 peaking at approximately 300 msfrom the beginning of the epoch. As a result GP (or the SVM) produces a high value and, so, there is along movement in target direction. At time t = 100 ms a non-target is flashed. This produces a low valueand, so, there is a small movement at ?45� from the target. At time t = 200 ms a non-target is flashedwhich produces a low value and a small movement at ?90� from the target. And so on. Note how thetrajectory, although loopy, is generally in the correct direction
Genet Program Evolvable Mach (2012) 13:377–405 387
123
any detours from the ideal line and any slow-downs in the march along it in the
actual trajectory were strongly penalised with our fitness measure.
5 Experimental results
5.1 Fitness plots, size plots, and best programs
Let us start with some standard statistics for fitness and size. Figures 11 and 12
show the dynamics of the median and best program’s fitness and size in our runs,
respectively. As one can see, in both subjects the median fitness, i.e., the typical
distance of trajectories from the ideal trajectory, started off very badly (note, fitness
scales are logarithmic) with values of around 1,000. Since ideal trajectories are
straight and have a typical length of between 400 and 600 units, this implies that on
average initial programs produced trajectories that were extremely convoluted.
Initial best fitnesses, however, were of the order of 100; so they did show some
degree of motion in the correct direction. For both subjects the median fitness then
rapidly approached the best fitness and they both continued to improve throughout
the runs. For approximately 30 generations for subject 1 and 15–20 generations for
subject 2 after the initial transient, fitness improvements followed a negative
exponential (with a fixed time constant). They then slowed down considerably.
Indeed, this lack of improvement is the reason why we stopped the run for subject 2
after 35 generations. Turning to size, we see that after an initial phase of around 10
generations where programs grew modestly, median and best sizes increased more
rapidly, revealing the presence of bloat.
The best evolved programs, expressed as S-expressions, are as follows (note that
we could not indent them because of their very deep structure):
Ideal Trajectory
GPTrajectory
SVMTrajectory
Fig. 10 Ideal and actual trajectories used in the fitness calculation. Dashed lines indicate pairs ofmatching points. Fitness is the average distance between such points across 16 trajectories. The end pointof the ideal trajectory is computed by projecting the end point of the trajectory produced by an SVM-based flash scorer
388 Genet Program Evolvable Mach (2012) 13:377–405
123
The programs are also presented in simplified tree form in Fig. 13 (the shading in
the figure will be explained in Sect. 5.4). The programs were first syntactically
simplified replacing expressions of the form (if X ExpA ExpB) with ExpA or ExpB
when X evaluates to the constant True or False, respectively. This happens if
relational operators have both arguments evaluating to a constant, e.g., ([ 0.5
0.0). Then, starting from the leaves, we replaced sub-trees with their median
output if the replacement influenced fitness by less than a pre-determined
acceptance threshold. We used a per-replacement acceptance threshold of 2 % in
Fig. 13. The simplified versions of the best programs used only 11 out of the 64
electrodes for subject 1 and 8 electrodes for subject 2. (The unsimplified versions
subject 2subject 1
Fig. 11 Plots of the median and best fitness versus generation number in our runs. Note that the run forsubject 2 was stopped after 35 generations
Genet Program Evolvable Mach (2012) 13:377–405 389
123
used 19 electrodes for subject 1 and 21 for subject 2.) This is an advantage over the
SVM, which instead requires all electrodes to be in place.
To evaluate the performance of our best programs we then compared their
outputs (i.e., mouse pointer trajectories) to the outputs produced by the SVM on the
same ERP data (see Sect. 4 for more details on the SVM used). We should note that
the SVM training was orders of magnitude faster than evolving control programs via
GP, taking approximately 1 min per subject.
5.2 Qualitative and statistical analysis of trajectories
Let us start with a qualitative analysis. Figure 14 (top) shows the output produced
by the SVM for each of the direction runs in the training set of each subject (after
the transformation of the SVM scores into ðDx;DyÞ displacements). Ignoring the
small-scales wiggles in the trajectories (which are due to the periodicity of the
flashing stimuli), we see that the SVMs do a reasonably good job at producing
straight trajectories. This feature is the result of the SVM being trained to respond
only when the target stimulus is flashed, and to produce much smaller outputs for all
other (seven) stimuli. A close inspection of the trajectories, however, reveals that
while they are often straight, frequently they do not point exactly in the desired
direction. For example, one of the two trajectories labelled as (-0.7, 0.7) in subject
1 (drawn with a thicker line in Fig. 14 top left) points almost directly upwards
instead of pointing towards the top-left corner of the plot. Also, one of the two
trajectories labelled as (0.7, -0.7) in subject 2 (lower thicker line in Fig. 14 top
right) points almost directly downwards instead of pointing towards the bottom-
right corner of the plot. Finally, one of the two trajectories labelled as (1.0, 0.0) in
subject 2 points (upper thicker line in Fig. 14 top right) towards the top-right corner
of the plot instead of pointing to the right. These biases are the result of the SVM
being a binary classifier. So, while its training algorithm minimises classification
errors, it does not control finely the values of the raw outputs (before thresholding).
subject 2subject 1
Fig. 12 Plots of the median size and the size of the best individual in the population versus generationnumber in our runs
390 Genet Program Evolvable Mach (2012) 13:377–405
123
Fig. 13 Tree-based representation of the best programs evolved in our runs after syntactic and semanticsimplification (see text) with an acceptance threshold of 2 % and the approximate interpretation of thefunctionality of their main sub-trees
Genet Program Evolvable Mach (2012) 13:377–405 391
123
As a result SVMs cannot consider an important fact: in order for overall trajectories
to point in the correct direction, the reduced (but often non-zero) output produced in
the presence of flashes of stimuli that are one or two positions before and one or two
positions after a target stimulus must be symmetric (at least on average). Overall,
because of this phenomenon, the SVM produced trajectories which show
insufficient clustering towards the 8 prescribed directions of motion.
Figure 14 (bottom) shows the corresponding trajectories produced by our best
evolved programs. Qualitatively it is clear that these trajectories are more
convoluted. This is to be expected since, unlike SVM, GP did not try to classify
flashes as targets or non-targets. So, there was no explicit bias towards suppressing
the output in the presence of non-target flashes. There was, however, a strong
pressure in the fitness measure towards ensuring that there was overall motion in the
target direction. Also, there was pressure to ensure that the outputs produced for
non-target flashes were such that they either cancelled out or added up in such a way
SVM subject2subject1
GP
Fig. 14 Graphical representation of the 16 sequences of SVM raw scores (top) and the evolved-programraw scores (bottom) for each of our two subjects. The pairs of numbers labelling trajectory endpoints areunit vectors indicating the desired direction of motion. See text for an explanation of thicker lines
392 Genet Program Evolvable Mach (2012) 13:377–405
123
to contribute to the motion in the target direction, with minimum deviations from
the ideal trajectory. As a result, the trajectories produced by the best programs
evolved by GP were quite close to the ideal line in each of the prescribed directions
of motion. Note, for example, how close the end of the trajectories are to the
corresponding target directions.
To quantitatively verify these observations, we performed statistical comparisons
between the trajectories produced by GP and those produced by SVM.
Let us start with Table 2, which shows the mean, median, standard deviation and
standard error of the mean of the distances between the ideal trajectory and the
actual trajectory recorded in each of the 16 direction runs of each subject. From an
analysis of these data we infer that the best evolved program for subject 1 is, on
average, at least as good as the corresponding SVM, while the best evolved program
for subject 2 is definitely superior to the SVM. Note that for both subjects the
trajectories produced by GP have a much lower standard deviation than those
produced by SVMs. This is an indication that the evolved programs show a much
more consistent behaviour and are likely less sensitive to artifacts than SVMs.
Since for each direction run we have both the trajectory produced by the
SVM and the trajectory produced by an evolved program, we can apply paired-
data statistics to evaluate whether the results obtained by the evolved controllers
are superior to those obtained by the SVM. To test this, we applied the one-
sided Wilcoxon signed rank test for paired data to the 16 trajectories of each
subject. Table 3 reports the p-values obtained. These confirm that the evolved
programs produce trajectories that are significantly better than those produced by
SVM for subject 2 and are approximately on par with SVM in the case of
subject 1.
Table 2 Statistical comparison between SVM and evolved solutions: basic statistics of the distribution
of distances between ideal and actual mouse trajectories
Subject Program Mean Median Standard deviation Standard error
1 GP 50.813 44.367 17.440 4.503
SVM 52.638 43.709 30.550 7.888
2 GP 38.424 37.797 10.030 2.590
SVM 69.766 59.997 34.518 8.913
Boldface indicates superior results
Table 3 Statistical comparison between SVM and evolved solutions: p-values for the one-sided Wil-
coxon signed rank test for paired data
Subject p-value (is GP better than SVM?)
1 0.430
2 0.000
Boldface indicates statistical significance
Genet Program Evolvable Mach (2012) 13:377–405 393
123
5.3 Effects of smoothing trajectories
Naturally, the more pronounced swirls present in the trajectories produced by the
evolved programs may be a distraction for a user. It is, however, easy to remove
them by post-processing the mouse movements ðDxi;DyiÞ produced by the system
via a smoothing function. To study the benefits and drawbacks of this, we have used
an exponential infinite-impulse-response filter of the form:
ðDxsi ;Dys
i Þ ¼ a � ðDxi;DyiÞ þ ð1� aÞ � ðDxsi�1;Dys
i�1Þ;
with a = 1/30 and initialisation ðDxs0;Dys
0Þ ¼ ðDx0;Dy0Þ; that has a smoothing effect
which is effectively equivalent to associating a momentum to the mouse cursor.
Figure 15 shows the resulting trajectories. Clearly smoothing improves signif-
icantly the straightness of the trajectories produced by evolved programs, thereby
removing the potential negative effects of the swirls in the raw trajectories.
Smoothing, of course, does not improve the directional biases in the trajectories
produced by the SVMs, although it makes them more rectilinear.
Smoothing has also some effects on the statistics of the distances from ideal
trajectories. Table 4 reports the mean, median, standard deviation and standard error
of the mean of the distances between the ideal trajectory and the actual trajectory
(after smoothing) for each of the 16 direction runs of each subject. As one can see
by comparing the values in the table with those in Table 2, for subject 1, smoothing
produces a slight worsening of the average performance for GP and a much bigger
one for the SVM. The most likely cause of this is the transient due to the
initialisation of the smoothing filter. Because ðDxs0;Dys
0Þ ¼ ðDx0;Dy0Þ and a is very
small, the initial movement vector ðDx0;Dy0Þ influences the filter’s output for many
time steps. This influence may be positive if ðDx0;Dy0Þ is in the target direction.
However, if ðDx0;Dy0Þ is not in the target direction (which is relatively likely), the
transient provokes unnecessary detours from the ideal trajectory.
So, in a good subject, like subject 1, where trajectories were already relatively
straight, the addition of the smoothing filter may bring no benefits or even be
deleterious in relation to distances from ideal trajectories. The drop in performance
is more limited in the case of GP trajectories, because it is compensated for by the
total elimination of the large swirls which characterised such trajectories and made
them deviate from the ideal line.
For a relatively poor subject, like subject 2, whose raw trajectories present more
artifacts and can thus benefit more from the smoothing, the negative effects of the
transient in the filter’s output are outweighed by the benefits. Indeed, the data for
subject 2 show that, in the presence of smoothing, the mean and median are
marginally improved for the evolved solutions, while for the SVM there is a slight
improvement in the mean but a worsening for the median.
Table 5 reports the p-values obtained when we analysed the smoothed
trajectories using a one-sided Wilcoxon signed rank test for paired data. These
confirm our previous analysis. The evolved program for subject 2 remains
statistically significantly superior to the corresponding SVM. In addition, smoothing
makes the differences between the trajectories produced by GP and those produced
by SVM almost statistically significant also for subject 1.
394 Genet Program Evolvable Mach (2012) 13:377–405
123
In summary, while smoothing either does not (or only marginally) improve
things in relation to our objective measure, it still brings much benefit in terms of
the users not being distracted by the swirls of the raw trajectories.
As a final note on the issue of smoothing, in Sect. 2 we indicated that in [17]
Eq. (1) was used to smooth and integrate the raw trajectories produced by an SVM.
SVM + Smoothing subject 2subject 1
GP + Smoothing
Fig. 15 Trajectories produced by SVM and GP after the application of a low-pass filter
Table 4 Statistical comparison between the trajectories produced by the SVM and GP after low-pass
filtering: basic statistics of the distribution of distances between ideal and actual mouse trajectories
Subject Program Mean Median Standard Deviation Standard Error
1 GP 52.337 47.204 17.888 4.619
SVM 62.652 52.154 27.060 6.987
2 GP 37.453 37.377 9.739 2.515
SVM 68.305 68.191 33.853 8.741
Boldface indicates superior results
Genet Program Evolvable Mach (2012) 13:377–405 395
123
In Fig. 16 we show the trajectories produced by such a procedure applied on the raw
outputs of the SVM for the two subjects in this study (reported at the top of Fig. 14).
Note the considerable similarity between Figs. 15 (top) and 16, the only difference
being the slightly more rugged nature of the former. This suggests that this
procedure is effectively equivalent to a low-pass filtering of trajectories.
5.4 Interpretation of evolved programs
While the results reported in the previous sections are very encouraging, it is also
important to see what lessons can be learnt from the evolved programs themselves.
The programs are reproduced with shadings to aid interpretation in Fig. 13.
Looking at the program evolved for subject 1, we see that its output is simply the
mean of a consecutive block of samples taken from channel P8. The choice of P8 is
quite appropriate, since this and other parietal channels are often the locations where
the P300 has maximum amplitude. The decision to select sample 15 (which
corresponds to 469 ms after the presentation of the stimulus) as one of the two
extremes of the averaging block is also correct, since this sample falls right in the
middle of typical P300 waves. The sample marking the other end of the averaging
block is the result of a more elaborate choice determined by three cascaded if
SVM + Standard Integratorsubject 1 subject 2
Fig. 16 Trajectories produced by the SVM followed by the standard integrator in Eq. (1)
Table 5 Statistical comparison between the trajectories produced by the SVM and GP after low-pass
filtering: p-values for the one-sided Wilcoxon signed rank test for paired data
Subject p-value (is GP better than SVM?)
1 0.106
2 0.001
Boldface indicates statistical significance. Italics indicates a result close to significance
396 Genet Program Evolvable Mach (2012) 13:377–405
123
statements. Effectively this sample can take four different values: 0 (0.5 is truncated
to 0), 4 (125 ms), 10 (313 ms) or 20 (625 ms). Sample 0 is chosen when the
leftmost shaded sub-tree in Fig. 13 (left) returns True. Careful analysis has clarified
that this tree is specialised in detecting near targets, i.e., non-target flashes that
immediately precede or follow the flashing of a target. When not in the presence of
a near target, sample 4 is chosen if the second shaded sub-tree returns True.
Analysis suggests that this sub-tree is a negated detector of eye-blinks in the early
part of the data. If an eye-blink is detected, then control passes to the last if in the
chain which moves away the averaging window from the early samples, forcing the
average to cover either the range 10–15 or the range 15–20 depending on the
likelihood of finding a target-related P300 in the corresponding time windows
(313–469 and 469–625 ms, respectively). This decision is taken by the right-most
shaded sub-tree.
A pseudo-code implementation of this evolved solution is shown in Algorithm 1.
Turning now to the program evolved for subject 2, we see that it too uses the
strategy of returning the average of a particular channel, FC2 in this case. Here the
average is between a fixed sample (13, corresponding to 406 ms) and a second
sample which is computed based on a number of conditions. In this case, the output
is clipped to the value 13. We think this clipping is an eye-blink handling strategy.
Since channel FC2 is relatively frontal, the system needs to prevent eye-blinks from
producing disastrous off course mouse movements. By clipping the output to 13,
eye-blinks (which typically produce huge voltages for many hundreds of
milliseconds, i.e., many stimulus flashes) will cause the pointer trajectory to loop
(following the perimeter of a small regular octagon), thereby never going too far off
course. Note that the choice of sample 13 (406 ms) as one extreme for the windows
where the mean is computed is quite appropriate since this sample, too, falls in the
middle of typical P300s.
The values used as the second extreme of the averaging window depends on the
result of the comparison carried out in the left-most shaded sub-tree in Fig. 13
(right). In this subtree electrode Iz (which is at the base of the skull) has a prominent
Algorithm 1 Pseudo code of mouse-control program evolved for subject 1
1: if NearTarget() then
2: OtherLimit = 0 ms
3: else if !EyeBlink() then
4: OtherLimit = 125 ms
5: else if Target() then
6: OtherLimit = 625 ms
7: else
8: OtherLimit = 313 ms
9: end if
10: return mean(OtherLimit, 469 ms, P8)
Genet Program Evolvable Mach (2012) 13:377–405 397
123
role. The subtree is effectively a detector for neck movements and other muscular
artifacts in the early samples of an epoch. If one such artifact is detected, the second
extreme for the mean is based on the calculation (mean 0.1 308 CP2), which is
internally transformed into (mean 0 22 CP2). Note that sample 22 corresponds to
688 ms. The stronger the effect of the artifact on centro-parietal areas, the more the
resulting sample moves away from the beginning of the epoch, thereby avoiding the
influence of spurious data in the determination of the program output. If no early
muscle artifact is detected then the second extreme of the averaging block is either 1
(32 ms) or 19 (594 ms).
The decision about which sample to use is essentially made by a second artifact
detection subtree (right-most shaded tree in Fig. 13 (right). When activated, this
checks for muscle artifacts over a wider range of samples (including the end of the
epoch). If none is detected, this causes a domino effect involving the five if
statements connecting the subtree to the mean-over-FC2 instruction near the top of
the program, with all such if statements returning 19. Sample 19 is then used as the
second extreme of the averaging window for FC2. In the absence of eye-blinks, the
output of the program is thus the average of FC2 over the range 406–594 ms. This
makes complete sense since the range effectively covers most of the energy in the
P300 wave.
A pseudo-code implementation of the evolved solution is shown in Algorithm 2.
6 Generalisation
There are two main questions in relation to the ability to generalise of the evolved
controllers. The first question is whether a controller evolved for one subject will
work for a different subject. The second question is whether a controller evolved
using training data from one subject would work on test data from the same subject.
We discuss these issues in this section.
Algorithm 2 Pseudo code of mouse-control program evolved for subject 2
1: if EarlyArtifact() then
2: OtherLimit = mean(0 ms, 688 ms, CP2)
3: else
4: if OtherArtifact() then
5: OtherLimit = 32 ms
6: else
7: OtherLimit = 594 ms
8: end if
9: end if
10: RawOutput = mean(OtherLimit, 406 ms, FC2)
11: return min(13, RawOutput)
398 Genet Program Evolvable Mach (2012) 13:377–405
123
6.1 Generalisation across subjects
As we have seen previously, the controllers evolved by our GP system and the
SVMs used for reference are specialised to each specific subject. There is enormous
inter-subject variability in ERPs and artifacts, both in terms of location of activity
on the scalp and the time at which such activity occurs. These are mainly caused by
individual differences on how brains respond to stimuli. As a result, there is little
hope that a controller specialised for one subject will work well for a different
subject. For those familiar with speech recognition, it would be as implausible as
expecting that a system trained with a female soprano voice would properly
recognise the speech of a male bass voice. For these reasons subject-dependency
and lack of generalisation beyond a subject are, literally, ubiquitous in BCI
wherever machine learning technology is adopted. For example, all systems
described in Sect. 2 were trained on a user-by-user basis.
As a demonstration of the kind of performance one may obtain when applying a
controller evolved using the data from one subject to the data of another subject, we
tested subject 1’s controller on the data of subject 2 and vice versa obtaining the
results reported in Tables 6 and 7. As expected, results are very poor compared with
those of the reference SVMs (trained on the test-subject data).
6.2 Generalisation within subjects
The second issue related to generalisation—whether a controller evolved using
training data from a subject works on test data from that subject—is a particularly
important one. The standard approach in machine learning is to divide the data
available into a training set and an independent test set. The prerequisite for this is
Table 6 Statistical comparison between the output of the SVM trained with data from a subject and the
solution evolved using the data from the other subject: basic statistics of the distribution of distances
between ideal and actual mouse trajectories
Subject Program Mean Median Standard deviation Standard error
1 GP 3100.944 807.342 5020.364 1296.252
SVM 52.638 43.709 30.550 7.888
2 GP 1553.989 511.953 3853.857 995.062
SVM 69.766 59.997 34.518 8.913
Boldface indicates superior results
Table 7 Statistical comparison between the output of the SVM trained with data from a subject and the
solution evolved using the data from the other subject: p-values for the one-sided Wilcoxon signed rank
test for paired data
Subject p-value (is SVM trained on a subject’s data better than GP tested on the same data but trained
on the other subject’s data?)
1 0.000
2 0.000
Genet Program Evolvable Mach (2012) 13:377–405 399
123
that, after the split, one is still left with sufficiently large and representative training
and test sets.
In BCI, data sets can never be very large, because their acquisition is a labour
intensive process involving human subjects. When subjects start becoming tired,
their attention levels drop and, as a consequence, their brain signals start changing.
These changes include much reduced or even absent P300s where one would
normally have robust ones, which is naturally a key problem for P300-based BCIs.
Also, as subjects become tired they tend to produce more muscular artifacts, which
are generally a very big problem for BCI. To counteract this, it is common practice
to give subjects regular breaks, offering refreshments, and so on. However, after
about 1 h of experimentation (excluding subject preparation, which takes approx-
imately 30 min) subjects wear out. Degradation of signals is also induced by the
drying out of the gel used to ensure good electrical connection between scalp and
electrodes, which starts becoming a problem after a couple of hours since
deployment.
Because of the limited size of BCI datasets, it is very rarely the case that one can
split a single-subject dataset into two representative training and test sets. This was
the case also for the data acquired from our subjects. Particularly difficult is
ensuring that both sets include a representative set of artifactual epochs (as well as
normal ones), which are rarer but tend to occur in fits. So, to evaluate whether
systems generalise, BCI researchers tend to use a k-fold cross validation approach
with values of k around 10, to ensure each fold has a training set as large as possible.
A problem with these approaches is that they can only effectively be adopted if
the machine learning algorithm is reasonably fast. However, as we explained in
Sect. 4, our GP runs were extremely computationally intensive, making a cross-
validation approach non-viable in our case. For example, an 8-fold cross validation
would have required approximately 280 CPU days per subject.
These are the reasons why we decided to include all of the data available to us in
the training set. In order to evaluate whether the evolved controllers would
generalise to unseen data, for each subject we generated an artificial test set by
suitably modifying (via morphing) the training data. The approach is described in
the next section.
6.2.1 Monte Carlo morphing
In the absence of a separate training set, it is possible to use Monte Carlo
simulations to ‘‘replay’’ the training data in a meaningful but significantly different
way so as to simulate a real testing situation. For example, we used this strategy in
[20]. The key to a successful simulation, however, is that the reply must be
meaningful, i.e., the output of a system must be influenced by the differences
introduced in the replay. In [20] we randomised the order of the replay of epochs,
since the method we were studying was influenced by the specific sequence in
which epochs were presented. The controllers evolved by GP in this paper, however,
produce exactly the same output in the presence of the same input epoch,
irrespective of the order in which epochs are presented to them. So, a different
approach is needed.
400 Genet Program Evolvable Mach (2012) 13:377–405
123
The algorithm we came up with is based on the idea of generating new data by
morphing each original ERPs towards the ERP recorded in another epoch (randomly
selected with the constraint that it must correspond to a flash of a stimulus in the
same direction as that of the original ERP). The morphing is obtained via the
formula
ERPtest set epoch ¼ a� ERPtraining set epoch þ ð1� aÞ � ERPother compatible training set epoch
ð2Þ
where a is a value drawn uniformly at random from the interval [0.0, 1.0]. This
morphing formula was used for the epochs in all 16 direction runs of the training set
of each subject, resulting in a corresponding test set including 16 direction runs of
the same duration as the original, but with significantly different ERP shapes.
Details on the generation of each test direction run are provided in Algorithm 3.
Test sets created with this algorithm are particularly difficult to classify if a
system does not generalise properly. This is for two reasons. Firstly, because of our
choice for the distribution of the stochastic variable a, Eq. (2) typically produces
ERPs which are radically different from the two ‘‘parent’’ ERP. Secondly, the
resulting test sets include approximately twice as many artifactual epochs as the
original. This second point requires a bit more explanation.
The amplitude of the EEG signals recorded during an eye blink, a neck
movement, etc., can be one or two orders of magnitude bigger that ordinary ERPs.
As a result of this and of the fact that E[a] = 0.5, if an artifact-free epoch is
combined with one including artifacts using Eq. (2), the resulting test epoch will
likely show the very large voltages observed in artifactual epochs. For example, let
us consider a typical non-artifactual epoch presenting a P300 and let us imagine that
the voltage recorded at sample 10 (313 ms from stimulus onset) is 6 lV. Let us now
imagine that such epoch is morphed using an a = 0.53 with an epoch that presents a
muscle artifact resulting in a voltage of 300 lV at sample 10. Applying Eq. (2) one
Algorithm 3 Pseudo code of the test set generation algorithm used to verify the generalisation of the
evolved controllers
1: for epoch = 0...numEpochsDirRun-1 do
2: direction = epoch mod 8
3: otherEpoch = (randint(0, numEpochsDirRun - 1) div 8) * 8 ?direction
4: if otherEpoch[= numEpochsDirRun then
5: otherEpoch = otherEpoch - 8
6: end if
7: alpha = random()
8: output[epoch] = alpha * dirEpoch[epoch] ? (1 - alpha) *dirEpoch[otherEpoch]
9: end for
10: return output
Genet Program Evolvable Mach (2012) 13:377–405 401
123
finds that the output epoch will present a voltage of approximately 144 lV at
sample 10, making it indistinguishable from an artifactual epoch.
Naturally, this also happens if one morphs two artifactual epochs together. It is
only when a non-artifactual epoch is combined with a second non-artifactual epoch
that the resulting signals will look non-artifactual. As a consequence, if ptrain is the
proportion of epochs containing artifacts, the proportion of test-set epochs
containing artifacts, ptest, is expected to be:
ptest ¼ 1� Prftest epoch is artifact freeg ¼ 1� ð1� ptrainÞ2 ¼ 2ptrain � p2train
So, for sufficiently small ptrain (e.g., ptrain B0.2, which is certainly the case in our
training sets), we have that ptest % 2ptrain. That is, as indicated above, the fraction ofartifactual epochs in the test set is approximately double that of the training set.
6.2.2 Within-subject generalisation results
The results obtained when applying our evolved detectors to morphed test sets
constructed as described above are summarised in Tables 8 and 9. To provide an
absolute reference, we still compare the performance of evolved programs to that
obtained by the corresponding SVM on the original data.
The first thing to note is that, while GP results are inferior to those obtained in the
training set (see Table 2), they are still very good and nowhere near the no-
generalisation-across-subjects results reported in Table 6. Also, as indicated in
Table 9, while the GP results appear to be slightly inferior to those of the reference
SVM, the difference is actually not statistically significant.
Overall the within-subject generalisation shown by the solutions evolved by GP
is very good, particularly considering that our test set effectively contains twice as
many artifactual epochs as the original training set.
Table 8 Statistical comparison between the output of the SVM trained with data from a subject and the
solution evolved using a test set generated via morphing using data from the same subject: basic statistics
of the distribution of distances between ideal and actual mouse trajectories
Subject Program Mean Median Standard deviation Standard error
1 GP 65.489 63.670 25.542 6.595
SVM 52.638 43.709 30.550 7.888
2 GP 74.281 67.687 35.530 9.174
SVM 69.766 59.997 34.518 8.913
Boldface indicates superior results
Table 9 Statistical comparison between the output of the SVM trained with data from a subject and the
solution evolved using a test set generated via morphing using data from the same subject: p-values for
the one-sided Wilcoxon signed rank test for paired data
Subject p-value (is SVM better than GP?)
1 0.116
2 0.281
402 Genet Program Evolvable Mach (2012) 13:377–405
123
7 Conclusions
Brain–computer interfaces are an exciting research area that, 1 day, will hopefully
turn the dream of effortlessly and efficiently interpreting users’ commands directly
from electrical brain signals into reality. Progress is constantly made in BCI but it is
slowed down by many factors, including the noise present in brain signals, the
variability of EEG signals and ERPs across subjects, muscular artifacts and the
inconsistency and variability of user attention and intentions.
Recent research [8, 16] has shown that genetic algorithms and support vector
machines can be of great help in the selection of the best channels, time steps and/or
filtering kernels to use for the control of an ERP-based BCI mouse. In this paper we
proposed the use of GP as a means to evolve BCI mouse controllers—a very
difficult task that has never been attempted before.
Our objective was to synthesise complete systems, which analyse electroen-
cephalographic signals and directly transform them into pointer movements. The
only input we provided in this process was the set of visual stimuli to be used to
generate recognisable brain activity. Note that there is no high-performance, human-
designed system for ERP-based mouse pointer control in BCI. There simply aren’t
any design guidelines for a domain such as this. This is why all but one of the
systems of this kind reported in the literature use some form of machine learning
(the system that did not use machine learning [6] had very poor performance).
The results of the experiments with our approach reported in this paper show that
GP can produce very effective BCI mouse controllers which include clever
mechanisms for artifact reduction. The picture emerging from our experiments is
that not only has GP been successful in the automated design of a control system for
a BCI mouse, but it has also been able to perform at least on par with (if not better
than) SVMs—which until now have been considered perhaps the best machine-
learning technology available for BCI—using only a fraction of the electrodes
needed by them. Additionally, GP produces controllers that can be understood and
from which we can learn new effective strategies for BCI.
All this suggests that our evolved systems are at, or above, state-of-the-art,
indicating that, perhaps, they qualify for the attribute of human-competitive, in the
sense indicated by Koza (e.g., see [21]).
Naturally, there are also limitations to the techniques reported in this paper.
Firstly, as we discussed in Sect. 6, there is the issue of generalisation of the evolved
controllers. While we are confident that our controllers generalise to unseen data
from the subject they were evolved for, it would be important to be able, 1 day, to
evolve controllers that go beyond subject-dependency typical of BCI. Also, there is
the issue of the long training times that GP requires, which are not required by the
SVM here and in [17] nor were required by the GA in [8, 16].
As a first attempt to address these issues, we have started to explore hybrid
systems where one SVM or an ensemble of SVMs is trained on a per-subject basis,
but where a trajectory integrator which is applicable across subjects is evolved by
GP [22, 23]. In these systems, once evolved, the GP integrator can be used over and
over again. So, only the SVM training is required to get a working system with a
new subject. Results so far with this idea have been promising.
Genet Program Evolvable Mach (2012) 13:377–405 403
123
Naturally, there are many other alternative avenues we hope to be able to explore
in future research. For example, as suggested by one of our reviewers, it would
make sense to encapsulate the trees shown within the shade regions in Fig. 13 and
provide them as primitives to a GP system, thereby making its task much easier (and
hopefully evolution times much shorter). Also, with minor variations of the
principles used in this paper, it would be possible to try 3-D pointer movement and
3-D navigation in virtual environments.
Acknowledgments We would like to thank EPSRC (grant EP/F033818/1) for financial support. We
would also like to thank the reviewers for their very useful comments.
References
1. L.A. Farwell, E. Donchin, Talking off the top of your head: toward a mental prosthesis utilizing
event-related brain potentials. Electroencephalogr. Clin. Neurophysiol. 70, 510–523 (1988)
2. J.R. Wolpaw, D.J. McFarland, G.W. Neat, C.A. Forneris, An EEG-based brain–computer interface
for cursor control. Electroencephalogr. Clin. Neurophysiol. 78, 252–259 (1991)
3. G. Pfurtscheller, D. Flotzinger, J. Kalcher, Brain–computer interface: a new communication device
for handicapped persons. J. Microcomput. Appl. 16(3), 293–299 (1993)
4. N. Birbaumer, N. Ghanayim, T. Hinterberger, I. Iversen, B. Kotchoubey, A. Kbler, J. Perelmouter,
E. Taub, H. Flor, A spelling device for the paralysed. Nature 398, 297–298 (1999)
5. J.R. Wolpaw, D.J. McFarland, Control of a two-dimensional movement signal by a noninvasive
brain-computer interface in humans. Proc. Natl. Acad. Sci. 101(51), 17849–17854 (2004)
6. J.B. Polikoff, H.T. Bunnell, W J. Borkowski Jr., Toward a P300-based computer interface, in
Rehabilitation Engineering and Assistive Technology Society of North America (RESNA’95)(Arlington, Va) (Resna Press, 1995), pp. 178–180
7. F. Beverina, G. Palmas, S. Silvoni, F. Piccione, S. Giove, User adaptive BCIs: SSVEP and P300
based interfaces. PsychNol. J. 1(4), 331–354 (2003)
8. L. Citi, R. Poli, C. Cinel, F. Sepulveda, P300-based BCI mouse with genetically-optimized analogue
control. IEEE Trans. Neural Syst. Rehabil. Eng. 16, 51–61 (2008)
9. M. Salvaris, C. Cinel, R. Poli, Novel sequential protocols for a ERP based BCI mouse. 5th inter-
national IEEE EMBS neural engineering conference, IEEE Press, 2011, pp. 352–355
10. J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection.
(MIT Press, Cambridge, MA, USA, 1992)
11. R. Poli, W. B. Langdon, N. F. McPhee, A field guide to genetic programming (2008), Published via
http://lulu.com and freely available at http://www.gp-field-guide.org.uk (With contributions by
J. R. Koza)
12. J. Donoghue, Connecting cortex to machines: recent advances in brain interfaces. Nat. Neurosci. 5,
1085–1088 (2002)
13. P. Comon, Independent component analysis, a new concept? Signal Process. 36, 287–314 (1994)
14. S. Makeig, A.J. Bell, T.-P. Jung, T.J. Sejnowski, Independent component analysis of electroen-
cephalographic data, in Advances in Neural Information Processing Systems, vol. 8, ed. by D S.
Touretzky, M.C. Mozer, M.E. Hasselmo (The MIT Press, Cambridge, MA, 1996), pp. 145–151
15. C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
16. R. Poli, L. Citi, F. Sepulveda, C. Cinel, Analogue evolutionary brain computer interfaces. IEEE
Comput. Intell. Mag. 4, 27–31 (2009)
17. M. Salvaris, C. Cinel, R. Poli, L. Citi, F. Sepulveda, Exploring multiple protocols for a brain–
computer interface mouse, in Proceedings of 32nd IEEE EMBS Conference, Buenos Aires,
pp. 4189–4192, Sept 2010
18. J. Wolpaw, N. Birbaumer, W. Heetderks, D. McFarland, P. Peckham, G. Schalk, E. Donchin,
L. Quatrano, C. Robinson, T. Vaughan, Brain–computer interface technology: a review of the first
international meeting. Rehabil. Eng. IEEE Trans. 8(2), 164–173 (2000)
19. D.J. Montana, Strongly typed genetic programming. Evol. Comput. 3(2), 199–230 (1995)
404 Genet Program Evolvable Mach (2012) 13:377–405
123
20. L. Citi, R. Poli, C. Cinel, Documenting, modelling and exploiting P300 amplitude changes due to
variable target delays in Donchin’s speller. J. Neural Eng. 7(5), 056006 (2010)
21. J. R. Koza, Human-competitive results produced by genetic programming. Genet. Programm.Evolvable Mach. 11, 251–284, Sept 2010. Tenth Anniversary Issue: Progress in Genetic Program-
ming and Evolvable Machines
22. R. Poli, M. Salvaris, C. Cinel, Evolutionary synthesis of a trajectory integrator for an analogue brain–
computer interface mouse, in Applications of Evolutionary Computation (EvoApplications),vol. 6624 of Lecture Notes in Computer Science, ed. by C.D. Chio, S. Cagnoni, C. Cotta, M. Ebner,
A. Ekart, A. Esparcia-Alcazar, J.J.M. Guervos, F. Neri, M. Preuss, H. Richter, J. Togelius, G.N.
Yannakakis (Springer, Berlin, 2011), pp. 214–223
23. R. Poli, M. Salvaris, C. Cinel, Evolution of an effective brain-computer interface mouse via genetic
programming with adaptive Tarpeian bloat control, in Genetic Programming Theory and Practice IX,
ed. by R. Riolo, E. Vladislavleva, J.H. Moore (Springer, New York, 2011), forthcoming
24. R. Poli, M. Salvaris, C. Cinel, Evolution of a brain–computer interface mouse via genetic pro-
gramming, in Proceedings of Genetic Programming—14th European Conference, EuroGP, no. 6621in Lecture Notes in Computer Science, ed. by S. Silva, J.A. Foster, M. Nicolau, P. Machado,
M. Giacobini (Springer, Berlin, 2011) pp. 203–214
25. W.S. Pritchard, Psychophysiology of P300. Psychol. Bull. 89, 506–540 (1981)
26. A. Rakotomamonjy, V. Guigue, BCI competition III: dataset II—ensemble of SVMs for BCI P300
speller. IEEE Trans. Biomed. Eng. 55, 1147–1154 (2008)
Genet Program Evolvable Mach (2012) 13:377–405 405
123