Report 3 Ack

B.E. PROJECTON

Directional Motion Support using Motor Imagery EEG Signals

Submitted By

Manvika Marwah

Mohit Arora

Nakul Gupta

Shaurya Varma

(In partial fulfillment of B.E. (Instrumentation and Control Engg.) degree

of University of Delhi)

Under the Guidance of

Dr.Vijander Singh

Mrs. Asha Rani

DIVISION OF INSTRUMENTATION AND CONTROL ENGINEERING

NETAJI SUBHAS INSTITUTE OF TECHNOL0GY

UNIVERSITY OF DELHI, DELHI

2011

i

CERTIFICATE

This is to certify that the project entitled, “Directional motion support using motor

imagery EEG signals” by Manvika Marwah, Mohit Arora, Nakul Gupta, Shaurya

Varma is a record of bonafide work carried out by them, in the department of

Instrumentation and Control Engg., Netaji Subhas Institute of Technology, New Delhi, under

my supervision and guidance in partial fulfillment of requirement for the award of the degree

of Bachelor of Engineering in Instrumentation and Control Engg., University of Delhi in the

academic year 2010-2011.

Signature Signature

Dr. Vijander Singh Mrs Asha Rani

Associate Professor Associate Professor

ii

CERTIFICATE

This is to certify that the project entitled, “Directional motion support using motor

imagery EEG signals” by Manvika Marwah, Mohit Arora, Nakul Gupta, Shaurya

Varma is a record of bonafide work carried out by them, in the department of

Instrumentation and Control Engg., Netaji Subhas Institute of Technology, New Delhi, in the

partial fulfillment of requirement for the award of Bachelor of Engineering in academic year

2010-2011.

Prof. A. P. Mittal

Head of Department (ICE)

N.S.I.T., New Delhi

iii

Acknowledgement

It is a great pleasure for us to acknowledge the assistance and contributions of few

individuals to the successful completion of our final B.Tech Project.

We would like to acknowledge our project guides, Dr. Vijander Singh and Mrs Asha Rani,

and our mentor, Ms. Girisha Garg, for their valuable time and inputs during the project. They

always stood by us during the problem sessions and despite their very busy schedule, they

always gave priorities to our queries and problems. They helped us with various study

materials on Brain Computer Interface which are otherwise very rare. In particular we would

like to acknowledge the help of Ms. Girisha Garg without whom the practical application of

the theoretical concepts wouldn’t be so clear.

We would also like to thank our parents and finally our friends at NSIT for being very kind

and supportive towards us throughout the course of project completion.

Manvika MarwahMohit AroraNakul GuptaShaurya Varma

iv

Contents

Topic Page no.

Abstract 1

Introduction 2

1.Brain Computer Interface (BCI) 1.1.Introduction 1.2.History 1.3.How does it work? 1.4.Applications of BCI 1.4.1.Introduction 1.4.2.Medical applications 1.4.3.Human enhancement 1.4.5.Human manipulation 1.5.Where is it used? 1.5.1.Medicine 1.5.2.Robots 1.6.BCI Devices 1.7.Work done 1.7.1.BCI and The Military 1.7.2.BCI Innovators 1.7.3.BCI Classification Competitions 1.8.Ethical and Moral Implications of BCI 1.9.BCI Drawbacks 1.10.Literature Survey 1.11.Current Projects 1.11.1.Berlin Brain-Computer-Interface (BBCI) 1.11.2.Graz Brain-Computer-Interface

3345

6

99

12121314

2.Electroencephalogram (EEG) 2.1.Introduction 2.1.1.Source of EEG activity 2.2.History Of EEG 2.3.Uses of EEG 2.3.1.Clinical use 2.4.Method of recording EEG 2.5.Limitations 2.5.1.Normal EEG 2.5.2.Wave patterns 2.6.Artifacts 2.6.1.Biological artifacts

16

1718

1922

25

v

2.6.2.Environmental artifacts 2.6.3.Removing Artifacts from EEG 2.7.Uses Of EEG 2.7.1.EEG and Telepathy 2.7.2.Games 2.8.Brain Computer Interface Using EEG Signals

26

28

3.Motor Imagery 3.1.The effects of motor imagery

29

4.Independent Component Analysis (ICA) 4.1.Motivation 4.2.Introduction 4.3.Assumptions 4.4.Ambiguities of ICA 4.5.What is independence? 4.5.1.Definition and fundamental properties 4.5.2Uncorrelated variables are only partly independent 4.6.Why Gaussian variables are forbidden 4.7.Principles of ICA estimation 4.8.Measures of nongaussianity 4.8.1.Kurtosis 4.8.2.Negentropy 4.9.Maximum Likelihood Estimation 4.9.1.The likelihood 4.9.2.The Infomax Principle 4.10.Preprocessing for ICA 4.10.1.Centering 4.10.2.Whitening

3033353536

383941

46

48

5.Joint Approximate Diagonalization of Eigen Matrices (JADE) 5.1.Joint diagonalization 5.1.1.The algorithm 5.2.Motor Imagery EEG classification using JADE

51

54

6. Linear Discriminant Analysis (LDA) 6.1.Linear Discriminant Analysis, two-classes 6.2.LDA example 6.3.Linear Discriminant Analysis, C-classes 6.3.1.Fisher’s LDA generalizes very gracefully for C-class problems 6.3.2.Derivation 6.3.3.NOTES

566162

vi

6.4.Limitations of LDA 65

7.Support Vector Machine (SVM) 7.1.Motivation 7.2.Linear SVM 7.2.1.Primal form 7.2.2.Dual form 7.3.Biased and unbiased hyperplanes 7.4.Limitations of SVM

666667

7172

8.Results 8.1.Graphical User Interface (GUI) 8.2.Observations 8.3.Comparisons

7374

9.Conclusion and Future Scope 80

10.Appendix 10.1.Data set 10.2.Code 10.2.1.sampleset1_jade.m 10.2.2.sampleset2_jade.m 10.2.3.sample_testing.fig 10.2.4.sample_testing.m

8181

11.References 91

vii

Abstract

A brain-computer interface (BCI) based on Electroencephalogram (EEG) records

noninvasively the activity of brain with a high amount of precision. We present an EEG-

based BCI which performs data processing and feedback of the imagined motion direction

(either left or right).

The EEG signals of known direction are recorded and stored as reference whereas the EEG

signals for which the direction is to be determined as stored as test signals. The features of

these signals are extracted with the help of independent component analysis (ICA). The

features of known signals are stored as a training set. The signals are exemplified by the

training set using various statistical analysis algorithms like linear discriminant analysis

(LDA) and support vector machine (SVM).

The methodology allows for determining the imagined motion of any paraplegic subject and

thus for actuating motion in vehicles designed specifically for them.

1

Introduction

When we talk about interfacing with a computer we typically mean typing at a keyboard or

using a mouse. This project investigates using a new communication channel - the EEG. The

EEG, or electroencephalogram, is electrical activity recorded from the scalp and produced by

neurons in the brain. The development of a Brain Computer Interface, or in our case, an

EEG-based communication device, requires the raw EEG signal to be converted into a new

output channel through which the brain can communicate and control its environment.

This projects builds on twenty years of research in the brain sciences and on recent

developments in adaptive computing. In the 1970s it was discovered that subtle changes

occur in the EEG when we plan movements. These changes are called Movement-Related

Desynchronizations (or MRDs for short) because when movements are planned the activity

of neurons in the motor cortex becomes desynchronized. But the MRD signals are tiny. They

are rarely bigger than a few tens of microvolts and are often buried beneath other signals. We

therefore need to use advanced pattern recognition methods to detect the MRD signals.

This work involved making recordings from a number of subjects and then analysing the

recorded data at a later date. We refer to this work as Off-line BCI. The work culminated in

the discovery that MRDs also occur when movements are merely imagined.

Much of the progress on this project has depended on technical developments in advanced

pattern recognition. This includes applying Independent Component Analysis (ICA) to

extract the signal features followed by classifying them as left or right imagined movement

with the help of Linear Discriminant Analysis (LDA) or Support Vector Machine (SVM)

algorithm.

2

http://www.robots.ox.ac.uk/~parg/projects/bci/technical.html

http://www.robots.ox.ac.uk/~parg/projects/bci/offline.html

1. Brain Computer Interface (BCI)

1.1 Introduction

A brain–computer interface (BCI), sometimes called a direct neural interface or a brain–

machine interface (BMI), is a direct communication pathway between the brain and an

external device. It is a technique that allows the brain to directly communicate with a

computer.

At some point in our life we may have lazed on the couch hoping things could get done just

by thinking about it. BC allow us to do that. Studies show that patients with access to BCI

technology recover more quickly from serious mental traumas, especially if there is

underlying physical trauma that renders the patient incapable of communicating. By

interfacing with a computer through a direct neural connection, patients report a higher rate of

mental engagement and, ultimately, recovery. BCI technology shows promising signs in both

preventing and delaying the onset of dementia, Alzheimer's and Parkinson's disease in the

elderly.BCI technology will help define the potential of the human race. It holds the promise

of bringing sight to the blind, hearing to the deaf, and the return of normal functionality to the

physically impaired.

1.2 History

Discovering the basics

The history of Brain-Computer-Interfaces (BCI) starts with Hans Berger's discovery of the

electrical activity of human brain and the development of electroencephalograpy (EEG). In

1924 Berger was the first one who recorded an EEG from a human brain. By analyzing EEGs

Berger was able to identify different waves or rhythms which are present in a brain, as the

3

Alpha Wave (8 – 12 Hz),also known as Berger's Wave. Berger analyzed the interrelation of

alternations in his EEG wave diagrams with brain diseases. EEGs permitted completely new

possibilities for the research of human brain activities.

However, it took until 1970 before the first development steps were taken to use brain

activities for simple communication systems. Research on BCIs began in the 1970s at the

University of California Los Angeles (UCLA) under a grant from the National Science

Foundation, followed by a contract from DARPA .

1.3 How does it work?

The Electric Brain

The reason a BCI works at all is because of the way our brains function. Our brains are filled

with neurons, individual nerve cells connected to one another by dendrites and axons. Every

time we think, move, feel or remember something, our neurons are at work. That work is

carried out by small electric signals that zip from neuron to neuron as fast as 250 mph. The

signals are generated by differences in electric potential carried by ions on the membrane of

each neuron.

Although the paths the signals take are insulated by myelin, some of the electric signal

escapes. Scientists can detect those signals, interpret what they mean and use them to direct a

device of some kind.

It can also work the other way around. For example, researchers could figure out what signals

are sent to the brain by the optic nerve when someone sees the color red. They could rig a

camera that would send those exact signals into someone's brain whenever the camera saw

red, allowing a blind person to "see" without eyes.

4

http://electronics.howstuffworks.com/camera.htm

The approach bases on an artificial neural network that recognizes and classifies different

brain activation patterns associated with carefully selected mental tasks. By this means a

robust classifier is developed with short classification time.

The electric activity of the brain can be measured using electroencephalography (EEG). In

addition to EEG, the magnetic activity of the brain can be measured S with

Magnetoencephalography (MEG).MEG signals are more localized than EEG signals and thus

give us more information about the brain activity related to, e.g., finger movements. These

signals are studied using time frequency representations (TFRs) and important features from

them are picked out.

1.4 Applications of BCI

1.4.1 Introduction

Apart from being a non-conventional input device for a computer we have found three main

application fields for BCIs and BCI related devices which are more or less controversial:

– Medical applications

– Human enhancement

– Human manipulation

1.4.2 Medical applications

BCIs provide a new and possibly only communication channel for people suffering from

severe physical disabilities but having intact cognitive functions. For example these devices

could help in treating (or rather overcoming) paraplegia or amyotrophia. Somewhat related to

this topic is the field of Neuroprosthetics which deals with constructing and surgically

implanting devices used for replacing damaged areas of the brain and more generally for

neural damages of any kind.

5

1.4.3 Human enhancement

“Human enhancement describes any attempt (whether temporary or permanent) to overcome

the current limitations of human cognitive and physical abilities, whether through natural or

artificial means.”

BCIs could help facilitate communication systems in Cybernetic Organisms, Brainwave

Synchronization, or even speculative things such as the Exocortex, among others.

Cybernetic Organism describes the enhancement of an organism by means of technology. For

example a BCI could enable the attachment of robotic limbs without the use of the

organism’s original nervous system (as long as the brain is intact).

1.4.4 Human manipulation

The notion that a BCI could allow a two-way communication between a human and a

computer gives rise to more controversial potential uses of such a device. Using such a

communication mechanism one could imagine directly influencing an individual’s thoughts,

decisions, emotions or thinking. And, the mere “reading” of the mind could be put to criminal

use. Brain-computer-interfaces present a new level of technology that could be used to

actively manipulate an individual.

1.5 Where is it used?

BCI is useful in disciplines ranging from medicine to robots. Applications of this technology

range from prostheses to control of robotic UAVs to non-verbal human communication.

Brain Computer Interfaces (BCIs) are intended for enabling both the severely motor disabled

as well as the healthy people to operate electrical devices and applications through conscious

mental activity.

6

1.5.1 Medicine

BCI research and development has focused on neuroprosthetics applications that aim at

restoring damaged hearing, sight and movement. BCIs are often aimed at assisting,

augmenting or repairing human cognitive or sensory-motor functions.

The most common and oldest way to use a BCI is a cochlear implant. For the average person,

sound waves enter the ear and pass through several tiny organs that eventually pass the

vibrations on to the auditory nerves in the form of electric signals. If the mechanism of the

ear is severely damaged, that person will be unable to hear anything. However, the auditory

nerves may be functioning perfectly well. They just aren't receiving any signals.

A cochlear implant bypasses the nonfunctioning part of the ear, processes the sound waves

into electric signals and passes them via electrodes right to the auditory nerves. The result: A

previously deaf person can now hear. He might not hear perfectly, but it allows him to

understand conversations.

For restoring vision, the principle is the same. Electrodes are implanted in or near the visual

cortex, the area of the brain that processes visual information from the retinas. A pair of

glasses holding small cameras is connected to a computer and, in turn, to the implants. After a

training period similar to the one used for remote thought-controlled movement, the subject

can see. Again, the vision isn't perfect, but refinements in technology have improved it

tremendously since it was first attempted in the 1970s.

1.5.2 Robots

One of the most exciting areas of BCI research is the development of devices that can be

controlled by thoughts. Some of the applications of this technology may seem frivolous, such

as the ability to control a video game by thought. If you think a remote control is convenient,

imagine changing channels with your mind.

7

However, there's a bigger picture -- devices that would allow severely disabled people to

function independently. For a quadriplegic, something as basic as controlling a computer

cursor via mental commands would represent a revolutionary improvement in quality of life.

But how do we turn those tiny voltage measurements into the movement of a robotic arm?

Early research used monkeys with implanted electrodes. The monkeys used a joystick to

control a robotic arm. Scientists measured the signals coming from the electrodes.

Eventually, they changed the controls so that the robotic arm was being controlled only by

the signals coming form the electrodes, not the joystick.

A more difficult task is interpreting the brain signals for movement in someone who can't

physically move their own arm. With a task like that, the subject must "train" to use the

device. With an EEG or implant in place, the subject would visualize closing his or her right

hand. After many trials, the software can learn the signals associated with the thought of

hand-closing. Software connected to a robotic hand is programmed to receive the "close

hand" signal and interpret it to mean that the robotic hand should close. At that point, when

the subject thinks about closing the hand, the signals are sent and the robotic hand closes.

A similar method is used to manipulate a computer cursor, with the subject thinking about

forward, left, right and back movements of the cursor. With enough practice, users can gain

enough control over a cursor to draw a circle, access computer programs and control a TV . It

could theoretically be expanded to allow users to "type" with their thoughts.

Once the basic mechanism of converting thoughts to computerized or robotic action is

perfected, the potential uses for the technology are almost limitless. Instead of a robotic hand,

disabled users could have robotic braces attached to their own limbs, allowing them to move

and directly interact with the environment. This could even be accomplished without the

"robotic" part of the device. Signals could be sent to the appropriate motor control nerves in

8

the hands, bypassing a damaged section of the spinal cord and allowing actual movement of

the subject's own hands.

BCI allows person-to-person communication through power of thought.

1.6 BCI Devices

Brain computer interface technology pioneer Emotive Systems has its EPOC neuroheadset . This

lightweight EPOC is, worn on the head but does not restrict movement in any way as it is wireless.

The set detects and processes conscious thoughts, expressions and non-conscious emotions based on

electrical signals around the brain. It opens up a plethora of new applications which can be controlled

with our thoughts, expressions and emotions. By integrating the Emotive EPOC into their games or

other applications, developers can dramatically enhance interactivity, gameplay and player enjoyment

by, Yet another direction enabled by the EPOC is that of live animation using the unit’s facial

recognition sensors to mimic the wearer’s facial expressions in an animated avatar.

1.7 Work done

In Asia, BCI technology has been applied to help handicapped individuals write

Chinese characters. Called the P300 Chinese Speller, this program, still in the

prototype phase, promises to change the lives of millions of Chinese people suffering

from paralysis.

At the recent CES 2011 trade show on January 8, tech journalist Evan Ackerman was

the first person to test the prototype of the Hybrid Assisted Limb, a Japanese-

developed robot suit. The suit is designed to help restore mobility to the elderly and

handicapped, as well as to give military personnel superhuman strength.

The XWave iPhone accessory is another recent BCI product release. This headset

plugs directly into compliant iPhones and reads brainwaves.

9

http://spectrum.ieee.org/tag/HAL

http://spectrum.ieee.org/static/ces-2011

http://www.hkstories.net/fall2010/?p=9272

Japanese researchers furthered development of devices for people suffering from

ALS (Amyotrophic lateral sclerosis; where the person is looses its ability to initiate

and control all voluntary movements) to both operate robotic limbs and to display

thoughts on a screen. They are currently in the process of securing funding for

additional research and development.

1.7.1 BCI and the Military

Military use of BCI technology is being applied to enhance troop responses to certain orders,

situations and words. Recognizable brainwave patterns communicated between soldiers has

given rise to what might be best described as technology generated telepathy. A soldier need

only think a command to instantly broadcast it to other troops.

A MATLAB-based BCI platform for offline as well as online use for quadriplegics has been

developed.

As BCI technology further advances, brain tissue may one day give way to implanted silicon

chips thereby creating a completely computerized simulation of the human brain that can be

augmented at will. Futurists like Kurzweil predict that from there, superhuman artificial

intelligence won't be far behind.

1.7.2 BCI Innovators

A few companies are pioneers in the field of BCI. Most of them are still in the research stages, though

a few products are offered commercially.

Neural Signals is developing technology to restore speech to disabled people.

NASA has researched a similar system, although it reads electric signals from the

nerves in the mouth and throat area, rather than directly from the brain. They

10

http://neurogadget.com/Neurogadget/Neurogadget_News/Entries/2011/1/11_Japanese_Researchers_to_Help_ALS_Patients_Through_Brain-Computer_Interface.html

http://health.howstuffworks.com/nerve.htm

http://science.howstuffworks.com/electricity.htm

http://science.howstuffworks.com/nasa.htm

succeeded in performing a Web search by mentally "typing" the term "NASA" into

Google.

Cyberkinetics Neurotechnology Systems is marketing the BrainGate, a neural

interface system that allows disabled people to control a wheelchair, robotic

prosthesis or computer cursor.

Japanese researchers have developed a preliminary BCI that allows the user to control

their avatar in the online world Second Life.

BCI was taken a step further by Dr Christopher James from the University's Institute of

Sound and Vibration Research. The aim was to expand the current limits of this technology

and show that brain-to-brain (B2B) communication is possible. It could be of benefit such as

helping people with severe debilitating muscle wasting diseases, or with the so-called

'locked-in' syndrome, to communicate and it also has applications for gaming.

His experiment had one person using BCI to transmit thoughts, translated as a series of binary

digits, over the internet to another person whose computer receives the digits and transmits

them to the second user's brain through flashing an LED lamp. While attached to an EEG

amplifier, the first person would generate and transmit a series of binary digits, imagining

moving their left arm for zero and their right arm for one. The second person was also

attached to an EEG amplifier and their PC would pick up the stream of binary digits and flash

an LED lamp at two different frequencies, one for zero and the other one for one. The pattern

of the flashing LEDs is too subtle to be picked by the second person, but it is picked up by

electrodes measuring the visual cortex of the recipient. The encoded information is then

extracted from the brain activity of the second user and the PC can decipher whether a zero or

a one was transmitted. This shows true brain-to-brain activity.

1.7.3 BCI Classification Competitions

11

http://computer.howstuffworks.com/second-life-job.htm

The first Brain-Computer-Interface Competition took place at the Laboratory for Intelligent

Imaging and Neural Computing of the Columbia University in 2002.

It was initialized to foster development of machine learning techniques and evaluate different

algorithms for BCI. The competition focused on classification and signal processing

algorithms. Data sets for several different BCI task were provided to be analyzed by the

participants.

The competition was a great success. Therefore it was repeated at the University of Graz,

Austria in 2003 and at the The Institute Computer Architecture and Software Technology of

the Fraunhofer Society in 2005.

1.8 Ethical and Moral Implications of BCI

As BCI technology goes mainstream, certain moral and ethical implications are sure to arise.

Who will have access to this potentially society disrupting technology? Clearly, individuals

equipped with BCI technology will be better positioned to excel in the world, and because

artificial brain augmentation will involve great expense, this technology will likely be

accessible to only those with great wealth.

Lawmakers and scientists must tread lightly where the potential for artificial augmentation of

human intelligence exists. In addition, assuming Kurzweil's predictions are accurate and

superhuman AIs are produced within the next fifty to one hundred years, how will "organic"

humans of average intelligence relate to both transhuman and posthuman intelligence?

Questions like this must be asked sooner than later - because later may be too late.

1.9 BCI Drawbacks

Although we already understand the basic principles behind BCIs, they don't work perfectly.

There are several reasons for this.

12

The brain is incredibly complex. To say that all thoughts or actions are the result of

simple electric signals in the brain is a gross understatement. There are about 100

billion neurons in a human brain . Each neuron is constantly sending and receiving

signals through a complex web of connections. There are chemical processes involved

as well, which EEGs can't pick up on.

The signal is weak and prone to interference. EEGs measure tiny voltage potentials.

Something as simple as the blinking eyelids of the subject can generate much stronger

signals. Refinements in EEGs and implants will probably overcome this problem to

some extent in the future, but for now, reading brain signals is like listening to a bad

phone connection. There's lots of static.

The equipment is less than portable. It's far better than it used to be -- early systems

were hardwired to massive mainframe computers. But some BCIs still require a wired

connection to the equipment, and those that are wireless require the subject to carry a

computer that can weigh around 10 pounds. Like all technology, this will surely

become lighter and more wireless in the future.

1.10 Literature Survey

Wireless Transmission

As BCI systems rely on wires snaking out from the skull, which would affect a person's

mobility and leave an opening in the scalp prone to infection. Wireless BCI would be much

more practical and could be implanted in several different areas of the brain to tap into more

neurons. A typical scheme would have electrodes penetrating brain tissue, picking up

neuronal electrical impulses, called spikes. A chip would amplify and process the signals and

transmit them over a broadband RF connection through the skull to a receiver. Then, just as

13

http://computer.howstuffworks.com/computer-channel.htm

http://electronics.howstuffworks.com/telephone.htm

http://health.howstuffworks.com/brain.htm

in wired systems, algorithms would decode these signals into commands for operating a

computer or a robot.

The key requirement for such a system is that it consume very little power to keep the heat

down. Most of the guidelines for implantable devices say that you should not raise the

surrounding tissue temperature by more than 1 C; otherwise, you'll kill the cells you're trying

to record from. Sending the complex analog impulses as they are would take up too much

bandwidth. So it will be necessary to convert them into a simpler, robust form as close as

possible to that of the neurons. Brown University neuroengineer Arto Nurmikko and his

colleagues were associated with start-up CyberkineticsNeurotechnology Systems, which did

the first human clinical trials of an implanted brain-computer interface. Now his team has a

promising wireless interface scheme, which they presented at the IEEE Engineering in

Medicine and Biology Conference (EMBC).

1.11 Current Projects

As already mentioned many projects are going on in the world of BCI research.

Here are some of them presented:

1.11.1 Berlin Brain-Computer-Interface (BBCI)

The Berlin Brain-Computer-Interface is a joint venture of several German research

organizations.

Members are:

The Institute Computer Architecture and Software Technology of the Fraunhofer

Society

The research group Intelligent Data Analysis (IDA)

The Neurophysics Research Group

The Technical University Berlin

14

The goal of the project is the development of an EEG based BCI system. The applications of

this system are on the one hand computer supported workplaces, to control a cursor via brain

waves and on the other hand tools for paralyzed or paraplegic people. The BBCI project aims

to shift the main learning effort to the computer. Therefore robust artificial learning and

signal processing algorithms need to be developed to classify and interpret the brain waves

correctly.

1.11.2 Graz Brain-Computer-Interface

The University of Graz, Austria has also a research project for Brain-Computer-Interfaces.

Its main topics are:

Brain-Computer-Interfaces

o Using EEG signals as input for computers.

Telemonitoring of BCIs

o Remote monitoring and administration of BCI systems

Brain Computer Interfaces

Combining BCI and virtual reality (VR) technology

o Using BCI systems to move in virtual realities

Functional electrical stimulation

o Stimulation of limbs by electrical signals

o The Graz Brain-Computer-Interface project partner of several international

research projects as:

Presenccia (European Union)

Eye2It (European Union)

Direct Brain Interface (National Institute of Health, USA)

15

2. Electroencephalogram (EEG)

2.1 Introduction

Electroencephalogram (EEG) is the recording of electrical activity along the scalp produced

by the firing of neurons within the brain. In clinical contexts, EEG refers to the recording of

the brain's spontaneous electrical activity over a short period of time, usually 20–40 minutes,

as recorded from multiple electrodes placed on the scalp.

2.1.1 Source of EEG activity

The electrical activity of the brain can be described in spatial scales from the currents within

a single dendritic spine to the relatively gross potentials that the EEG records from the scalp,

Neurons, or nerve cells, are electrically active cells that are primarily responsible for carrying

out the brain's functions. Neurons create action potentials, which are discrete electrical

signals that travel down axons and cause the release of chemical neurotransmitters at the

synapse, which is an area of near contact between two neurons. This neurotransmitter then

activates a receptor in the dendrite or body of the neuron that is on the other side of the

synapse, the post-synaptic neuron. The neurotransmitter, when combined with the receptor,

typically causes an electric current within the dendrite or body of the post-synaptic neuron.

Thousands of post-synaptic currents from a single neuron's dendrites and body then sum up

to cause the neuron to generate an action potential. This neuron then synapses on other

neurons, and so on.

16

EEG reflects correlated synaptic activity caused by post-synaptic potentials of cortical

neurons. The ionic currents involved in the generation of fast action potentials may not

contribute greatly to the averaged field potentials representing the EEG . More specifically,

the scalp electrical potentials that produce EEG are generally thought to be caused by the

extracellular ionic currents caused by dendritic electrical activity, whereas the fields

producing magnetoencephalographic signals are associated with intracellular ionic currents.

The electric potentials generated by single neurons are far too small to be picked by EEG or

MEG. EEG activity therefore always reflects the summation of the synchronous activity of

thousands or millions of neurons that have similar spatial orientation. Because voltage fields

fall off with the square of the distance, activity from deep sources is more difficult to detect

than currents near the skull.

Scalp EEG activity shows oscillations at a variety of frequencies. Several of these oscillations

have characteristic frequency ranges, spatial distributions and are associated with different

states of brain functioning (e.g., waking and the various sleep stages). These oscillations

represent synchronized activity over a network of neurons. The neuronal networks underlying

some of these oscillations are understood

2.2 History of EEG

A timeline of the history of EEG is given by Swartz. Richard Caton (1842–1926), a physician

practicing in Liverpool, presented his findings about electrical phenomena of the exposed

cerebral hemispheres of rabbits and monkeys in the British Medical Journal in 1875. In 1890,

Polish physiologist Adolf Beck published an investigation of spontaneous electrical activity

of the brain of rabbits and dogs that included rhythmic oscillations altered by light. In 1912,

Russian physiologist, Vladimir VladimirovichPravdich-Neminsky published the first EEG

and the evoked potential of the mammalian (dog).[30] In 1914, Napoleon Cybulski and

Jelenska-Macieszyna photographed EEG-recordings of experimentally induced seizures.

17

2.3 Uses of EEG

In neurology, the main diagnostic application of EEG is in the case of epilepsy, as epileptic

activity can create clear abnormalities on a standard EEG study. A secondary clinical use of

EEG is in the diagnosis of coma, encephalopathies, and brain death. EEG used to be a first-

line method for the diagnosis of tumors, stroke and other focal brain disorders, but this use

has decreased with the advent of anatomical imaging techniques such as MRI and CT.

2.3.1 Clinical use

A routine clinical EEG recording typically lasts 20–30 minutes (plus preparation time) and

usually involves recording from scalp electrodes. Routine EEG is typically used in the

following clinical circumstances:

• To distinguish epileptic seizures from other types of spells, such as psychogenic non-

epileptic seizures, syncope (fainting), sub-cortical movement disorders and migraine variants.

• To differentiate "organic" encephalopathy or delirium from primary psychiatric syndromes

such as catatonia.

• To serve as an adjunct test of brain death

• To prognosticate, in certain instances, in patients with coma

At times, a routine EEG is not sufficient, particularly when it is necessary to record a patient

while he/she is having a seizure. In this case, the patient may be admitted to the hospital for

days or even weeks, while EEG is constantly being recorded (along with time-synchronized

video and audio recording). A recording of an actual seizure (i.e., an ictal recording, rather

than an interictal recording of a possibly epileptic patient at some period between seizures)

can give significantly better information about whether or not a spell is an epileptic seizure

and the focus in the brain from which the seizure activity emanates.

Epilepsy monitoring is typically done:

18

• To distinguish epileptic seizures from other types of spells, such as psychogenic non-

epileptic seizures, syncope (fainting), sub-cortical movement disorders and migraine variants.

• To characterize seizures for the purposes of treatment

• To localize the region of brain from which a seizure originates for work-up of possible

seizure surgery

EEG may be used to monitor certain procedures:

• To monitor the depth of anesthesia

• As an indirect indicator of cerebral perfusion in carotid endarterectomy

• To monitor amobarbital effect during the Wada test

EEG can also be used in intensive care units for brain function monitoring:

To monitor for non-convulsive seizures/non-convulsive status epilepticus

If a patient with epilepsy is being considered for resective surgery, it is often necessary to

localize the focus (source) of the epileptic brain activity with a resolution greater than what is

provided by scalp EEG. This is because the cerebrospinal fluid, skull and scalp smear the

electrical potentials recorded by scalp EEG.

2.4 Method of Recording EEG

Encephalographic measurements employ recording system consisting of

Electrodes with conductive media

Amplifiers with filters

A/D converter

Recording device

In conventional scalp EEG, the recording is obtained by placing electrodes on the scalp with

a conductive gel or paste, usually after preparing the scalp area by light abrasion to reduce

impedance due to dead skin cells. Many systems typically use electrodes, each of which is

19

attached to an individual wire. Some systems use caps or nets into which electrodes are

embedded; this is particularly common when high-density arrays of electrodes are needed.

Electrode locations and names are specified by the International 10–20 System for most

clinical and research applications (except when high-density arrays are used). This system

ensures that the naming of electrodes is consistent across laboratories. In most clinical

applications, 19 recording electrodes (plus ground and system reference) are used.

20

High-density arrays (typically via cap or net) can contain up to 256 electrodes more-or-less

evenly spaced around the scalp. Each electrode is connected to one input of a differential

amplifier (one amplifier per pair of electrodes); a common system reference electrode is

connected to the other input of each differential amplifier. These amplifiers amplify the

voltage between the active electrode and the reference (typically 1,000–100,000 times, or 60–

100 dB of voltage gain). In analog EEG, the signal is then filtered and the EEG signal is

output as the deflection of pens as paper passes underneath. Most EEG systems these days,

however, are digital, and the amplified signal is digitized via an analog-to-digital converter,

after being passed through an anti-aliasing filter. Analog-to-digital sampling typically occurs

at 256–512 Hz in clinical scalp EEG; sampling rates of up to 20 kHz are used in some

research applications

The digital EEG signal is stored electronically and can be filtered for display. Typical settings

for the high-pass filter and a low-pass filter are 0.5-1 Hz and 35–70 Hz, respectively. The

high-pass filter typically filters out slow artifact, such as electrogalvanic signals and

movement artifact, whereas the low-pass filter filters out high-frequency artifacts, such as

21

electromyographic signals. An additional notch filter is typically used to remove artifact

caused by

electrical power lines A typical adult human EEG signal is about 10μV to 100 μV in

amplitude when measured from the scalp and is about 10–20 mV when measured from

subdural electrodes. Since an EEG voltage signal represents a difference between the

voltages at two electrodes, the display of the EEG for the reading encephalographer may be

set up in one of several ways. The representation of the EEG channels is referred to as a

montage.

In digital EEG all signals are typically digitized and stored in a particular (usually referential)

montage; since any montage can be constructed mathematically from any other, the EEG can

be viewed by the electroencephalographer in any display montage that is desired.

2.5 Limitations

EEG has several limitations. Most important is its poor spatial resolution. EEG is most

sensitive to a particular set of post-synaptic potentials. It is mathematically impossible to

reconstruct a unique intracranial current source for a given EEG signal, as some currents

produce potentials that cancel each other out. This is referred to as the inverse problem.

However, much work has been done to produce remarkably good estimates of, at least, a

localized electric dipole that represents the recorded currents.

2.5.1 Normal EEG

The EEG is typically described in terms of

(1) Rhythmic activity

(2) Transients.

22

The rhythmic activity is divided into bands by frequency. To some degree, these frequency

bands

are a matter of nomenclature (i.e., any rhythmic activity between 8–12 Hz can be described as

"alpha"), but these designations arose because rhythmic activity within a certain frequency

range was noted to have a certain distribution over the scalp or a certain biological

significance. Most of the cerebral signal observed in the scalp EEG falls in the range of 1–20

Hz (activity below or above this range is likely to be artifactual, under standard clinical

recording techniques).

TYPE FREQUENCY LOCATION NORMALLY

Delta Upto 4 frontally in adults,

posteriorly in children;

high amplitude waves

adults slow wave sleep

in babies

during some continuous

attention task

Theta 4 - <8 Found in locations not

related to task at hand

young children

drowsiness or arousal in

older children and adults

idling

Alpha 8 – 13 posterior regions of

head, both sides, higher

in amplitude on

dominant side. Central

sites (c3-c4) at rest .

relaxed/reflecting

closing the eyes

Also associated with

inhibition control,

seemingly with the

purpose of timing

23

inhibitory activity in

different locations across

the brain

Beta >13 – 30 both sides, symmetrical

distribution, most

evident frontally; low

amplitude waves

alert/working

active, busy or anxious

thinking, active

concentration

Gamma 30 – 100+ Somatosensory cortex Displays during cross-

modal sensory processing

(perception that combines

two different senses, such

as sound and sight)

Also is shown during short

term memory matching of

recognized objects,

sounds, or tactile

sensations (Herrmann,

Frund, & Lenz 2009)

Mu 8 – 13 Sensorimotor cortex Shows rest state motor

neurons .

2.5.2 Wave patterns

24

2.6 Artifacts

2.6.1 Biological artifacts

Electrical signals detected along the scalp by an EEG, but that originate from non-cerebral

origin are called artifacts. EEG data is almost always contaminated by such artifacts. The

amplitude of artifacts can be quite large relative to the size of amplitude of the cortical signals

of interest. This is one of the reasons why it takes considerable experience to correctly

interpret EEGs clinically. Some of the most common types of biological artifacts include:

• Eye-induced artifacts (includes eye blinks, eye movements and extra-ocular muscle activity)

• EKG (cardiac) artifacts

• EMG (muscle activation)-induced artifacts

• Glossokinetic artifacts

The most prominent eye-induced artifacts are caused by the potential difference between the

cornea and retina, which is quite large compared to cerebral potentials. When the eyes and

eyelids are completely still, this is the most prominent eye-induced artifacts are caused by the

potential difference between the cornea and retina, which is quite large compared to cerebral

potentials. When the eyes and eyelids are completely still, this corneo-retinal dipole does not

affect EEG. However, blinks occur several times per minute, the eyes movements occur

several times per second. Eyelid movements, occurring mostly during blinking or vertical eye

25

movements, elicit a large potential seen mostly in the difference between the

Electrooculography (EOG) channels above and below the eyes.

2.6.2 Environmental artifacts

In addition to artifacts generated by the body, many artifacts originate from outside the body.

Movement by the patient, or even just settling of the electrodes, may cause electrode pops,

spikes originating from a momentary change in the impedance of a given electrode. Poor

grounding of the EEG electrodes can cause significant 50 or 60 Hz artifact, depending on the

local power system’s frequency.

2.6.3 Removing Artifacts from EEG

Severe contamination of EEG activity by eye movements, blinks, muscle, heart and line noise is a

serious problem for EEG interpretation and analysis. Many methods have been proposed to remove

eye movement and blink artifacts from EEG recordings:

Simply rejecting contaminated EEG epochs results in a considerable loss of collected

information.

Since many noise sources, include muscle noise, electrode noise and line noise, have no clear

reference channels, regression methods cannot be used to removed them.

A new and often preferable alternative is to apply ICA to multichannel EEG recordings and remove a

wide variety of artifacts from EEG records by eliminating the contributions of artifactual sources onto

the scalp sensors. It show that ICA can effectively detect, separate and remove activity in EEG

records from a wide variety of artifactual sources, with results comparing favorably to those obtained

using regression.

2.7 Uses of EEG

The EEG has been used for many purposes besides the conventional uses of clinical diagnosis

and conventional cognitive neuroscience. Long-term EEG recordings in epilepsy patients are

used for seizure prediction. Neurofeedback remains an important extension, and in its most

26

advanced form is also attempted as the basis of brain computer interfaces. The EEG is also

used quite extensively in the field of neuromarketing. There are many commercial products

substantially based on the EEG. Honda is attempting to develop a system to move its Asimo

robot using EEG, a technology it eventually hopes to incorporate into its automobiles.

EEGs have been used as evidence in trials

2.7.1 EEG and Telepathy

DARPA has budgeted $4 million in 2009 to investigate technology to enable soldiers on the

battlefield to communicate via computer-mediated telepathy. The aim is to analyze neural

signals that exist in the brain before words are spoken.

2.7.2 Games

Recently a few companies have scaled back medical grade EEG technology (and in one case,

NeuroSky, rebuilt the technology from the ground up) to create inexpensive devices based on

EEG. Two of these companies, NeuroSky and OCZ, have even built commercial EEG

devices retailing for under 100$.

• In 2007 NeuroSky released the first affordable consumer based EEG along with the game

NeuroBoy. This was also the first large scale EEG device to use dry sensor technology.

• In 2008 OCZ Technology developed device for use in video games relying primarily on

electromyography.

• In 2009 Mattel partnered with NeuroSky to release the Mindflex, a game that used an EEG

to steer a ball through an obstacle course. By far the best selling consumer based EEG to

date.

• In 2010 NeuroSky added blink an electromyography function to the MindSet.

27

2.8 Brain Computer Interface using EEG Signals:

Brain-computer interface (BCI) is an emerging technology which aims to convey people's

intentions to the outside world directly from their thoughts It is especially appealing to

severely paralyzed patients, since motor ability is no longer a prerequisite for this

communication. It also offers a promising tool for normal people to enhance their

communications with computers. It has not only introduced new dimensions in machine

control but the researchers round the globe are still exploring the possible uses of such

applications.

BCIs have given a hope where alternative communication channels can be created for the

persons having severe motor disabilities.

28

3. Motor Imagery

Motor imagery can be defined as a dynamic state during which an individual mentally

simulates a given action. This type of phenomenal experience implies that the subject feels

herself/himself performing the action. It corresponds to the so called internal imagery (or first

person perspective) of sport psychologists.

3.1 The effects of motor imagery

Motor imagery is now widely used as a technique to enhance motor learning and to improve

neurological rehabilitation in patients after stroke. Its effectiveness has been demonstrated in

musicians.

On motor learning: Motor imagery is an accepted procedure in the preparation of

athletes. Such practice usually covers a warming up period, relaxation and

concentration, and then mental simulation of the specific movement

In neurological rehabilitation: There is some evidence to suggest that motor imagery

provides additional benefits to conventional physiotherapy or occupational therapy.

However, a recent systematic review indicates that there is modest evidence

supporting the additional benefit of motor imagery compared to only conventional

physiotherapy in patients with stroke. These authors concluded that motor imagery

appears to be an attractive treatment opinion, easy to learn and to apply and the

intervention is neither physically exhausting nor harmful. Therefore, motor imagery

may generate additional benefit for patients

29

http://en.wikipedia.org/wiki/Physiotherapy

http://en.wikipedia.org/wiki/Relaxation_(psychology)

http://en.wikipedia.org/wiki/Stroke

http://en.wikipedia.org/wiki/Neurological_rehabilitation

http://en.wikipedia.org/wiki/Learning

4. Independent Component Analysis (ICA)

4.1 Motivation

Imagine that you are in a room where two people are speaking simultaneously. You have two

microphones, which you hold in different locations. The microphones give you two recorded

time signals, which we could denote by x1(t) and x2(t), with x1 and x2 the amplitudes, and t the

time index. Each of these recorded signals is a weighted sum of the speech signals emitted by

the two speakers, which we denote by s1(t) and s2(t). We could express this as a linear

equation:

where a11,a12,a21, and a22 are some parameters that depend on the distances of the

microphones from the speakers. It would be very useful if you could now estimate the two

original speech signals s1(t) and s2(t), using only the recorded signals x1(t) and x2(t). This is

called the cocktail-party problem. For the time being, we omit any time delays or other extra

factors from our simplified mixing model.

As an illustration, consider the waveforms in Fig. 4.1 and Fig. 4.2. These are, of course, not

realistic speech signals, but suffice for this illustration. The original speech signals could look

something like those in Fig. 4.1 and the mixed signals could look like those in Fig. 4.2. The

problem is to recover the data in Fig. 4.1 using only the data in Fig. 4.2.

30

Figure 4.1: The original signals.

Figure 4.2: The observed mixtures of the source signals in Fig. 4.1.

Figure 4.3: The estimates of the original source signals, estimated using only the observed signals in Fig. 4.2. The original signals were very accurately estimated, up to multiplicative signs.

31

Actually, if we knew the parameters aij, we could solve the linear equation in (1) by classical

methods. The point is, however, that if you don't know the aij, the problem is considerably

more difficult.

One approach to solving this problem would be to use some information on the statistical

properties of the signals si(t) to estimate the aii. Actually, and perhaps surprisingly, it turns out

that it is enough to assume that s1(t) and s2(t), at each time instant t, are statistically

independent. This is not an unrealistic assumption in many cases, and it need not be exactly

true in practice. The recently developed technique of Independent Component Analysis, or

ICA, can be used to estimate the aij based on the information of their independence, which

allows us to separate the two original source signals s1(t) and s2(t) from their mixtures x1(t)

and x2(t). Fig.4.3 gives the two signals estimated by the ICA method. As can be seen, these

are very close to the original source signals (their signs are reversed, but this has no

significance.)

Independent component analysis was originally developed to deal with problems that are

closely related to the cocktail-party problem. Since the recent increase of interest in ICA, it

has become clear that this principle has a lot of other interesting applications as well.

Consider, for example, electrical recordings of brain activity as given by an

electroencephalogram (EEG). The EEG data consists of recordings of electrical potentials in

many different locations on the scalp. These potentials are presumably generated by mixing

some underlying components of brain activity. This situation is quite similar to the cocktail-

party problem: we would like to find the original components of brain activity, but we can

only observe mixtures of the components. ICA can reveal interesting information on brain

activity by giving access to its independent components.

32

4.3 Introduction

To rigorously define ICA , we can use a statistical ``latent variables'' model. Assume that we

observe n linear mixtures x1,...,xn of n independent components

(1)

We have now dropped the time index t; in the ICA model, we assume that each mixture xj as

well as each independent component sk is a random variable, instead of a proper time signal.

The observed values xj(t), e.g., the microphone signals in the cocktail party problem, are then

a sample of this random variable. Without loss of generality, we can assume that both the

mixture variables and the independent components have zero mean: If this is not true, then

the observable variables xi can always be centered by subtracting the sample mean, which

makes the model zero-mean.

It is convenient to use vector-matrix notation instead of the sums like in the previous

equation. Let us denote by the random vector whose elements are the mixtures x1, ..., xn,

and likewise by the random vector with elements s1, ... , sn. Let us denote by the matrix

with elements aij. Generally, bold lower case letters indicate vectors and bold upper-case

letters denote matrices. All vectors are understood as column vectors; thus , or the

transpose of , is a row vector. Using this vector-matrix notation, the above mixing model is

written as

(2)

Sometimes we need the columns of matrix ; denoting them by aj the model can also be

written as

33

(3)

The statistical model in Eq. 4 is called independent component analysis, or ICA model. The

ICA model is a generative model, which means that it describes how the observed data are

generated by a process of mixing the components si. The independent components are latent

variables, meaning that they cannot be directly observed. Also the mixing matrix is assumed

to be unknown. All we observe is the random vector , and we must estimate both and

using it. This must be done under as general assumptions as possible.

The starting point for ICA is the very simple assumption that the components si are

statistically independent.. It will be seen below that we must also assume that the independent

component must have nongaussian distributions. However, in the basic model we do not

assume these distributions known (if they are known, the problem is considerably simplified.)

For simplicity, we are also assuming that the unknown mixing matrix is square, but this

assumption can be sometimes relaxed. Then, after estimating the matrix , we can compute

its inverse, say , and obtain the independent component simply by:

(4)

ICA is very closely related to the method called blind source separation (BSS) or blind signal

separation. A ``source'' means here an original signal, i.e. independent component, like the

speaker in a cocktail party problem. ``Blind'' means that we no very little, if anything, on the

mixing matrix, and make little assumptions on the source signals. ICA is one method,

perhaps the most widely used, for performing blind source separation.

In many applications, it would be more realistic to assume that there is some noise in the

measurements, which would mean adding a noise term in the model. For simplicity, we omit

34

any noise terms, since the estimation of the noise-free model is difficult enough in itself, and

seems to be sufficient for many applications.

4.3 Assumptions

ICA-based artifact correction can separate and remove a wide variety of artifacts from EEG

data by linear decomposition. The ICA method is based on the assumptions that the time

series recorded on the scalp:

Are spatially stable mixtures of the activities of temporally independent cerebral and

artifactual sources, that

The summation of potentials arising from different parts of the brain, scalp, and body

is linear at the electrodes, and that

Propagation delays from the sources to the electrodes are negligible.

The method uses spatial filters derived by the ICA algorithm, and does not require a reference

channel for each artifact source. Once the independent time courses of different brain and

artifact sources are extracted from the data, artifact-corrected EEG signals can be derived by

eliminating the contributions of the artifactual sources.

4.4 Ambiguities of ICA

In the ICA model in Eq.(2) , it is easy to see that the following ambiguities will hold:

We cannot determine the variances (energies) of the independent components.

The reason is that, both and being unknown, any scalar multiplier in one of the

sources si could always be cancelled by dividing the corresponding column ai of by

the same scalar; see. As a consequence, we may quite as well fix the magnitudes of

the independent components; as they are random variables, the most natural way to do

35

this is to assume that each has unit variance: E{ Si2}=1. Then the matrix will be

adapted in the ICA solution methods to take into account this restriction. Note that

this still leaves the ambiguity of the sign: we could multiply the an independent

component by -1 without affecting the model. This ambiguity is, fortunately,

insignificant in most applications.

We cannot determine the order of the independent components.

The reason is that, again both and being unknown, we can freely change the order

of the terms in the sum in, and call any of the independent components the first one.

Formally, a permutation matrix and its inverse can be substituted in the model to

give . The elements of are the original independent variables sj, but

in another order. The matrix is just a new unknown mixing matrix, to be

solved by the ICA algorithms.

4.5 What is independence?

4.5.1 Definition and fundamental properties

To define the concept of independence, consider two scalar-valued random variables y1 and

y2. Basically, the variables y1and y2 are said to be independent if information on the value of

y1 does not give any information on the value of y2, and vice versa. Above, we noted that this

is the case with the variables s1, s2 but not with the mixture variables x1, x2.

Technically, independence can be defined by the probability densities. Let us denote by

p(y1,y2) the joint probability density function (pdf) of y1 and y2. Let us further denote by p1(y1)

the marginal pdf of y1, i.e. the pdf of y1 when it is considered alone:

36

(5)

and similarly for y2. Then we define that y1 and y2 are independent if and only if the joint pdf

is factorizable in the following way:

p(y1,y2)=p1(y1)p2(y2). (6)

This definition extends naturally for any number n of random variables, in which case the

joint density must be a product of n terms.

The definition can be used to derive a most important property of independent random

variables. Given two functions, h1 and h2, we always have

(7)

This can be proven as follows:

4.5.2 Uncorrelated variables are only partly independent

A weaker form of independence is uncorrelatedness. Two random variables y1and y2 are said

to be uncorrelated, if their covariance is zero:

(8)

If the variables are independent, they are uncorrelated, which follows directly from Eq. (9),

taking h1(y1)=y1 and h2(y2)=y2.

37

On the other hand, uncorrelatedness does not imply independence. For example, assume that

(y1,y2) are discrete valued and follow such a distribution that the pair are with probability 1/4

equal to any of the following values: (0,1),(0,-1),(1,0),(-1,0). Then y1 and y2 are uncorrelated,

as can be simply calculated. On the other hand,

(9)

so the condition in Eq. (9) is violated, and the variables cannot be independent.

Since independence implies uncorrelatedness, many ICA methods constrain the estimation

procedure so that it always gives uncorrelated estimates of the independent components. This

reduces the number of free parameters, and simplifies the problem.

4.6 Why Gaussian variables are forbidden

The fundamental restriction in ICA is that the independent components must be nongaussian

for ICA to be possible.

To see why gaussian variables make ICA impossible, assume that the mixing matrix is

orthogonal and the si are gaussian. Then x1 and x2 are gaussian, uncorrelated, and of unit

variance. Their joint density is given by

(10)

This distribution is illustrated in Fig. 4.4. The Figure shows that the density is completely

symmetric. Therefore, it does not contain any information on the directions of the columns of

the mixing matrix . This is why cannot be estimated.

38

Figure 4.4: The multivariate distribution of

two independent gaussian variables.

More rigorously, one can prove that the distribution of any orthogonal transformation of the

gaussian (x1,x2) has exactly the same distribution as (x1,x2), and that x1 and x2 are independent.

Thus, in the case of gaussian variables, we can only estimate the ICA model up to an

orthogonal transformation. In other words, the matrix is not identifiable for gaussian

independent components. (Actually, if just one of the independent components is gaussian,

the ICA model can still be estimated.)

4.8 Principles of ICA estimation

Nongaussian is independent

Intuitively speaking, the key to estimating the ICA model is nongaussianity. Actually,

without nongaussianity the estimation is not possible at all. This is at the same time probably

the main reason for the rather late resurgence of ICA research: In most of classical statistical

theory, random variables are assumed to have gaussian distributions, thus precluding any

methods related to ICA.

39

The Central Limit Theorem, a classical result in probability theory, tells that the distribution

of a sum of independent random variables tends toward a gaussian distribution, under certain

conditions. Thus, a sum of two independent random variables usually has a distribution that is

closer to gaussian than any of the two original random variables.

Let us now assume that the data vector is distributed according to the ICA data model in

Eq. 4, i.e. it is a mixture of independent components. For simplicity, let us assume in this

section that all the independent components have identical distributions. To estimate one of

the independent components, we consider a linear combination of the xi . Let us denote this

by ,y=wTx= ∑wixi where is a vector to be determined. If were one of the rows of the

inverse of , this linear combination would actually equal one of the independent

components. The question is now: How could we use the Central Limit Theorem to

determine so that it would equal one of the rows of the inverse of ? In practice, we

cannot determine such a exactly, because we have no knowledge of matrix , but we can

find an estimator that gives a good approximation.

To see how this leads to the basic principle of ICA estimation, let us make a change of

variables, defining . Then we have y=wTx=wTAs=zTs. y is thus a linear

combination of si, with weights given by zi. Since a sum of even two independent random

variables is more gaussian than the original variables, is more gaussian than any of the si

and becomes least gaussian when it in fact equals one of the si. In this case, obviously only

one of the elements zi of is nonzero. (Note that the si were here assumed to have identical

distributions.)

Therefore, we could take as a vector that maximizes the nongaussianity of . Such a

vector would necessarily correspond (in the transformed coordinate system) to a which has

40

http://cis.legacy.ics.tkk.fi/aapo/papers/IJCNN99_tutorialweb/node3.html#xAs

only one nonzero component. This means that equals one of the independent

components.

Maximizing the nongaussianity of thus gives us one of the independent components. In

fact, the optimization landscape for nongaussianity in the n-dimensional space of vectors

has 2 nlocal maxima, two for each independent component, corresponding to si and -si (recall

that the independent components can be estimated only up to a multiplicative sign). To find

several independent components, we need to find all these local maxima. This is not difficult,

because the different independent components are uncorrelated: We can always constrain the

search to the space that gives estimates uncorrelated with the previous ones. This corresponds

to orthogonalization in a suitably transformed (i.e. whitened) space.

4.8 Measures of nongaussianity

To use nongaussianity in ICA estimation, we must have a quantitative measure of

nongaussianity of a random variable, say y. To simplify things, let us assume that y is

centered (zero-mean) and has variance equal to one.

4.8.1 Kurtosis

The classical measure of nongaussianity is kurtosis or the fourth-order cumulant. The kurtosis

of y is classically defined by

(11)

Actually, since we assumed that y is of unit variance, the right-hand side simplifies to E{y4

}-3. This shows that kurtosis is simply a normalized version of the fourth moment E{y4} . For

a gaussian y, the fourth moment equals 3E({y4})2. Thus, kurtosis is zero for a gaussian

41

random variable. For most (but not quite all) nongaussian random variables, kurtosis is

nonzero.

Kurtosis can be both positive or negative. Random variables that have a negative kurtosis are

called subgaussian, and those with positive kurtosis are called supergaussian. In statistical

literature, the corresponding expressions platykurtic and leptokurtic are also used.

Supergaussian random variables have typically a ``spiky'' pdf with heavy tails, i.e. the pdf is

relatively large at zero and at large values of the variable, while being small for intermediate

values. A typical example is the Laplace distribution, whose pdf (normalized to unit variance)

is given by

(12)

This pdf is illustrated in Fig. 5. Subgaussian random variables, on the other hand, have

typically a ``flat'' pdf, which is rather constant near zero, and very small for larger values of

the variable.

Figure 4.5: The density function of the Laplace distribution, which is a typical

supergaussian distribution. For comparison, the gaussian density is given by a dashed line.

Both densities are normalized to unit variance.

42

Typically nongaussianity is measured by the absolute value of kurtosis. The square of

kurtosis can also be used. These are zero for a gaussian variable, and greater than zero for

most nongaussian random variables. There are nongaussian random variables that have zero

kurtosis, but they can be considered as very rare.

Kurtosis, or rather its absolute value, has been widely used as a measure of nongaussianity in

ICA and related fields. The main reason is its simplicity, both computational and theoretical.

Computationally, kurtosis can be estimated simply by using the fourth moment of the sample

data. Theoretical analysis is simplified because of the following linearity property: If x1 and x2

are two independent random variables, it holds

(13)

and

(14)

where is a scalar. These properties can be easily proven using the definition.

43

To illustrate in a simple example what the optimization landscape for kurtosis looks like, and

how independent components could be found by kurtosis minimization or maximization, let

us look at a 2-dimensional model . Assume that the independent components s1,

s2have kurtosis values Kurt(S1), Kurt(S2), respectively, both different from zero. Remember

that we assumed that they have unit variances. We seek for one of the independent

components as y=wTx

Let us again make the transformation . Then we have y=wTx=wTAs=zTs=z1s1+z2s2.

Now, based on the additive property of kurtosis, we have

.

On the other hand, we made the constraint that the variance of y is equal to 1, based on the

same assumption concerning s1, s2. This implies a constraint on : E{y2}=Z21 +Z2

2=1

Geometrically, this means that vector is constrained to the unit circle on the 2-dimensional

plane. The optimization problem is now: what are the maxima of the function │kurt(y)│=│

z14kurt(s1)+z2

4 kurt(s2)│on the unit circle? For simplicity, you may consider that the kurtosis

are of the same sign, in which case it absolute value operators can be omitted. The graph of

this function is the "optimization landscape" for the problem.

It is not hard to show that the maxima are at the points when exactly one of the elements of

vector is zero and the other nonzero; because of the unit circle constraint, the nonzero

element must be equal to 1 or -1. But these points are exactly the ones when y equals one of

the independent components ±Si, and the problem has been solved.

In practice we would start from some weight vector , compute the direction in which the

kurtosis of y=wTs is growing most strongly (if kurtosis is positive) or decreasing most

strongly (if kurtosis is negative) based on the available sample x(1),….x(T) of mixture vector

44

, and use a gradient method or one of their extensions for finding a new vector . The

example can be generalized to arbitrary dimensions, showing that kurtosis can theoretically

be used as an optimization criterion for the ICA problem.

However, kurtosis has also some drawbacks in practice, when its value has to be estimated

from a measured sample. The main problem is that kurtosis can be very sensitive to outliers.

Its value may depend on only a few observations in the tails of the distribution, which may be

erroneous or irrelevant observations. In other words, kurtosis is not a robust measure of

nongaussianity.

Thus, other measures of nongaussianity might be better than kurtosis in some situations.

Below we shall consider negentropy whose properties are rather opposite to those of kurtosis,

and finally introduce approximations of negentropy that more or less combine the good

properties of both measures.

4.8.2 Negentropy

A second very important measure of nongaussianity is given by negentropy. Negentropy is

based on the information-theoretic quantity of (differential) entropy.

Entropy is the basic concept of information theory. The entropy of a random variable can be

interpreted as the degree of information that the observation of the variable gives. The more

``random'', i.e. unpredictable and unstructured the variable is, the larger its entropy. More

rigorously, entropy is closely related to the coding length of the random variable, in fact,

under some simplifying assumptions, entropy is the coding length of the random variable.

Entropy H is defined for a discrete random variable Y as

(15)

45

where the ai are the possible values of Y. This very well-known definition can be generalized

for continuous-valued random variables and vectors, in which case it is often called

differential entropy. The differential entropy H of a random vector y with density f(y) is

defined as

(16)

A fundamental result of information theory is that a gaussian variable has the largest entropy

among all random variables of equal variance. This means that entropy could be used as a

measure of nongaussianity. In fact, this shows that the gaussian distribution is the ``most

random'' or the least structured of all distributions. Entropy is small for distributions that are

clearly concentrated on certain values, i.e., when the variable is clearly clustered, or has a pdf

that is very ``spiky''.

To obtain a measure of nongaussianity that is zero for a gaussian variable and always

nonnegative, one often uses a slightly modified version of the definition of differential

entropy, called negentropy. Negentropy J is defined as follows

(17)

Where ygauss is a Gaussian random variable of the same covariance matrix as y. Due to the

above-mentioned properties, negentropy is always non-negative, and it is zero if and only if y

has a Gaussian distribution. Negentropy has the additional interesting property that it is

invariant for invertible linear transformations.

The advantage of using negentropy, or, equivalently, differential entropy, as a measure of

nongaussianity is that it is well justified by statistical theory. In fact, negentropy is in some

sense the optimal estimator of nongaussianity, as far as statistical properties are concerned.

46

The problem in using negentropy is, however, that it is computationally very difficult.

Estimating negentropy using the definition would require an estimate (possibly

nonparametric) of the pdf.

4.9 Maximum Likelihood Estimation

4.9.1 The likelihood

A very popular approach for estimating the ICA model is maximum likelihood estimation,

which is closely connected to the infomax principle. Here we discuss this approach, and show

that it is essentially equivalent to minimization of mutual information. It is possible to

formulate directly the likelihood in the noise-free ICA model, and then estimate the model by

a maximum likelihood method. Denoting by W=(w1,….wn)T the matrix , the log-

likelihood takes the form:

(18)

where the fi are the density functions of the si (here assumed to be known), and the x(t),t=1,

….,T are the realizations of . The term log│detW│in the likelihood comes from the classic

rule for (linearly) transforming random variables and their densities: In general, for any

random vector with density px and for any matrix , the density of y=Wx is given by

pz(Wx)│detW│

4.9.2 The Infomax Principle

Another related contrast function was derived from a neural network viewpoint in. This was

based on maximizing the output entropy (or information flow) of a neural network with non-

linear outputs. Assume that is the input to the neural network whose outputs are of the form

47

gi(w iT x), where the gi are some non-linear scalar functions, and the wi are the weight vectors

of the neurons. One then wants to maximize the entropy of the outputs:

(19)

If the gi are well chosen, this framework also enables the estimation of the ICA model.

Indeed, several authors, , proved the surprising result that the principle of network entropy

maximization, or ``infomax'', is equivalent to maximum likelihood estimation. This

equivalence requires that the non-linearities gi used in the neural network are chosen as the

cumulative distribution functions corresponding to the densities fi, i.e., gi'(.)=fi(.).

4.10 Preprocessing for ICA

In the preceding section, we discussed the statistical principles underlying ICA methods..

However, before applying an ICA algorithm on the data, it is usually very useful to do some

preprocessing. In this section, we discuss some preprocessing techniques that make the

problem of ICA estimation simpler and better conditioned.

4.10.1 Centering

The most basic and necessary preprocessing is to center , i.e. subtract its mean vector m =

E{x} so as to make a zero-mean variable. This implies that is zero-mean as well, as can

be seen by taking expectations on both sides of Eq. (4).

This preprocessing is made solely to simplify the ICA algorithms: It does not mean that the

mean could not be estimated. After estimating the mixing matrix with centered data, we

can complete the estimation by adding the mean vector of back to the centered estimates of

. The mean vector of is given by , where is the mean that was subtracted in the

preprocessing.

48

http://cis.legacy.ics.tkk.fi/aapo/papers/IJCNN99_tutorialweb/node3.html#xAs

4.10.2 Whitening

Another useful preprocessing strategy in ICA is to first whiten the observed variables. This

means that before the application of the ICA algorithm (and after centering), we transform

the observed vector linearly so that we obtain a new vector which is white, i.e. its

components are uncorrelated and their variances equal unity. In other words, the covariance

matrix of equals the identity matrix:

(20)

The whitening transformation is always possible. One popular method for whitening is to use

the eigen-value decomposition (EVD) of the covariance matrix E{xxT}=EDET, where is

the orthogonal matrix of eigenvectors of E{xxT} and is the diagonal matrix of its

eigenvalues, D=diag(d1,…dn). Note that E{xxT} can be estimated in a standard way from the

available sample x(1),….,x(T) .Whitening can now be done by

(21)

Where the matrix is computed by a simple component-wise operation as

.

It is easy to check that now. E{xxT}=I

Whitening transforms the mixing matrix into a new one, . We have

(22)

The utility of whitening resides in the fact that the new mixing matrix is orthogonal. This

can be seen from

49

(23)

Here we see that whitening reduces the number of parameters to be estimated. Instead of

having to estimate the n2 parameters that are the elements of the original matrix , we only

need to estimate the new, orthogonal mixing matrix . An orthogonal matrix contains n(n-

1)/2degrees of freedom. For example, in two dimensions, an orthogonal transformation is

determined by a single angle parameter. In larger dimensions, an orthogonal matrix contains

only about half of the number of parameters of an arbitrary matrix. Thus one can say that

whitening solves half of the problem of ICA. Because whitening is a very simple and

standard procedure, much simpler than any ICA algorithms, it is a good idea to reduce the

complexity of the problem this way.

It may also be quite useful to reduce the dimension of the data at the same time as we do the

whitening. Then we look at the eigenvalues dj of E{xxT} and discard those that are too small,

as is often done in the statistical technique of principal component analysis. This has often the

effect of reducing noise. Moreover, dimension reduction prevents overlearning, which can

sometimes be observed in ICA.

50

5. Joint Approximate Diagonalization of Eigen Matrices (JADE)

5.1 Joint Diagonalization

The joint diagonalization of a set of square matrices consists in finding the orthonormal

change of basis which makes the matrices as diagonal as possible. When all the matrices in

the set commute, this can be achieved exactly. When this is not the case, it is always possible

to optimize a joint diagonality criterion. This defines an approximate joint diagonalization.

When the matrices in the set are `almost exactly jointly diagonalizable', this approach also

defines something like the `average eigen-spaces' of the matrix set.

5.1.1 The algorithm

For off-line ICA, we have an algorithm based on the (joint) diagonalization of cumulant

matrices developed by Cardoso called Joint Diagonalization of Eign Matrices. `Good'

statistical performance is achieved by involving all the cumulants of order 2 and 4 while a

fast optimization is obtained by the device of joint diagonalization.

JADE has been successfully applied to the processing of real data sets, such as found in

mobile telephony and in airport radar as well as to bio-medical signals (ECG, EEG, multi-

electrode neural recordings).It is very efficient for operation when there are small number of

observations.

The strongest point of JADE for applications of ICA is that it works off-the-shelf (no

parameter tuning).The weakest point of the current implementation is that the number of

sources (but not of sensors) is limited in practice (by the available memory) to something like

40 or 50 depending on your computer.

51

The JADE algorithm can be summarized as follows:

Initialization : Estimate whitening as W and set Z=WX

The covariance matrix is defined as Rx=E (XXT) , where E is the mathematical

expectation function. Denoting D as the diagonal matrix of its eign values and H as

the corresponding eignvectors , a whitening matrix is

W=HD(-1/2)HT

Form Statistics : Estimate a maximal set {QZ}of the cumulant matrix.

Given n x 1 random vector Z and any n x n matrix M, the cumulant matrix is given as:

QZ(M)=E{(zTMz)zzT}-Rz tr(MRz)-RzMz-RzMTRz

Optimize orthogonal contrast : Find the rotation matrix U such that the cumulant

matrix is as diagonal as possible.

Separate : Estimate A as V=UW-1 and source as V=U-1X.

52

JADE ALGORITHM

53

START

Initialize:Estimate whitening W and set Z=WX, where X is the measured signal vector.

STOP

Separate: Estimate A as V=UW-1 and the source as V=U-1X.

Optimize Orthogonal Contrast: Find rotation matrix U such that cumulant matrix is as diagonal as possible.

Form Statistics: Estimate maximal set {QZ} of the cumulant matrix.

5.2 Motor Imagery EEG classification using JADE

Recently the mu rhythm by motor imagination has been used as a reliable EEG pattern for

brain computer interface (BCI) system. To motor – imagery – based BCI , feature extraction

and classification are two critical stages. The use of ICA in the form of JADE algorithm

provides us with the mixing matrix coefficients of the input EEG signals. By using the feature

extraction patterns based on the total energy of dynamic mixing coefficients in a certain time

window , the classification accuracy without training can be achieved beyond 85%. The

results demonstrate that this method can be used for extraction and classification of motor

imagery EEG.

The use of ICA on EEG motor imagery classification exploits the fact that the energy content

in the mu rhythm wave is different when a person imagines right movement as compared to

the energy content for left movement imagination. This can be known from the coefficients

of the feature extraction matrix found from the mixing matrix.

The Ipsilateral and Contralateral area of the brain are responsible for the left and right

imagination. Hence the signals from these areas are of importance to us. So from all the

channels of the EEG mask , the probes on the Ipsilateral and Contralateral area (C3 , C4) are

selected.

Let the signal from C3 , C4 be XC3, XC4. This is signal is a mixture signals from C3 , C4 where

the original signals are S1, S2. This is given by :

The mixing matrix A=

54

By repeated study , it was found that when a person performs the left (right) hand motor

imagery task, the value of a11 and a12 is more(less) than the value of a21 and a22. This result

may be explained as : When the person imagines the hand movement , the mu rhythm is

enhanced over Ipsilateral area and constrained over Contralateral are of cerebral cortex.

Therefore , as the feature patterns for classification , we choose the total energy of mixing

matrix coefficients which can be understood as the sum of “instantaneous energy” on C3 and

C4 channel.

So a simple classification rule can be proposed by comparison of the values of f C3 and fC4.

Depending upon which is higher for the given signal fC3 or fC4, one can conclude if the person

is imagining right or left.

6. Linear Discriminant Analysis (LDA)

6.1 Linear Discriminant Analysis, two-classes

The objective of LDA is to perform dimensionality reduction while

55

preserving as much of the class discriminatory information as possible.

Assume we have a set of D-dimensional samples {x(1, x(2, …, x(N}, N1 of which belong to class

ω1, and N2 to class ω2. We seek to obtain a scalar y by projecting the samples x onto a line

y = wTx

Of all the possible lines we would like to select the one that maximizes the separability of the

scalars. This is illustrated for the two-dimensional case in the following figures

In order to find a good projection vector, we need to define a measure

of separation between the projections.

The mean vector of each class in x and y feature space is

56

(6.1)

We could then choose the distance between the projected means as our objective function

(6.2)

However, the distance between the projected means is not a very good measure since it does

not take into account the standard deviation within the classes.

The solution proposed by Fisher is to maximize a function that represents

the difference between the means, normalized by a measure of the within-

class scatter.

For each class we define the scatter, an equivalent of the variance, as

57

(6.3)

where the quantity is called the within-class scatter of the projected examples.

The Fisher linear discriminant is defined as the linear function wTx that maximizes the

criterion function

(6.4)

Therefore, we will be looking for a projection where examples from the same class are

projected very close to each other and, at the same time, the projected means are as farther

apart as possible.

In order to find the optimum projection w*, we need to express J(w) as an explicit function of

w.

We define a measure of the scatter in multivariate feature space x, which are scatter matrices

(6.5)

58

The scatter of the projection y can then be expressed as a function of the scatter matrix in

feature space x

(6.6)

Similarly, the difference between the projected means can be expressed in terms of the means

in the original feature space

(6.7)

The matrix SB is called the between-class scatter. Note that, since SB is the outer product of

two vectors, its rank is at most one.

We can finally express the Fisher criterion in terms of SW and SB as

(6.8)

To find the maximum of J(w) we derive and equate to zero

(6.9)

Dividing by wTSWw

59

(6.10)

Solving the generalized eigenvalue problem (SW-1SBw=Jw) yields

(6.11)

This is know as Fisher’s Linear Discriminant (1936), although it is not a discriminant but

rather a specific choice of direction for the projection of the data down to one dimension.

60

6.2 LDA example

# Compute the Linear Discriminant projection for the following two-dimensional

dataset

X1=(x1,x2)={(4,1),(2,4),(2,3),(3,6),(4,4)}

X2=(x1,x2)={(9,10),(6,8),(9,5),(8,7),(10,8)}

SOLUTION (by hand)

The class statistics are:

The within- and between-class scatter are

61

The LDA projection is then obtained as the solution of the generalized eigenvalue problem

Or directly by

6.3 Linear Discriminant Analysis, C-classes

6.3.1 Fisher’s LDA generalizes very gracefully for C-class problems

Instead of one projection y, we will now seek (C-1) projections [y1,y2,…,yC-1] by means of

(C-1) projection vectors wi, which can be arranged by columns into a projection matrix

W=[w1|w2|…|wC-1]:

6.3.2 Derivation

62

The generalization of the within-class scatter is

(6.12)

The generalization for the between-class scatter is

(6.13)

where ST=SB+SW is called the total scatter matrix.

Similarly, we define the mean vector and scatter matrices for the projected samples as

From our derivation for the two-class problem, we can write

Recall that we are looking for a projection that maximizes the ratio of between-class to

within-class scatter. Since the projection is no longer a scalar (it has C-1 dimensions), we

then use the determinant of the scatter matrices to obtain a scalar objective function:

63

(6.14)

And we will seek the projection matrix W* that maximizes this ratio.

It can be shown that the optimal projection matrix W* is the one whose columns are the

eigenvectors corresponding to the largest eigenvalues of the following generalized eigenvalue

problem

(6.15)

6.3.3 NOTES

SB is the sum of C matrices of rank one or less and the mean vectors are constrained

by

Therefore, SB will be of rank (C-1) or less

This means that only (C-1) of the eigenvalues λi will be non-zero.

The projections with maximum class separability information are the eigenvectors

corresponding to the largest eigenvalues of SW-1SB

LDA can be derived as the Maximum Likelihood method for the case of normal class-

conditional densities with equal covariance matrices.

64

6.4 Limitations of LDA

LDA produces at most C-1 feature projections

If the classification error estimates establish that more features are needed, some other

method must be employed to provide those additional features.

LDA is a parametric method since it assumes unimodal Gaussian likelihoods

If the distributions are significantly non-Gaussian, the LDA projections will not be

able to preserve any complex structure of the data, which may be needed for

classification

LDA will fail when the discriminatory information is not in the mean but rather in the

variance of the data

65

7. Support Vector Machine

A support vector machine (SVM) is a concept in computer science for a set of related

supervised learning methods that analyze data and recognize patterns, used for classification and

regression analysis. The standard SVM takes a set of input data and predicts, for each given

input, which of two possible classes the input is a member of, which makes the SVM a non-

probabilistic binary linear classifier. Given a set of training examples, each marked as belonging

to one of two categories, an SVM training algorithm builds a model that assigns new

examples into one category or the other. An SVM model is a representation of the examples

as points in space, mapped so that the examples of the separate categories are divided by a

clear gap that is as wide as possible. New examples are then mapped into that same space and

predicted to belong to a category based on which side of the gap they fall on.

7.1 Motivation

Classifying data is a common task in machine learning. Suppose some given data

points each belong to one of two classes, and the goal is to decide which class a new data

point will be in. In the case of support vector machines, a data point is viewed as a p-

dimensional vector (a list of p numbers), and we want to know whether we can separate such

points with a (p − 1)-dimensional hyperplane. This is called a linear classifier. There

are many hyperplanes that might classify the data. One reasonable choice as the best

hyperplane is the one that represents the largest separation, or margin, between the two

classes. So we choose the hyperplane so that the distance from it to the nearest data point on

each side is maximized. If such a hyperplane exists, it is known as the maximum-margin

hyperplane and the linear classifier it defines is known as a maximum margin

classifier, or - equivalently - the perceptron of optimal stability.

66

7.2 Linear SVM

We are given some training data , a set of n points of the form

where the yi is either 1 or −1, indicating the class to which the point belongs. Each is a

p-dimensional real vector. We want to find the maximum-margin hyperplane that divides the

points having yi = 1 from those having yi = − 1. Any hyperplane can be written as the set of

points satisfying

where denotes the dot product. The vector is a normal vector: it is perpendicular to

the hyperplane. The parameter determines the offset of the hyperplane from the origin

along the normal vector .

We want to choose the and b to maximize the margin, or distance between the parallel

hyperplanes that are as far apart as possible while still separating the data. These hyperplanes

can be described by the equations

and

67

Note that if the training data are linearly separable, we can select the two hyperplanes of

the margin in a way that there are no points between them and then try to maximize their

distance. By using geometry, we find the distance between these two hyperplanes is , so

we want to minimize . As we also have to prevent data points from falling into the

margin, we add the following constraint: for each i either

of the first class

or

of the second.

This can be rewritten as:

We can put this together to get the optimization problem:

Minimize (in )

subject to (for any )

68

7.2.1 Primal form

The optimization problem presented in the preceding section is difficult to solve because it

depends on ||w||, the norm of w, which involves a square root. Fortunately it is possible to

alter the equation by substituting ||w|| with (the factor of 1/2 being used for

mathematical convenience) without changing the solution (the minimum of the original and

the modified equation have the same w and b). This is a quadratic programming (QP)

optimization problem. More clearly:

Minimize (in )


One could be tempted to express the previous problem by means of non-negative Lagrange

multipliers αi as

but this would be wrong. The reason is the following: suppose we can find a family of

hyperplanes which divide the points; then all . Hence we could

find the minimum by sending all αi to , and this minimum would be reached for all the

members of the family, not only for the best one which can be chosen solving the original

problem.

Nevertheless the previous constrained problem can be expressed as

69

http://en.wikipedia.org/wiki/Lagrange_multipliers

http://en.wikipedia.org/wiki/Lagrange_multipliers

http://en.wikipedia.org/wiki/Optimization_(mathematics)

http://en.wikipedia.org/wiki/Quadratic_programming

that is we look for a saddle point. In doing so all the points which can be separated as

do not matter since we must set the corresponding αi to zero.

This problem can now be solved by standard quadratic programming techniques and

programs. The solution can be expressed by terms of linear combination of the training

vectors as

Only a few αi will be greater than zero. The corresponding are exactly the support vectors,

which lie on the margin and satisfy . From this one can derive that the

support vectors also satisfy

which allows one to define the offset b. In practice, it is more robust to average over all NSV

support vectors:

7.2.2 Dual form

Writing the classification rule in its unconstrained dual form reveals that the maximum

margin hyperplane and therefore the classification task is only a function of the support

vectors,thetraining data that lie on the margin.

70

http://en.wikipedia.org/wiki/Dual_problem

Using the fact, that and substituting , one can show that

the dual of the SVM reduces to the following optimization problem:

Maximize (in αi )


and to the constraint from the minimization in b

Here the kernel is defined by .

The α terms constitute a dual representation for the weight vector in terms of the training set:

7.3 Biased and unbiased hyperplanes

For simplicity reasons, sometimes it is required that the hyperplane passes through the origin

of the coordinate system. Such hyperplanes are called unbiased, whereas general hyperplanes

not necessarily passing through the origin are called biased. An unbiased hyperplane can be

enforced by setting b = 0 in the primal optimization problem. The corresponding dual is

identical to the dual given above without the equality constraint

71

7.4 Limitations of SVM

The biggest limitation of SVM lies in the choice of the kernel (the best choice

of kernel for a given problem is still a research problem).

A second limitation is speed and size (mostly in training - for large training

sets, it typically selects a small number of support vectors, therby minimizing

the computational requirements during testing).

The optimal design for multiclass SVM classifiers is also a research area.

72

8. Results

The complete process was tested in MATLAB on two different sample sets obtained from

different sources. JADE algorithm was applied on both of the sample sets and then

classification was done by both LDA and SVM to compare the results.

The information about the two sample sets is given below:

Sample Set #1

Number of samples in the training set: 8

Number of samples in the test set:8

Sample Set #2

Number of samples in the training set: 140

Number of samples in the test set: 140

73

8.1 Graphical User Interface (GUI):

The Graphical User Interface for the project is a simple to use interface. It has four different

sections – for both type of classification methods and for both sample sets.

Furthermore, each section has two options to execute the process:

1) ‘Continuous Sampling’ – This would classify the test inputs with the training set in

random order.

2) ‘Choose Sample’ – One can choose a particular test input to classify it with the

training set.

A random screenshot of the GUI is shown below:

Fig 8.1: Random GUI Screenshot 1

The result in GUI is displayed with an actual output which is used to match the accuracy of

the classification method.

74

If the same test signals are chosen in the two sections for different classification methods, the

accuracy can be compared for each of the methods visually. For example, a random

screenshot with same test signal chosen is shown below:

Fig 8.2: Random GUI Screenshot 2

The complete useful code for whole GUI process can be found in Appendix.

8.2 Observations:

The accuracy in the results after the application of the two different classification algorithms

on each sample set is shown below.

75

Sample Set #1:

Results on Application of LDA

Type linear diaglinear quadratic diagquadratic mahalanobis

Correct 6 6 5 5 7

Incorrect 2 2 3 3 1

Accuracy(%) 75.00 75.00 62.50 62.50 87.50

Table 8.1

Results on Application of SVM

TypeLinear Quadratic

Polynomial RBF

3 5 8 10 1 2 4 10

Correct 6 6 7 7 7 7 6 5 5 6

Incorrect 2 2 1 1 1 1 2 3 3 2

Accuracy (%) 75.00 75.00 87.50 87.50 87.50 87.50 75.00 62.50 62.50 75.00

Table 8.2

76

Sample Set #2:

Results on Application of LDA

Type linear diaglinear quadratic diagquadratic mahalanobis

Correct 110 110 110 110 112

Incorrect 30 30 30 30 28

Accuracy (%) 78.57 78.57 78.57 78.57 80.00

Table 8.3

Results on Application of SVM

TypeLinear Quadratic

Polynomial RBF

3 5 8 10 1 2 4 10

Correct 110 109 110 117 118 123 117 112 111 111

Incorrect 30 31 30 23 22 17 23 28 29 29

Accuracy (%) 78.57 77.86 78.57 83.57 84.28 87.86 83.57 80.00 79.29 79.29

Table8.4

77

8.3 Comparisons

All the observed results are compared in this section like accuracy amongst different type of

classifiers in an algorithm and average accuracy comparison between the two classification

algorithms.

The graph below shows the accuracy comparison amongst different types of classifiers in

LDA algorithm for both Sample Sets.

linear diaglinear quadratic diagquadratic mahalanobis40

50

60

70

80

90

100

Sample Set #1Sample Set #2

Fig 8.3: Accuracy of different Classifiers in LDA

78

The graph below shows the accuracy comparison amongst different types of classifiers in

SVM algorithm for both Sample Sets.

Linear

Quadratic

Polynomial

Polynomial (5)

Polynomial (8)

Polynomial (10) RB

F

RBF (4)

RBF (10)

40

50

60

70

80

90

100

Sample Set #1Sample Set #2

Fig 8.4: Accuracy of different Classifiers in SVM

The graph below shows the average accuracy (average for all different classifiers used) plots

for both LDA & SVM.

Sample Set #1 Sample Set #265

70

75

80

85

90

LDASVM

Fig 8.5:Average Accuracy Comparison between LDA & SVM

79

9. Conclusion and Future Scope

The JADE algorithm performed well as expected in separating the individual sources, from

the two scopes C3 & C4 in EEG signals, required for further classification process. This

algorithm is applied on both training & test signals to get the training & test set respectively

for classification.

The classification methods used - LDA and SVM – both gave respectable accuracies

considering the research done on this so far. However, it was noticed that SVM gave slightly

better accuracies when using large training set, particularly better accuracies for higher

polynomial order (in polynomial type) and lower rbf value (in rbf type) in classification

process. In LDA, mahalanobis classifier gave better accuracies compared to other classifiers.

In future, many more amendments for the code are possible, like:

The EEG signal set for both training & test signals can be normalized, filtered &

conditioned in order to provide universal classification for any incoming EEG signals.

The complex method of Artificial Neural Networks may be applied for classification

– it is supposed to give equal or better accuracies, and also at the same time require

very large number of training signals to efficiently train the ANN.

If very optimum accuracies are achieved, the complete process can be applied to

automate certain task depending upon the thoughts in the brain about left & right

directions.

80

10. Appendix

10.1 DataSets

The following two data sets have been used in the project:

SampleSet1.mat SampleSet2.mat

10.2 Code

The files SampleSet1_JADE.m and SampleSet2_JADE.m are used to create individual sources

on application of JADE algorithm.

As the JADE algorithm doesn’t already exist in MATLAB, an external implementation for

JADE was used, whose source can be found in the references.

Them sample_testing.fig and sample_testing.m refer to the GUI blocks and its corresponding

code respectively.

81

10.2.1 SampleSet1_JADE.m

load SampleSet1

clear c3c4;c3(:,1)=LeftBackwardImagined1(:,5);c3(:,2)=LeftForwardImagined1(:,5);c3(:,3)=RightBackwardImagined1(:,5);c3(:,4)=RightForwardImagined1(1:7040,5);c3(:,5)=LeftBackwardImagined2(:,5);c3(:,6)=LeftForwardImagined2(:,5);c3(:,7)=RightBackwardImagined2(:,5);c3(:,8)=RightForwardImagined2(1:7040,5);

c4(:,1)=LeftBackwardImagined1(:,6);c4(:,2)=LeftForwardImagined1(:,6);c4(:,3)=RightBackwardImagined1(:,6);c4(:,4)=RightForwardImagined1(1:7040,6);c4(:,5)=LeftBackwardImagined2(:,6);c4(:,6)=LeftForwardImagined2(:,6);c4(:,7)=RightBackwardImagined2(:,6);c4(:,8)=RightForwardImagined2(1:7040,6);

for i=1:8 temp=[c3(:,i), c4(:,i)]; [ c , d ] = jadetd( temp' ) ; c = inv( c ); fC3 = ( c(1,1) ) ^2 + ( c(1,2) ) ^2; fC4 = ( c(2,1) ) ^2 + ( c(2,2) ) ^2; f = [ fC3 fC4 ]; f_full(i,:)=f;end

clear c3c4; c3(:,1)=Leftbacktest1(:,5);c3(:,2)=Leftfwdtest1(:,5);c3(:,3)=Leftbacktest2(:,5);c3(:,4)=Leftfwdtest2(:,5);c3(:,5)=Rightbacktest1(:,5);c3(:,6)=Rightfwdtest1(1:7040,5);c3(:,7)=Rightbacktest2(:,5);c3(:,8)=Rightfwdtest2(1:7040,5);

c4(:,1)=Leftbacktest1(:,6);c4(:,2)=Leftfwdtest1(:,6);c4(:,3)=Leftbacktest2(:,6);c4(:,4)=Leftfwdtest2(:,6);c4(:,5)=Rightbacktest1(:,6);c4(:,6)=Rightfwdtest1(1:7040,6);c4(:,7)=Rightbacktest2(:,6);c4(:,8)=Rightfwdtest2(1:7040,6);

for i=1:8 temp=[c3(:,i), c4(:,i)]; [ c , d ] = jadetd( temp' ) ; c = inv( c ); fC3 = ( c(1,1) ) ^2 + ( c(1,2) ) ^2; fC4 = ( c(2,1) ) ^2 + ( c(2,2) ) ^2;

82

f = [ fC3 fC4 ]; f_test(i,:)=f;end

y=[1 1 2 2 1 1 2 1]';

10.2.2 SampleSet2_JADE.m

load SampleSet2

%Training Feautre Matrixfor i=1:140 c3(:,1)=x_train(:,1,i); c4(:,1)=x_train(:,3,i); temp=[c3, c4]; [ c , d ] = jadetd( temp' ) ; c = inv( c ); fC3 = ( c(1,1) ) ^2 + ( c(1,2) ) ^2; fC4 = ( c(2,1) ) ^2 + ( c(2,2) ) ^2; f = [ fC3 fC4 ]; f_full(i,:)=f;end

%Testing Feature Matrixfor i=1:140 c3(:,1)=x_test(:,1,i); c4(:,1)=x_test(:,3,i); temp=[c3, c4]; [ c , d ] = jadetd( temp' ) ; c = inv( c ); fC3 = ( c(1,1) ) ^2 + ( c(1,2) ) ^2; fC4 = ( c(2,1) ) ^2 + ( c(2,2) ) ^2; f = [ fC3 fC4 ]; f_test(i,:)=f;end

83

10.2.3 sample_testing.fig

The following image shows the description of the GUI with tag names of useful elements:

84

10.2.4 sample_testing.m

function popupmenu1_Callback(hObject, eventdata, handles) SampleSet1_JADE

op=get(handles.popupmenu1,'Value'); disp = strcat( 'Sample No: ', int2str(op) ); class=classify(f_test(op,:),f_full,y,'mahalanobis'); set(handles.ans_str,'String',disp);

disp_op=y_test(op);if(class == disp_op)if(class==1) subplot(1,1,1,'Parent',handles.panel1); imshow('image_right_green.jpeg'); subplot(1,1,1,'Parent',handles.panel2); imshow('image_right_green.jpeg');else subplot(1,1,1,'Parent',handles.panel1); imshow('image_left_green.jpeg'); subplot(1,1,1,'Parent',handles.panel2); imshow('image_left_green.jpeg');endelseif(class==1) subplot(1,1,1,'Parent',handles.panel1); imshow('image_right_red.jpg'); subplot(1,1,1,'Parent',handles.panel2); imshow('image_left_red.jpg');else subplot(1,1,1,'Parent',handles.panel1); imshow('image_left_red.jpg'); subplot(1,1,1,'Parent',handles.panel2); imshow('image_right_red.jpg');endend

function pushbutton1_Callback(hObject, eventdata, handles) SampleSet1_JADE p=randperm(8);for i=1:8 op=p(i); disp = strcat( 'Sample No: ', int2str(op) ); class=classify(f_test(op,:),f_full,y,'mahalanobis'); set(handles.ans_str,'String',disp);

disp_op=y_test(op);if (class == disp_op)if (class==1) subplot(1,1,1,'Parent',handles.panel1); imshow('image_right_green.jpeg'); subplot(1,1,1,'Parent',handles.panel2); imshow('image_right_green.jpeg');else subplot(1,1,1,'Parent',handles.panel1); imshow('image_left_green.jpeg'); subplot(1,1,1,'Parent',handles.panel2); imshow('image_left_green.jpeg');end

85

elseif (class==1) subplot(1,1,1,'Parent',handles.panel1); imshow('image_right_red.jpg'); subplot(1,1,1,'Parent',handles.panel2); imshow('image_left_red.jpg');else subplot(1,1,1,'Parent',handles.panel1); imshow('image_left_red.jpg'); subplot(1,1,1,'Parent',handles.panel2); imshow('image_right_red.jpg');endend pause(2);end

function pushbutton2_Callback(hObject, eventdata, handles) SampleSet2_JADE p=randperm(140);for i=1:140 op=p(i); disp = strcat( 'Sample No: ', int2str(op) ); class=classify(f_test(op,:),f_full,y_train,'mahalanobis'); set(handles.text17,'String',disp);

disp_op=y_test(op);if(class == disp_op)if(class==1) subplot(1,1,1,'Parent',handles.panel3); imshow('image_right_green.jpeg'); subplot(1,1,1,'Parent',handles.panel4); imshow('image_right_green.jpeg');else subplot(1,1,1,'Parent',handles.panel3); imshow('image_left_green.jpeg'); subplot(1,1,1,'Parent',handles.panel4); imshow('image_left_green.jpeg');endelseif(class==1) subplot(1,1,1,'Parent',handles.panel3); imshow('image_right_red.jpg'); subplot(1,1,1,'Parent',handles.panel4); imshow('image_left_red.jpg');else subplot(1,1,1,'Parent',handles.panel3); imshow('image_left_red.jpg'); subplot(1,1,1,'Parent',handles.panel4); imshow('image_right_red.jpg');endend pause(2);end

function popupmenu2_Callback(hObject, eventdata, handles) SampleSet2_JADE op=get(handles.popupmenu2,'Value'); disp = strcat( 'Sample No: ', int2str(op) ); class=classify(f_test(op,:),f_full,y_train,'mahalanobis'); set(handles.text17,'String',disp);

86

disp_op=y_test(op);if(class == disp_op)if(class==1) subplot(1,1,1,'Parent',handles.panel3); imshow('image_right_green.jpeg'); subplot(1,1,1,'Parent',handles.panel4); imshow('image_right_green.jpeg');else subplot(1,1,1,'Parent',handles.panel3); imshow('image_left_green.jpeg'); subplot(1,1,1,'Parent',handles.panel4); imshow('image_left_green.jpeg');endelseif(class==1) subplot(1,1,1,'Parent',handles.panel3); imshow('image_right_red.jpg'); subplot(1,1,1,'Parent',handles.panel4); imshow('image_left_red.jpg');else subplot(1,1,1,'Parent',handles.panel3); imshow('image_left_red.jpg'); subplot(1,1,1,'Parent',handles.panel4); imshow('image_right_red.jpg');endend

function pushbutton3_Callback(hObject, eventdata, handles) SampleSet1_JADEfor i=1:8if (y(i)==2) y(i)=0;endend p=randperm(8);

svmStruct = svmtrain(f_full,y,'Kernel_Function','polynomial','polyorder',5);

for i=1:8 op=p(i); disp = strcat( 'Sample No: ', int2str(op) ); class = svmclassify(svmStruct,f_test(op,:)); set(handles.text22,'String',disp);

disp_op=y_test(op);if (disp_op==2) disp_op=0;endif(class == disp_op)if(class==1) subplot(1,1,1,'Parent',handles.panel5); imshow('image_right_green.jpeg'); subplot(1,1,1,'Parent',handles.panel6); imshow('image_right_green.jpeg');else subplot(1,1,1,'Parent',handles.panel5); imshow('image_left_green.jpeg'); subplot(1,1,1,'Parent',handles.panel6);

87

imshow('image_left_green.jpeg');endelseif(class==1) subplot(1,1,1,'Parent',handles.panel5); imshow('image_right_red.jpg'); subplot(1,1,1,'Parent',handles.panel6); imshow('image_left_red.jpg');else subplot(1,1,1,'Parent',handles.panel5); imshow('image_left_red.jpg'); subplot(1,1,1,'Parent',handles.panel6); imshow('image_right_red.jpg');endend pause(2);end

function popupmenu3_Callback(hObject, eventdata, handles) SampleSet1_JADEfor i=1:8if (y(i)==2) y(i)=0;endend op=get(handles.popupmenu3,'Value') ; disp = strcat( 'Sample No: ', int2str(op) ); svmStruct = svmtrain(f_full,y,'Kernel_Function','polynomial','polyorder',5); class = svmclassify(svmStruct,f_test(op,:)); set(handles.text22,'String',disp);

disp_op=y_test(op);if (disp_op==2) disp_op=0;endif(class == disp_op)if(class==1) subplot(1,1,1,'Parent',handles.panel5); imshow('image_right_green.jpeg'); subplot(1,1,1,'Parent',handles.panel6); imshow('image_right_green.jpeg');else subplot(1,1,1,'Parent',handles.panel5); imshow('image_left_green.jpeg'); subplot(1,1,1,'Parent',handles.panel6); imshow('image_left_green.jpeg');endelseif(class==1) subplot(1,1,1,'Parent',handles.panel5); imshow('image_right_red.jpg'); subplot(1,1,1,'Parent',handles.panel6); imshow('image_left_red.jpg');else subplot(1,1,1,'Parent',handles.panel5); imshow('image_left_red.jpg'); subplot(1,1,1,'Parent',handles.panel6); imshow('image_right_red.jpg');endend

88

function pushbutton4_Callback(hObject, eventdata, handles) SampleSet2_JADE p=randperm(140);for i=1:140if (y_train(i)==2) y_train(i)=0;endend

svmStruct = svmtrain(f_full,y_train,'Kernel_Function','polynomial','polyorder',5);

for i=1:140 op=p(i); disp = strcat( 'Sample No:', int2str(op) ); class = svmclassify(svmStruct,f_test(op,:)); set(handles.text27,'String',disp);

disp_op=y_test(op);if (disp_op==2) disp_op=0;endif(class == disp_op)if(class==1) subplot(1,1,1,'Parent',handles.panel7); imshow('image_right_green.jpeg'); subplot(1,1,1,'Parent',handles.panel8); imshow('image_right_green.jpeg');else subplot(1,1,1,'Parent',handles.panel7); imshow('image_left_green.jpeg'); subplot(1,1,1,'Parent',handles.panel8); imshow('image_left_green.jpeg');endelseif(class==1) subplot(1,1,1,'Parent',handles.panel7); imshow('image_right_red.jpg'); subplot(1,1,1,'Parent',handles.panel8); imshow('image_left_red.jpg');else subplot(1,1,1,'Parent',handles.panel7); imshow('image_left_red.jpg'); subplot(1,1,1,'Parent',handles.panel8); imshow('image_right_red.jpg');endend pause(2);end

function popupmenu4_Callback(hObject, eventdata, handles) SampleSet2_JADEfor i=1:140if (y_train(i)==2) y_train(i)=0;endend op=get(handles.popupmenu4,'Value'); disp = strcat( 'Sample No:', int2str(op) );

89

svmStruct = svmtrain(f_full,y_train,'Kernel_Function','polynomial','polyorder',5); class = svmclassify(svmStruct,f_test(op,:)); set(handles.text27,'String',disp);

disp_op=y_test(op);if (disp_op==2) disp_op=0;endif(class == disp_op)if(class==1) subplot(1,1,1,'Parent',handles.panel7); imshow('image_right_green.jpeg'); subplot(1,1,1,'Parent',handles.panel8); imshow('image_right_green.jpeg');else subplot(1,1,1,'Parent',handles.panel7); imshow('image_left_green.jpeg'); subplot(1,1,1,'Parent',handles.panel8); imshow('image_left_green.jpeg');endelseif(class==1) subplot(1,1,1,'Parent',handles.panel7); imshow('image_right_red.jpg'); subplot(1,1,1,'Parent',handles.panel8); imshow('image_left_red.jpg');else subplot(1,1,1,'Parent',handles.panel7); imshow('image_left_red.jpg'); subplot(1,1,1,'Parent',handles.panel8); imshow('image_right_red.jpg');endend

90

11.References

[1]. Jianfeng hu, Dan Xiao, Zhendung Mu ,”Application of Energy Entropy in Motor

EEG Classification “ International Journal of Digital Content Technology and its

Applications, 2009, vol. 3, pp 4-7.

[2]. Xiaojing Guo, Xiaopie Wu ,”Motor Imagery EEG Classification using Dynamic Mixing

Matrix” Bioinformatics and Biomedical Engineering (iCBBE), 2010 4th International

Conference , 2010, pp 1-4.

[3]. G.Pfurtscheller, C.Neuper, D.Flotzinger,M.Pregenzer,”EEG-based discrimination

between imagination of right and left hand movement,” Electroencephalogr Clin

Neurophysiol , 1997,103, pp 642-651.

[4]. G.Pfurtscheller, C. Neuper, “Motor imagery and direct brain-computer communication,”

Proc IEEE 2001, pp. 1123-1134.

[5]. G.Pfurtscheller, C.Brunner,A.Schlogl, F.H.L. daSilva, “Mu Rhythm (de)synchronization

and EEG single-trial classification of different motor imagery tasks ,” Neuroimage,

2006, vol.31, pp.153-159.

[6]. Taigang He, Gari Clifford, Lionel Tarassenko, “Application of ICA in removal of

artefacts from ECG “,Neural computing and Applications 2002, Mit.edu,pp.3-12.

[7]. Jaimie F. Borisoff, Steve G. Mason, Ali Bashashati, and Gary E. Birch, “Brain–

Computer Interface Design for Asynchronous Control Applications: Improvements to

the LF-ASD Asynchronous Brain Switch” Ieee transactions on biomedical engineering,

vol. 51, no. 6, june 2004 pp.985-992.

91

[8]. Aapo Hyvärinen and Erkki Oja,” Independent Component Analysis: A Tutorial”, Neural

Networks by the title ``Independent Component Analysis: Algorithms and

Applications”, pp-4-17.

[9]. Sri, K.S.; Rajapakse, J.C.,” Extracting EEG rhythms using ICA-R”, IEEE international

joint conference on Neural Networks, 2008, pp-4-8.

[10]. Nello Cristianini and John Shawe-Taylor, “An Introduction to Support Vector Machines

and other kernel-based learning methods”, Cambridge University Press, 2000. ISBN 0-

521-78019-5,pp 6-19.

[11].Kristin P. Bennett and Colin Campbell, "Support Vector Machines: Hype or

Hallelujah?", SIGKDD Explorations, 2,2, 2000, pp-1–13.

[12]. Muller K.-R, Philips, P. and Ziehe, A., ''JADEtd: Combining Higher-Order statistics and

temporal information for blind source separation (with noise), in Proc. Int. Workshop on

Independent Component Analysis and Blind Separation of Signals (ICA '99), Aussois,

1999.

[13]. Alan Oursland, Judah De Paula, Nasim Mahmood,”Case Studies of Independent

Component Analysis”,For CS383C – Numerical Analysis of Linear Algebra.

[14]. Pierre Comon,” Independent Component Analysis: a new concept?”, Signal Processing,

Elsevier, 36(3): pp-287--314 (The original paper describing the concept of ICA) .

[15]. Schmidt, EM; McIntosh, JS; Durelli, L; Bak, MJ (1978). "Fine control of operantly

conditioned firing patterns of cortical neurons.". Experimental neurology 61 (2):

pp- 349–69.

92

Report 3 Ack

Documents

Transcript of Report 3 Ack