Computers.and.Security.volume.25.Issue.3.May.2006.eBook EEn

c o m p u t e r s & s e c u r i t y 2 5 ( 2 0 0 6 ) 155

From the Editor-in-Chief

Special systems: Overlooked sources of security risk?

After the events of September 11, 2001 increasing attention is

being paid to security in special types of automated systemsd

systems that provide physical access control, SCADA systems,

plant process control systems, and so on. The consequences of

any security breach in these types of systems are, after all,

potentially catastrophic. Consider, for example, the conse-

quences of a compromise of a physical access control system

in a nuclear power plant. If a saboteur were able to gain unautho-

rized physical access, untold damage and loss of life could easily

occur.

Information security professionals and auditors generally

focus on the security risk associated with these types of sys-

tems, as they rightfully should. At the same time, however, I

sometimes wonder if their risk analyses for such systems

are sufficiently complete. These types of systems for the

most part were originally developed for and deployed in non-

networked environments. Risk analyses in past decades thus

did not need to consider the risks accruing from network con-

nectivitydthe only plausible attack scenarios involved perpe-

trators who were able to gain physical access to the systems.

Today, however, the situation has changed considerably in

that nearly all such systems are connected to some type of

network, potentially exposing these systems to all kinds of re-

mote attacks. Worse yet, many of these systems are not con-

nected to some kind of air-gapped network that insulates

internal traffic from the outside world and vice versa. Many

are instead Internet connected, resulting in far greater levels

of security-related risk than were previously ever envisioned.

It has been my experience that security and other pro-

fessionals are generally not oblivious to the perils of special

systems being connected to the outside world. Yet at the

same time, I have learned of incidents in which these systems

have been accessed over the Internet without authorization,

resulting in highly negative outcomes. In one case a remote

perpetrator broke into a system that controlled lighting levels

in a building; the perpetrator had a heyday changing the light-

ing levels back and forth until one of the administrators of this

system finally determined what the problem was and cut off

the perpetrator’s access. Needless to say, neither the owner

nor the administrator of this system had anticipated that

this kind of thing could happen.

There is, however, another facet of the risk associated with

SCADA, process control, physical security and other systems

that is lamentably almost universally overlookeddthe rela-

tionship of these special systems to security risk associated

with other networked systems and devices. It is almost as if

imagined security breach scenarios end, i.e., the ‘‘game is

over,’’ so to speak, if a perpetrator breaks into one or more

of these systems, yet ‘‘the game will only have begun’’ in

many cases. Perpetrators could easily use systems such as

process control systems that they have compromised to

launch vulnerability scans, perpetrate denial of service at-

tacks, intrude into other networked systems, steal valuable

and sensitive data from databases and files, and so on.

The ‘‘bottom line’’ is that risk analysis performed on spe-

cial systems must take into account not only the risk associ-

ated with the outcomes of these systems themselves

becoming compromised, but also the potential risk of these

systems being used against other networked systems and de-

vices. To do anything less is to perform an incomplete risk

analysis. At the same time, however, no one would rightfully

expect that the few paragraphs in this editorial would per-

suade the majority of security professionals and auditors to

expand their view and focus when they assess risks associ-

ated with special systems. ‘‘War stories,’’ case studies of

real-life incidents in which these systems have been used

without authorization to launch attacks against other systems

and network devices, will in contrast have a much greater

effect. I thus invite and encourage readers to submit papers

that describe these kinds of war stories (without attribution

or references to the organizations in which they have

occurred, of course) to Computers & Security.

E. Eugene Schultz Ph.D., CISSP, CISM

Editor-in-Chief

E-mail address: [email protected]

4 March 2006

0167-4048/$ – see front matter

ª 2006 Elsevier Ltd. All rights reserved.

doi:10.1016/j.cose.2006.03.003

http://www.elsevier.com/locate/cose

mailto:[email protected]

c o m p u t e r s & s e c u r i t y 2 5 ( 2 0 0 6 ) 1 5 6 – 1 6 2

Security views

1. Malware update

A Visual Basic worm with multiple names, Blackmal.e,

MyWife.d, and others, is infecting Windows systems. It arrives

in the form of an attachment that can be an executable file or

a Multipurpose Internet Mail Extensions (MIME) file with an

embedded executable file. It replicates via shared folders and

tries to stop a number of different security-related programs.

The CME-24 worm (also called the Blackmal and Nyxem

worm) is destructive; it is programed to destroy files on infected

Windows systems on February 3, 2006. This worm causes sys-

tems that it infects to go to an on-line counter Web site. Easy-

net, a UK-based Internet service provider (ISP), is monitoring

the counter traffic to determine whether any of its customers’

systems are infected and is warning customers accordingly.

A link to proof-of-concept exploit code for Mac OS X has

been posted on the Internet. This code appears to be screen-

shots of Mac OS X Leopard version 10.5; it sends itself to other

systems via the iChat instant messaging system. Fortunately,

this code does not have a malicious payload.

The Mare-D worm capitalizes on vulnerabilities in XML-RPC

for PHP and Mambo to infect Linux systems. This worm can also

install an IRC-controlled backdoor on systems that it infects.

Mare-D has been deemed to not pose much risk, however.

A newly detected proof-of-concept worm exploits a vulner-

ability in Apple’s Bluetooth implementation. Apple released

a patch for this vulnerability in the middle of last year. The

worm replicates by searching for other Bluetooth-enabled

devices and then sends a copy of itself to devices that it finds.

Two of the above news items were about vulnerabilities in

Mac OS X. For years people have paid little attention to security

in Mac OSX systems, for the most part I suspect because few per-

petrators have been interested in creating exploit code for these

systems. The trend appears to be changing – an increasing num-

ber of exploits are being produced and break-ins into Mac OS X

systems are becoming increasingly commonplace. The big ques-

tion is whether the Mac user community, which has so far been

relatively complacent about security issues, will start to become

interested in Mac OS X security and more importantly do what is

needed to harden these systems sufficiently to make them less

vulnerable to attack.

2. Update in the war against Cybercrime

Ang Chiong Teck, a student at Nanyang Technological Univer-

sity in Singapore, has received a jail sentence of four months

for selling illegal copies of Microsoft software. Purchasers of

this software had complained that they did not get the codes

necessary for on-line registration and downloading updates.

Additionally, the illegal copies even had counterfeited certifi-

cates of authenticity. When law enforcement investigated,

Teck quickly became a suspect. He had SD 20,000 of counter-

feit software on his possession when he was arrested. His

sentencing was delayed to allow him to take his university

examinations first.

Spanish law enforcement has arrested a man on charges

that he broke into a computer at the US Navy base in

San Diego, California. His house was searched, resulting in

seizure of a computer and other potential evidences. He is

suspected of belonging to a ring that may have broken into

more than 100 computers, causing a financial loss of more

than USD 500,000.

Daniel Lin, one of four people charged for allegedly using

compromised computers to send volumes of spam, is

expected to plead guilty to the charges during his forthcoming

court appearance. Having admitted of using government and

commercial networks to send spam, Lin reached a deal with

federal prosecutors that will result in his receiving a prison

sentence of between 24 and 57 months. Without the deal Lin

would have gotten a much longer sentence. The accused indi-

viduals allegedly used proxies with bogus return paths to

transmit spam, a violation of the CAN-SPAM Act.

A federal grand jury has indicted Joseph Nathaniel Harris,

the former office manager of the San Jose Medical Group in

California, on multiple charges that resulted from stealing

computers and DVDs on which patient records were stored.

He allegedly broke into the medical group’s office after he

quit his job. Harris could receive a prison sentence of up to

10 years and a fine of up to USD 250,000.

The well-known Web site of UK student Alex Tew, Million-

DollarHomepage.com, has been hit with a barrage of denial-

of-service (DoS) attacks launched by cyber extortionists. Tew

set up this site to pay for his education by selling individual

pixels to advertisers. Having earned more than USD 1 million

so far, Tew recently revealed in his blog that people have

demanded ‘‘a substantial amount of money’’ in return for

leaving his site alone. Law enforcement says that Russians

are responsible for the attacks.

Eight Bulgarians have been arrested on charges that they

conducted a phishing scheme. The group allegedly ran nu-

merous bogus Microsoft Web sites used in connection with

phishing email that was sent with falsified addresses to look

http://MillionDollarHomepage.com

http://MillionDollarHomepage.com


c o m p u t e r s & s e c u r i t y 2 5 ( 2 0 0 6 ) 1 5 6 – 1 6 2 157

as if it had been sent by Microsoft account billing. Recipients

were urged to enter credit card information; perpetrators

allegedly used any of this information that they obtained to

purchase goods and to make wire transfers.

Jeanson Ancheta has pleaded guilty in Los Angeles federal

court to charges related to establishing a botnet consisting of

hundreds of thousands of compromised computers. He alleg-

edly offered to sell the botnet’s capabilities to enable others to

send spam and launch distributed denial-of-service (DDoS) at-

tacks. According to the terms of a plea bargain, Ancheta would

agree to serve four–six years in prison, repay USD 58,000, for-

feit a BMW, and pay USD 19,000 of restitution. The judge in the

case has not, however, approved the plea bargain. Ancheta

will be sentenced on May 1.

A federal grand jury in Seattle has named Christopher

Maxwell in an indictment related to his allegedly illegally cre-

ating and using a botnet. Maxwell and two alleged partners

reportedly made USD 100,000 by using the botnet to install

adware. Maxwell and the other two were additionally indicted

on charges of perpetrating a botnet attack on a hospital in the

Seattle Area that rendered doctors’ pagers inoperable and

closed down the hospital’s intensive care unit. If convicted

of the charges that he faces, Maxwell could receive up to

a 10-year prison term and a fine of USD 250,000.

The British High Court has ordered 10 Internet service pro-

viders (ISPs) to reveal the identities of 150 individuals who may

have engaged in illegal file sharing. These ISP’s must hand over

names, addresses and other personal information about the al-

leged illicit file sharers to the Federation Against Software Theft.

TheBritishHighCourthasalsodirectedtwoUKmentopayatotal

of GBP 6500 for making almost 9000 songs available via peer-to-

peersharing.Casesagainst threeother individualsarealsopend-

ing. The British Phonographic Industry (BPI) has initiated these

cases; defendants must also pay the BPI’s court costs, which so

far have amounted to more than GBP 20,000.

The US District Attorney’s Office in Chicago has obtained

indictments against 19 individuals who are allegedly part of

an international group involved in piracy. US law enforcement

is pressing for extradition of two alleged members of this

group. Prosecutors say that this group illegally copied and dis-

tributed software, games and movies valued at more than

USD 6.5 million, even though currently it appears that per-

sonal usage, not financial gain, may have been the primary

motivation for the group’s activities. The alleged members

of this group will be charged for conspiracy to infringe on

copyrights. If they are convicted, they may have to serve up

to five years of imprisonment, pay a fine of up to USD

250,000, and pay restitution.

Canada’s biggest record company, Nettwerk Music Group,

has announced that it will pay to defend David Gruebel,

whom the Recording Industry Association of America (RIAA)

has sued for allegedly having music that was illegally down-

loaded. The chief executive for Nettwerk, Terry McBride,

said: ‘‘The current actions of the RIAA are not in my artists’

best interests. Litigation is a deterrent to creativity. and it

is hurting the business I love.’’ After hiring a Chicago law

firm to defend Gruebel, Nettwerk asserts that it will cover

any fines resulting from losing the case if this outcome occurs.

The RIAA wants a USD 9000 fine in this case, but will agree to

USD 4500 if the fine is paid before the ruling date.

A new phishing ploy has surfaced in the US. A message that

purports to be from the Internal Revenue Service (IRS) includes

a URL for a Web site that appears to inform users of the status of

their taxpayer refunds. Users are tricked into revealing their

names, Social Security Numbers (SSNs), and credit card data.

Israeli natives Ruth and Michael Haephrati have both been

extradited from the UK to Israel on charges that they created

and distributed a spyware tool designed to pilfer competitors’

information. The couple reportedly obtained GBP 2000 for

each copy of this tool. Twenty other individuals in Israel and

the UK have been arrested for being involved in the Haeph-

rati’s alleged activities.

Christopher William Smith has been ordered to pay Amer-

ica Online (AOL) more than USD 5 million for damages caused

and legal fees incurred as the result of his sending out vol-

umes of spam. AOL originally sued Smith in 2004 under provi-

sions of the CAN-SPAM Act. Smith must also face drug

violation-based charges.

William Genovese Jr. received a sentence of two years of

imprisonment for selling source code for two Windows oper-

ating systems after he pleaded guilty last year to one count

of illegal distribution of trade secrets. He has previously

been convicted of criminal charges 12 times; three of these

convictions are for computer crime-related activities. After

he serves his sentence, he must serve three years of super-

vised release in which he must continuously run programs

that monitor his Internet activity on his computer.

Japanese law enforcement has arrested Atsushi Takewaka

on charges that he and an alleged accomplice, Kiichi

Hirayama, created spyware that they deployed to steal pass-

words used for Internet banking. Hirayama allegedly per-

suaded Takewaka to create the spyware; Hirayama allegedly

sent CD-ROMs containing the spyware to certain corpora-

tions. Some of the recipients installed the spyware on their

computing systems. The two individuals then allegedly used

passwords that they pilfered to steal money from bank

accounts.

The US District Attorney’s Office in Los Angeles has

arrested Jeffrey Brett Goodin on the grounds that he allegedly

perpetrated a phishing scheme designed to fool America

Online (AOL) users into disclosing their credit card informa-

tion. Bogus email messages in connection with this scheme

informed AOL users that they needed to update their billing

information and they had to go to a certain Web site to do

so. The site was then used to glean financial information

that unsuspecting users entered. The stolen information

was used to run up fraudulent charges. The charges against

Goodin include wire fraud and illegal use of an access device.

If he is convicted, he could get a prison sentence of up to 30

years.

A Spanish man who used a computer worm three years ago

to perpetrate a denial-of-service attack has received a sen-

tence of two years in jail and a fine of EUR1.4 million. Santiago

Garrido launched the attack, which interrupted Internet

service for millions of Spaniards. His motivation was

revenge – he had previously been banned from using an IRC

chat room.

Russian computer criminals may be using Trojan horse

programs in an effort to steal money from French bank ac-

counts. More than EUR1 million has reportedly already been


stolen. The programs reportedly come embedded in email

messages and are also downloaded from malicious Web sites.

Remaining dormant until users access their on-line banking

accounts, the programs supposedly glean passwords and

other financial data and send them to the perpetrators. The

perpetrators are also allegedly working in connection with

people who allow stolen funds to go through their accounts

in return for a fee of up to 10% of the proceeds.

Honeywell International has lodged a civil complaint

against a former employee, Howard Nugent, who it claims dis-

closed information about 19,000 company employees on the

Internet. A US District Court judge in Arizona has ordered

him to not divulge the Honeywell information. Papers filed

in the court case indicate that Honeywell’s computers were

not broken into – Nugent allegedly instead exceeded the level

of access allowed to him.

At a hearing last month, UK district judge Nicholas Evers

decided to deny the US’ request to have UK citizen Gary

McKinnon extradited unless the US promises that McKinnon

will not be considered a terrorist. McKinnon allegedly in-

truded into US Department of Defense (DOD) and NASA com-

puters. The judge said that he worries that in the US terrorist

suspects can be tried under military law.

Brazilian federal police have arrested 41 individuals who

allegedly used a Trojan horse program to pilfer BRL10 million

from 200 accounts at six banks. The suspects allegedly sent

the program in email messages. Twenty-four additional

suspects are still wanted.

Stephen Sussich of Australia has been fined AUD 2000 and

must pay AUD 3000 in restitution because he installed a rootkit

on a server owned by Webcentral, an Australian company.

Sussich pleaded guilty to two counts of unauthorized data

modification to cause harm. Sussich’s motivation does not

appear to be financial; apparently he did not even access

credit card data.

Luis Ochoa of California has been arrested for allegedly

uploading an Academy Award-nominated movie to the Inter-

net. Law enforcement set up a string operation after someone

informed the Motion Picture Association of America (MPAA)

that he had revealed in a chat room that he was going to up-

load the movie. The movie had a watermark that indicated it

was a ‘‘screener’’ copy meant for viewing only by individuals

with Academy voting privileges. If convicted of all charges

that he faces, he could be sentenced to one year of imprison-

ment and a fine.

Former CA chief executive officer Sanjay Kumar faces

charges resulting from his allegedly deleting information

from his laptop’s hard drive. The information could conceiv-

ably have been used as evidence in accounting the debacle

that led to Kumar’s exit from CA. The US District Court in East-

ern New York has stated that it intends to submit evidence

that Kumar reformatted his laptop to run the Linux operating

system, thereby destroying the contents of the laptop’s hard

drive. Kumar’s action allegedly occurred after the government

investigation had started and after a memorandum ordered

CA employees to preserve all pertinent data. Kumar was

indicted after a government investigation into dubious

accounting practices at CA.

Scott Levine of Florida, former CEO of Snipermail.com,

a bulk email company, has received a sentence of eight years

of imprisonment for intruding into Acxiom Corporation’s

database of consumer data and then pilfering more than one

billion records. He was convicted of 120 counts of illegal access

to a computer connected to the Internet, two counts of device

fraud, and one count of obstruction of justice. No evidence

that Levine used the data to perpetrate identity fraud exists.

Levine must also pay a fine of USD 12,300 as well as restitution;

the exact amount of restitution has not yet been determined,

however.

Police raids in Switzerland and Belgium have closed down

Razorback2, one of the largest index servers within the

eDonkey file sharing network. According to the RIAA, these

servers held an index of approximately 170 million illegally

copied files. The server’s owner has been arrested and the

equipment has been seized.

The length of the ‘‘War against Cybercrime’’ portion of

Security Views seems to continually grow; more news items

concerning investigations, arrests and sentences for com-

puter crime are being covered. Law enforcement and the legal

system in an increasing number of counties appear to be com-

ing up to speed when it comes to dealing with computer

crime. At the same time, however, computer crime perpetra-

tors seem to constantly create new ways of doing their evil

deeds. Additionally, a rapidly increasing number of computer

criminals seem to be surfacing for various reasons, one of the

chief of which is the promise of a great amount of financial

gain generally without all that much risk to them. So while

it is good to see some progress in the effort against computer

crime being made, computer crime is, unfortunately, inevita-

bly going to become more prevalent.

3. More compromises of personal andfinancial information occur

An Ameriprise Financial employee’s laptop that contained

customer information was stolen out of a car. Ameriprise

Financial has sent letters to 158,000 customers informing

them accordingly. No customer SSNs were stored on the lap-

top, but unfortunately a file that contained the names and

SSNs of 68,000 current and former financial advisers was.

Providence Home Services is notifying 265,000 current and

former patients that their medical information fell into unau-

thorized hands when disks and tapes containing this informa-

tion were pilfered from the car of an employee. Information

about numerous current and former employees was also on

the stolen disks and tapes. No evidence that the stolen informa-

tion has been used for identity fraud purposes exists. Having

employees take home disks and tapes is a standard business

continuity-related procedure for Providence Home Services.

Providence Home Services has set up a hotline to answer inqui-

ries from those whose information was compromised.

A perpetrator gained unauthorized access to a computer at

the University of Delaware’s School of Urban Affairs and Pub-

lic Policy; SSNs of 159 graduate students were stored on that

system. Additionally, someone pilfered a backup hard drive

from the University’s Department of Entomology and Wildlife

Ecology; the hard drive contained personal data. The univer-

sity has notified all affected individuals.

http://Snipermail.com


The University of Notre Dame is looking into a break-in

into a server on which confidential data about financial

donors is stored. The attack was discovered early this year.

The compromised server, which was taken offline, was not

connected to central databases at the university; it is being

forensically analyzed. The school has informed individuals

whose information was compromised.

The University of Northern Iowa has sent letters to 6000

staff members telling them that information about them in

Internal Revenue Service W-2 forms was potentially compro-

mised when a laptop’s security was compromised. University

administrators say that there is no evidence that any of the

information was actually accessed, however. Staff members

were encouraged to closely watch their financial accounts in

case of identity theft attempts.

Canterbury University in New Zealand has terminated all

on-line access to student records after learning that students

were able to see other students’ records while they enrolled

on-line. The University is trying to determine the source of

the problem.

Perpetrators reportedly broke into the Rhode Island

government Web site, www.RI.gov, and stole credit card data

belonging to individuals who had engaged in on-line business

with Rhode Island state agencies. The credit card data were

encrypted. Several individuals boasted of these deeds on

a Russian-language Web site over a one month ago. A spokes-

person for the Rhode Island Web site said that security for the

site complies with the payment card Industry’s Data Security

Standards. Technical staff members have patched the vulner-

ability that the attackers exploited.

Credit card and bank routing information of up to nearly

one quarter million Boston Globe and Worcester Telegram &

Gazette subscribers was leaked when internal reports con-

taining this information were recycled for use as routing

receipts for bundles of newspapers. When it found out what

had happened, the Globe sent delivery staff to gather the rout-

ing receipts, but was successful in retrieving only a fraction of

them. The Globe has advised credit card companies and finan-

cial institutions concerning the incident and intends to send

notification letters to its subscribers. This company has also

set up a hotline to enable customers to learn whether or not

their financial information was leaked.

A Boston investment bank has reported that it has been get-

ting FAXes from Brigham and Women’s Hospital with patient

medical information (SSNs, medical test results, and more)

concerning women who have had recently given birth at this

facility. The bank’s finance manager has been destroying

every FAX copy and has notified the hospital several times,

but until recently to no avail – the FAXes kept coming for six

months. The documents contain a great deal of personal

information, including SSNs and medical test results. The hos-

pital plans to inform the patients whose data were exposed.

The FBI is looking into unauthorized changes in a MySQL

database on which an electronic medical record system at

an orthopedics clinic is based. Orthopedics Northeast (ONE),

which is based in Indiana, stumbled onto the problem when

severe performance decrements occurred three months ago.

Technical staff concluded that the changes were apparently

made by someone who had gained unauthorized access to

the system. The original path of entry was through a virtual

private network (VPN); the attacker reached a proxy server

and then exploited a backdoor in WebChart software made

by Medical Informatics Engineering (MIE). The attacker

appended characters to a database query, causing the server

to crash, and also erased a print server directory.

Security breaches at a Wal-Mart store and also at a OfficeMax

store in California prompted the Bank of America (BoA), Wash-

ington Mutual Bank, and a credit union to void about 200,000

debit cards. The BoA sent letters to potentially affected cus-

tomers; the letters explained that their debit cards were can-

celed and urged them to be on the alert for any unauthorized

transactions. Fortunately, so far no indications of customer ac-

count compromises or identity theft have surfaced. Neither

store has commented on the incidents. The FBI and the Secret

Service have launched an investigation.

John Lynch, Governor of New Hampshire, announced that

one of the state’s computers was accessed without authoriza-

tion. The perpetrators may have tried to obtain credit card

account information about New Hampshire residents. At

risk is information concerning computer and in-person trans-

actions at state liquor stores, motor vehicle offices, and other

places. Individuals who have used credit cards to buy some-

thing from one of these places during the last six months

were advised to watch for fraudulent transactions. The inci-

dent was discovered by technical staff, who found a Trojan

horse monitoring program running on the system.

A McAfee spokesperson reported that Deloitte & Touche,

the company’s external auditing firm, lost a CD that contained

the names, SSNs and McAfee stock holdings of a large number

of current and prior McAfee employees. Acknowledging that

an employee left the unlabelled CD in the seat back pocket

on an airplane, Deloitte & Touche informed McAfee about

the lost disk early this year. All potentially affected employees

have been informed accordingly.

So many compromises and potential compromises of

personal and financial data have occurred that I am not sure

exactly where to start in commenting on this last round of

lamentable incidents. I have a very difficult time understand-

ing why some organizations are so deficient in their security

and other practices. Consider, for example, all the organiza-

tions that allow unencrypted personal and financial data to

be stored on employee laptops or CDs that employees carry

around with them. Providence Home Services’ business conti-

nuity practices – having employees take backup disks and

tapes home with them – certainly qualifies as a ‘‘worst prac-

tice’’ anywhere as does Brigham and Women’s Hospital’s send-

ing FAXes containing patient medical information to the wrong

destination and then not ceasing to do so when informed of the

problem. In any case, it is clear that many organizations have

a long way to go when it comes to securing financial and per-

sonal data. As such, in the future we are very likely to see an

increasing amount of news related data security breaches.

4. Violence Against Women and DOJReauthorization Act bans annoyingpostings and messages

By signing the Violence Against Women and Department of

Justice Reauthorization Act, President Bush also signed into

http://www.RI.gov


law Section 113, ‘‘Preventing Cyberstalking,’’ which among

other things makes it illegal to anonymously post annoying

Web messages or send annoying email messages. The crimi-

nal penalties for not revealing one’s identity when posting

even potentially annoying Web content or email messages

include large fines and up to two years of imprisonment. The

law in effect updates existing telephone harassment laws to

prohibit using the Internet anonymously with the intention

to annoy. The pertinent section was buried into the unrelated

bill that passed in both houses of Congress. Critics contend

that this legislation circumvents the First Amendment’s pro-

tection of citizens’ right to write something that is potentially

annoying as well as their right to do it anonymously.

I like the idea of trying to clamp down on some of the

excesses (such as ‘‘cyberstalking’’) that occur on the Internet,

especially those designed to trigger fear and resentment, but I

seriously question some of the provisions of this act. What is

‘‘annoying’’ to one person may be perfectly acceptable to

another. The subjectivity involved in defining ‘‘annoying’’

promises to render this act difficult to interpret and enforce.

Additionally, the civil rights implications are downright

frightening. If this act stands up to the test cases that will

invariably surface, Americans will lose yet more rights during

a time in which civil liberties are already being seriously eroded

in the US.

5. Proposed US legislation would requiredeletion of personal information on USWeb sites

Proposed federal legislation currently being considered by the

US Congress would require every US Web site to delete all infor-

mation about visitors to the site, including names, street and

email addresses, telephone numbers, and so forth, if the infor-

mation is no longer needed for a bona fide business reason. The

provisions for personal information deletion in the proposed

legislation, the Eliminate Warehousing of Consumer Internet

Data Act of 2006, are intended to fight identity theft because

Web sites that contain personal information are often major

targets for computer criminals. Some speculate that one effect

of this requirement would be to reduce concern about search

engines storing information about users’ search terms, some-

thing for which the US Department of Justice (DOJ) recently

subpoenaed Yahoo, Google, and other search engine providers.

As the bill is currently worded ‘‘personal information’’ does not

refer to search terms or Internet addresses. If this proposed leg-

islation is passed, violations of this law could be punished by

the FTC as ‘‘deceptive business practices’’ whether or not

a Web site is run by a business or an individual.

The proposed legislation described in this news item ap-

pears to be another step forward in fighting identity theft. If

there is no genuine business-related reason to keep personal

information on a Web site, it is only logical that this informa-

tion be removed. Controversies and challenges concerning the

interpretation of ‘‘no longer needed’’ will, of course, surface.

Still, requiring Web site operators to purge unneeded personal

information will help ensure that at least some targets of

opportunity for would-be identity thieves will disappear.

6. Financial Services Authority Reporthighlights need for banks to booston-line security

In its Financial Risk 2006 Report the UK’s Financial Services

Authority (FSA) found that 50% of Internet users are very con-

cerned about the risk of fraud. The report stated that banks

should strive more to alleviate these concerns by educating

Web users about on-line security. Of the 1500 people asked

about their on-line habits, many reported that they used

good security practices, yet a fourth did not remember when

their security software such as anti-virus software was last

updated. Industry group Apacs found that Internet fraud

losses rose to GBP 14.4 million during the first half of 2005 –

more than triple that of the same period the previous year.

A critical point made in the FSA report was that if customers

were expected to absorb the costs for on-line fraud, 77% would

avoid on-line banking altogether. Recent reports of criminal

gangs stealing millions from the government through tax

credit scams involving the Department for Work and Pensions

and Network Rail have fueled the concerns of on-line cus-

tomers concerning on-line security.

It would be very difficult to disagree with the findings and

recommendations of the recent FSA report. Banks rely on

on-line transactions, yet they often do not go far enough in en-

suring that these transactions are secure. I especially worry

about the threat of keystroke sniffers being installed on users’

computers; few users know what keystroke loggers are, let

alone how to detect them. Losses from fraudulent on-line

transactions are starting to mount, as indicated in this and

several previous news items. As these losses grow, banks

and other financial institutions will be virtually forced to pay

more attention to on-line security.

7. Washington State and Microsoftsue anti-spyware vendor

Microsoft and the State of Washington each filed lawsuits in

the US District Court for the Western District of Washington

against Secure Computer and its principals. The charges in-

clude violation of Washington’s Computer Spyware Act and

three other laws. Secure Computer allegedly used scare tactics

that included putting misleading links on Google’s Web site,

producing unwanted pop up advertising, and spamming.

Secure Computer implied that their software came from or

was endorsed by Microsoft and then went further by using

a Windows feature to pop up warnings on PCs, informing

the users that their system had been compromised and that

they should run a spyware scan. Users were later advised to

buy Secure Computer’s Spyware Cleaner for USD 49.95 to

remove the malware that was supposedly installed on their

computers. The program does not work, however. Washington

state law establishes a fine of up to USD 100,000 per violation.

If Secure Computer has actually done what it is being

accused of having done, the lawsuits brought against this

company are a just punishment. As I have said so many times

before, computer users are for the most part incredibly naıve

concerning security issues; it would not be difficult for an


unscrupulous person or organization to cause uncertainty and

even fear sufficient to motivate them to buy software that

promises to fix whatever the apparent problem is.

8. Morgan Stanley offers to settlewith the SEC

US-based investment bank Morgan Stanley has offered to

settle with the Securities and Exchange Commission (SEC)

for USD 15 million to resolve a matter related to Morgan Stan-

ley’s having destroyed potential electronic evidence. The

company did not comply with an order to keep electronic

messages that pertained to a lawsuit that had been filed

against it. Morgan Stanley claims that backup tapes on which

the email messages in question were stored were accidentally

overwritten. The SEC has not decided whether to accept

Morgan Stanley’s settlement offer.

This is a truly fascinating case. Morgan Stanley somehow

‘‘got its wires crossed’’ and deleted evidence that the SEC

ordered it to hand over. I do not blame the SEC for taking its

time in deciding how to deal with this investment bank.

If the SEC accepts Morgan Stanley’s offer, Morgan Stanley will

not only get away relatively cheaply (remember, USD 15 mil-

lion is small change for a company such as Morgan Stanley),

but other companies faced with the dilemma of having to

hand over evidence that they know will be used against them

will also be tempted to ‘‘accidentally erase the evidence.’’ On

the other hand, offering to pay the SEC right up front not

only appears to be a magnanimous move on Morgan Stanley’s

part, but it also promises to close one of the many complicated

cases with which I am sure that the SEC is having to deal.

9. FTC settles with CardSystems Solutionsand ChoicePoint

ID verification services vendor CardSystems Solutions has set-

tled charges brought by the Federal Trade Commission (FTC)

that this company failed to secure sensitive customer data.

The charges followed a major security incident that led to

more than 260,000 individual cases of identity fraud. CardSys-

tems Solutions had been obtaining data from the magnetic

strips of credit and debit cards and storing them without

deploying ample security safeguards. The company, bought

by Pay By Touch late last year, has agreed to implement

a wide-ranging security program and undergo independent

security audits every two years for 20 years.

In settling with the FTC, ID verification services vendor

ChoicePoint must pay USD 10 million in civil penalties and

USD 5 million for consumer damages. The USD 10 million is

the FTC’s largest civil fine to date. ChoicePoint was charged

with not sufficiently screening its clients for legitimacy and

for data handling methods that violated the Fair Credit

Reporting Act, the FTC Act, other federal laws, and privacy

rights. The settlement requires ChoicePoint to establish a se-

curity program that includes verifying the legitimacy of clients

for their services, auditing its clients’ use of the information

obtained, and making visits to client sites. ChoicePoint now

must also submit its new security program to independent

security audits every two years until 2026.

The FTC deserves a lot of credit for its efforts here. This

commission ‘‘played hardball’’ with both CardSystems Solu-

tions and ChoicePoint because of their very deficient data pro-

tection practices and got a very good outcome in each case.

The large fine that ChoicePoint had to pay is particularly note-

worthy; the FTC is in effect saying that organizations that do

a poor job in protecting personal and financial data are going

to have to face meaningful punishment. Hopefully, this out-

come will send a powerful message to other organizations

that have poor data security practices.

10. Lawsuits are not curtailing illegaldownloads

Surveys of 3000 on-line users in Spain, Germany, and the UK,

by the industry group Jupiter and studies by the International

Federation of the Phonographic Industries (IFPI) are indicating

that despite almost 20,000 people being sued in illegal song

downloading cases in 17 countries, illegal file sharing activity

has remained close to the same for the past two years.

Approximately 335 legal download stores and on-line music

services have two million songs legally available – double

the amount from the previous year – with 420 million singles

legally downloaded in 2005 and sales exceeding USD 1 billion

in 2005 – up from USD 380 million in 2004. More rapid growth

is predicted this year. According to the surveys, 35% of illegal

file sharers have cut back on their activity, 14% have increased

their activity, and 33% of them buy less music than those who

obtain their music through purchasing it through legal chan-

nels. With approximately 870 million song files available

through illegal downloading on the net, the music industry

is having a difficult time persuading song-swappers to get

their music legally. The music industry is threatening to sue

Internet service providers (ISPs) if they do not start identifying

and stopping customers who ignore copyright restrictions. In

its Digital Music Report, the IFPI stated that music downloads

for mobile phones had reached USD 400 million annually,

which comprises 40% of the digital music business. Mean-

while, the plusses and minuses of the use of Digital Rights

Management technology, something that limits what con-

sumers can do with their music once they have purchased

it, are still being debated.

The entertainment industry faces a continuing uphill strug-

gle in its war against piracy. Using lawsuits as a mechanism for

reducing illegal downloads may not be working, but it neverthe-

less was a logical course of action to pursue. I suspect that much

of the reason that lawsuits are not working better than they are

is that most of the lawsuits have targeted individuals instead of

organizations. As such, many individuals who illegally down-

load movies and music are probably not even aware of the

many lawsuits that have been filed over these type of activities,

so there is little or no intimidation factor. It is logical to also as-

sume that the entertainment industry will in the not too distant

future now shift its strategy by increasingly going after ISPs who

do not prevent users from performing illegal downloads. The

road to success for this possible strategy is also not certain,

however; in the past numerous ISPs have been able to win court


battles against the RIAA and other entertainment industry enti-

ties when they have been directed to hand over names of illegal

file sharers. Again, the entertainment industry does indeed

have a long way to go.

11. Russian stock exchange operationsdisrupted by virus

A virus halted computing operations at the main Russian

stock exchange. The Russian Trading System (RTS) halted

operations in its three markets for slightly over 1 h after an

unidentified virus infected computing systems there. The

infection produced a massive amount of outgoing traffic that

disrupted normal network operations. The virus reportedly

came in over the Internet and infected a computer connected

to a test trading system. The infected computer then started

generating huge volumes of traffic to the point that it over-

loaded the RTS’s support routers. The result was that normal

traffic (data going into and out of the trading system) was not

being processed.

It is truly scary to think that a virus has actually

stopped trading transactions within a stock market. An in-

cident of this nature should not occur in as critical a setting

as in a stock market. I would be very curious to learn more

details about this incident, including what operating sys-

tem ran on the infected computer and how the virus in

question actually worked. I also wonder what kind (if

any) of anti-virus and incident response measures were

in effect.


Modeling network security

Danny Bradbury

a r t i c l e i n f o

Article history:

Received 6 March 2006

Revised 9 March 2006

Accepted 9 March 2006

A model idea for network security? For those that know

how, testing a virtual version of your network for security

can be more productive than testing the real thing.

For companies with complex networks, duplicating the

equipment and the intricate infrastructure for testing pur-

poses is impossible. There is simply too much of it, and it

would be too expensive. Testing live systems for vulnerabil-

ities is generally preferred, but how can you be sure that you

have tested for everything? And how can you model the secu-

rity implications of planned changes to the network?

The alternative to physical testing is to do it virtually.

Building mathematical models of networks can help security

evaluators to understand their strengths and weaknesses. A

model can help to identify single points of failure, or reveal

how particular events in some nodes could lead to unexpected

results elsewhere. Using these models as a guide, administra-

tors may be able to develop strategies to make networks more

reliable and more secure, preventing them from attack.

Ideally, a network model should be as accurate as possible,

but David A. Fisher, a senior member of the technical staff at

Carnegie Mellon’s Software Engineering Institute, draws a dis-

tinction between accuracy and precision in network model-

ing. ‘‘They are often confused,’’ he warns. ‘‘Accuracy has to

do with correctness but it can be precise or abstract.’’

A precise model will be as detailed as possible as it tries to

represent your network in software, usually down to the con-

figuration of specific devices, connection speeds, and applica-

tions. An abstract model, on the other hand, concerns itself

more with the general behavior of a network with certain

numbers of vaguely described nodes exhibiting certain behav-

iors. ‘‘If you want to do predictive simulations, you must be

both accurate and precise, but if you’re interested in gaining

insight, or understanding the mechanisms involved, an accu-

rate simulation without the precision is quite acceptable,’’

Fisher explains.

The value of abstract models becomes apparent in the con-

text of activities at CERT, Fisher’s old employer. Concerned

with broader issues of Internet security and broad security re-

sponses, the organization wants to understand the basic

mechanisms involved in an attack rather than getting into de-

tail. Malicious activities such as distributed denial of service

attacks occur across huge numbers of machines. ‘‘These

models don’t depend on details like the topology of the Inter-

net or who is connected to who,’’ Fisher explains. ‘‘They can

be abstracted away. You are concerned about the number of

machines vulnerable to attack, and the number of machines

capable of launching attacks.’’

It is these types of network, with large numbers of autono-

mous nodes, where emergent behavior is prevalent, Fisher

says. Outcomes for the whole system derive from local events.

As one node’s influence affects its neighbor, the neighbor’s

behavior in turn will affect other neighbors. This emergent

behavior is similar in different domains. The spread of viruses

in large computer networks, for example, can be similar to the

spread of biological epidemics – even though the details in

each domain (people vs computers) will be totally different.

Fisher developed a software language called Easel designed

to help model emergent behaviors in Internet security and the

critical national infrastructure. It works on the basis that, al-

though you could not reasonably model every single node

on the Internet for example, you must model enough nodes

to reflect emergent behavior. Although Fisher does most of

his work in the range of 20–1000 nodes, the Macintosh-based

tool can model abstract networks up to 32,000 nodes in size.

E-mail address: [email protected]




Modeling the emergent behavior in abstract networks may

be useful for looking at the critical national infrastructure, but

how will it help a real-world corporate network? Companies

like Opnet focus on precision models of such infrastructures.

‘‘Weproduce a virtual representation of thenetwork infrastruc-

ture,’’ says product marketing director for enterprise solutions

Devesh Satyavolu. This software model contains everything

from the servers hosting your applications through to the client

machines, routers, switches, and the protocols in use.

‘‘We create a baseline of the production infrastructure by

talking to various sources of monitoring information,’’ Sa-

tyavolu continues. These sources include most industry

systems management players such as Computer Associ-

ates, BMC, and Hewlett-Packard. The product feeds infor-

mation into a configuration management database

contained within its IT Guru product, and its Virtual Net-

work Environment then models its behavior, using both

historical network traffic information where available,

and also enabling administrators to model ‘what if’ scenar-

ios. The company’s Application Characterization Environ-

ment (ACE) also models application transaction behavior,

enabling staff to understand how interactions between cli-

ents and servers affect the network.

Opnet’s is a general network modeling environment,

designed to help staff manage everything from capacity to

performance. It is not security specific, but Satyavolu says

that it can be used for this purpose. ‘‘We have a comprehen-

sive rules engine for almost 400 rules that we ship, and

you could evaluate the accuracy of configurations on your

devices – firewalls, switches, routers etc – as they pertain to

network security,’’ he says, arguing that analyzing the net-

work against the rules can help you to answer security ques-

tions. ‘‘Can someone burrow from point A to point B in my

network, or will access controllers in the middle stop them,

and if so, where?’’

Skybox Security’s modeling software focuses heavily on

using precise network models to analyze vulnerabilities, ex-

plains vice president of worldwide marketing Ed Cooper. Its

Skybox View Suite gathers information about the network

and builds what the company calls an integrated security

model. The software tools in the suite can then be used to sim-

ulate attacks on the model, while also analyzing access paths

through the network to throw up unexpected vulnerabilities.

The model is of little use without adequate simulation, ar-

gues Cooper, because like Opnet it enables you to conduct

‘what if’ analyses on the network. ‘‘What if we deployed our

intrusion prevention systems and moved them from point A

to point B?’’ he asks. ‘‘If we infect a virtual model with

a worm, what propagation attributes will it adopt?’’

Getting the information out of real-world networks to pop-

ulate these precise models can be daunting. For companies

without a properly populated confirmation management da-

tabase, Skybox’s software must be given administration rights

to all network devices so that it can download their configura-

tion files during auto-discovery.

How such auto-discovery works in practice remains to be

seen, but an interesting function of the Skybox software is

the ability to include business impact rules enabling network

administrators to associate technology assets with monetary

risk to the business in the event of an attack. This can be

done using risk metrics, or regulatory compliance frameworks

such as ISO 17799 or Sarbanes Oxley.

Should companies choose an abstract modeling approach

or use more precise network models to create a more detailed

picture of their network? It isn’t an either/or decision, says

Colin O’Halloran, director of the systems assurance group at

QinetiQ, a commercial spin-off from the UK’s Ministry of

Defence’s Defence Evaluation and Research Agency, and a vis-

iting professor at the University of York. ‘‘Any model will

make simplifications because the more faithful you are to

the network the closer you’ll eventually get to the network

itself,’’ he says.

Models that attempt to represent your system in detail will

naturally make some assumptions and simplifications, O’Hal-

loran argues. The best approach is to use different models at

different stages of analysis. ‘‘It’s not one model; it’s a whole

family of models that you need.’’

QinetiQ begins an analysis by checking models at the high-

est abstract level, characterizing nodes as simply compro-

mised or not. For this, it uses the Failures-Divergence

Requirement (FDR) tool, a model checker for state machines.

Given a description of a system in terms of components

with simple states, it attempts to explore every possible com-

bination of states. Such tools often suffer from a combination

explosion problem, in which the number of combined states

becomes astronomically large. QinetiQ breaks down the net-

work into chunks, analyzing them individually.

This abstract analysis can help to highlight areas of poten-

tial vulnerability without pinning down what that vulnerabil-

ity is, Hopkins explains. ‘‘How it comes about is almost

irrelevant – it just allows someone from outside the firewall

to do something that they shouldn’t’ be able to do.’’

This level of analysis is pessimistic in nature. It will tell se-

curity consultants that there’s a possibility of an intrusion

path through the network, but it might be a false positive. At

this point, O’Halloran hands off to Paul Hopkins, group man-

ager for investigations and security health check at QinetiQ,

who stress tests that particular part of the network using

the Skybox Security software to see if there is a real vulnera-

bility there, or whether the problem is simply an artifact of

the abstract model.

But then, QinetiQ is a specialist in the security area, and

network modeling is not an activity for the faint of heart. At

the abstract level, you need the technical capability to build

and simulate these models, which in many cases will involve

programming the necessary properties and rules. At the pre-

cise level, your infrastructure must be mature enough to pro-

vide the information to make the model as complete as

possible (again, with the understanding that it is not possible

to be entirely complete).

Ideally, you will use a combination of the two, but this

will be a pipedream for most network departments strug-

gling to firefight everyday performance and capacity prob-

lems. They will be hammering out security scenarios

using conventional analysis tools on live networks for

some years to come.


Information Security – The Fourth Wave

Basie von Solms*

University of Johannesburg, Johannesburg, South Africa


Article history:

Received 20 February 2006

Revised 9 March 2006

Accepted 9 March 2006

Keywords:

Corporate Governance

Information Security

Information Security Management

Information Security Governance

Risk management

Sarbanes–Oxley

Social engineering

a b s t r a c t

In a previous article [von Solms, 2000], the development of Information Security up to the

year 2000 was characterized as consisting of three waves:

� the technical wave,

� the management wave, and

� the institutional wave.

This paper continues this development of Information Security by characterizing the

Fourth Wave – that of Information Security Governance.


1. Introduction

The First Wave was characterized by Information Security

being a technical issue, best left to the technical experts.

The Second Wave was driven by the realization that Informa-

tion Security has a strong management dimension, and that

aspects like policies and management involvement are very

important. The Third Wave consisted of the need to have

some form of standardization of Information Security in

a company, and aspects like best practices, certification, an

Information Security culture and the measurement and

monitoring of Information Security became important.

Since the paper (von Solms, 2000) introducing this develop-

ment cycle for Information Security appeared in Computers

and Security, the development of the next wave of Informa-

tion Security, the Fourth, became very clear and well defined.

This wave relates to the development and crucial role of

Information Security Governance.

The drivers behind this Fourth Wave are closely related to

developments in fields of Corporate Governance and the re-

lated legal and regulatory areas. Top management and Boards

of Directors felt the heat as they started to become personally

accountable for the health (read Information Security) of their

IT systems on which they base their planning and decisions.

This paper will discuss this Fourth Wave, and the drivers

behind the wave.

In Section 2 we will briefly investigate the development of

Corporate Governance, and highlight the relationship with

Information Security. Section 3 will discuss the relationship

between Corporate Governance and Information Security in

more detail, followed by Section 4 investigating the concept

of Information Security Governance. After that, in Section 5

we look at some of the drivers behind the Fourth Wave,

followed by Section 6 which presents the discussion about

some of the consequences of this wave. We conclude with

a summary in Section 7.

* Tel.: þ27 11 489 2843; fax: þ27 11 489 2138.E-mail address: [email protected]




2. Corporate Governance and InformationSecurity

Several documents related to Corporate Governance have

appeared during the last five years, and the importance of Cor-

porate Governance in general is now established on an inter-

national level. Important examples of such documents are the

OECD Principles of Corporate Governance (OECD Principles of

Corporate Governance, 2004) and the King 2 Report on Corpo-

rate Governance (King 2 Report on Corporate Governance,

2002).

The following two quotes come from the OECD document

under the section ‘Responsibilities of the Board’:

‘[Responsibilities of the Board include] ensuring the integrity of

the corporation’s accounting and financial reporting systems, in-

cluding the independent audit, and that appropriate systems of

control are in place, in particular, systems for risk management,

financial and operational control, and compliance with the law

and relevant standards.’

‘In order to fulfill their responsibilities, board members should

have access to accurate, relevant and timely information.’

Therefore, although these documents do not necessarily

refer to Information Security per se, they do refer to aspects

like reporting systems, systems of control, compliance with

relevant standards, risk management, accurate, relevant and

timely information, internal controls, etc.

Most companies are totally dependent on their IT

systems to capture, store, process and distribute company

information. As Information Security is and has always

been the discipline to mitigate risks impacting on the

confidentiality, integrity and availability of a company’s

IT resources, Information Security is extremely relevant

to what is required in such Corporate Governance

documents.

Several legal and regulatory developments related to

Corporate Governance have further escalated the role and ac-

countability of senior management as far as their Corporate

Governance responsibilities are concerned, reaching the

agendas of board and other high level meetings. The leading

example here is the Sarbanes–Oxley Act (Sarbanes–Oxley,

2002).

This Act requires top management (and the Board) to sign

off on the information contained in annual reports.

‘. in this law (Act) there is a provision mandating that

CEOs and CFOs attest to their companies’ having proper

‘internal controls’. It’s hard to sign off on the validity of

data if the systems maintaining it are not secure. It’s the

IT systems that keep the books. If systems are not secure,

then internal controls are not going to be too good.’ (Hurley,

2003)

From the above discussion, it is clear that, al-

though indirectly mentioned, there is a significant rela-

tionship between Corporate Governance and Information

Security.

3. The relationship between CorporateGovernance and Information Security

The important, and interesting, aspect of the relationship

between governance and security is the clarity with which

this relationship had been expressed in relevant and recent

documentation.

The following type of statements has started to appear

more regularly, highlighting the integral role of Information

Security in Corporate Governance.

‘Corporate Governance consists of the set of policies and

internal controls by which organizations, irrespective of

size or form, are directed and managed. Information

security governance is a subset of organizations’ overall

(corporate) governance program.’ (Information Security

Governance – a call to action).

‘.. boards of directors will increasingly be expected to

make information security an intrinsic part of governance,

preferably integrated with the processes they have in place

to govern IT’. (Information Security Governance: Guidance

for Boards of Directors and Executive Management).

What has also emerged is the pivotal role of Information

Security as a risk management or risk mitigation discipline.

A representative statement in this case is:

‘An information security programme is a risk mitigation

method like other control and governance actions and

should therefore clearly fit into overall enterprise gover-

nance.’ (Information Security Governance: Guidance for

Boards of Directors and Executive Management).

This growing realization has established the fact that Infor-

mation Security Governance has an enterprise wide impact,

and that the risks mitigated by an Information Security Gover-

nance plan are risks which have an enterprise wide business

implication.

Of course, we, as professionals and practitioners in the

field of Information Security, had been making these state-

ments for some time, but we never really succeeded in getting

the impact we wanted. The wider emphasis on good

Corporate Governance has now succeeded to achieve that

which we had been preaching for so long.

Let us now have a closer look at precisely what we can

understand under the concept of Information Security

Governance.

4. Information Security and InformationSecurity Governance

From the previous discussion, and many other references,

there can be no doubt that the developments in the field of

good Corporate Governance over the last three to four years

had escalated the importance of Information Security to

higher levels. It is not only the fact that the spotlight was on

Information Security which resulted in this, but also the


establishment and growth in maturity of the concept of

Information Security Governance.

It became clear that Information Security Governance is

more than just Information Security Management. Informa-

tion Security Governance clearly indicates the significant

role of top management and Boards of Directors in the way

Information Security is handled in a company.

The following definition tries to reflect this wider meaning of

Information Security Governance which flowed from its explicit

inclusion as an integral part of good Corporate Governance:

‘Information Security Governance is an integral part of

Corporate Governance, and consists of

� the management and leadership commitment of the

Board and Top management towards good information

security;

� the proper organizational structures for enforcing good

information security;

� full user awareness and commitment towards good infor-

mation security; and

� the necessary policies, procedures, processes, technologies

and compliance enforcement mechanisms

all working together to ensure that the confidentiality, integ-

rity and availability (CIA) of the company’s electronic assets

(data, information, software, hardware, people etc) are main-

tained at all times’.

Information Security Governance therefore involves every-

one in a company – from the Chairman of the Board right

through to the data entry clerk on the shop floor and the driver

of the vehicle delivering the products to the customers.

Information Security Governance can be seen as the over-

all way in which Information Security as a discipline is han-

dled (used) to mitigate IT risks. One of the essential

characteristics of Information Security Governance is the

fact that it consists of a ‘closed’ loop.

The loop starts with management’s commitment to Infor-

mation Security by treating it as a strategic aspect pivotal

to the existence of the company and being responsible for

managing the IT risks of the company. This treatment includes

the sanctioning of a Corporate Information Security Policy

accepted and signed off by the Board.

This Policy is supported by a suitable organizational struc-

ture for Information Security, specifying ownership and

responsibilities on all levels. The organizational structure

must take the compliance and operational management of

Information Security into account (von Solms, 2005). Such

ownership and responsibilities are strengthened by the necessary

User Awareness programs for all users of IT systems.

The required technology is rolled out and managed, and

compliance monitoring is instituted to measure the level of

compliance to policies, etc., reflecting the level to which IT

risks are managed. The results of such compliance monitoring

efforts are then fed back to Top Management to comprehen-

sively inform them about the status of IT risk management.

This closes the loop.

Information Security Governance is therefore the

implementation of the full well-known Plan–Do–Control–

Measure – Report loop.

Let us now investigate some of the drivers behind this

Fourth Wave in more detail.

5. Drivers behind the Fourth Wave

As discussed above, some of the major drivers behind this

Fourth Wave are definitely the bigger emphasis on good Cor-

porate Governance and the supporting legal and regulatory

developments in this area.

Taking one step back, we can again reason that the major

drivers for this bigger emphasis on good Corporate Governance

and the supporting legal and regulatory developments are the

risksofcommitting fraudandmisusingfinancial resourcesbyma-

nipulating the company’s electronic data stored on its IT systems.

Therefore, preventing fraud through manipulating elec-

tronic company data seems to be the core of this drive. From

this core came the relevant regulatory and legal develop-

ments, as well as the pressure for good Corporate Governance.

The total integration of IT into the strategic operation of

companies over the last few years, and pervasiveness of the

use of IT throughout companies and the services they deliver,

opened up many opportunities to commit fraud using the

company’s IT systems, resulting in serious risks.

One of the most serious of these risks is that of social engi-

neering and its relationship to Information Security.

Senior management realized that the human side of using

IT systems, by employees, clients and customers, can cause

serious risks, not withstanding the amount of money spent

on the technical measures. It became clear to them that the In-

formation Security problem cannot be solved by technical

means alone, and that strategic decisions on a high level had

to be made to ensure that all users are aware of possible risks,

and the impact of social engineering in attacking IT systems.

Attempts to use social engineering to commit fraud seem

to be rising. It is essential to realize that good Information Se-

curity Governance, in the sense discussed above, is essential

to addressing this risk.

Again, this has been stated over and over by Information

Security practitioners over many years, but the pressure

caused by good Corporate Governance allowed the penny to

drop on the level we targeted for such a long time.

An important question is, of course, whether this Fourth

Wave will be sustainable.

6. Some consequences of the Fourth Wave

As discussed above, the major drivers behind this Fourth

Wave are definitely the emphasis on good Corporate Gover-

nance and the supporting legal and regulatory developments

in this area. For this reason it can be accepted that the Fourth

Wave will be sustainable, in the sense that top management

will not lose interest – they cannot afford to, because their

heads are on the block.

This will give more exposure to Information Security in

general, which is what we have hoped for anyway. We will

probably find that audit committees become much more

sensitive towards Information Security, and we will even

see a person or persons on the Board assigned specific


Information Security Governance responsibilities. In many

instances, these steps have started already.

The following quote supports the realization mentioned

above:

‘Accordingto the2005Global InformationSecurityWorkforce

Study, sponsored by the International Information Systems

Security Certification Consortium, IT security professionals

are gaining increased access to corporate boardrooms. More

than 70% of those surveyed said they felt they had increased

influence on executives in 2005, and even more expect that

influence to keep growing.’ (Security Log, 2006).

It is, however, crucial to realize that Information Security

Governance as introduced by this Fourth Wave, is NOT a tech-

nical issue. Although it contains technical issues, other

(non-technical) issues like awareness and compliance man-

agement – ensuring that the stakeholders conform to all

relevant policies, procedures and standards – are core to

good Information Security Governance.

As such compliance and risk reporting is core to Informa-

tion Security Governance, we will therefore see that the

Fourth Wave requires more formal reporting tools and

mechanisms – ways and means to give Top Management an

easily understandable overview of precisely what the IT risks

are, and how these risks are being managed over time.

7. Summary

Based on the three waves in the development of Information

Security as introduced in other articles (von Solms, 2000),

Information Security development is presently in its Fourth

Wave.

This wave reflects the development of Information Secu-

rity Governance as a result of the emphasis on good Corporate

Governance.

The Fourth Wave of Information Security can therefore be

defined as the process of the explicit inclusion of Information

Security as an integral part of good Corporate Governance,

and the maturing of the concept of Information Security

Governance.

We as Information Security practitioners must use this

development to its optimum to ensure the security of IT

systems.

r e f e r e n c e s

OECD Principles of Corporate Governance, http://www.oecd.org/dataoecd/32/18/31557724.pdf; 2004 [accessed 13.01.2006].

King 2 Report on Corporate Governance, http://www.iodsa.co.za/corporate.htm; 2002 [accessed 13.01.2006].

Sarbanes–Oxley, http://news.findlaw.com/hdocs/docs/gwbush/sarbanesoxley072302.pdf; 2002.

Hurley E, http://searchsecurity.techtarget.com/originalContent/0, 289142,sid14_gci929451, 00.html; 2003.

Information Security Governance – a call to action, NationalCyber Security Summit Task Force, www.cyberpartnership.org/InfoSecGov_04.pdf; 2003.

Information Security Governance: guidance for Boards of Direc-tors and Executive Management. USA: IT Governance Insti-tute, ISBN 1-893209-28-8, www.itgovernance.org.

Security Log. Computerworld, http://www.computerworld.com/securitytopics/security/story/0, 10801, 107706, 00.html?source¼NLT_SEC&nid¼107706; 2006 [accessed 18.01.2006].

von Solms B. Information security governance. Computers andSecurity 2005;24:443–7.

von Solms B. Information Security – The Third Wave? Computersand Security 2000;19:615–20.

Prof SH (Basie) von Solms holds a PhD in Computer Science,

and is the Head of Department of the Academy for Informa-

tion Technology at the University of Johannesburg in Johan-

nesburg, South Africa. He has been lecturing in Computer

Science and IT related fields since 1970. Prof von Solms spe-

cializes in research and consultancy in the area of Information

Security. He has written more than 90 papers on this aspect,

most of which were published internationally. Prof. S. H. von

Solms also supervised more than 15 PhD, students and more

than 45 Master students. Prof von Solms is the present Vice-

President of IFIP, the International Federation for Information

Processing, and the immediate past Chairman of Technical

Committee 11 (Information Security), of the IFIP. He is also

a member of the General Assembly of IFIP. He has given nu-

merous papers, related to Information Security, at Interna-

tional conferences and is regularly invited to be a member of

the Program Committees for international conferences. Prof

von Solms has been a consultant to industry on the subject

of Information Security for the last 10 years, and received

the 2005 ICT Leadership Award from the ICT Industry in SA.

He is a Member of the British Computer Society, a Fellow of

the Computer Society of South Africa, and a SAATCA Certifi-

cated Auditor for ISO 17799, the international Code of Practice

for Information Security Management.

http://www.oecd.org/dataoecd/32/18/31557724.pdf

http://www.oecd.org/dataoecd/32/18/31557724.pdf

http://www.iodsa.co.za/corporate.htm

http://www.iodsa.co.za/corporate.htm

http://news.findlaw.com/hdocs/docs/gwbush/sarbanesoxley072302.pdf

http://news.findlaw.com/hdocs/docs/gwbush/sarbanesoxley072302.pdf

http://searchsecurity.techtarget.com/originalContent/0%2C%20289142%2Csid14_gci929451%2C%2000.html

http://searchsecurity.techtarget.com/originalContent/0%2C%20289142%2Csid14_gci929451%2C%2000.html

http://www.cyberpartnership.org/InfoSecGov_04.pdf

http://www.cyberpartnership.org/InfoSecGov_04.pdf

http://www.itgovernance.org

http://www.computerworld.com/securitytopics/security/story/0, 10801, 107706, 00.html?source=NLT_SEC&nid=107706





Computers & Security (2006) 25

www.elsevier.com/locate/cose

EVENTS

CSI NET SEC ‘0612—14 June 2006Scottsdale, Arizona, USAwww.csinetsec.com

INFOSECURITY CANADA14—16 June 2006Toronto, Canadawww.infosecuritycanada.com

INTERNATIONAL CONFERENCE ON DEPENDABLESYSTEMS AND NETWORKS 200625—28 June 2006Philadelphia, Pa, USAwww.dsn.org

18TH ANNUAL FIRST CONFERENCE25—30 June 2006Baltimore, Maryland, USAwww.first.org/conference/2006

THE 26TH INTERNATIONAL CONFERENCE ONDISTRIBUTED COMPUTING SYSTEMS4—7 July 2006Lisboa, Portugalhttp://icdcs2006.di.fc.ul.pt/

IEEE CEC 2006 SPECIAL SESSION ONEVOLUTIONARY COMPUTATION IN CRYPTOLOGY AND COMPUTER SECURITY16—21 July 2006Vancouver BC, Canadahttp://163.117.149.137/cec2006ss.html

BLACK HAT USA 200629 July—3 August 2006Las Vegas, USAhttp://www.blackhat.com/html/bh-link/briefings.html

ISACA INTERNATIONAL CONFERENCE30 July—2 August 2006Adelaide, Australiawww.isaca.org

8TH ANNUAL NEBRASKACERT8—10 August 2006Omaha, Nebraska, USAwww.certconf.org

19TH IFIP WORLD COMPUTER CONGRESS20—25 August 2006Santiago, Chilehttp://www.wcc-2006.org/

CSI ANNUAL CONFERENCE AND EXHIBITION5—8 November 2006Orlando, Florida, USAwww.gocsi.com

For a more detailed listing of IS security and audit events, please refer to the events diary on www.compsecon-line.com


Computers & Security (2006) 25, 169e183


Real-time analysis of intrusion detectionalerts via correlation

Soojin Lee*, Byungchun Chung, Heeyoul Kim, Yunho Lee,Chanil Park, Hyunsoo Yoon

Division of Computer Science, Department of Electrical Engineering and Computer Science,Korea Advanced Institute of Technology (KAIST), 373-1 Guseong-Dong, Yuseong-Gu,Daejeon, Republic of Korea

Received 8 December 2004; revised 28 June 2005; accepted 23 September 2005

KEYWORDSSecurity;Intrusion detection;Correlation;Alert analysis;Reduction;Attack scenario

Abstract With the growing deployment of networks and the Internet, the impor-tance of network security has increased. Recently, however, systems that detectintrusions, which are important in security countermeasures, have been unableto provide proper analysis or an effective defense mechanism. Instead, they haveoverwhelmed human operators with a large volume of intrusion detection alerts.This paper presents a fast and efficient system for analyzing alerts. Our system ba-sically depends on the probabilistic correlation. However, we enhance the probabi-listic correlation by applying more systematically defined similarity functions andalso present a new correlation component that is absent in other correlation mod-els. The system can produce meaningful information by aggregating and correlatingthe large volume of alerts and can detect large-scale attacks such as distributeddenial of service (DDoS) in early stage. We measured the processing rate of eachelementary component and carried out a scenario-based test in order to analyzethe efficiency of our system. Although the system is still imperfect, we were ableto reduce the numerous redundant alerts 5.5% of the original volume without dis-torting the meaning through two-phase reduction. This ability reduces the manage-ment overhead drastically and makes the analysis and correlation easy. Moreover,we were able to construct attack scenarios for multistep attacks and detect large-scale attacks in real time.ª 2005 Elsevier Ltd. All rights reserved.

* Corresponding author. Tel.: þ82 42 869 5552; fax: þ82 42 869 5569.E-mail addresses: [email protected] (S. Lee), [email protected] (B. Chung), [email protected]

(H. Kim), [email protected] (Y. Lee), [email protected] (C. Park), [email protected] (H. Yoon).

0167-4048/$ - see front matter ª 2005 Elsevier Ltd. All rights reserved.doi:10.1016/j.cose.2005.09.004








170 S. Lee et al.

Introduction

Cyber attacks are escalating as the mission-criticalinfrastructures for governments, companies, insti-tutions, and millions of every day users becomeincreasingly reliant on interdependent computernetworks and the Internet. Moreover, currentcyber attacks show a tendency to become moreprecise, distributive, and large-scale (CERT Coor-dination Center; Bugtraq). However, recent intru-sion detection systems (IDSs) which are importantin security countermeasures, have been unableto provide proper analysis or an effective securitymechanism for defending such cyber attacks be-cause of several limitations.

First, as network traffic increases, the intrusiondetection alerts produced by IDSs are increasingexponentially. In spite of this increase, most IDSsneglect the overhead of human operators, who areoverwhelmed by the large volume of alerts. Sec-ond, human operators are fully responsible foranalyzing a network’s status and the trends ofcyber attacks. Third, although cyber attacks canproduce multiple correlated alerts (Kendall, 1999;CERT Coordination Center), IDSs are generally un-able to detect such attacks as a complex singleattack but regard each alert as a separate attack.Therefore, in the early stage, it is difficult to de-tect large-scale attacks such as a distributed de-nial of service (DDoS) or a worm.

These limitations are caused by the absence ofa mechanism that can preprocess and correlate themassive number of alerts from IDSs. In fact, prepro-cessing and correlation of alerts are essential forhuman operators because the information repro-duced by this means can reduce the overhead ofhuman operators and help them react appropriately(Bloedorn et al., 2001).

In proposing a fast and efficient system thatanalyzes intrusion detection alerts via correlation,we focused on providing human operators witha level of flexibility that matches the topology andstatus of a network. Our system basically dependson the probabilistic correlation proposed in Valdesand Skinner (2001) rather than the fixed rule-basedcorrelation of Perrochon et al. (2000), Cuppens(2001a,b) Cuppens et al. (2002), Lee (1999) andLee et al. (2000). Compared with other models,our model, which is similar to the probabilisticcorrelation, has several advantages.

First, we considered the time similarity, thoughthis major measure of correlation is disregarded inother models, and we used a mathematical func-tion that computes the time similarity on the basisof Browne’s result in Browne et al. (2001). To pro-

cess the time information more systematically,we also applied the result to our system.

Second, for immediate analysis of the status ofa managed network and the trends of cyberattacks, we used a situator, which can grasp thetrend of attacks being generated in the network byanalyzing the relations between the source andthe destination, as one of our components. Witha situator, we could detect large-scale attackssuch as a DDoS or worm in the early stage, and wecould respond to such threats as soon as possible.

Third, we implemented our model and tested itfor various attack scenarios. Moreover, as a resultof our improvement, the system has the capabilityof real-time processing and is therefore morepractical than other models.

The remainder of this paper is organized asfollows. In next section, we describe the architec-ture of our proposed system and the details ofeach component. Then, we describe the correla-tion hierarchy and similarity functions of oursystem. Further, we compare our system withother correlation systems and illustrate the per-formance of our system which is followed by anoverview of previously proposed correlation mech-anisms. Finally, in the last section, we summarizethe paper and discuss future work.

System architecture

Our system consists of the five components asshown in Fig. 1: Filter, Control center, Aggregator,Correlator, and Situator. We attached the filter tothe sensor in each managed network and operatedother components from the control center.

Control center

The control center receives filtered alerts asa Thread Event from the filter and saves them in

Sensor

Filter

Sensor

Filter

Control center

Aggregator

Correlator

Situator

DBDB

Figure 1 Overall system architecture.

Real-time analysis of intrusion detection alerts via correlation 171

FILTER

Alert Receiver

Module

ThreadEvent

Maker Module

ThreadEvent

Sender Module

Alert Queue

ThreadEvent Table

SENSOR

CENTER

(1)

(3)

(2)

(4)

(5)

(6)

(7)(8)

(4) Search previous ThreadEvent

(5) Insert new Thread Event

(6) Search Time information

(7) Receive Thread Event that

will be forwarded to center

Sensor Alert

Thread Event

Control information

Timer

Figure 2 Internal architecture and processing flow of filter.

a database before forwarding them to the aggrega-tor and the situator for further processing. Whenthe system is started, the control center initializesa runtime environment by connecting to a data-base, setting the parameters and so on. The view-ers that are used for inspecting the processedinformation in each component and the data struc-ture are also defined in the control center.

Filter

The filter gathers alerts from the sensor in eachmanaged network and eliminates redundanciesamong those alerts. The features that the filteruses to eliminate the redundancies are the sourceand class of the attack. The filter merges theredundant alerts into Thread Events and forwardsthem to the control center at regular intervals.The filter consists of three modules: an AlertReceiver, a ThreadEvent Maker, and a ThreadEventSender. The alert receiver forms one process andthe other two modules behave as multiple threadsin a single process. Fig. 2 shows the internal archi-tecture and processing flow of the filter.

Alert receiver: the primary sensor of our system isa widespread NIDS Snort. The alert receiver receivesalerts from the sensor in the form of an Alertpkt1

struct type, and sends them to the alert queue. Thealert queue saves the alerts in the order of arrival.

ThreadEvent Maker: after receiving the alertsfrom the queue, the ThreadEvent Maker comparesthem with previous alerts. If exact matches exist

1 Alertpkt is a data structure of snort log.

between the alerts, ThreadEvent Maker mergesthe alerts into a matching thread event. Otherwise,if there is no match in the source or class of theattack, a new thread event is generated. Fig. 3shows the flow diagram of the ThreadEvent Maker.

ThreadEvent Sender: the ThreadEvent Sendertransfers the thread events to the control centerat predefined intervals. That is, whenever the alertaggregation interval defined in the timer expires,the ThreadEvent Sender stops updating the threadevent table and transfers thread events to controlcenter for further processing. The timer is recon-figurable according to the status of the network.In our experiment, we set the timer as 1 min.

Aggregator

The aggregator compares the similarity of featuresbetween the thread events transferred from eachfilter. If common features exist between twothread events, the aggregator merges them intoone meta event named an Aggregation Event. Theaggregator can merge the thread events that maynot be merged into a similar thread event in thefilter because the aggregator has longer merginginterval than the filter.

Fig. 4 illustrates a diagram of the processing flowof the aggregator. When new thread events aretransferred into the control center, the networkmodule that is communicating with each managednetwork calls the aggregator. The aggregator thenextracts the previous aggregation events generatedfor certain period of time from the database and,using the similarity functions defined in section

172 S. Lee et al.

Dequeuing AlertEvent

from Alertqueue

Is Alertqueue empty?

Wait until Alertqueue

has elements

N

Y

Does ThreadEvent Table

has the meta ThreadEvent

that has the same src IP

and attack class?

Y

N

Create sub ThreadEvent

and attach it to the matching

meta ThreadEvent

Create meta ThreadEvent

and save it to

ThreadEvent Table

Figure 3 Flow diagram of ThreadEvent Maker.

Similarity functions, compares them with thenewly transferred thread events to determinewhether they have common features. If theyhave common features that satisfy the predefinedconditions, the aggregator updates the previousaggregation event to include the new threadevent. Otherwise, the aggregator generatesa new aggregation event. Table 1 shows the weightof each feature that is used to merge the threadevents in the aggregator. To aggregate the dupli-cated alerts, we suppressed the minimum expec-tation of similarity on the source, destination,and attack class of the attack. We also relaxedthe minimum expectation and similarity expecta-tion of the time in order to aggregate threadevents that could not be merged in the filter dueto the short merging interval.

Correlator

By analyzing the timing and causal relation be-tween aggregation events, the correlator can catchattack scenarios that are carried out in multiplesteps and accumulate a store of knowledge aboutnew attack patterns. We therefore suppressed theminimum expectation of similarity on the sourceand destination of the attack as shown in Table 2.We also enforced the similarity expectation on thesource and destination of the attack. Moreover, tocorrelate various attacks with the same destina-tion, the minimum expectation and similarity ex-pectation of the attack class are set to low.

As shown in Fig. 5, which illustrates a diagram ofthe processing flow of the correlator, the correlatoronly processes aggregation events. When a new

Sensor1

Sensor2

Sensor3

Thread EventDatabase

Update CorrelatedMeta EventDatabase

SelectMeta Event

Update orCreate

- Call Fusion( )

Thread Event

Control center

Network module

- Save Thread Event

Save

Thread Event

Aggregator

- Calculate feature

similarity & overall

similarity

- Update or create

Aggregation Event

Aggregtion EventDatabase

- Communicate

with each network

Network 1

Network 2

Network 3

Figure 4 Processing flow diagram of aggregator.


Table 1 Weight of each feature in aggregator

Feature Source IP Source port Destination IP Destination port Attack class Time

Expectationa Medium Low Medium Low High MediumMinimumb High Low High Low High Medium

a Expectation: similarity expectation.b Minimum: minimum expectation of similarity.

aggregation event is transferred from the aggrega-tor, the correlator selects the previously generatedCorrelation Events within certain periods. If thereis a matching event in the lists of selected events,a new aggregation event is merged into that corre-lation event with the time information. Otherwise,a new correlation event is generated.

The correlator can provide us with importantinformation about the similarity of an attack class.For example, the attack scenarios detected in thecorrelator may consist of related attacks, andthese attacks can be considered as similar. There-fore, to construct a more precise matrix, theresults of the correlator should be fed back tothe similarity matrix.

Situator

The situator grasps the trend of attacks beinggenerated in the network by analyzing the rela-tions between the source and the destination. Thiscapability enables early detection of the large-scale attacks that originate from many attackersaround the world, such as DDoS and worm, and itreduces the response time.

The situator can detect three types of attack:1:N, N:1, and M:N. The 1:N attack means an attackthat originates from a single source to multiple des-tinations, such as a network scan and a service scan.In contrast, the N:1 attack means an attack that orig-inates from multiple sources to a single destination.

One example of an N:1 attack is a DDoS, andsuch attacks tend to increase without warning.Therefore, by analyzing the attack trends in a net-work, we can detect attacks in the early stage. Aswith a worm or virus, an M:N attack has the entirenetwork as its destination. While this type of at-tack generates a small number of events for a spe-cific source and destination, it generates a greatnumber of events in the entire network.

Fig. 6 shows the internal architecture and a sim-ple flow diagram of the situator. The situator saveseach threadevent that is transferred from the filtersin candidate lists first. If the number of threadevents saved in each candidate list exceeds a prede-fined threshold, the situator classifies them intoa corresponding situation and generates the Situa-tion Events. A human operator can reconfigure thethreshold according to the status of the managednetwork or the trend of the current attacks. Fig. 7shows a more detailed flow diagram of the situator.

Hierarchy of correlation

Each component of our system achieves thefollowing hierarchy of correlation and, to get thecorrelation at different stages of the hierarchy, weuse multiple events. For example, we can inferthread events (within a sensor correlation) andthen merge them into aggregation events orsituation events for center-level inspection. Theaggregation events are correlated into the corre-lation events again in order to construct the attackscenarios of multistep attacks. Fig. 8 shows theoverall hierarchy of correlation.

Thread event: a thread event is the primary in-formation unit in our system. The filter first elimi-nates the redundancies among a large number ofraw alerts and merges them into a small numberof thread events. In this process, the filter doesnot use the similarity functions defined in sectionSimilarity functions. Instead, the filter simplycompares the source and the attack class. Thethread events generated in the filter are trans-ferred to the control center for further processing.

Aggregation event: by setting the minimum ex-pectation of similarity on the source, the destina-tion ip address, and the attack class as high, and byrelaxing the similarity expectation for the time as

Table 2 Weight of each feature in correlator

Feature Source IP Source port Destination IP Destinationport

Attack class Time

Expectation High Low High Low Low LowMinimum High Low High Low Low Low

174 S. Lee et al.

shown in Table 1, we can merge more threadevents from several sensors into a single aggrega-tion event.

Correlation event: by relaxing the minimumexpectation of similarity on the attack class asshown in Table 2, we were able to reconstructvarious steps of a multistep attack. Each step ofan attack may itself be an aggregation event. Inview of this possibility, we can recognize a multi-step attack composed of, for example, a probefollowed by an exploit to gain access to a criticalhost, and then using that host to launch an attackto a more critical asset. To correlate the aggrega-tion event, we also enforced the minimum ex-pectations on the source and destination of theattack.

Situation event: a situation event, which is in-dependent of the aggregation event and the corre-lation event, comprises only thread events. As withthe filter, the similarity functions are not usedwhen the situation events are generated.

Similarity functions

In so far as we consider the similarity of features,the minimum expectation of similarity, and theexpectation of similarity, our correlation approachis similar to the probabilistic alert correlation

Correlator

Update

CreateCorrelatedMeta EventDatabase

- Calculate Similarity

- Update or create Correlated MetaEvent

Aggregation EventDatabase

SimilarityMatrix

Feedback

Figure 5 Processing flow diagram of correlator.

Worm

Detector

DDoS

Detector

Scan

Detector

M:N Situator

N:1 Situator

1:N Situator

N:1 Candidates

1:N Candidates

Situator

Thread EventSituation Event

ThreadEvent

Figure 6 Internal architecture and processing flow di-agram of situator.

proposed in Valdes and Skinner (2001). However,as mentioned in section Introduction, we intro-duce a more systematic approach by referring toearlier research. To correlate meta events thatare possibly composed of several alerts, we de-fined similarity functions for a list value. Featuresused in analyzing the similarity include the ip ad-dress, the port, the attack class of the attack,and the time information. In this section, we de-scribe only the basic similarity functions.

IP address similarity

If the sources of two different events (or attacks)belong to the same sub-network, there is greaterprobability that the same attacker launched thetwo events. This probability may increase expo-nentially as the matching address becomes longer.We can infer, therefore, that the similarity of ipaddress agrees with the log scale. The similarityfunction for the ip address is defined as follows andits value can be readjusted to a realistic levelthrough more experiments.

IPsimilarity (string IP1, string IP2) {If perfect match, return 1;If C class match, return 0.8;If B class match, return 0.4;If A class match, return 0.2;Return 0;}

Port similarity

There may be a high probability that the list ofinput events will become a subset of the metaevents because newly input events are generallylow level of the earlier meta events. We thereforedefined the similarity between the port list of twoevents as the mean of the similarity between eachinput event and the meta event. The input eventhas a port list of L1¼ {x1, x2, ., xn}, and the metaevent has a port list of L2¼ {y1, y2, ., ym}, then thesimilarity S between L1 and L2 is defined as follows.

Si ¼ max1� j�m

�Similarity

�xi;yj

��

S¼ 1

n

XSi

Attack class similarity

Because heterogeneous sensors may use a differentname for the same attack, it is difficult to


M:N Situator N:1 Situator N:1 Candidates 1:N Situator 1:N Candidates

thread event If exists then increase

M:N Situator; exit;

Otherwise

If exists then increase

N:1 Situator

Otherwise

If exists then increase

1:N Situator

Union N:1,1:N Situator such that

AttackClass = event.AttackClass

If Src_Cnt > T_N1 and Dst_Cnt > T_1N then

Insert into M:N Situator

Otherwise

Increase N:1

CandidatesIf Src_Cnt > T_N1

Increase 1:N

CandidatesIf Dst_Cnt > T_1N

Figure 7 Detailed processing flow of situator.

correlate alerts from heterogeneous sensors. Forthat reason, we need to define the definite set ofthe attack class, and classify various attacks intothe corresponding class according to theircharacteristics.

This potential problem between heterogeneoussensors does not occur in our current developedsystem, since we use only one sensor, Snort.However, to enable our system to correlate het-erogeneous sensors in an integrated system, weneed to define the attack class. In our currentsystem, we used the 34 types of attacks2 that thesnort provides; in the future, we hope to offerclearer definitions of the attack classes.

To define the similarity between two attackclasses, we constructed a similarity matrix, S. Thesize of the S matrix is 34 by 34, and each value inthe matrix is between 0 and 1. If the attack class ofthe input event is i and the attack class of themeta event is j, similarity between the two attackclasses is defined as S[i, j]. To establish the initialvalues of the similarity matrix of the attack clas-ses, we statistically analyzed a DARPA data setand the real intrusion detection alerts.

Time similarity

The time information is important in alert corre-lation, and time similarity has great significancewhen we calculate the overall similarity. Forexample, if most features of the two events aresimilar, and if the two events are extremely

2 ‘‘34 types of attacks’’ means Snort default classifications.The ‘classtype’ field of snort alert uses 34 classifications definedby the classification config option. The classifications used by therules provided with Snort are defined in etc/classification.config.

distant in time, we should regard the two eventsas having no similarity. According to the trendanalysis of exploitations conducted by Browneet al. (2001), the number of incidents caused byone exploit can be modeled with the followingformula.

C¼ Iþ S�ffiffiffiffiMp

C: the cumulative count of reported incidentsI, S: the regression coefficients determined byanalysis of the incident report dataM: the time since the start of the exploit cycle

The above formula indicates that most events(or incidents) are generated in the early stage ofan exploit cycle, and the number decreases withtime. Consequently, the time similarity betweena newly input event and an earlier meta eventincreases as close in time. In contrast, the timesimilarity decreases rapidly as far in time. If thecreated time of the input event is t2 and the

Alert

Alert

Alert

Alert

Alert

Alert

Alert

Thread Event

Thread Event

Thread Event

Aggregation

Event

Correlation

Event

Situation

Event

Sensor Filter

Aggregator Correlator

Situator

M:N Situation

N:1 Situation

1:N Situation

Figure 8 Overall hierarchy of correlation.

176 S. Lee et al.

created time of the meta event is t1, then the timesimilarity S is defined as follows.

S¼ I� V �ffiffiffiffiffiffiffiffiffiffiffiffiffiffit2� t1

p

I, V: the coefficients determined by the statusof network

Overall similarity

After calculating the similarity of each feature, weneed to calculate the overall similarity in order todecide whether the two events can be correlated.When calculating overall similarity, the expecta-tion of similarity and the minimum expectationof similarity play an important role as a weightand a necessary condition each. By using theexpectation of similarity, we can attach impor-tance to the significant features. The minimumexpectation of similarity is used as a thresholdvalue. For instance, certain features can be re-quired to match exactly or approximately for anevent to be considered as a candidate for correla-tion with another. The minimum expectation thusexpresses the necessary but not sufficient condi-tions for correlation.

If any overlapping feature matches at a value lessthan the minimum similarity for the feature, theoverall similarity between two events is zero.Otherwise, the overall similarity is the weightedaverage of the similarities of the overlapping fea-tures, using the respective expectations of similar-ity as weights.

As with the probabilistic approach (Valdes andSkinner, 2001), we can define the overall similaritybetween a new event, X, and an earlier event, Y,as follows.

SIMðX;YÞ ¼

Pj

Ej� SIM�Xj;Yj

�

Pj

Ej

j: index over features in the eventEj: expectation of similarity for feature jXj, Yj: values for feature j in events X and Y,respectively

After the overall similarities are calculatedbetween a new event and earlier candidateevents, the largest value becomes a candidatevalue for correlation. If the candidate value islarger than the predefined threshold, the newevent and the corresponding meta event are

merged into another meta event. Otherwise,a new event is generated using the new event.

Analysis and performance evaluation

In this section, we describe the differences be-tween our system and previous correlation sys-tems. We proceed with the results of performanceevaluation.

Comparison with previous correlationsystems

We based our proposed system on probabilisticalert correlation (Valdes and Skinner, 2001). Thatapproach is similar to ours in so far as the correla-tion components use feature-specific similarityfunctions and a probabilistic criterion. Our system,however, has some distinctive characteristics.First, our system uses the filter in each managednetwork to merge duplicate alerts into threadevents; and, before conducting further correla-tion, the thread events that are transferred tothe control center are merged again by the aggre-gator. That is, although all alerts are consideredthe subject of correlation in the probabilisticapproach, in our system only preprocessed alerts(or events) in the filter and the aggregator are cor-related. This property can drastically reduce theoverhead of our system. Second, in contrast toour system, the probabilistic approach does notconsider time information as a significant feature.For example, in the probabilistic approach, thethread and scenario aggregation may occur overintervals of days. Third, to construct the attackclass similarity matrix, we statistically analyzethe DARPA data set and the live alerts collectedfrom our network.

This approach enables us to reflect the recentattack trends and to construct a more substantialmatrix.

Our system is more flexible than rule-basedapproaches such as the Stanford CIDF correlator(Perrochon et al., 2000) and the planning processmodel (Cuppens et al., 2002). For instance, itcan find a new multistep attack and get a realisticresult in the correlation process. Moreover, whilemost correlation systems (Perrochon et al., 2000;Cuppens et al., 2002; Debar and Wespi, 2001;Valdes and Skinner, 2001; Porras et al., 2002;Ning et al., 2002; Lee et al., 2000) cannot detectlarge-scale attacks such as DDoS or worm in theearly stage, our system can detect such attacksin real time.


Performance evaluation

To assess the processing power and efficiency of oursystem, we measured the reduction ratio of thefilter and the correlator, and the processing time ofeach component. To evaluate the correlation per-formance, we also conducted the following scenario-based test using known techniques and exploits.

� Scenario #1: stealth scan to specific host� Scenario #2: Buffer Overflow attack to FTP

server� Scenario #3: CGI attack to Web server� Scenario #4: Buffer Overflow attack to RPC

service� Scenario #5: network scan to multiple host [1:N

attack]� Scenario #6: DDoS attack [N:1 attack]� Scenario #7: attack using worm and virus [N:M

attack]

Although useful for evaluating the performanceof the IDS, the DARPA data set is unsuitable forevaluating the correlation system. We thereforeconducted known attack scenarios to inspectwhether our correlation component successfullydetects multistep attacks and large-scale attacksin the early stage. In this paper, we only presentthe results of Scenarios #2, #4, and #6.

Reduction ratio of filter and aggregatorWe measured the reduction ratio of the filter andthe aggregator by dividing the number of eventsgenerated in the filter or the aggregator into thetotal number of alerts generated in the testperiod. The results are as shown in Table 3.

When we set the timer in the filter to 1 min, theaverage reduction ratio of the filter is 11.1%. Inthe case of ICMP Nachi Worm by Ping CyberKit,the filter on average merged 20 alerts into a singlethread event, and the maximum reduction ratiowas 5%.

Processing time of each componentTo assess the processing efficiency of each com-ponent, we measured the processing time ofwhatever single thread event (i.e. may includeseveral alerts) was processed completely in eachcomponent.

As shown in Table 4, our system has a real-timeprocessing capability, since a single thread eventcan be processed completely in all componentswithin 0.8 s. Furthermore, in contrast to other sys-tems, in which most processes are conducted man-ually, the automation in our system can drasticallyreduce the management overhead of human

operators and simplify analysis. Moreover, byproviding useful information for the reaction of hu-man operators in real time, our system can reducethe response time.

Performance evaluation of correlator(Scenario #2)Scenario #2 is a multistep attack that exploits thevulnerabilities of ftp server and consists of follow-ing steps.

� Step 1: host scan (detect whether ftp service isrunning)� Step 2: login attempt to detect the type of ftp

server� Step 3: execution of exploit code to attack the

vulnerabilities of ftp server� Step 4: system file access after acquiring the

privilege of root

In this scenario, an attacker may use NMAP(NMAP) to try to detect whether an ftp service isrunning on the target host. According to the resultsof the NMAP, the attacker can determine whetherthe target host runs the ftp service. The attackerthen logins to the target host to detect the typeof ftp server that is running. Finally, the attackerlearns that the WU-FTP is running.

The attacker exploits the specific vulnerabilityof the WU-FTP server as shown in Fig. 9. If theexploitation succeeds, the attacker can access

Table 3 Reduction ratio of filter and aggregator

Component Total number of alertsgenerated within theperiod of test

599,403

Filter # of thread event 66,775Average reduction ratio(# of thread event/#of alert)

11.1%

Aggregator # of aggregation event 33,173Average reduction ratio(# of aggregation event/#of alert)

5.5%

Table 4 Processing time of each component

Component Processingtime (s)

Thread event saving in control center 0.0097Aggregator 0.5398Correlator (include event update) 0.0887Situator 0.1691

Total 0.8103

178 S. Lee et al.

Figure 9 Buffer Overflow attack to FTP server.

the server using the username ‘ftp’ and gain theprivilege of the system administrator. The attackerthen transfers the significant system file (that is, /etc/passwd) to the attacker’s host.

Fig. 10 shows the thread events that are gener-ated as a consequence of a Buffer Overflow attackto an ftp server. As shown in Fig. 10, current IDSsusually provide the detected alerts as they are,

Figure 10 Detection results of FTP server exploit: thread event.


Figure 11 Results of correlation with FTP server exploit in correlator.

and they can’t represent the relation betweenthose alerts. For that reason, IDSs overwhelm hu-man operators with a large volume of alerts andmake it difficult to analyze the attack trends.

Our system, however, can analyze alerts andrepresent their relations by using our correlationmechanism as shown in Fig. 11. The various eventsshown in Fig. 11 are represented as a single metaevent because they have the same source and tar-get of the attack. More detailed information suchas low level events (or attacks) of the correlationevent is also provided.

Performance evaluation of correlator(Scenario #4)Scenario #4 is a multistep attack that exploits thevulnerabilities of an rpc service and consists offollowing steps.

� Step 1: host scan using NMAP (detect whetheran rpc service is running)� Step 2: execution of exploit code to attack the

vulnerabilities of rpc service� Step 3: system file access after acquiring the

privilege of root (Root shell)

Fig. 12 shows all the steps. An attacker usesNMAP to detect whether the rpc service is runningon the known host. With the result of NMAP, the at-tacker can determine whether the target host runsthe rpc service. The attacker then executes theexploitation to exploit the specific vulnerabilityof the rpc service. If the exploit succeeds, the at-tacker can gain the privilege of the system admin-istrator and may access the significant system file(that is, /etc/passwd).

Fig. 13 shows the thread events transferred tothe control center. Because these thread eventsare mingled with other thread events from themultiple sensors, it is difficult for human operatorto analyze the relation between those events.

However, our system can find out the timing andcausal relations between the mingled threadevents and successfully correlate them into a singlemeta event. As shown in Fig. 14, a series of attacksis correlated into a correlation event, and thename of the event is made up of the source anddestination of the attack.

Fig. 15 shows the attack scenario that our sys-tem finds out in real time, and the results corre-spond with our intended attack step. Moredetailed information such as a list of attack signa-tures is also provided.

Performance evaluation of situator(Scenario #6)To evaluate the performance of our situator, weclassified the attack scenario as 1:N, N:1, and N:M.In all cases (Scenarios #4, #5, and #6), we achievedthe intended results in real time, but here wedescribe only the results of Scenario #6.

Scenario #6 is a many-to-one attack such asa DDoS attack. A DDoS attack, which is usually

Figure 12 Buffer Overflow attack to RPC service.

180 S. Lee et al.

Figure 13 Detection results of RPC exploit: thread event.

carried out in order to interrupt the service pro-vision or normal operation of a specific host,causes a great deal of overhead in a managednetwork. Most IDSs, however, can’t detect such anattack in the early stage.

We emulated a DDoS attack with the aid of anICMP Flooder. As shown in Fig. 16, the ICMP Floodercontinuously transfers large packets to the targetof the attack.

Fig. 17 shows the thread events that were trans-ferred to the control center. The thread eventsgenerated by our emulated DDoS attack are theevents included in the rectangles. As you can seein the Count field of the table, the DDoS attackusually generates a large volume of alerts withina short period. Whenever such a large volume ofalerts is transferred to the control center withoutpreprocessing (that is, merging in the filter) as isthe case in most IDSs, human operators may beeasily overwhelmed and react inappropriately.Our system, however, can reduce the numerousalerts to a small volume that human operatorscan easily handle. For example, more than a thou-sand alerts can be merged into just 10 threadevents, as shown in Fig. 17.

Furthermore, in the case of alert flooding(i.e. when a large volume of alerts is generated inthe managed network), the attack count of eachthread event also shows the average and maximumreduction ratio of our system. The maximum

reduction ratio of the filter is 0.78%; or in otherwords, 129 alerts can be merged into a single threadevent.

In Fig. 17, we can see that various attacks oc-curred in the managed network and that 10 outof them are a similar type of attack. Our situatorcan detect such situations in the early stage ofthe attack and start the correlation process.Fig. 18 shows the final result of the correlation.

Numerous alerts were correlated into only onesituation event in real time, since they had thesame class and target of the attack. In addition,we can easily find out which attackers executea DDoS attack to the same target host and inspectthe detailed attack signatures. When the sametype of attacks continuously invade the sametarget host, our situator updates the matchingsituation event and increases its AlertCount.

Related work

Several alert aggregation and correlation techniques(Perrochon et al., 2000; Cuppens, 2001a,b; Cuppenset al., 2002; Debar and Wespi, 2001; Valdes and Skin-ner, 2001; Porras et al., 2002; Ning et al., 2002; Morinet al., 2002) have been proposed to facilitate theanalysis of intrusions. Based on their own way, theseapproaches tried to find the relationships betweenalerts and to generate the significant information.

Figure 14 Result of correlation with RPC service exploit in correlator: correlation event.


Figure 15 Correlation event: more detailed information.

Perrochon et al. (2000) used a predefined rule to cor-relate alerts and to find the attack scenarios. Cup-pens (2001a,b) and Cuppens et al. (2002) usedLambda language to specify attack scenarios andused Prolog predicates to correlate alerts based onIDMEF data model. In Debar and Wespi (2001), an ag-gregation and correlation component was built intoa Tivoli Enterprise Console. In Valdes and Skinner(2001), a probabilistic method was used to correlatealerts by using the similarity between their features.Porras et al. (2002) proposed a mission-impact-basedapproach to analyzing the security alerts producedby spatially distributed heterogeneous informationsecurity (INFOSEC) devices. They intended to pro-vide analysts with a powerful capability to automat-ically fuse together and isolate the INFOSEC alertsthat represent the greatest threat to the healthand security of their networks. Ning et al. (2002) de-veloped three utilities to facilitate the analysis oflarge sets of correlated alerts. In Morin et al.

(2002), a formal data model called M2D2 wasproposed in order to make full use of the available in-formation. The effectiveness of the proposed aggre-gation and correlation algorithms depends heavily onthe information provided by the individual IDS.

Conclusion and future work

In this paper, we propose a fast and efficientsystem for analyzing intrusion detection alerts.By analyzing and correlating a large volume ofalerts with respect to feature similarity, oursystem can produce meaningful information thatmay be used in timely decisions and properresponses. Several properties distinguish our sys-tem from other systems: two-phase reduction ofalerts in the filter and aggregator, the time similar-ity function, the situator, and the feed-back mech-anism for the attack class similarity matrix in the

Figure 16 Execution of DDoS attack to target host.

182 S. Lee et al.

Figure 17 Detection results of DDoS attack: thread event.

correlator. The two-phase reduction of numerousalerts drastically reduces the management over-head and simplifies the analysis and correlation.The time similarity function is more systematicallydefined than other systems, and it produces moreaccurate results in real situations. The situator en-ables us to detect large-scale attacks such as DDoS

or worm in the early stage. This ability is absent inother systems. The feed-back mechanism for theattack class similarity matrix that we conceptuallydescribed is still under construction.

To better evaluate the performance of oursystem, we plan to use more attack scenarios.Moreover, in order to make a more flexible system

Figure 18 Detection result of DDoS attack in situator: situation event.


that can correlate heterogeneous sensors, we planto introduce a host-based IDS and other NIDS.

Acknowledgement

This work was supported by the Korea Science andEngineering Foundation (KOSEF) through the ad-vanced Information Technology Research Center(AITrc) and University IT Research Center Project.

References

Bloedorn E, Christiansen AD, Hill W, Skorupka C, Talbot LM,Tivel J. Data mining for network intrusion detection: howto get started. MITRE Technical Report; August 2001.

Browne H, Arbaugh W, McHugh J, Fithen W. A trend analysis ofexploitations. In: Proceedings of the 2001 IEEE symposium onsecurity and privacy; May 2001. p. 214e29.

Bugtraq. Security focus online, <http://online.securityfocus.com/archive/1>.

CERT Coordination Center. Cert/CC advisories Carneige Melon.Software Engineering Institute. Online, <http://www.cert.org/advisories/>.

Cuppens F. Cooperative intrusion detection. In: Internationalsymposium ‘‘Information Superiority: Tools for Crisis andConflict-Management’’. Paris, France; September 2001a.

Cuppens F. Managing alerts in a multi intrusion detection envi-ronment. In: 17th annual computer security applicationsconference (ACSAC). New Orleans; December 2001b.

Cuppens F, Autrel F, Miege A, Benferhat S. Correlation in an in-trusion detection process. In: Internet security communica-tion workshop (SECI02). Tunis-Tunisia; September 2002.

Debar H, Wespi A. Aggregation and correlation of intrusion-detection alerts. In: Proceedings of 2001 international work-shop on recent advances in intrusion detection. Davis, CA;October 2001.

Kendall K. A database of computer attacks for the evaluation ofintrusion detection systems. Master’s thesis. MassachusettsInstitute of Technology; June 1999.

Lee W. A framework for constructing features and models for in-trusion detection system. PhD thesis. Columbia University;June 1999.

Lee W, Nimbalkar RA, Yee KK, Patil SB, Desai PH, Tran TT, et al.A data mining and CIDF-based approach for detecting noveland distributed intrusions. In: Proceedings of 2000 interna-tional workshop on recent advances in intrusion detection(RAID’00). Toulouse, France; October 2000.

Morin B, Me L, Debar H, Ducasse M. M2D2: a formal data model forIDS alert correlation. In: Proceedings of the fifth internationalsymposium on recent advances in intrusion detection(RAID’02). In: LNCS 2516. Zurich, Switzerland; October 16e18,2002. p. 115e37.

Ning P, Cui Y, Reeves DS. Analyzing intensive intrusion alertsvia correlation. In: Proceedings of the fifth internationalsymposium on recent advances in intrusion detection(RAID’02). In: LNCS 2516. Zurich, Switzerland; October2002. p. 74e94.

NMAP network mapping tool, <http://www.insecure.org/nmap/>.

Perrochon L, Jang E, Luckham DC. Enlisting event patterns forcyber battlefield awareness. In: DARPA information surviv-ability conference and exposition (DISCEX’00). HiltonHead, South Carolina; January 2000.

Porras P, Fong M, Valdes A. A mission impact-based approach toINFOSEC alarm correlation. In: Fifth international workshopon the recent advances in intrusion detection. Zurich,Switzerland; October 2002.

Valdes A, Skinner K. Probabilistic alert correlation. In: Fourthinternational workshop on the recent advances in intrusiondetection. Davis, USA; October 2001.

Soojin Lee received the B.S. degree in Computer Science fromKorea Military Academy in 1992. He also received the M.S. ofComputer Science from Yonsei University, South Korea, in1996. He is currently working toward the Ph.D. degree at theDivision of Computer Science, Korea Advanced Institute ofScience and Technology (KAIST). His research interest includesad-hoc and sensor network, cryptography and computer security,especially in intrusion detection system.

Byungchun Chung received the B.E. degree in Information andComputer Engineering from Sungkyunkwan University, SouthKorea, in 1998. He also received the M.S. degree in ComputerScience from Korea Advanced Institute of Science and Technol-ogy (KAIST) in 2001. He is currently working toward the Ph.D.degree at the Division of Computer Science, KAIST. His researchinterest includes cryptography and computer security, especiallyin elliptic curve cryptography.

Heeyoul Kim received the B.E. degree in the Division of Com-puter Science from Korea Advanced Institute of Science andTechnology (KAIST), South Korea, in 2000, the M.S. degree inComputer Science from KAIST, in 2002. He is currently workingtoward the Ph.D. degree at the Division of Computer Science,KAIST. His research interest includes cryptography and comput-er security, especially in secure group communication.

Yunho Lee received the B.E. degree in the Division of ComputerScience from Korea Advanced Institute of Science and Technol-ogy (KAIST), South Korea, in 2000, the M.S. degree in ComputerScience from KAIST, in 2002. He is currently working toward thePh.D. degree at the Division of Computer Science, KAIST. Hisresearch interest includes cryptography and computer security,especially in digital signature.

Chanil Park received the B.S. degree in Mathematics from InhaUniversity, South Korea, in 1999. He also received the M.S. de-gree in Mathematics from Korea Advanced Institute of Scienceand Technology (KAIST) in 2001. He is currently working towardthe Ph.D. degree at the Division of Computer Science, KAIST. Hisresearch interest includes cryptography and computer security,especially in authentication.

Hyunsoo Yoon received the B.E. degree in electronics engineer-ing from Seoul National University, South Korea, in 1979, theM.S. degree in Computer Science from Korea Advanced Instituteof Science and Technology (KAIST) in 1981, and the Ph.D. de-gree in computer and information science from the Ohio StateUniversity, Columbus, Ohio, in 1988. From 1988 to 1989, hewas a member of technical staff at AT& T Bell Labs. Since1989 he has been a faculty member of Division of ComputerScience at KAIST. His main research interest includes wirelesssensor networks, 4G networks, and network security.

http://online.securityfocus.com/archive/1

http://online.securityfocus.com/archive/1

http://www.cert.org/advisories/

http://www.cert.org/advisories/

http://www.insecure.org/nmap/

http://www.insecure.org/nmap/



A novel remote user authentication schemeusing bilinear pairings

Manik Lal Das a,b,*, Ashutosh Saxena a, Ved P. Gulati a,Deepak B. Phatak b

a Institute for Development and Research in Banking Technology, Castle Hills, Road Number 1,Masab Tank, Hyderabad-500057, Indiab K. R. School of Information Technology, Indian Institute of Technology, Mumbai-400076, India

Received 20 January 2005; revised 16 August 2005; accepted 23 September 2005

KEYWORDSAuthentication;Bilinear pairings;Smart card;Password;Timestamp

Abstract The paper presents a remote user authentication scheme using theproperties of bilinear pairings. In the scheme, the remote system receives userlogin request and allows login to the remote system if the login request is valid.The scheme prohibits the scenario of many logged in users with the same login-ID, and provides a flexible password change option to the registered users withoutany assistance from the remote system.ª 2005 Elsevier Ltd. All rights reserved.

Introduction

Password authentication is an important techniqueto verify the legitimacy of a user. The technique isregarded as one of the most convenient methodsfor remote user authentication. Based on thecomputation complexity, password-based authen-tication schemes are classified into two broad

* Corresponding author. Institute for Development andResearch in Banking Technology, Castle Hills, Road Number 1,Masab Tank, Hyderabad-500057, India. Tel.: C91 40 2353 4981;fax: C91 40 2353 5157.

E-mail addresses: [email protected], [email protected] (M.L. Das), [email protected] (A. Saxena), [email protected](V.P. Gulati), [email protected] (D.B. Phatak).

0167-4048/$ - see front matter ª 2005 Elsevier Ltd. All rights resdoi:10.1016/j.cose.2005.09.002

categories, viz. hash-based (Menezes et al.,1996) authentication and public-key based authen-tication (IEEE P1363.2 Draft D12, 2003).

In 1981, Lamport introduced the first well-knownhash-based password authentication scheme.Lamport’s scheme suffers from high hash overheadand password resetting problems. Later, Shimizuet al. (1998) overcome the weakness of Lamport(1981) and proposed a modified scheme. Thereaf-ter, many schemes and improvements (Lee et al.,2002; Peyravian and Zunic, 2000; Ku et al., 2003;Ku, 2004) on hash-based remote user authentica-tion, have been proposed. These schemes takelow computation cost and are computationallyviable for implementation in a handheld devicelike smart card; however, the schemes primarilysuffer from password guessing, stolen-verifier and

erved.







A novel remote user authentication scheme 185

denial-of-service attacks (Ku et al., 2003; Hsiehet al., 2003). In contrast, public-key based authen-tication schemes require high computation cost forimplementation, but meet higher security require-ments. So far, several research works on public-keybased remote user authentication (Chang and Wu,1993; Chang and Liao, 1994; Hwang and Yeh,2002; Shen et al., 2003) have been done. Unfortu-nately, many times, a paper typically breaks a pre-vious scheme and proposes a new one (Ku et al.,2003; Hsieh et al., 2003), which someone breakslater and, in turn, proposes a new one, and so on.Most of such work, though quite important and use-ful, essentially provides an incremental advance tothe same basic theme (Peyravian and Zunic, 2000).

Recently, the bilinear pairings (Boneh andFranklin, 2001), namely the Weil pairing and theTate pairing of algebraic curves have been foundas important applications (Boneh and Franklin,2001; Hess, 2003) in cryptography and allowed usto construct identity (ID) based cryptographicschemes. In 1984, Shamir introduced the conceptof ID-based cryptosystem; however, the practicalID-based schemes (Boneh and Franklin, 2001;Cocks, 2001) were found in 2001.

In this paper, we present a remote user authen-tication scheme using the properties of bilinearpairings. In our scheme, the user is assigneda smart card, which is being personalized bysome parameters during the user registration pro-cess. The use of smart card not only makes thescheme secure but also prevents the users fromdistribution of their login-IDs, which effectivelyprevents the scenario of many logged in users withthe same login-ID. The characteristics of ourscheme are summarised as follows:

- The user’s smart card generates a dynamic loginrequest and sends it to the remote system forlogin to the system. The login request is com-puted by the smart card internally without anyhuman intervention and the login request iscomposed by the user system’s timestamp.Thus, an adversary cannot predict the next loginrequest with the help of current login request.

- The users can choose and change their preferredpasswords freely without any assistance fromthe remote system. During the user registrationprocess, the remote system stores a secret com-ponent and other parameters in a smart card,and then sends it to the user securely. With thehelp of the smart card and its secret componentthe user can change his password without anyassistance from remote system.

- The remote system does not maintain any pass-word or verifier table for the verification of

user login request. The login request verifica-tion requires user identity, remote system pub-lic-key corresponding to the remote system’ssecret key.

- The scheme prevents the scenario of manylogged in users with the same login-ID. Typical-ly, a registered user can share his password orsecret component with others, thus all whoknow the password or secret component withrespect to the user’s login-ID, can login to theremote system. This generally happens in digi-tal library, where a subscriber can share hislogin-ID and password with others, and manyusers (who knows login-ID and password) candownload or view the digital document. Inour scheme, the login request is generated bythe smart card using its stored secret compo-nent without any human intervention. It isextremely difficult to extract the secretcomponent from the smart card, and thus theuser cannot share it with others. Even if thelegitimate user’s password is shared withothers, the other person cannot login to thesystem without the smart card. Once a validuser logs into the remote system, his smartcard will be inside the terminal until the userlogs out. If the user pulls out the card fromthe terminal after login the remote system,the login session will be immediately expired.Thus, the scheme can successfully preventthe scenario of many logged in users with thesame login-ID.

- The scheme can resist the replay, forgery andinsider attacks.

The rest of the paper is organised as follows. Inthe next section, we give some preliminaries ofbilinear pairings. In the section following that, wepropose our scheme and analyse the scheme inSection Correctness, performance and security.Finally we conclude the paper in last section.

Preliminaries

Bilinear pairings

Suppose G1 is an additive cyclic group generatedby P, whose order is a prime q, and G2 is a multipli-cative cyclic group of the same order. A mape : G1!G1/G2 is called a bilinear mapping if itsatisfies the following properties:

1. Bilinear: eðaP; bQ ÞZeðP; QÞab for all P, Q ˛ G1

and a, b ˛Z�q:

186 M.L. Das et al.

2. Non-degenerate: there exist P, Q ˛ G1 such thateðP;QÞs1:

3. Computable: there is an efficient algorithm tocompute eðP; Q Þ for all P, Q ˛ G1.

We note that G1 is the group of points on anelliptic curve and G2 is a multiplicative subgroupof a finite field. Typically, the mapping e will bederived from either the Weil or the Tate pairingon an elliptic curve over a finite field.

Mathematical problems

Definition 1. (Discrete Logarithm Problem(DLP)). Given Q, R ˛ G1, find an integer x ˛Z)

q

such that R Z xQ.

The MOV and FR reductions: Menezes et al.(1993) and Frey and Ruck (1994) show a reductionfrom the DLP in G1 to the DLP in G2. The reductionis: Given an instance Q, R ˛ G1, where Q is a pointof order q, find x˛Z)

q , such that R Z xQ. Let T be anelement of G1 such that gZeðT ;Q Þ has order q, andlet hZeðT ;RÞ. Using bilinear property of e, we haveeðT ;RÞZeðT ;QÞx: Thus, DLP in G1 is no harder thanthe DLP in G2.

Definition 2. (Computational DiffieeHellmanProblem (CDHP)). Given (P, aP, bP) for a,b˛Z)

q ; compute abP.

The advantage of any probabilistic polynomialetime algorithm A in solving CDHP in G1, is definedas AdvCDH

A;G1ZProb½AðP; aP;bP; abPÞZ1 : a;b˛Z)

q �.For every probabilistic algorithm A, AdvCDH

A;G1is

negligible.

Proposed scheme

There are three entities in the proposed scheme,namely the user, user’s smart card and the remotesystem. The scheme consists of mainly threephases e the setup phase, the registration phaseand the authentication phase.

Setup phase

Suppose G1 is an additive cyclic group of orderprime q, and G2 is a multiplicative cyclic group ofthe same order. Suppose P is a generator of G1,e : G1!G1˛G2 is a bilinear mapping and H: {0,

1}) / G1 is a cryptographic hash function. Theremote system (we call it as RS in the rest ofthe paper) selects a secret key s and computesthe public-key as PubRS Z sP. Then, the RS pub-lishes the system parameters CG1, G2, e, q, P,PubRS, HD and keeps s secret.

Registration phase

This phase is executed by the following steps whena new user wants to register with the RS.

R1. Suppose a new user Ui wants to register withthe RS.

R2. Ui submits his identity IDi and password PWi tothe RS.

R3. On receiving the registration request, the RScomputes RegIDi

Z s$H(IDi) C H(PWi).R4. The RS personalizes a smart card with the

parameters IDi, RegIDi, H($) and sends the

smart card to Ui over a secure channel.

Authentication phase

This phase is executed every time whenever a userlogs into the RS. The phase is further divided intothe login and verification phases. In the loginphase, user sends a login request to the RS. Thelogin request comprises with a dynamic coupon,called DID, which is dependent on the user’s ID,password and RS’s secret key. The RS allows theuser to access the system only after successfulverification of the login request.

Login phaseThe user Ui inserts the smart card in a terminal andkeys IDi and PWi. If IDi is identical to the one that isstored in the smart card, the smart card performsthe following operations:

L1. Computes DIDi Z T$RegIDi, where T is the user

system’s timestamp.L2. Computes Vi Z T$H(PWi).L3. Sends the login request CIDi, DIDi, Vi, TD to the

RS over a public channel.

Verification phaseLet the RS receives the login message CIDi, DIDi, Vi,TD at time T) (RT ). The RS performs the followingoperations to verify the login request:

V1. Verifies the validity of the time interval be-tween T) and T. If (T)�T ) % DT, the RS


proceeds to the step (V2), where DT denotesthe expected valid time interval for trans-mission delay. Otherwise, rejects the loginrequest. We note that at the time of registra-tion, the user and the RS have agreed on theaccepted value of the transmission delay DT.

V2. Checks whether eðDIDi � Vi; PÞZeðHðIDiÞ;PubRSÞT: If it holds, the RS accepts the loginrequest; otherwise, rejects it.

Password change phase

This phase is invoked whenever a user Ui wants tochange his password. By invoking this phase, Ui caneasily change his password without taking any as-sistance from the RS. The phase works as follows:

P1. Ui attaches the smart card to a terminal andkeys IDi and PWi. If IDi is identical to the onethat is stored in the smart card, proceeds tothe step (P2); otherwise, terminates theoperation.

P2. Ui submits a new password PWi).

P3. The smart card computes RegIDi

) Z RegIDi�

H(PWi) C H(PWi)) Z s$H(IDi) C H(PWi

)).P4. The password has been changed now with the

new password PWi) and the smart card re-

placed the previously stored RegIDivalue by

RegIDi

) value.

Correctness, performance and security

Correctness

The verification step (V2) of a login request isverified by the following:

eðDIDi � Vi; PÞZe�T$RegIDi

� Vi; P�

Ze��

T�s$HðIDiÞCHðPWiÞ

�� T$HðPWiÞ

�; P�

Zeðs$HðIDiÞ; PÞT�as eðaP; QÞZ eðP; QÞa; bilinearity of e

�

ZeðHðIDiÞ; PubRSÞT�as eðbP; QÞZeðP;bQÞ and PubRSZsP

�

Performance

In order to compare the performance of ourscheme with the existing public-key based remoteuser authentication schemes, we consider theschemes (Chang and Liao, 1994; Shen et al.,2003) which are based on ElGamal’s (1985)

signature scheme and used smart cards. The smartcard personalization cost for the registration pro-cess of our scheme is as per the schemes in (Changand Liao, 1994; Shen et al., 2003). The login phasein (Chang and Liao, 1994; Shen et al., 2003) re-quires four discrete logarithm operations, onescalar multiplication and one hash computation;whereas the verification phase requires two dis-crete logarithm operations, one scalar multiplica-tion, one hash computation and one inverseoperation. Our scheme needs two scalar multipli-cations of elliptic curve point and one hash topoint operation in the login phase; whereas two bi-linear pairing operations, one scalar multiplicationof curve point, one point addition and one hash topoint operation in the verification phase. As thepairing operation is costly (Barreto et al., 2002),so the verification phase of our scheme takeshigh computation cost compared to the verifica-tion phase in (Chang and Liao, 1994; Shen et al.,2003). However, the verification process is doneby the RS with large computation system, therebythe computation cost of the verification processis not a constraint. The computation cost at theuser’s system (e.g., smart card) is a crucial issueand the login phase of our scheme is efficientthan the login phase in (Chang and Liao, 1994;Shen et al., 2003). Furthermore, our schemeclaims the following characteristics:

Claim 1. The scheme prevents the scenario ofmany logged in users with the same login-ID.

Typically, a valid user can share his password orsecret component with others, thus all who knowthe password or secret component correspondingto the user’s login-ID, can login to the RS. Forexample, in a digital library, a subscriber can sharehis login-ID and password with others. Now,the users who know login-ID and password of thegenuine subscriber can download or view theinformation. In our proposed scheme, the loginrequest is generated by the smart card using itsstored secret component without any human in-tervention. The secret component cannot beextracted from smart card and thus, cannot beshared with others. Even if the legitimate user’spassword is shared, the other person cannot loginto the RS without the smart card. That is, who ishaving the smart card and knowing the validpassword, can only login to the RS. It is notedthat the smart card will be inside the terminaluntil the user logs out. If the user pulls out the cardfrom the terminal after login the RS, the loginsession will be immediately expired. Thus, the

188 M.L. Das et al.

proposed scheme can successfully prevent thescenario of many logged in users with the samelogin-ID.

Claim 2. The scheme provides a user-friendlypassword change option to the user without anyassistance from RS.

The user can choose and change his preferredpassword freely without any assistance from theRS. The user is given a smart card at the time ofuser registration process, where the smart card ispersonalized with a secret component and someother parameters. With the help of the secretcomponent the user can change his passwordwithout any assistance of RS. This avoids the RS’sburden and reduces communication cost for thepassword change protocol.

Claim 3. The RS does not maintain any passwordor verifier table for user login requestverification.

In our scheme, the RS does not maintain anypassword or verifier table for the verification ofuser login request. The user login request isverified by the user identity and RS public-keycorresponding to the RS’s secret key.

Security

Here, we show that the proposed scheme canwithstand the following attacks.

Replay attackSuppose an adversary replays an intercepted validlogin request and the RS receives the request attime Tnew. The attack cannot work because it failsthe step (V1) of the verification phase as the timeinterval (Tnew� T ) exceeds the expected trans-mission delay DT.

Forgery attackA valid user login message consists of IDi, DIDi, Vi

and T, where DIDi Z T$RegIDiand Vi Z T$H(PWi).

The RegIDiis stored in smart card by the RS at the

time of Ui registration process and it is extremelydifficult to extract RegIDi

from the smart card. Anadversary cannot construct a valid RegIDi

(Zs$H(IDi) C H(PWi)) without the knowledge of RS’ssecret key s and user’s password. If an adversaryintercepts a valid login message CIDi, DIDi, Vi, TD,he cannot resend it later, but the timestamp willbe different in the next time and it fails the step

(V1) of the verification phase. If a valid smartcard is stolen, the unauthorized user cannot loginto the RS because he does not know the passwordof the card owner. Furthermore, on interceptinga valid login request CIDi, DIDi, Vi, TD, an adversarycan get s$T$H(IDi) by calculating DIDi� Vi. Usings$T$H(IDi), the adversary can try the followings:

(i) Given CTH(IDi), s$TH(IDi)D; Compute s.To compute s from CTH(IDi), s$TH(IDi)D is

a hard problem and it is equivalent to solvethe discrete logarithm problem (Definition 1 inSection Preliminaries).(ii) Given Cs$TH(IDi), T#$TH(IDi), TH(IDi)D; Com-pute s$T#$TH(IDi).

The adversary can choose a new timestampT#, and can try to generate a valid DIDi#� Vi#,that is, s$T#$TH(IDi). However, to computes$T#$TH(IDi) from Cs$TH(IDi), T#$TH(IDi), TH(IDi)Dis a Computational DiffieeHellman Problem(Definition 2 in Section Preliminaries).

Therefore, the adversary cannot forge a validlogin request with the help of s$T$H(IDi).

Insider attackIn many scenarios, the user uses a commonpassword to access several systems for his conve-nience. If the user login request is password-basedand the RS maintains password or verifier table forlogin request verification, an insider of RS couldimpersonate user’s login by stealing password andgets access of the other systems. In our scheme,the user login request is based on the user’spassword as well as an RS’s secret key. The RSdoes not maintain any password or verifier table,thus an insider cannot get the user password.Though, the user submits his password to the RSduring the registration process, he can change hispassword on invoking the password change phaseafter registration, thereby the scheme can with-stand the insider attack.

Conclusion

We proposed a remote user authentication schemeusing the properties of bilinear pairings. Thescheme prevents the adversary from forgery at-tacks by employing a dynamic login request inevery login session. The use of smart card notonly makes the scheme secure but also preventsthe users from distribution of their login-IDs, whicheffectively prohibits the scenario of many logged inusers with the same login-ID. Moreover, the schemeprovides a flexible password change option, where


the users can change their passwords any timewithout any assistance from the remote system.

References

Barreto PSLM, Kim HY, Lynn B, Scott M. Efficient algorithmsfor pairing-based cryptosystems. In: Advances in crypto-logy e Crypto’02, LNCS, vol. 2442. Springer-Verlag; 2002.p. 354e68.

Boneh D, Franklin M. Identity-based encryption from the Weilpairing. In: Advances in cryptology e Crypto’01, LNCS, vol.2139. Springer-Verlag; 2001. p. 213e29.

Chang CC, Wu TC. Remote password authentication with smartcards. IEE Proceedings e E 1993;138(3):165e8.

Chang CC, Liao WY. A remote password authentication schemebased upon ElGamal’s signature scheme. Computers & Secu-rity 1994;13(2):137e44.

Cocks C. An identity based encryption scheme based on qua-dratic residues. In: Cryptography and coding, LNCS, vol.2260. Springer-Verlag; 2001. p. 360e3.

ElGamal T. A public key cryptosystem and signature schemebased on the discrete logarithms. IEEE Transaction on Infor-mation Theory 1985;31(4):469e72.

Frey G, Ruck H. A remark concerning m-divisibility and thediscrete logarithm in the divisor class group of curves.Mathematics of Computation 1994;62:865e74.

Hess F. Efficient identity based signature schemes based onpairings. In: Selected areas in cryptography’02, LNCS, vol.2595. Springer-Verlag; 2003. p. 310e24.

Hsieh BT, Sun HM, Hwang T. On the security of some passwordauthentication protocols. Informatica 2003;14(2):195e204.

Hwang JJ, Yeh TC. Improvement on PeyravianeZunic’s pass-word authentication schemes. IEICE Transactions on Com-munications 2002;E85-B(4):823e5.

IEEE P1363.2 draft D12: standard specifications for password-based public key cryptographic techniques. IEEE P1363working group; 2003.

Ku WC, Chen CM, Lee HL. Weaknesses of LeeeLieHwang’s hash-based password authentication scheme. ACM OperatingSystems Review 2003;37(4):9e25.

Ku WC. A hash-based strong-password authentication schemewithout using smart cards. ACM Operating Systems Review2004;38(1):29e34.

Lamport L. Password authentication with insecure communica-tion. Communications of the ACM 1981;24(11):770e2.

Lee CC, Li LH, Hwang MS. A remote user authentication schemeusing hash functions. ACM Operating Systems Review 2002;36(4):23e9.

Menezes A, Okamoto T, Vanstone S. Reducing elliptic curvelogarithms to logarithms in a finite field. IEEE Transactionson Information Theory 1993;39:1639e46.

Menezes A, van Oorschot PC, Vanstone S. Handbook of appliedcryptography. CRC Press; 1996.

Peyravian M, Zunic N. Methods for protecting password trans-mission. Computers & Security 2000;19(5):466e9.

Shamir A. Identity-based cryptosystems and signature schemes.In: Advances in cryptology e Crypto’84, LNCS, vol. 196.Springer-Verlag; 1984. p. 47e53.

Shen JJ, Lin CW, Hwang MS. A modified remote user authentica-tion scheme using smart cards. IEEE Transactions onConsumer Electronics 2003;49(2):414e6.

Shimizu A, Horioka T, Inagaki H. A password authenticationmethods for contents communication on the Internet. IEICETransactions on Communications 1998;E81-B(8):1666e73.

Manik Lal Das received his M. Tech.degree in 1998. He is working in Insti-tute for Development and Research inBanking Technology, Hyderabad as Re-search Officer and pursuing his Ph.D.degree in K. R. School of InformationTechnology, Indian Institute of Tech-nology, Bombay, India. He has pub-lished over 15 research articles inrefereed Journal Conferences. He isa member of Cryptology Research So-

ciety of India and Indian Society for Technical Education. His re-search interests include Cryptography and Information Security.

Ashutosh Saxena received his M.Sc.(1990), M. Tech. (1992) and Ph.D. inComputer Science (1999) from DeviAhilya University, Indore. Presently,he is working as Associate Professorin Institute for Development and Re-search in Banking Technology, Hydera-bad. He is on the Editorial Committeesof various International Journals andConferences, and is a Life Member ofComputer Society of India and Cryp-

tology Research Society of India and Member of IEEE ComputerSociety. He has authored and co-authored more than 50 re-search paper published in National/International Journals andConferences. His main research interest is in the areas of Au-thentication Technologies, Smart Cards, Key Management andSecurity Issues in Banking.

Ved P. Gulati received his Ph.D. de-gree from Indian Institute of Technol-ogy, Kanpur, India. Presently, he isa consultant advisor in Tata Consul-tancy Services, Hyderabad, India. Hewas Director of Institute for Develop-ment and Research in Banking Tech-nology, Hyderabad, India from 1997to 2004. He is a member of IEEE, Cryp-tology Research Society of India andComputer Society of India. His re-

search Interests include Payment Systems, Security Technolo-gies, and Financial Networks.

Deepak B. Phatak received his Ph.D.degree from Indian Institute of Tech-nology, Bombay, India. He is SubraoM. Nilekani Chair Professor with K. R.School of Information Technology, In-dian Institute of Technology Bombay,India. His research interests includeData Bases, System performance eval-uation, Smart Cards and InformationSystems.



A novel approach for computer securityeducation using Minix instructional operatingsystem*

Wenliang Du*, Mingdong Shang, Haizhi Xu

Department of Electrical Engineering and Computer Science, 3-114 Center for Science and Technology,Syracuse University, Syracuse, NY 13244, USA

Received 8 December 2004; revised 23 September 2005; accepted 23 September 2005

KEYWORDSComputer security;Education;Courseware;Laboratory projects;Minix

Abstract To address national needs for computer security education, many uni-versities have incorporated computer and security courses into their undergraduateand graduate curricula. In these courses, students learn how to design, implement,analyze, test, and operate a system or a network to achieve security. Pedagogicalresearch has shown that effective laboratory exercises are critically important tothe success of these types of courses. However, such effective laboratories donot exist in computer security education.

Intrigued by the successful practice in operating system and network courseseducation, we adopted a similar practice, i.e., building our laboratories based onan instructional operating system. We use Minix operating system as the lab basis,and in each lab we require students to add a different security mechanism to thesystem. Benefited from the instructional operating system, we design our lab exer-cises in a way such that students can focus on one or a few specific security con-cepts while doing each exercise. The similar approach has proved to be effectivein teaching operating system and network courses, but it has not yet been usedin teaching computer security courses.ª 2005 Elsevier Ltd. All rights reserved.

* The project is supported by grant DUE-0231122 from theNational Science Foundation and by fundings from CASE center.* Corresponding author. Tel.: þ1 315 443 9180; fax:þ1 315 443

1122.E-mail addresses: [email protected] (W. Du), mshang@ecs.

syr.edu (M. Shang), [email protected] (H. Xu).

0167-4048/$ - see front matter ª 2005 Elsevier Ltd. All rights resedoi:10.1016/j.cose.2005.09.011

Introduction

The high priority that information security educa-tion warrants has been recognized since early1990s. In 2001, Eugene Spafford, director of theCenter for Education and Research in InformationAssurance and Security (CERIAS) at Purdue Univer-sity, testified before Congress that ‘‘to ensure safe

rved.






A novel approach for computer security education 191

computing, the security (and other desirable prop-erties) must be designed in from the start. To dothat, we need to be sure all of our studentsunderstand the many concerns of security, privacy,integrity, and reliability’’ (Spafford, 1997).

To address these needs, many universities haveincorporated computer and information securitycourses into their undergraduate and graduatecurricula. In many curricula, computer securityand network security are two core courses. Thesecourses teach students how to design, implement,analyze, test, and operate a system or a networkwith the goal of making it secure. Pedagogicalresearch has shown that students’ learning isenhanced if they can engage in a significantamount of hands-on exercises. Therefore, effec-tive laboratory exercises (or course projects) arecritically important to the success of computersecurity education.

Traditional courses, such as operating systems,compilers, and networking, have effective labora-tory exercises, as the result of 20 years’ matura-tion. In contrast, laboratory designs in securityeducation courses are still embryonic. A variety ofapproaches are currently used; three of the mostfrequently used designs are the following: (1) thefree-style approach, i.e., instructors allow stu-dents to pick any security-related topic they areinterested in for the course projects; (2) the dedi-cated computing environment approach, i.e., stu-dents conduct security implementation, analysisand testing (Hill et al., 2001; Mayo and Kearns,1999) in a contained environment; and (3) thebuild-it-from-scratch approach, i.e., studentsbuild a secure system from scratch (Mitchenerand Vahdat, 2001).

Free-style design projects are effective forcreative students; however, most students becomefrustrated with this strategy because of the diffi-culty in finding an interesting topic. With thededicated environment approach, projects canbe very interesting, with the logistical burdens ofthe laboratory e obtaining, setting up, and man-aging the computing environment. In addition,course size is constrained by the size of thededicated environment. The third design approachrequires students to spend considerable amount oftime on activities that are irrelevant to computersecurity education but are essential for a meaning-ful and functional system.

The lack of an effective and efficient laboratoryfor security courses motivated us to considerpractices adopted by the traditional maturecourses, e.g., operating systems (OS) and com-pilers. In OS courses, a widely adopted successfulpractice is using an instructional OS (e.g., MINIX

(Tanenbaum, 1996), NACHOS (Christopher et al.,1993), and XINU (Comer, 1984)) as a frameworkand ask students to write significant portions ofeach major piece of a modern OS. The compilerand network courses adopted a similar approach.Inspired by the success of the instructional OSstrategy, we adapt it to our computer securitycourses. Specifically, we provide students witha system as the framework, and then ask them toimplement significant portions of each fundamen-tal security-relevant functionality for a system.Although there are a number of instructional sys-tems for OS courses, to our knowledge, this ap-proach has not yet been applied to computer andinformation security courses.

Our goal is to develop a courseware system,serving as an experimental platform and frame-work for computer security courses. The course-ware is not designed to create new securitymechanisms, but to let students practice existingsecurity work. The courseware contains a set ofwell defined and documented projects for helpingstudents focus on (1) grasping security concepts,principles, and technologies; (2) practicing designand implementation of security mechanisms andpolicies; and (3) analyzing and testing a systemfor its security properties.

We chose Minix as our base system, and havedesigned a number of laboratory assignments onit. These assignments cover topics ranging fromthe design and implementation of security mecha-nisms to the analysis and testing of a system for se-curity purpose. Each assignment can be consideredas adding/modifying security mechanisms toMinix. To finish each task, students just need tofocus on those security mechanisms, with mini-mum effort on other parts of the system. Forexample, while learning discretionary access con-trol (DAC), we give students a file system withoutDAC mechanisms; students only need to designand implement DAC for this existing file system.Students can immediately see how their DACimplementation affects the system. This strategyhelps students to stay focused on securityconcepts.

Our course projects consist of two parts. Onepart focuses on design and implementation. Thispart of the projects requires students to add newsecurity mechanisms to the underlying Minix sys-tem to enhance its security. The security mecha-nisms students need to implement include accesscontrol, capability, sandbox, and encrypted filesystems. In the second part of our projects, wegave students a modified Minix system that con-tains a number of injected vulnerabilities. Stu-dents need to use their skills learned from the

192 W. Du et al.

lectures to identify, exploit, and fix thosevulnerabilities.

Our approach is open-ended, i.e., we can addmore laboratory projects to this framework with-out affecting others. The projects presented inthis paper are the result of 3 years’ maturation,with more components added in each year. We arealso planning to design a number of networksecurity projects for Minix based on the Minix’sexisting networking functionality.

The paper is organized as follows: the nextsection briefly describes our computer securitycourse. Then the design of our courseware isdescribed which is followed by description ofeach of our laboratory projects. Further theexperiences and lessons we have gained duringour 3-year practice are presented. Finally, the lastsection concludes the paper and describes thefuture work.

The computer security course

Scope of the course

Our department offers two graduate courses insecurity: one is computer security, and the otheris network security. The computer security coursefocuses on the concepts, principles, and techniquesfor system security, such as encryption algorithms,authentication, access control, privilege, vulner-abilities, system protection, etc. Currently, ourproposed approach only targets at the computersecurity course, but we plan to extend this approachto the network security course in our futurework.

Pedagogical approach

Lecturing on theories, principles and techniques ofcomputer security is not enough for students tounderstand system security. Students must be ableto put what they have learned into use. We use the‘‘learning by doing’’ approach. It was shown inother studies that this type of active learningapproach has a higher chance of having a lastingeffect on students than letting students passivelylisten to lectures without reinforcement (Meyersand Jones, 1993).

More specifically, we try to use the Minix OS asour base system to develop assignments that cangive students hands-on experience with those the-ories taught in class. For example, when teachingSet-UID concept of Unix, we developed an assign-ment for students to play with this security

mechanism, figure out why it is needed, andunderstand how it is implemented.

We have developed two types of assignments:small assignments and comprehensive assign-ments. Each small assignment focuses on onespecific concept, such as Set-UID and access con-trol. These assignments are usually small; they donot need much programming, and take only 1 or 2weeks; therefore, we can have several small proj-ects to cover a variety of concepts in system secu-rity. However, being able to deal with eachindividual concept is not enough, students needto learn how to put them together. We have devel-oped comprehensive assignments, which covera number of concepts in one assignment. Theyare ideal candidates for final projects.

Course prerequisites

Because this course focuses on system security, werequire students to have appropriate system back-ground. Students taking the course are expectedto have taken the graduate-level operating sys-tems. They should be proficient in C programming.

Design of course projects

The goal of our projects is to provide a set ofexercises for students to practice their securitydesign, implementation, analysis, testing, and op-eration skills. Using the Minix instructional operat-ing system, we designed two classes of projects,one focusing on design and implementation of se-curity mechanisms, and the other focusing on se-curity analysis and testing. The overview of ourprojects is depicted in Fig. 1.

Design and implementation. In our computersecurity class, we aim at covering a number ofimportant security mechanisms, such as Privilege,Authentication, Access Control, Capability, andSandboxing. We expect students to have first-handexperience on most of them during one semesterperiod. However, asking students to implementa system with all of these mechanisms from scratchsounds infeasible. Using an instructional operatingsystem, our goal becomes feasible because of thefollowing reasons: (1) An instructional OS providesstudents with a structured framework upon whichthey can build various security mechanisms. (2)An instructional OS is functional even if thestudents have not implemented the securitymodules completely. This gives students quickfeedback as to how their implementations workand whether the modules are implementedcorrectly.


Security Design & ImplementationSecurity Exploit, Analysis & Testing

Sandboxing

Minix InstructionalOperating System

EncryptedFile System

PreparationVulnerabilities poolPrivilege

(Set−UID)AccessControl Capability

Figure 1 Overview of course projects based on Minix.

Some of the security mechanisms are alreadyimplemented in Minix, such as privilege, andaccess control. For some of these mechanisms,our projects are designed in a way that requiresstudents to study and play with the existing imple-mentation, so they can gain first-hand experience.For other existing mechanisms, we ask studentsto extend them and add more functionalities. Forexample, we ask students to extend the Minix’sabbreviated access control list mechanism to sup-port full access control lists. Several securitymechanisms that we cover in class do not exist inMinix, such as capability and encrypted file sys-tem. For them, we designed course projects thatask students to implement these mechanisms inMinix. To make the tasks doable within 2e3weeks, the security mechanisms are simplifiedcompared to those implemented in a real operat-ing system.

Security analysis and testing. To master the se-curity analysis and testing skills the students havelearned from the class, they need to practice thoseskills in some systems. One way to do this is to givethem a vulnerable system, such as older versionsof Windows 2000 or Linux, and ask them to findsecurity flaws in those systems. Although thesesystems contain many vulnerabilities, identifyingand exploiting them is not a trivial task even forseasoned system administrators, much less stu-dents who have just learned the basic skills.

We have created a pool of vulnerable compo-nents for Minix, with some in the application layerand some in the kernel layer. The vulnerabilitieswe choose reflect vulnerabilities in the real world.They include buffer-overflow errors, race condi-tion errors, sym-link errors, input validation er-rors, authentication errors, domain errors, anddesign errors (Landwehr et al., 1994).

Instructors can choose the vulnerable compo-nents they like and inject them into Minix. Theflawed Minix system is then given to students,

who need to find those vulnerabilities and exploitthem. Before starting these exercises, studentsare equipped with theoretical knowledge of thesevulnerabilities, the methods of detection and ex-ploitation, and the methodologies of penetrationtesting and standard security testing.

Why choose Minix?

Before we decided to use Minix, we have investi-gated a number of alternatives. We had the follow-ing criteria in mind when choosing an operatingsystem as the base of our courseware:

1. Source code availability. Because the systemsecurity course involves implementation of sys-tem security mechanisms, studying the sourcecode is important for the learning process.

2. Complete but not complex. The OS shouldprovide sufficient infrastructure to students.Students should be able to immediately seehow their implementation behaves withouthaving to build the security-irrelevant com-ponents to make the whole system work. How-ever, the OS should not be too complex;otherwise students need to spend much timein understanding the underlying system.

3. Modularized. The security modules in thesystem should be highly modularized, so thatthey can be modified or replaced indepen-dently.

4. No need for superuser privilege. It is prefera-ble for students to carry out lab assignmentsin a general computing environment using nor-mal user accounts, as opposed to in a dedicatedcomputing environment using superuserprivileges.

A complete featured OS like Linux seems a goodcandidate because of its completeness. However,if we choose such an operating system, the

194 W. Du et al.

students will take considerable amount of time tounderstand the functionality of the OS and thuslose focus on security. To overcome this drawback,many operating system courses use simplified op-erating systems, such as Xinu, Nachos and Minix,for educational purposes. We adopted a similarpractice.

Most computer security course projects requirethe administrator/superuser privilege, which canjeopardize the security of the security experi-ment. With the superuser privilege, students canhave complete control over the experimentaldomain. A malicious student might use it to gainunwanted access to other people’s accounts. Evenif all students are well behaved, they mightaccidentally introduce security holes into thesystem because of the lack of system administrat-ing experience. Some universities do give studentsthe superuser privilege for this type of projects,but the computers have to be restricted to anisolated environment. Although this approach hasbeen widely used in practice, it requires high costfor lab setting up and management. We chosea different approach: to enable students to buildand run the operating system without giving thesuperuser privilege.

We chose Minix instructional operating systemas our base system for three reasons: first, Minix

is complete comparing to other unix-style instruc-tional OSs; second, Minix can run on the Solaris

systems as a non-privileged process; third, Minix

is small and easy to understand. Table 1 comparesthe pros and cons of using different OSs as the baseof our courseware.

Introduction to the Minix operating system

Minix is a Unix operating system, and its namecame from ‘‘mini Unix ’’. As an instructional oper-ating system, Minix system is designed to be smalland simple. It only has about 15,000 lines of codes,which are publicly available at http://www.cs.vu.nl/%126;ast/minix.html (Tanenbaum).

A textbook was also written by Tanenbaum toexplain how Minix works (Tanenbaum, 1996).Students meeting the prerequisites can understandthis operating system within a short period of time.Minix system has a high modular structure, whichmakes it not only easy to understand, but alsoeasy for students to extend and modify.

Minix was originally developed as a real operat-ing system, running directly on Intel x86 machines.Later on, Ashton ported Minix to run on theSUN Solaris systems as a non-privileged process(Ashton, 1996).

Course projects

Laboratory setup

We use Minix on Solaris in our course. All of thelaboratory exercises will be conducted in SUNSolaris environment using C language. Exceptfor giving students more disk space (100 MB) tostore the files of Minix system, Minix poses nospecial requirements on the general Solaris

computing environment.The Minix operating system can also be in-

stalled on simulated environments like VMware ,Bochs and so on. Installing the operating systemon VMware is not a difficult process, and no super-user privilege is needed to run Minix on VMware.Therefore, this could be another installation op-tion. Both approaches can be used in our labora-tory designs. However, we preferred to use theSolaris approach, so students do not need to buythe VMware license or use free-wares that arenot stabilized yet.

We have designed a variety of course projectson Minix. Depending on the course schedule andthe students’ familiarity with Unix and their profi-ciency in C programming, instructors might wantto choose a subset of the projects we designed.Currently, we are still developing more assign-ments, and we will also solicit contributions from

Table 1 A comparison of various operating systems

Source code availability Complete Complex Superuser privilege Modularized

Instructional OS Minix Yes Yes No No YesNachos Yes Partial No No YesXinu Yes Yes No Yes Yes

Commercial OS Linux Yes Yes Yes Yes YesBSD Yes Yes Yes Yes YesSunOS No Yes Yes Yes YesWindows No Yes Yes Yes Yes

http://www.cs.vu.nl/&percnt;126;ast/minix.html

http://www.cs.vu.nl/&percnt;126;ast/minix.html


other people. Our goal is to create a pool of lab as-signments, such that different instructors canchoose the subset to meet the requirements oftheir syllabi.

Preparation

In this warm-up project, students get familiar withthe Minix operating system, such as installing andcompiling the Minix OS, conducting simple admin-istration tasks (e.g., adding/removing users), andlearning to use/modify some common utilities.More importantly, we want students to understandthe Minix kernel. For our system security course,students just need to understand in detail systemcalls, file systems, the data structure of i-node

and process table. They do not need to studynon-security modules such as process schedulingand memory management. Students meeting theprerequisites should be comfortable with theMinix environment in 2e3 weeks.

The following is a list of sample tasks we used.In reality, instructor can choose different tasks toachieve the same goals:

� Compile and install Minix, then add three useraccounts to the system.� Change the password verification procedure,

such that a user is blocked for 15 min afterthree failed trials.� Implement system calls to enable users to print

out attributes in i-node and process table.Appropriate security checking should be imple-mented to ensure that a user cannot steal in-formation from other accounts.

Our experiments show that it is better to guidestudents to conduct the above tasks in one or twolab sessions, in which a teaching assistant canprovide immediate helps. These lab sessions areextremely necessary when students have signifi-cantly different backgrounds.

Set-UID programs

Set-UID is an important security concept in Unix

operating systems. It is a good example to showstudents how privileges are escalated in a system.In this project, students learn the Set-UID conceptand its implementation. Students also learn howan attacker can escalate his privileges via exploit-ing a vulnerable Set-UID program.

Students need to finish the following tasks: (1)Figure out why passwd, chsh, su commands needto be Set-UID programs, and what will happen if

they are not. (2) Students are given the binarycode of the passwd program, which contains a num-ber of security flaws injected before-hand. Stu-dents need to identify those flaws, and exploitthe vulnerable program to gain the root privileges.(3) Read Minix source codes, and figure out howSet-UID is implemented in the system. (4) Modifythe kernel source code to disable the Set-UID

mechanism.This project is quite straightforward. On aver-

age it takes students 1 week to finish.

Access control list

Access control is an important security mechanismimplemented in many systems. It can be classifiedas Discretionary Access Control and MandatoryAccess Control (MAC). In DAC systems, the ownerof an object can decide its security properties(e.g., who can read this file?); while in MACsystems, the security properties are determinedand controlled by only a security manager. Accesspermissions can be represented on a per objectbasis (i.e., who can do what operations on anobject); this is called Access Control Lists. Permis-sions can also be represented on a per subject(principal) basis (i.e. what operations on what ob-jects the subject can do); this is called Capabil-ities. This project focuses on access control lists.

The goal of this project is two-fold: (1) to getfirst-hand experience with DAC and (2) to be ableto implement DAC. Minix already has an imple-mentation of abbreviated ACL; namely the accesscontrol is based on three classes: owner, group,and others. Students need to extend this abbrevi-ated ACL to a full ACL, i.e., a user can assign a spe-cific access right to any single user. On averagestudents need about 2e3 weeks to finish this pro-ject. Students need to deal with the followingissues:

� How access control works: Before working ontheir implementations, students need to un-derstand the entire process of access control,and they need to trace the program executionto find out how the access control is conductedin Minix. This enhances their understanding ofaccess control.� ACL representation: Students need to think

about how to represent the full ACL, how to al-low ACLs to specify access permissions on a peruser (principal) basis, rather than the currentowneregroupeother protection method. Stu-dents also need to make their representationflexible for adding and removing purposes.

196 W. Du et al.

� Storing the ACLs: This is another challengingpart of the project. Students need to thinkwhere exactly they should store the accesscontrol list. The current Minix implementationdoes not seem to have a place to store the fullaccess control list. Students need to solve thisissue. A hint we give them is to use some un-used entries in i-nodes or store the accesscontrol lists in separate files.� ACL management: In addition to implementing

the full ACL in the kernel, students also need toimplement the corresponding utilities, suchthat users can manage the access control listof their own files.

Capability

Capability is another important concept in com-puter security. The goal of this project is to helpstudents understand the concept of capability. Wedefined a set of capabilities in this project, witheach capability representing whether a processcan invoke a specific system call. Students need toimplement these capabilities in Minix. Specifical-ly, their capability mechanism should be able toachieve the following functionalities: (1) Permis-sion granting based on capability. (2) Capabilitycopying: A process should be able to copy its capa-bilities to another process. (3) Capability reduc-tion/restoration: A process should be able toamplify or reduce its current capabilities. For ex-ample, a process can temporarily remove its ownSet-UID capability, but later can add it back. Ofcourse, a process cannot assign a new capabilityto itself. (4) Capability revocation: Root shouldbe able to revoke capabilities from processes.

In this project, students need to take care ofthe following issues:

� Capability list representation: Students needto think about how to represent the set of de-fined capabilities. They also need to think howthey can associate capabilities with each pro-cess. The final representation should conve-niently support the required functionalities(e.g., copying, removing, etc.).� Storing the capabilities: This is another chal-

lenging part of the project where studentsneed to think where capabilities should bestored.

One option is to add an entry to the processtable to store the capabilities. A potential issue ishow feasible it is to extend the process table (notethat the process table is a kernel data structureused by many other components).

� Capability revocation: Students need to thinkabout how to revoke an object’s capability.They must be careful not to introduce vulner-abilities in this part.� Capability management: Students need to take

care of two types of users, normal and superus-ers. They need to consider the following issues:how they manage these two types of users, andwhat functionalities are associated with eachof them.

This project enhanced the students’ understand-ing of the capability concept. At the beginning, moststudents had trouble mapping the capability con-cept to the real world. We did not tell the studentshow the capability should be implemented, but toask them to design their own capability mecha-nisms. This requires them to figure out how thecapabilities should be represented in the system,where to store the capabilities, how the system canuse the capability to conduct access control, etc.Once students have figured out all of these issues,the implementation becomes relatively easy;therefore the amount of coding for this project isnot significant, and students are able to accomplishthe task within 2 weeks. Had it not been for Minix,students would need to spend a lot of time imple-menting a meaningful system where the effect ofthe capability can be demonstrated.

We encouraged students to design some otherfeatures beyond the basic requirements. Studentswere highly motivated, some implemented a moregeneric capability-based access control mecha-nism than the required one, and some allow newcapabilities to be defined by the superuser.

Sandbox

A sandbox is an environment in which the actionsof an untrusted process are restricted according toa security policy (Bishop, 2002). Such restrictionprotects the system from untrusted applications.In Unix, chroot can be used to achieve a simplesandbox.

The instruction ‘‘chroot newroot cmd’’ causescmd to be executed relative to newroot, i.e., theroot directory is changed to newroot for cmd andany of its child processes. Any program runningwithin this sandbox can only access files withinthe subdirectory of newroot.

Some Unix systems allow normal user to runchroot sandbox (just make chroot a Set-UID

program). However, this can introduce a seriousproblem: malicious users may create a login envi-ronment with their own shadow file and passwdfile under newroot, which will help them gain


a root shell. Once getting that privilege, they cancreate a Set-UID shell program which allowsthem to use after exiting the sandbox. The attackis described in the following:

test $ mkdir /tmp/etc

test $ echo root::0:0::/:/bin/sh > /tmp/

etc/passwd

test $ mkdir /tmp/bin

test $ cp /bin/sh /tmp/bin/sh

test $ cp /bin/chmod /tmp/bin/chmod

test $ chroot /tmp /bin/login (login as root

with no password)

root # chmod 4755 /bin/sh (change shell to

Set-UID)

root # exit

test $ cd /tmp/bin

test $ ./sh

root # (get root shell in real system)

One of the goals of this project is to let studentsfind out this vulnerability with some providedclues. Students need to implement attack proce-dures and demonstrate how to take advantage ofthe vulnerability to gain root privileges. This is anefficient way for students to enhance their un-derstanding on security hole in kernel level.

To fix the above vulnerability, the best way is todisallow normal user from using chroot. However,normal users will not be able to take advantageof the sandbox. We ask students to extend thecurrent chroot such that the program is safe tobe used by normal users.

We suggest students to design a security policyfor this sandbox. Sandbox security policy definesa set of permissions and restrictions that a programmust obey while running. For example, the policycan define whether a program is permitted to readfiles or connect to the Internet. Any programattempting to violate the security policy will beblocked. Students need to consider a number ofissues, including how to define policy, where tosave policy, when it should be read in, and howto secure the policy file. Students should be able tofinish this project within 2e3 weeks.

Encrypted file system

Non-encrypted file system stores plain text ondisks, so if the disk is stolen, information on itcan be disclosed. An Encrypted File System (EFS)solves this problem by encrypting the file system,such that only users who know the encryption keyscan access the files. The primary benefit of EFS isto defend against unauthorized access. The

encryption/decryption operations should be trans-parent to users. Implementing EFS requires stu-dents to combine techniques such as encryption,key management, authentication, access control,and security in OS kernels and file systems;therefore this project is a comprehensive project.We give this project as a final project.

Minix system has a complete file system, so stu-dents can build the EFS on top of it. As we men-tioned before, Minix file system is reasonablyeasy to understand; students can start buildingtheir own EFS after they understand how the filesystem works.

This project is a good candidate for the finalcomprehensive project because it covers a varietyof security-related concepts and properties:

� User transparency: The main challenge of thisproject is how to make EFS transparent. Ifthe transparency is not an issue, then studentscan easily implement a set of encryption/de-cryption utilities, and users need to use thoseutilities to encrypt/decrypt their files manually.The transparency means that the encryption/decryption should be performed on the fly,while users are reading/writing their files. Toachieve the transparency, students need tomodify the system calls related to the readingand writing. They need to insert the encryptionalgorithms into the proper positions in thosesystem calls.� Key management: Another challenge of this

project is the key management, namely howand where the encryption keys should bestored, how the keys should be protected,changed, and revoked. We have seen differentdesigns from students. For example, regardingthe key storage problem, some students storethe key (encrypted) in a file, and some storeit in the i-node of the encrypted file. We alsofound out that some students mistakenly savethe plain text key on the disk, which defeatsthe whole purpose of the EFS.� Authentication: How to decide whether a user

can access the encrypted file system or not?This part of the project not only teaches stu-dents the authentication purpose, more impor-tantly, it teaches students an important lessonabout the tradeoff between the usability andthe security. Some students’ projects requireusers to authenticate themselves each timewhen they access a file in EFS; some conductjust one authentication when the users mountthe EFS (a good implementation in our opin-ion); some conduct the authentication duringthe login. During their demos, we point out

198 W. Du et al.

the advantages and disadvantages of their de-signs, so they can evaluate their own designs.� Using encryption and hashing algorithms: Al-

though students are provided with codes forencryption and hashing algorithms, they stillneed to learn how to use it correctly. BecauseAES is a block cipher, students need to dealwith the issues related to the block and pad-ding; otherwise, their reading/writing systemcalls might not function correctly.� Security analysis: After most of the students

have finished their designs, we gave them sev-eral incorrect designs that we have encoun-tered in the past, and we asked them to findout whether those designs are secure or not;if not, how to break those EFSs.

Project simplificationFor students who do not have sufficient back-ground in operating system kernel programming,we need to customize our projects for them. Wedivide the EFS project into three projects:

1. Project 1: Encryption algorithms. This projectgets students familiar with the AES algorithm.Students need to implement a user-level pro-gram to encrypt and decrypt files.

2. Project 2: Kernel modification. The second pro-ject asks students to modify the correspondingsystem calls, such that some special files arealways read/write using encryption. However,to simplify this project, we ask them to alwaysuse a fixed key for the encryption. The key canbe hard-coded in their programs.

3. Project 3: Key management. This project dealswith the key management issue that is inten-tionally left off in the previous project. Stu-dents now need to find a place to store thekey; they need to make decision on whetherto use the same key for all the files or onekey for each file; they also need to deal withthe authentication issues, etc.

Vulnerability analysis

Vulnerability analysis strengthens the system se-curity by identifying and analyzing security flawsin computer systems. This project intends to ex-pose students to such a critical approach. Wehave two goals in this project: The first goal is tolet students gain first-hand experience on softwarevulnerabilities, be familiar with a list of commonsecurity flaws, and understand how a seemly-not-so-harmful flaw in a program can become a riskto a system. The second goal is to give students

opportunities to practice their vulnerability analy-sis and testing skills. Students can learn a numberof methodologies from class, such as vulnerabilityhypothesis, penetration testing methodology,code inspection techniques, and blackbox andwhitebox testing (Pfleeger et al., 1989). Theyneed to practice these methodologies in thisproject.

To achieve our goals, we modify the Minix

source codes and intentionally introduce a set ofvulnerabilities. We call these vulnerabilities theinjected vulnerabilities. The revised Minix systemis then given to students. The students are givensome hints, such as a list of possible vulnerabil-ities, the possible locations of the vulnerable pro-grams, etc. Their task is to find out and verifythese vulnerabilities.

The injected vulnerabilities cover a wide spec-trum of vulnerabilities, such as buffer overflow,race condition, security holes in the access controlmechanisms, security holes in Set-UID programs,information leakage, and denial of service. Thesevulnerabilities reflect system flaws caused by in-correct design, implementation, and configura-tion. All these vulnerabilities are collected fromreal commercial Unix operating systems, such asSunOS, HP-Unix and Linux, and are then portedto Minix. We have ported nine vulnerabilities sofar, with six in the user level and three in the ker-nel level. We will port other typical vulnerabilitiesto Minix in the future.

Students in this project need to accomplish thefollowing tasks:

� Identify vulnerabilities. This is a warm-uppractice to help students get familiar withvulnerability living environment.� Exploit vulnerabilities. This is a challenging

and interesting part of the project in which stu-dents write attack programs aiming at thesevulnerabilities. Demonstration is needed toshow what unauthorized privilege can beobtained.� Fix vulnerabilities. Students need to design

solutions to eliminate or remedy the identifiedvulnerabilities.

Experiences and lessons

We did a teaching experiment in the 2002 springsemester when we taught the graduate-levelcomputer security course at Syracuse University.At that time, we asked students to add certainspecific security mechanisms to Minix. We onlygive students one project for the whole semester


because modifying an OS seems to be a dauntingjob for most of the students. The students likedthe project very much and were highly motivated.At the end of the semester, the students provideda number of useful suggestions. For example,many students noted, ‘‘most of our time was spenton figuring out how such an operating systemwork, if somebody or some documentation can ex-plain that to us, we could have done four or fivedifferent projects of this type instead of doingone during the whole semester’’. This observationshapes the goal of our design: we want students toimplement a project within 2e4 weeks using ourproposed instructional environment.

When we taught the course again in Spring 2003,we provided students with sufficient informationon how Minix works, and we added a lecture to in-troduce Minix. As a result, students had becomefamiliar with Minix within the first 3 weeks, andwere ready for the projects we had designed forthem. The same degree of familiarity took stu-dents half of a semester previously due to thelack of information.

In our first experiment in 2002, the require-ments of each project were not tailored to a scopeappropriate for 2e3 weeks. During the last 3 years’experiments, we simplified those requirements.In 2004 semester, we successfully assigned fourprojects in one semester, including the Set-UID

project, capability project, access control project,and the comprehensive encrypted file system pro-ject. However, we are still unable to assign thevulnerability project due to the lack of time. Wewill further improve our strategy in the coming2005 Spring semester.

During the last 3 years, we have also learned thefollowing lessons:

� Preparation: From our experience, the prepa-ration project is crucial to the success of thesubsequent assignments. Some students whooverlooked this assignment find themselves introuble later. In fact, when we used the pro-posed approach at the first time, we did notgive students this assignment because wethought it was not necessary. As a result, stu-dents later spent a great deal of time in figur-ing out how to achieve the tasks in thisassignment. Most of the students told us thatthey spent 80% of their time to get familiarwith the system. Once they know how Minix

works, they can spend short time to finish therequired task. Therefore, when we use theapproach again, we used several lectures toinform students the necessary materials, andask the TA to devote significant amount of

time to help the students finish this assign-ment. The preparation part is extremely impor-tant. If students fail this part, they will spendenormously more time on the subsequent proj-ects. This is very clear when we compare theperformance of the students in our 2003 coursewith that of the students in 2002. We plan to in-tegrate the materials related to Minix into thelecture, so students can be prepared better.� Background knowledge: We also realized that

some students in the class are not familiarwith the Unix environment because they havebeen using Windows most of the time. Thisbrings some challenges because these studentsdo not know how to set up the PATH environ-ment variable, how to search for a file, etc.We plan to develop materials to help studentsget over this obstacle.� Cheating: Cheating did occur, especially on the

final encrypted file system project. We nowhave a list of questions that we will ask duringstudents’ demonstrations. They not only helpus evaluate students’ projects, but also arequite effective so far in identifying cheatings.Example of questions include ‘‘where do yousave keys and why?’’, ‘‘can your implementa-tion work on large files? and how did you handlethat?’’, etc. Students who simply copy others’implementation be will most likely unable toanswer these questions.

Conclusion and future work

We have described a laboratory design for ourgraduate-level computer security course. Our ap-proach is intrigued by the successful practice inoperating system and network courses education.In our approach, we use Minix instructional operat-ing system as the basis of our laboratory; in design-oriented laboratoryprojects, studentsadda specificsecurity mechanism to the system; in analysis-oriented laboratory projects, students identify,exploit, and fix vulnerabilities in Minix. Becauseof the desirable properties of Minix, our laboratoryprojects can be finished within a reasonableamount of time and in a general computing environ-ment without using superuser privileges. We havedesigned a series of laboratory projects based onMinix, and have experimented with our approachfor the last 3 years. The experience obtained isencouraging, and students in our class have showngreat interest in the course and the projects.

We will continue experimenting and perfectingour approach. More importantly, we will workon making this laboratory approach easy to be

200 W. Du et al.

adopted by other people. This requires us toprovide detailed documentations, instructions,and a pool of different projects covering a widerange of security concepts.

References

Ashton P, Smxdthe solaris port of minix. 1996.Bishop M. Computer security: art and science. Addison-Wesley;

2002.Bochs, <http://bochs.sourceforge.net>; 2002.Christopher WA, Procter SJ, Anderson TE. The nachos instruc-

tional operating system. In: Proceedings of the winter1993, USENIX conference, San Diego, CA, USA, January25e29, 1993. p. 481e9. Available from: http://http.cs.berkeley.edu/%126;tea/nachos.

Comer D. Operating system design: the XINU approach. PrenticeHall; 1984.

Hill JMD, Carver CA Jr, Humphries JW, Pooch UW. Using an iso-lated network laboratory to teach advanced networks andsecurity. In: Proceedings of the 32nd SIGCSE technical sympo-sium on computer science education, Charlotte, NC, USA,February 2001. p. 36e40.

Landwehr CE, Bull AR, McDermott JP, Choi WS. A taxonomy ofcomputer program security flaws. ACM Computing SurveysSeptember 1994;26(3):211e54.

Mayo J, Kearns P. A secure unrestricted advanced systems lab-oratory. In: Proceedings of the 30th SIGCSE technical sympo-sium on computer science education, New Orleans, USA,March 24e28, 1999. p. 165e9.

Meyers C, Jones TB. Promoting active learning: strategies forthe college classroom. San Francisco, CA: Jossey-Bass;1993.

Mitchener WG, Vahdat A. A chat room assignment for teachingnetwork security. In: Proceedings of the 32nd SIGCSE techni-cal symposium on computer science education, Charlotte,NC, USA, February 2001. p. 31e5.

Pfleeger C, Pfleeger S, Theofanos M. A methodology for pene-tration testing. Computers and Security 1989;8(7):613e20.

Spafford EH. February 1997 testimony before the United StatesHouse of Representatives’ subcommittee on technology,computer and network security, 2000. Available from:http://www.house.gov/science/hearing.htm.

Tanenbaum A. Operating systems: design and implementation.2nd ed. Prentice Hall; 1996.

Tanenbaum A, <http://www.cs.vu.nl/%126;ast/minix.html>;1996.

VMWare, <http://www.vmware.com>; 1996.

Wenliang Du received the B.S. degree in Computer Sciencefrom the University of Science and Technology of China, Hefei,China, in 1993, the M.S. degree and the Ph.D. degree from theComputer Science Department at Purdue University, West La-fayette, Indiana, USA, in 1999 and 2001, respectively. Duringhis studies in Purdue, he did research in the Center for Educa-tion and Research in Information Assurance and Security(CERIAS). Dr. Du is currently an assistant professor in theDepartment of Electrical Engineering and Computer Science atSyracuse University, Syracuse, New York, USA. His researchbackground is in computer and network security. In particular,he is interested in wireless sensor network security and privacy-preserving data mining. He is also interested in developinginstructional laboratories for security education using instruc-tional operating systems. His research has been supported bythe National Science Foundation and the Army Research Office.

Mingdong Shang received his B.S. Degree in Electrical andMechanical Engineering from Beijing University of Aeronauticsand Astronautics in 1998. He is Currently a Ph.D. student inthe Department of Electrical Engineering and Computer Scienceat Syracuse University. His research interests include computersecurity and network security, and he has been focusing ondeveloping Minix-based instructional laboratory environmentand lab exercises for computer and network security courses.

Haizhi Xu received his B.S. and M.S. degrees both in computerengineering from Harbin Institute of Technology, Herbin, China,in 1995 and 1997 respectively. He is a Ph.D. Candidate at Syra-cuse University, Syracuse, NY, USA, majoring in computer engi-neering. His current research interests are computer systemsecurity, intrusion detection and mitigation, and operatingsystems.

http://bochs.sourceforge.net

http://http.cs.berkeley.edu/%126;tea/nachos

http://http.cs.berkeley.edu/%126;tea/nachos

http://www.house.gov/science/hearing.htm

http://www.cs.vu.nl/%126;ast/minix.html

http://www.vmware.com



A traceable threshold signature scheme withmultiple signing policies

Jun Shao, Zhenfu Cao*

Department of Computer Science and Engineering, Shanghai Jiao Tong University,1954 Huashan Road, Shanghai 200030, People’s Republic of China

Received 15 November 2005; accepted 15 November 2005

KEYWORDSThresholdcryptography;Signature schemes;Multi-secret;Traceability;Multiple signingpolicies

Abstract In recent years, a great deal of work has been done on threshold signa-ture schemes and many excellent schemes have been proposed. In Eurocrypt’94,Li et al. [Threshold-multisignature schemes where suspected forgery impliestraceability of adversarial shareholders. In: Advances in CryptologydProceedingsof EUROCRYPT 94; 1994. p. 413e9] proposed a threshold signature scheme withtraceability, which allows us to trace back to find the signer without revealingthe secret keys. And in 2001, Lee [Threshold signature scheme with multiple signingpolicies. IEE Proc Comput Digit Tech 2001;148(2):95e9] proposed a threshold signa-ture scheme with multiple signing policies, which allows multiple secret keys to beshared among a group of users, and each secret key has its specific threshold value.In this paper, based on these schemes, we present a traceable threshold signaturescheme with multiple signing policies, which not only inherits their properties, butalso fixes their weaknesses.ª 2005 Elsevier Ltd. All rights reserved.

Introduction

In order to keep the secret efficiently and safely,Shamir (1979) and Blakley (1979) presented (l, n)threshold secret sharing schemes independentlyin 1979. In such a scheme, the dealer splits the se-cret x into shares ðx1;.; xnÞ among players, andsends the share to the corresponding player. As

* Corresponding author. Tel.: þ86 21 62835602; fax: þ86 2162933504.

E-mail address: [email protected] (Z. Cao).


a result, any l or more players can cooperate torecover the secret x, but any l� 1 or fewer playerscannot get any useful information about the secretx. A threshold secret sharing scheme has manypractical applications, such as opening a bankvault, launching a nuclear, or authenticating anelectronic funds transfer. On the other hand, thereare several situations in which more than onesecret is to be shared among players. As an exam-ple, consider the following situation, described bySimmons (1991): there is a missile battery and notall of the missiles have the same launch enablecode. The problem is to devise a scheme that

rved.



202 J. Shao, Z. Cao

will allow any one, or any selected subset, of thelaunch enable codes to be activated in thisscheme. Till now, many efficient schemes for shar-ing more than one secret have been proposed(Blundo et al., 1994; Lee, 2001).

Digital signature is a major research topic inmodern cryptography and computer security. Thesigner needs to take full responsibility for theirdigital signatures. In 1991, Desmedt and Frankel(1991) combined digital signatures and thresholdsecret sharing schemes to propose the concept ofthreshold signature. Like threshold secret sharingschemes, in a threshold signature scheme, the re-sponsibility for signing a document is shared bya group of signers from time to time. A thresholdsignature scheme is designed to allow that onlywhen the number of players attains the giventhreshold value, the signature can be created.More precisely, a typical (l, n) threshold signaturescheme follows the three basic properties:

� Any l or more players in the group can cooper-ate with each other to generate a valid groupsignature, while they do not reveal any infor-mation about their sub-secret keys and thesecret key.� Any l� 1 or fewer players in the group cannot

create a valid group signature.� Any verifier can verify the group signature with

only knowing the group public key.

However, Li et al. (1994) have pointed out thatmost of the (l, n) threshold digital signatureschemes proposed so far suffer from the so-calledconspiracy attack. That is, any l or more playerscan cooperate to impersonate any other set of play-ers to forge the signature. To prevent from the at-tack, they added a random number to the shadowto form a sub-secret key held by each player. Theadditional random number gives the (l, n) thresholdsignature scheme the property of traceability,which means that we can trace adversarial signersif forgery is suspected. Unfortunately, Michels andHorster (1996) showed that the signer cannot besure who his cosigners are in Li et al.’s (1994)scheme, and this weakness violates the traceabilityproperty.

Corresponding to multi-secret sharing scheme,there is threshold signature scheme with multiplegroup secret keys. In this kind of scheme, differentsecret keys can be used to sign documents depend-ing on the significance of the documents. Once thenumber of the cooperated users is greater than orequal to the threshold value of the group secretkey, they can cooperate to sign the document.In 2001, Lee proposed an efficient threshold

signature scheme with multiple signing policies.However, in Lee’s scheme, there are n groupsecret keyðS0; S1;.; Sn�1Þ; if the group secret keyS0 is exposed, then the scheme is broken. Further-more, the partial signature cannot be verified.

In this paper, based on Li et al.’s scheme andLee’s scheme, we present a traceable thresholdsignature scheme with multiple signing policies.The proposed scheme allows the players to applydifferent group secret keys to sign documents, andonly two sub-secret keys need to be kept by eachplayer. Furthermore, in the proposed scheme, wecan trace back to find the signer without revealingthe secret keys. In addition, the exposure of any ofthe group secret key cannot harm the security ofother unexposed group secret key.

The rest of this paper is organized as follows. Inthe next section, we first review Li et al.’s schemeand Lee’s scheme. Then we propose our schemeand discuss its security. Finally, conclusions aremarked.

Review of Li et al.’s scheme and Lee’sscheme

In this section, we briefly describe Li et al.’sscheme (1994) and Lee’s scheme (2001).

Li et al.’s scheme

Li et al. (1994) proposed two (l, n) threshold signa-tures with traceable players: the first one needsa mutually trusted dealer while the second onedoes not. In this section we only review their firstscheme, which needs a mutually trusted dealer(Michels and Horster, 1996).

The dealer picks two large primes p, q withqjp� 1, a generator g˛GFðpÞ order q and apolynomial fðxÞ ¼

Pl�1i¼0 aix

i mod q�ai˛Z*

q; i ¼ 0;1;.; l� 1

�. Then the dealer determines x ¼

fð0Þ ¼ a0 as the group secret key and computesy ¼ gx mod p as the public group key. The secretshare of each player Pið1 � i � nÞ with identityIDi is ui ¼ bi þ fðIDiÞ mod q using a random valuebi˛Z*

q and the public keys are yi ¼ gui mod p andzi ¼ gbi mod p.

If a group B with jBj ¼ t of players would like togenerate a signature of a message m, then eachplayer Pi (i ˛ B) picks ki˛Z*

q and broadcasts�ri ¼ gki mod p

�. Once, all ri are available, each

player and the designated combiner (DC) compute

R¼Y

i˛B

ri mod p and E ¼ Hðm;RÞmod q

Then each player Pi (i ˛ B) computes

A traceable threshold signature scheme 203

si ¼ uidiþ kiE mod q

where di ¼Q

j˛B;isj

�0� IDj

��IDi�IDj

�. Each player

Pi (i ˛ B) sends the values m and si to the DC whocan verify si by the following equation:

gsi ¼ ydii r E

i mod p

Then the DC computes the group signature by

S¼X

i˛B

si mod q

(m, B, R, S ) is the group signature of the signers inB to message m. This signature can be checked bycomputing

T ¼Y

i˛B

zdii mod p; E ¼ Hðm;RÞmod q

and checking, whether the equation

gS ¼ yTRE mod p

holds.

Lee’s scheme

Now we review the scheme by Lee (2001). In thisscheme, a trusted dealer is assumed too. The dealerpicks three numbers N, L and a, where N ¼ pq,p ¼ 2p0 þ 1 and q ¼ 2q0 þ 1, p, q, p0 and q0 arelarge primes, L is a random number withgcdðL;4ðNÞÞ ¼ 1ð4ðNÞ ¼ 2p0q0Þ, and a is primitivein both GF (p) and GF (q). Also, the dealer picksa random number dðgcdðd;4ðNÞÞ ¼ 1Þ and a randompolynomial f(x) of degree n� 1 with fð0Þ ¼d mod 4ðNÞ. N, L and a are published, while p, q,p0, q0, 4(N) and f(x) are kept secret. Then the dealercomputes the n group secret keys ðSi; i ¼0; 1;.; n� 1Þ and the corresponding n public groupkeys ðYi; i ¼ 0; 1;.; n� 1Þ as follows:

Si ¼ adLi

mod N

Yi ¼ a�dLnþi

mod N

The threshold value of Si is set to be n� i. Thesecret share of each player Piði ¼ n;.; 2n� 1Þ is Ki

(Ki ¼ asi mod N; si ¼ ½fðxiÞ=2� =��Q

j˛D; jsi

�xi � xj

��

=2�

mod p0q0, xi is a public odd integer, and f(xi) isa secret even integer). Also, the dealer publishesan odd integer xi with an even f(xi), and apublic shadow Ki ¼ aLisi mod Nði ¼ 1; 2;.; n� 1Þ.Let AðjAj ¼ nÞ be the set of all players’ public val-ues xis in the group, CðjCj ¼ n� 1Þ be the set of allpublic shadows’ public values xis, and D be theunion of A and C.

Let BlðjBlj ¼ lÞ be denoted as the set of the lplayers’ xis, BaðjBa ¼ n� ljÞ be denoted as the

set of the public values, ðx1;.; xn�lÞ, and B bethe union of Bl and Ba. If a group Bl with jBlj ¼ lof players would like to generate a signature ofa message m with the threshold value l, theneach player Piði˛BlÞ picks ri˛ð1;.;N � 1Þ andbroadcasts ui ðui ¼ rLn

i mod NÞ. Once all ui areavailable, then each player and the DC compute

U ¼Y

i˛Bl

ui mod N ¼ RLnmod N

e¼ Hðm;UÞ

where R ¼Q

i˛Blri mod N. Then each player

Piði˛BlÞ computes

zi ¼ riKLn�ldii mod N

where di ¼Q

j˛D;j;B

�xi � xj

�Qj˛B;jsi

�0� xj

�e. Each

player Piði˛BÞ sends the values m and zi to the DCwho can compute the group signature by

Z ¼Y

i˛Bl

zi

Y

i˛Ba

Wi mod N ¼ RaLn�lde mod N

where Wi ¼ KLn�l�idi

i mod N. (m, e, Z ) is the groupsignature of the signers in Bl to message m. Thissignature can be checked by computing

U ¼ ZLnYe

n�l mod N

and checking, whether the equation

e¼ Hðm;UÞ

holds.

The proposed scheme

In this section, we present our scheme that isbased on the schemes of Li et al. and Lee. In theproposed scheme, a trusted dealer is also as-sumed. Let us divide the proposed scheme intothree phases: the initialization phase, the partialsignature generation/verification phase, and thegroup signature generation/verification phase. Theproposed scheme can thus be stated as follows.

Initialization phase

Firstly, the dealer selects the following para-meters:

(1) two large primes c, c0, with c0jc� 1;(2) a generator GFðcÞ of order c0;(3) a number N ¼ pq(p ¼ 2p0 þ 1 and q ¼ 2q0 þ 1),

where p, q, p0 and q0 are large primes, anddefines 4ðNÞ ¼ 2p0q0;

204 J. Shao, Z. Cao

(4) three random numbers L, L0 and d, wheregcdðL;4ðNÞÞ ¼ 1, gcd

�L0;4ðNÞ

�¼ 1 and gcdðd;

4ðNÞÞ ¼ 1;(5) two numbers a and b, where a and b are both

primitive in both GF (p) and GF (q);(6) two collision free hash functions H1 and H2;(7) a polynomial f(x) of degree of n� 1, where

fð0Þ ¼ d mod 4ðNÞ.

Thus, the dealer publishes�c; c0; g;N; L; L0;

a; b;H1;H2

�as the group public parameters and

keeps fp;q;p0;q0;4ðNÞ;d; fðxÞg from being re-vealed. Let AðjAj ¼ nÞ be the set of all players’public values xis in the group, CðjCj ¼ n� 1Þ bethe set of all public shadows’ public values xis,and D be the union of A and C.

Then the dealer computes the n group secretkeys ðSi; i ¼ 0;.; n� 1Þ and the corresponding npublic group keys ðYi; i ¼ 0;.; n� 1Þ as follows:

Si ¼ adLiL0n�imod N ð1Þ

Yi ¼ a�dLnþiL0n�imod N ð2Þ

The threshold value of Si is set to be n� i, e.g.the threshold value of S0 is n and the threshold valueof Sn�1 is 1. The dealer sends the (si, bi) to playerPi, and publishes yiðyi ¼ bsi mod NÞ and vi

�vi¼

gbi modc�ði¼ n;.; 2n� 1Þ, where si¼½fðxiÞ=2�=��Q

j˛D;jsi

�xi � xj

��2�mod p0q0, xi is a public odd

integer, f(xi) is secret even integers, and bi is a ran-dom integer.

Also, the dealer publishes an odd integer xi withan even f(xi), and a public shadow Ki ¼ aLiSi

mod Nði ¼ 1;.; n� 1Þ.

Partial signature generation/verificationphase

Assume that a message m is required to be signedby the cooperation of l players (l could be any in-teger from 1 to n). Let BlðjBlj ¼ lÞ be denoted asthe set of the l players’ xi s, BaðjBaj ¼ n� lÞ be de-noted as the set of the public values, ðx1;.; xn�lÞ,and B be the union of Bl and Ba.

To generate the signature for message m, eachplayer Piði˛BlÞ picks riðri˛ð1;N � 1ÞÞ and broad-casts uiðui ¼ grimod cÞ. Once all ui are available,then each player computes

U ¼Y

i˛Bl

ui mod c ð3Þ

e¼ H1ðm;U;BlÞ ð4Þ

gi ¼ ari mod N ð5Þ

g0i ¼ bri mod N ð6Þ

zi ¼ asie mod N ð7Þli ¼ H2ða;b;zi;y

ei ;gi;g

0i Þ ð8Þ

wi ¼ riþ lisie ð9Þ

ti ¼ biþ eri mod c0 ð10Þ

Also, DC computes U and e as Eqs. (3) and (4),respectively. Then each player Pi sends his partialsignature ðm; li;wi; zi; tiÞ to DC who can check thevalidity of the partial signature by

li ¼ H2

a;b;zi;y

ei ;

awi

zlii

;bwi

yliei

!ð11Þ

gti ¼ viuei mod c ð12Þ

Group signature generation/verificationphase

Once these l partial signatures are verified, the DCcan compute the group signature by

Z ¼�Y

i˛Bl

zdiLn�1

i

Y

i˛Ba

Wi

L0l

mod N ¼ aLn�lL0lde mod N

ð13Þ

T ¼X

i˛Bl

ti mod c0 ð14Þ

where Wi ¼ KLn�l�idiei mod N, and di ¼

Qj˛D;j;B�

xi � xj

�Qj˛B;jsi

�0� xj

�. ðm; Z;T ;U;BlÞ is the

group signature of the signers in Bl to message m.This signature can be checked by computing

e¼ H1ðm;U;BlÞ

and checking, whether the equations

1¼ ZLnYe

n�l mod N ð15Þ

gT ¼Y

i˛Bl

viUe mod c ð16Þ

hold.

Theorem 1. If 1 ¼ ZLnYe

n�l mod N and gT ¼Qi˛Bl

viUe

mod c hold, then ðm; Z;T ;U;BlÞ is the valid groupsignature of m with threshold value l.

Proof 1. Since

zi ¼ asie mod N

where i˛Bl, and

A traceable threshold signature scheme 205

Wi ¼ KLn�l�idiei mod N

¼ aLisiLn�l�idie mod N

¼ aLn�lsidie mod N

where i˛Ba, we have

Z ¼�Y

i˛Bl

zdiLn�l

i

Y

i˛Ba

Wi

L0l

mod N

¼�Y

i˛Bl

asiediLn�lY

i˛Ba

asiediL

n�l

i

L0l mod N

¼Y

i˛B

asiediLn�lL0l mod N

also, we have

si ¼fðxiÞ=2�Q

j˛D;jsi

�xi� xj

�.2

mod p0q0

di ¼Y

j˛D;j˛B

�xi� xj

� Y

j˛B;jsi

�0� xj

�

By the threshold secret sharing scheme (Shamir,1979), the unique (n� 1)th degree polynomialf (x) can be determined with knowledge of n pairof ðxi; fðxiÞÞ, thus

Z ¼Y

i˛B

asiediLn�lL0l mod N ¼ adeLn�lL0l mod N

Consequently,

ZLnYe

n�l ¼�

adeLn�lL0lLn�

a�dL2n�lL0le

mod N

¼ adeL2n�lL0l a�deL2n�lL0l mod N ¼ 1 mod N

On the other hand, since

gti ¼ gbiþeri mod c¼ gbiðgriÞe mod c¼ viuei mod c

By multiplying gti for all i˛Bl, we have,

Y

i˛Bl

gti ¼Y

i˛Bl

viuei mod c¼

Y

i˛Bl

vi

Y

i˛Bl

uei mod c

¼Y

i˛Bl

viUe mod c

Since T can be expressed as

T ¼X

i˛Bl

ti mod c0

we have

gT ¼Y

i˛Bl

viUe mod c

Security discussions

According to Theorem 1, any subset Bl of l playerscan generate a valid group signature with thresh-old value l. The group signature can also be veri-fied easily by any verifier.

Although any player can associate the n� 1 pub-lic shadows to retrieve the group secret keyaLn�1dL0, this group secret key can be only used forgenerating group signatures with threshold value 1.Moreover, the exposure of any group secret keyaLidL0n�i

cannot harm the security of other unex-posed group secret keys, unless the adversarycan find L�1 mod 4ðNÞ or L0�1 mod 4ðNÞ. However,it is as difficult as factoring N.

The adversary cannot get any useful informationabout si from Eqs. (5)e(9), because it is a well-knownnon-interactive protocol, due to Shoup (2000). More-over, we directly adopt the method proposed inMichels and Horster (1996), thus, our scheme canwithstand the attack presented by them.

Furthermore, we can check the security of ourscheme by replying to the questions given byLi et al. (1994) and Lee (2001). We omit the detailanalysis here because it is very similar to that pre-sented earlier (Li et al., 1994; Lee, 2001). Thereader may refer to the above-mentioned worksfor more detailed information.

Conclusions

Based on the schemes of Li et al. and Lee we havedevised a traceable threshold signature scheme withmultiple signing policies. In the proposed scheme,any group secret key is exposed, which cannot harmthe security of other unexposed group secret keys.Moreover, our scheme has the traceability property.

Acknowledgements

This research is supported by the National NaturalScience Foundation of China for DistinguishedYoung Scholars under Grant No. 60225007, theNational Research Fund for the Doctoral Programof Higher Education of China under GrantNo.20020248024, and the Science and TechnologyResearch Project of Shanghai under Grant Nos.04JC14055 and 04DZ07067.

References

Blakley GR. Safeguarding cryptographic keys. In: Proceedings ofAFIPS National Computer Conference, vol. 48, Arlington, VA,June 1979. p. 313e7.

Blundo C, Santis AD, Crescenzo GD, Gaggia AG, Vaccaro U. Multisecret sharing schemes. In: Desmedt YG, editor. Advances in

206 J. Shao, Z. Cao

cryptologydCrypto’94 Proceedings. LNCS 839. Berlin:Springer-Verlag; 1994. p. 150e63.

Desmedt Y, Frankel Y. Shared generation of authenticators andsignatures. In: Advances in cryptologydCrypto’91 Proceed-ings; 1991. p. 457e69.

Lee NY. Threshold signature scheme with multiple signingpolicies. IEE Proc Comput Digit Tech March 2001;148(2):95e9.

Li C, Hwang T, Lee N. Threshold-multisignature schemes wheresuspected forgery implies traceability of adversarial share-holders. In: Advances in CryptologydProceedings of EURO-CRYPT 94; 1994. p. 413e9.

Michels M, Horster P. On the risk of disruption in severalmultiparty signature schemes. In: Advances in Cryptologyd

Proceedings of Asiacrypto 96; 1996. p. 334e45.Shamir A. How to share a secret. Commun ACM 1979;22(11):

612e3.Simmons GJ. An introduction to shared secret and/or shared

control schemes and their application, Contemporary cryp-tology. IEEE Press; 1991. p. 441e97.

Shoup V. Practical threshold signatures. In: Preneel B, editor.EUROCRYPT 2000. LNCS 1807; 2000. p. 207e20.

Jun Shao received his B.S. degree in Computer Science fromNorthwestern Polytechnical University in 2003. Currently,he is a doctoral candidate in the Department of ComputerScience and Engineering, Shanghai Jiao Tong University.His research interests lie in cryptography and networksecurity.

Zhenfu Cao is the professor and the doctoral supervisor ofthe Department of Computer Science and Engineering,Shanghai Jiao Tong University. His main research areasare number theory, modern cryptography, and informationsecurity. He is the recipient of the Youth Award and ResearchFund of Chinese Science Academy (1986), the first prize Awardfor Science and Technology in Chinese University (2001), andthe National Outstanding Youth Fund of China (2002), etc.



Security implications in RFID andauthentication processing framework

John Ayoade*

Security Advancement Group, National Institute of Information andCommunications Technology, Japan


KEYWORDSRFID;Access control;Authentication;Security;APF

Abstract The objective of this paper is to propose an idea called APF (Authenti-cation Processing Framework) as one of the ways to deter the growing concernsof unauthorized readers from accessing the tag (transponder) which could result in-to the violations of information stored in the tag. On one hand, we will discuss theimportance of RFID systems and on the other hand, we will discuss about the secu-rity implications that the RFID systems have over consumers’ privacy and security.In this paper, we are trying to weigh the two issues, importance of RFID system andthe RFID security implications. Having done that, we are recommending our ideacalled APF (Authentication Processing Framework) as a good method to overcomethe above mentioned problem.ª 2005 Elsevier Ltd. All rights reserved.

Introduction

A typical RFID system will consist of a tag, a reader,an antenna and a host system. Most RFID tags arepassive which means that they are battery-less andthat they obtain power to operate from thereader. While some are battery powered tagswhich means they are active and do not needpower from the reader to function. RFID tags aretiny computer chips connected to miniature an-tennae that can be affixed to physical objects

* 101 Domiru-Tsuda, 3-25-41 Tsudamachi, Kodaira-shi, Tokyo,Japan. Tel./fax: þ81 423 43 4403.

E-mail addresses: [email protected], [email protected]


(Berthon, 2000). In the most commonly touted ap-plications of RFID, the microchip contains an Elec-tronic Product Code (EPC) with sufficient capacityto provide unique identifiers for all items producedworldwide. When an RFID reader emits a radio sig-nal, tags in the vicinity respond by transmittingtheir stored data to the reader.

With passive (battery-less) RFID tags, read-range can vary from less than an inch to 20e30feet, while active (self-powered) tags can havea much longer read-range.

Typically, the data are sent to a distributedcomputing system involved in, perhaps, supplychain management or inventory control (Spychips,2003).

rved.




208 J. Ayoade

RFID system has many beneficial uses as it can beapplied to many areas of our day to day activities.It supports many versatile applications includingentrance gate control at transport facilities, cus-tody control and so on. However, the major barrierthat the RFID system is facing presently is the issueof possibility of privacy violation which could be asa result of illegal access.

Since, RFID tags respond automatically to anyreader; that is, they transmit without the knowl-edge of the bearer, and this property can be usedto track a specific user or object over wide areas.While expectations are growing for the use of RFIDsystems in various fields, opposition to their usewithout the knowledge of the user is increasing(CASPIAN).

Furthermore, if personal identity were linkedwith unique RFID tag numbers, individuals could beprofiled and tracked without their knowledge orconsent. For example, a tag embedded in a shoecould serve as a de facto identifier for the personwearing it. Even if item-level information remainsgeneric, identifying items people wear or carrycould associate them with, for example, particularevents like political rallies (Spychips, 2003).

Our main goal is to find a solution to the privacyproblem of illegal access of readers to the tags(tags) in the RFID system.

Moreover, the RFID has been around for manyyears now. The first notable application was inidentifying aircraft as friend or foe. Since thenRFID has been deployed in a number of applicationsuch as identifying and tracking animals fromimplanted tags; tracking transport containers;access control systems; keyless entry systems forvehicles; and automatic collection of road tolls(Allan, 2003).

Many other RFID applications may emerge.Consider an airport setting. Both boarding passesand luggage labels could be tagged with RFIDdevices. Before take-off, an RFID enabled airplanecould verify that all boarding passes issued wereon the plane and that all luggage associated withthose was in the hold. Within an airport, trackingpassengers by their boarding passes could improveboth security and customer service. Of course, inother environments this would be an undesirableviolation of privacy (Weis, 2003).

Regarding consumers’ privacy violation, we canrefer to the above example. Since many airlinesare in the airport with different workers, therecould be malicious workers working for differentairlines with ulterior motives to violate consumers’privacy. There is a tendency that the maliciousworkers would be accessing and monitoring theprivate information of consumers.

Therefore, there should be a preventive methodthat should be put in place to deter the violation ofprivacy of consumers.

Importance and implications of RFIDsystems

The problem we are dealing with in this paper isthe issue of privacy problem in RFID system. Since,in RFID system any reader can read and write tothe tag in the range of its vicinity. As it is obviousthat any item a tag is attached to is susceptible totracking or monitoring.

This is explained in Ohkubo et al. (2004) asa leakage of information regarding use of belong-ings, for example, money and expensive products,medicine (which may indicate a particular dis-ease), and books (which mirror personal conscious-ness or avocation). It means if such items aretagged, various types of personal information canbe acquired without the knowledge of the user(Ohkubo et al., 2004).

However, in this paper, we proposed APF whichstands for Authentication Processing Framework.We will discuss how the APF will be able tocircumvent the problem described above later inthe paper.

Importance of RFID systems

RFID systems are now used for a variety of in-dustrial and consumer applications, includingaccess control, asset management, and warehouseautomation (UBICOM).

Electronic toll collection and road pricing area typical use of active and semi-active tags.Automobiles are equipped with an active tagthat can be read as the car moves througha toll booth or drives along the road. Each taghas a unique serial number; a database correlatesthe serial number with an account number that isautomatically debited each time the tag is read(E-ZPASS).

Security implications in RFID

Personal data protectionWhile recognizing the benefits of RFID, businessalso has to consider fully the implications forpersonal data protection and security. Citizenshave already voiced concerns about the ability ofRFID to track them personally, to gather informa-tion about their purchasing habits, and to compro-mise their personal security.

Security implications in RFID and APF 209

System reliabilitySystem reliability has been identified as keyelement for the future deployment of RFID. Themost pressing concern is the possibility that datafrom tags could be compromised or altered by anunauthorized source.

Consumer educationA core part of any future dialogue on RFID will beconsumer education. Consumers should be pro-vided with accurate information to enable them toparticipate fully in discussions regarding RFIDtechnology, usage and management and to un-derstand any possible benefits from the use of RFID(Nakamura).

Pros and cons of previous work

Some people have worked on this issue of privacyproblem in RFID system. Moreover, we will discusssome of the ideas and approaches used by them.

a. Kill command idea e The standard mode of op-eration proposed by the AutoID Center is in-deed for tags to be killed upon purchase ofthe tagged product. With their proposed tagdesign, a tag can be killed by sending it a spe-cial ‘‘kill’’ command. However, there are manyenvironments, in which simple measures like‘‘kill command’’ are unworkable or undesir-able for privacy enforcement. For example,consumers may wish RFID tags to remain oper-ative while in their possession.

b. Faraday cage approach e An RFID tag may beshielded from scrutiny using what is known asa Faraday cage e a container made of metalmesh or foil which is impenetrable by radio sig-nals (of certain frequencies). There have beenreports that some thieves have been using foil-lined bags in retail shops to prevent shoplift-ing-detection mechanisms (Liu et al., 2004).

c. The active jamming approach e An active jam-ming approach is a physical means of shieldingtags from view. In this approach, the user coulduse a radio frequency device which activelysends radio signals so as to block the operationof any nearby RFID readers. However, this ap-proach could be illegal for example if thebroadcast power is too high it could disruptall nearby RFID systems and not that alone itcould be dangerous and cause problems in re-stricted areas like hospital and in the train.

d. The blocker tag approach e The blocker tag isthe tag that replies with simulated signalswhen queried by reader so that the reader can-not trust the received signals. Like active

jamming, it may affect the other legal tags(Ari et al.).

Importance of a proper security andaccess control in RFID systems

In this paper, our objective is to find a solution tothe pressing concern of data from tags beingcompromised or altered by an unauthorized source.

We proposed an authentication frameworkcalled APF e Authentication Processing Frame-work. This is a framework that makes it compul-sory for the readers to authenticate themselveswith the APF database before they can accessregistered tags.

In order to prevent illegal access to the memorysegment of tag there should be a procedural accesscontrol to the memory segment of the tag.

From Fig. 1, each tag memory segment will reg-ister its unique ID and the access key to the mem-ory of the tag with the APF. This means both theunique access key and the data in the tag will beencrypted and the access key will be registeredwith the APF. This is necessary for the protectionof tag from unscrupulous readers that have ulteriorintention. Once tag registers its unique identityand access key with the APF, it will be difficultfor any reader to have access to the memory seg-ment of the tag without possessing the accesskey to the tag. We will discuss about how the au-thenticated reader would have access to the mem-ory segment of the tag in the next paragraph.

Furthermore, every reader will register itselfwith the APF in order for it to be authenticatedprior to the time the reader will request for thekey to access the data in the tag.

In a nutshell, every reader will register itsunique identification number with the APF andthis will be confirmed by the APF before releasing

ID 1 – Key = 10

Tags

Registration

Authentication ProcessingFramework

ID 1 – Key = 10

ID N - Key = ***

……

ID 2 – Key = 11

ID N – Key = **

….

ID 2 – Key = 11

Figure 1 The registration of tags with the APF.

210 J. Ayoade

the encrypted key to the reader in order to readthe encrypted data in the specific tag.

From Fig. 2 every reader registers its unique iden-tification number with the APF. Since both readersand tags register their identification numbers withthe APF, this will serve as a mutual authenticationand will protect tags from malicious readers whichis one of the concerns users have for the full realiza-tion of the RFID systems. This means that unautho-rized access into the tag will be eradicated if APFframework is implemented and used. In the nextparagraph we will discuss about the registrationand access control of readers to the APF.

In the previous paragraphs we discussed aboutthe registration of the tag memory segment’sunique identity and access key with the APF. Alsowe discussed about the registration of readers withthe APF prior to accessing the tags. When thereader sends a read ‘‘command’’ to the tag, thetag will reply with its identification number andthe encrypted data, this means that only regis-tered reader with the APF will be able to get thedecryption key to access the encrypted data. Once

Readers

Registration


ID R1 = 110

ID R2 = 111

ID R N = ***

……

ID R1 – Key = 110

ID R2 – Key = 111

ID RN – Key = ***

….

Figure 2 The registration of readers with the APF.

the key is received the data in the tag will bereadable (Fig. 3).

In this framework, there are two importantprocesses, the first one is that, mutual authenti-cation will be carried out by the APF because itauthenticates the reader and the tag.

Secondly, the privacy concern will be guaran-teed because the data stored in the tag areprotected from malicious reader. Since, the in-formation the reader got from the tag is encryptedand it can only be read after the decryption key toaccess the information is received from the APF.

The flowchart of the APF framework

The flowchart of the APF framework is given inFig. 4.

The pseudo code of the APF framework

1. Tags register decryption key with the APF.2. Readers register their unique identification

numbers with the APF.3. Readers issue command to access the tag.4. Response from the tag to release the encrypted

data.5. Readers request for the decryption key.6. Readers can decrypt the encrypted data.

Merits and demerits of the APF

The APF provides assurance to the RFID users thatthe information stored in the tag is secured in thesense that only authenticated reader by the APFcan have access to the tag. The reason for this isthat the information received by the reader fromthe tag is encrypted and this information can onlybe decrypted by getting the decryption key from

…….…….……..ReaderID N

……OXReader

ID2

……XOReader

ID1

TagIDN

TagID2

TagID1

Reader 2

Reader N

Reader 1 Tag 1 “Encrypted data”

Tag 2 “Encrypted data”

Tag 3 “Encrypted data”

Challenge

Response


Readers Tags

Registrationprocess

Figure 3 The registration/access control of readers to the APF/tag. O-means access granted X-means accessdenied.

Security implications in RFID and APF 211

the APF. Also, the reader that did not registerwith the APF prior to the time it gets the in-formation from the tag will be denied of gettingthe decryption key, the reason for this is in order tocircumvent malicious readers from accessing tagsillegally. The only disadvantage of this frameworkis that, it is designed for read only RFID system.

In case of reader writing into the tag, thisprocedure can not handle that. The reason forthis is that after the reader gets the decryptionkey from the APF the encrypted information willbe decrypted automatically so there is no needfor the reader to go back to the tag for anyprocess.

Validity and effectiveness of the APF

The validity and effectiveness of the APF is given inFig. 5.

The importance of the APF

i. It prevents malicious readers from reading theinformation in the tags.

ii. It permits consumers their wishes for RFID tagsto remain operative while in their possession.

iii. It also helps to authenticate both tagsand readers that is, it deploys mutualauthentication.

Real world application of the APF system

We are trying to consider a particular area in whichthe APF system can be applied in a real worldsystem. Patient confidential/personal informationcould be the area that APF system can be worldwide applied. Take for example, a patient who hasan RFID tag attached to his hospital card ina particular hospital and his doctor diagnosed hisailment and prescribed some drugs to him and allthis information and other personal information are

Readers

“Challenge”Issue Command to Access

“Response”Get the encrypted data

Register with the APF

Access Granted

Registerdecryption keys with the APF

1

3

4

6

2

5 Request for the access keyAPF

Database

Tags

Figure 4 The flowchart of the APF framework.

stored in the tag of the patient’s card. In this case,the patient needs some level of confidentiality tothe information stored in this tag and this patientwants only his doctor or a specific doctor to knowabout the information stored in this tag. However,since the current RFID system allows any reader tohave access to any tag the patient’s private in-formation stored in the tag could be jeopardized.

Moreover, if the APF system is used, first of all,the patient doctor’s reader will be authenticatedby the APF, this means not just any reader canaccess the patient’s tag. Therefore, maliciousreaders will be denied access to read the storedinformation since, in the APF system any readerthat did not register with the APF prior to accessingthe tag will be denied access to the key that canaccess the tag’s information. This means, in casethe patient wants to change his hospital and doctorit is a matter of informing the APF his intention andthe new doctor and hospital’s reader have toregister with the APF prior to the time the newreader will have access to the patient’s tag.However, the former doctor and hospital reader’sright to the patient’s tag will be denied henceforth.

Looking at this example, we can seethe importance and contribution that the APFsystem can provide and the level of users’ confi-dence that the APF system will guarantee in theRFID system.

Conclusion

In conclusion, information in tags can be protectedfrom being read by unauthorized readers through

Authenticate readers before it can access the APF which means ill-intentioned readers can nit decrypt the information collected from the tag

APF

Tag replies with simulated signals when queriedby reader so that the reader cannot trust thereceived signals, like active jamming - Illegal and dangerous like active jamming

Blocker tag

Physical means of shielding tag from view.- Illegal and dangerous especially in restrictedareas, like hospital and train

Active jamming

Tag shielded from scrutiny.- Thieves using foil-end bag to preventshoplifting

Faraday Cage

Tags to be killed upon purchase- Unworkable or undesirable

Kill Command

Figure 5 The validity and effectiveness of the APF.

212 J. Ayoade

the authentication procedures as we have de-scribed above in the APF system. It is veryimperative to protect unauthorized access to thetag in order to prevent the violation of privacy andconfidential information stored in it. Moreover, theabove framework is a mutual authentication whichmakes it a system that will be able to protectunauthorized or malicious readers from accessingthe information stored in the RFID tags.

References

Adopting fair information practices to low cost RFID sys-tems, <http://www.guir.berkeley.edu/pubs/ubicomp2002/privacyworkshop/papers/UBICOM2002_RFIDv3.doc>.

Allan Alex. RFID and privacy, <http://www.whitegum.com/journal/rfidspch.htm>; November 2003.

Ari Juels, Rivest RL, Szydlo M. The blocker tag: selectiveblocking of RFID tags for consumer privacy, <http://www.rsasecurity.com/rsalabs/staff/bios/ajuels/publications/blocker/blocker.pdf>; 2003.

Berthon Alain. Security in RFID, <http://www.nepc.sanc.org.sg/html/techReport/N327.doc>; July, 2000.

C.A.S.P.I.A.N., <http://www.nocards.org>.E-ZPASS Regional Consortium Service Center, <http://www.

ezpass.com>.Liu Dingzhe, Kobara Kazukuni, Hideki Imai. Pretty-simple

privacy enhanced RFID and its application. In: (SCIS 2004)

The symposium on cryptography and information security,Sendai, Japan; January 2004.

Nakamura Naoshi. Future of the Internet RFID, <http://www.gbde.org/acrobat/rfid03.pdf>.

Ohkubo Miyako, Suzuki Koutarou, Kiinoshita Shingo. Hash-chainbased forward-secure privacy protection scheme for low-cost RFID. In: (SCIS 2004) The symposium on cryptographyand information security, Sendai, Japan; January 2004.

Position statement on the use of RFID on consumer products,<http://www.spychips.org/jointrfid_position_paper.html>;November 2003.

Weis Stephen A. Security and privacy in radio-frequency identi-fication devices, <http://theory.lcs.mit.edu/wSweis/masters.pdf>; May 2003.

Dr. John Ayoade is an expert researcher in the Security Ad-vancement Group of the National Institute of Information andCommunications Technology, Tokyo, Japan.

He obtained his Ph.D. degree in Information Systems underJapanese government scholarship in the Graduate School of In-formation Systems in the University of Electro-Communications,Tokyo, Japan.

Dr. Ayoade’s research work focuses on information and com-munications security and privacy. He has a very wide knowledgein the university training involving lectures and practical in theprinciples and practice of telecommunications and network pol-icies, coupled with the sound theoretical and practical knowl-edge in Computer Science. He has presented and publishedpapers in many conferences and journals, respectively.

Dr. Ayoade is happily married to his loving and caring wifeOluwatomi and they are blessed with a daughter and a son,Opeyemi and Ayodeji, respectively.

http://www.guir.berkeley.edu/pubs/ubicomp2002/privacyworkshop/papers/UBICOM2002_RFIDv3.doc

http://www.guir.berkeley.edu/pubs/ubicomp2002/privacyworkshop/papers/UBICOM2002_RFIDv3.doc

http://www.whitegum.com/journal/rfidspch.htm

http://www.whitegum.com/journal/rfidspch.htm

http://www.rsasecurity.com/rsalabs/staff/bios/ajuels/publications/blocker/blocker.pdf



http://www.nepc.sanc.org.sg/html/techReport/N327.doc

http://www.nepc.sanc.org.sg/html/techReport/N327.doc

http://www.nocards.org

http://www.ezpass.com

http://www.ezpass.com

http://www.gbde.org/acrobat/rfid03.pdf

http://www.gbde.org/acrobat/rfid03.pdf

http://www.spychips.org/jointrfid_position_paper.html

http://theory.lcs.mit.edu/&percnt;7ESweis/masters.pdf

http://theory.lcs.mit.edu/&percnt;7ESweis/masters.pdf



Change trend of averaged Hurst parameterof traffic under DDOS flood attacks

Ming Li*

School of Information Science and Technology, East China Normal University,No. 3663, Zhongshan Bei Road, Shanghai 200026, PR China

Received 22 November 2004; revised 15 November 2005; accepted 15 November 2005

KEYWORDSHurst parameter;Traffic;Time series;Distributed denial-of-service flood attacks;Anomaly detection

Abstract Distributed denial-of-service (DDOS) flood attacks remain great threatsto the Internet though various approaches and systems have been proposed. Be-cause arrival traffic pattern under DDOS flood attacks varies significantly away fromthe pattern of normal traffic (i.e., attack free traffic) at the protected site, anom-aly detection plays a role in the detection of DDOS flood attacks. Hence, quantita-tively studying statistics of traffic under DDOS flood attacks (abnormal traffic forshort) are essential to anomaly detections of DDOS flood attacks.

References regarding qualitative descriptions of abnormal traffic are quite rich,but quantitative descriptions of its statistics are seldom seen. Though statistics ofnormal traffic are affluent, where the Hurst parameter H of traffic plays a key role,how H of traffic varies under DDOS flood attacks is rarely reported. As a supplemen-tary to our early work, this paper shows that averaged H of abnormal traffic usuallytends to be significantly smaller than that of normal one at the protected site. Thisabnormality of abnormal traffic is demonstrated with test data provided by MITLincoln Laboratory and explained from a view of Fourier analysis.ª 2005 Elsevier Ltd. All rights reserved.

Introduction

The Internet is the infrastructure that supportscomputer communications. It has actually becomethe ‘‘electricity’’ of the modern society because

* Tel.: þ86 21 62233389; fax: þ86 21 62232517.E-mail addresses: [email protected], ming_lihk@yahoo.

com.URL: http://www.ee.ecnu.edu.cn/teachers/mli/js_lm(Eng).

htm.


its use in modern society is so pervasive and manypeople rely on it so heavily. For instance, employ-ees in the modern society would rather give upaccess to their telephone than give up their accessto their email. Nevertheless, it is subject toelectronic attacks (Coulouris et al., 2001), e.g.,distributed denial-of-service (DDOS) flood attacks(Sorensen, 2004). The threats of DDOS attacks tothe individuals are severe. For instance, any denial-of-service of a bank server implies a loss of money,disgruntling or losing customers. Hence, intrusion

rved.




mailto:http://www.ee.ecnu.edu.cn/teachers/mli/js_lm(Eng).<?tjl=20mm?><?tjl?>htm

mailto:http://www.ee.ecnu.edu.cn/teachers/mli/js_lm(Eng).<?tjl=20mm?><?tjl?>htm


214 M. Li

detection system (IDS) and intrusion preventionsystem (IPS) are desired (Kemmerer and Vigna,2002; Householder et al., 2002; Schultz, 2004; Sor-ensen, 2004; Gong, 2003; Li, 2004; Streilein et al.,2003; Bencsath and Vajda, 2004; Feinstein et al.,2003; Oh and Lee, 2003; Liston, 2004).

There are several categories of denial-of-service (DOS) attacks (Gong, 2003). The CERT Co-ordination Center (CERT/CC) divides DOS attacksinto three categories: (1) flood (i.e., bandwidth)attacks, (2) protocol attacks, and (3) logical at-tacks. This paper considers flood attacks.

A DDOS flood attack sends attack packets upona site (victim) with a huge amount of traffic, thesources of which are distributed over the world soas to effectively jam its entrance and block accessby legitimate users or significantly degrade itsperformance. It never tries to break into thevictim’s system, making security defenses at theprotected site irrelevant (DDoS; Dittrich-a; Dit-trich-b; Dittrich-c; Dittrich-d; Dietrich et al.;Geng et al., 2002).

Usually, IDSs are classified into two categories.One is misuse detection and the other anomalydetection. Solutions given by misuse detection areprimarily based on a library of known signatures tomatch against network traffic. Hence, unknownsignatures from new variants of an attack mean100% miss. Therefore, anomaly detectors playa role in detection of DDOS flood attacks. As faras anomaly detection is concerned, quantitativelycharacterizing abnormalities of statistics of abnor-mal traffic is fundamental.

A traffic stream is a packet flow. A packetconsists of a number of fields, such as protocoltype, source IP, destination IP, ports, flag setting(in the case of TCP or UDP), message type (in thecase of ICPM), timestamp, and data length (packetsize). Each may serve as a feature of a packet. Theliterature discussing traffic features is rich (seee.g. Li, 2004; Streilein et al., 2003; Bencsath andVajda, 2004; Feinstein et al., 2003; Oh and Lee,2003; Cho and Park, 2003; Cho and Cha, 2004; Lanet al., 2003; Paxson and Floyd, 1995; Li et al.,2003; Beran, 1994; Willinger and Paxson, 1998;Willinger et al., 1995; Csabai, 1994; Tsybakovand Georganas, 1998; MIT; Garber, 2000; Kimet al., 2004; Mahajan et al., 2002; Kim et al.,2004; Bettati et al., 1999). For instance, Mahajanet al. (2002) consider flow rate, Kim et al. (2004)use head message, Oh and Lee (2003) alone consid-er 86 features of traffic (not from a statistics viewthough), and so on. To the best of our knowledge,however, taking into account the Hurst parameterH in characterizing abnormality of traffic series inpacket size under DDOS flood attacks is rarely seen

except for Li (2004), where autocorrelation func-tion (ACF) of traffic series in packet size (trafficfor short) with long-range dependence (LRD) istaken as its statistical feature. As a supplementaryto Li (2004), this paper specifically studies howH of traffic varies under DDOS flood attacks. Inthis regard, the following two questions arefundamental.

(1) Whether H of traffic when a site is under DDOSflood attacks (abnormal traffic for short) is sig-nificantly different from that of normal one(i.e., attack free traffic)?

(2) What is the change trend of H of traffic whena site suffers from DDOS flood attacks?

We will give the answers to the above questionsfrom the point of views of processing data trafficand theoretic inference and analysis.

In the rest of paper, section ‘‘Test data sets’’ isabout test data. We brief data traffic and usea series of normal traffic in ACM to explain howits H normally varies in section ‘‘Brief of datatraffic’’. The answer to the question (1) is givenin section ‘‘Using H to describe abnormality oftraffic under DDOS flood attacks’’. Then, in section‘‘Change trend of traffic under DDOS floodattacks’’, we use a pair of series (one is normaltraffic and the other abnormal one) that is provid-ed by MIT Lincoln Laboratory to demonstrate thataveraged H of abnormal traffic tends to be signifi-cantly smaller than that of normal one and brieflydiscusses this abnormality of abnormal traffic froma view of Fourier analysis. The answer to thequestion (2) is given in this section. Section ‘‘Con-clusions’’ concludes the paper.

Test data sets

Three series of test data are utilized in this paper.The first one is an attack free series measured atthe Lawrence Berkeley Laboratory from 14:00 to15:00 on Friday, 29 January 1994. It is named LBL-PKT-4, which has been widely used in the researchof general (normal traffic) traffic pattern (see e.g.Paxson and Floyd, 1995; Li et al., 2004). We use itto show a case how H of normal traffic varies. Thesecond is Outside-MIT-week1-1-1999-attack-free(OM-W1-1-1999AF for short) (MIT). It was recordedfrom 08:00:02, 1 March (Monday) to 06:00:02, 2March (Tuesday), 1999. The third is Outside-MIT-week2-1-1999-attack-contained (OM-W2-1-1999ACfor short) (MIT), which was collected from08:00:01, 8 March (Monday) to 06:00:49, 9 March(Tuesday), 1999. Two MIT series are used to

Change trend of averaged Hurst parameter of traffic under DDOS flood attacks 215

demonstrate a case how H of traffic varies underDDOS attacks. Though whether or not MIT testdata are in the sense of standardization is worthfurther discussion as stated in McHugh (2000),they are valuable and can yet be test data forthe research of abnormality of abnormal trafficdue to available data traffic under DDOS flood at-tacks being rare.

Brief of data traffic

Denote xðtiÞ a traffic series, indicating the numberof bytes in a packet at time ti, i ¼ 0; 1; 2;.. Froma view of discrete series, we write xðtiÞ as xðiÞ im-plying the number of bytes in the ith packet. LetrðkÞ be the ACF of xðiÞ. Then,

rðkÞwck2H�2 for c> 0; H˛ð0:5;1Þ; ð1Þ

where w stands for the asymptotical equivalenceunder the limit k/N and H the Hurst parameter.

The ACF in Eq. (1) is non-summable forH˛ð0:5; 1Þ, implying LRD. Hence, H is a measureof LRD of traffic.

According to the research in traffic engineering,fractional Gaussian noise (FGN) is an approximatemodel of traffic (Paxson and Floyd, 1995; Li et al.,2003; Beran, 1994; Willinger and Paxson, 1998;Willinger et al., 2002; Li et al., 2004; Paxson,1997; Li and Chi, 2003; Michiel and Laevens, 1997;Adas, 1997; Leland et al., 1994; Beran et al., 1995;Stallings, 1998; Carmona et al., 1999; Pitts andSchormans, 2000; MaDysan, 2000). The ACF of FGNis given by

Rðk;HÞ ¼ 0:5s2hjkþ 1j2H�2jkj2Hþjk� 1j2H

i; ð2Þ

where,

s2 ¼ Gð2�HÞcosðpHÞpHð2H� 1Þ

(Mandelbrot, 2001; Muniandy and Lim, 2001).By taking FGN as an approximate model of xðiÞ,

we consider another series given by xðiÞðLÞ¼1=L

Pðiþ1ÞL�1j¼iL xðjÞ. According to the analysis in self-

similar processes (see e.g. Beran, 1994; Mandelbrot,2001; Beran et al., 1995), one has

Var�xðLÞ�zL2H�2 VarðxÞ;

where Var implies the variance operator. Thus,traffic has the property of self-similarity measuredby H. Consequently, H characterizes the propertiesof both LRD and self-similarity of traffic.

In practice, measured traffic is of finite length.Let x be a series of P length. Divide x into N non-overlapping sections. Each section is divided intoM non-overlapping segments. Divide each segment

into K non-overlapping blocks. Each block is of Llength. Let xðiÞðLÞm ðnÞ be the series with aggregatedlevel L in the mth segment in the nth sectionðm ¼ 0; 1;.;M� 1; n ¼ 0; 1;.;N � 1Þ. Let HmðnÞbe the H value of xðiÞðLÞm ðnÞ. Let rðk; HmðnÞÞ be the

measured ACF of xðiÞðLÞm ðnÞ in the normalized case.Then,

Rðk;HmðnÞÞ¼ 0:5hjkþ 1j2HmðnÞ�2jkj2HmðnÞ

þ jk� 1j2HmðnÞi: ð3Þ

The above expression exhibits the multi-fractalproperty of traffic as that explained from a mathe-matics view (Muniandy and Lim, 2001; Muniandyand Lim, 2000).

Let JðHmðnÞÞ ¼P

k½Rðk; HmðnÞÞ � rðkÞ�2 be thecost function. Then, one has

HmðnÞ ¼ arg min J½HmðnÞ�: ð4Þ

Averaging HmðnÞ in terms of index m yields

HðnÞ ¼ 1

M

XM�1

m¼0

HmðnÞ; ð5Þ

representing the H estimate of the series in the nthsection. In practical terms, a normality assumptionfor HðnÞ is quite accurate in most cases for M > 10regardless of probability distribution function of H(Bendat and Piersol, 1986). Thus,

Hx ¼ E½HðnÞ� ð6Þ

is taken as a mean estimate of H of x, where E isthe mean operator.

Let sH be the standard deviation of HðnÞ. Then,

Prob

�z1�a=2 <

HðnÞ �Hx

sH� za=2

�¼ 1� a;

where ð1� aÞ is the confidence coefficient. Theconfidence interval of HðnÞ with ð1� aÞ confidencecoefficient is given by

�Hx � sHza=2; Hx þ sHza=2

�.

The following demonstration exhibits HðnÞ of traf-fic series LBL-PKT-4.

Demonstration 1: The first 1024 points of theseries xðiÞ of LBL-PKT-4 are indicated in Fig. 1 (a).Consider the first 524 288 (¼P) points of xðiÞ. Thepartition settings are as follows. L ¼ 32; K ¼ 16;M ¼ 32; N ¼ 32; and J ¼ 2048. Computing H ineach section yields HðnÞ as shown in Fig. 1(b). Itshistogram is indicated in Fig. 1(c).

According to Eq. (6), we have Hx ¼ 0:758. Theconfidence interval with 95% confidence level is[0.750, 0.766]. Hence, we have 95% confidenceto say that the H estimate in each section of that

216 M. Li

0 256 512 768 10240

500

1000

i

x(i)

(a)

0 8 16 24 320.7

0.78

0.85

n

H(n

)

(b)

0 0.25 0.5 0.75 10

0.5

1

H

His

t[H

(n)]

(c)

Figure 1 Demonstrating statistical invariable H. (a) A real-traffic time series; (b) estimate HðnÞ; (c) histogramof HðnÞ.

series takes Hx ¼ 0:758 as its approximation withthe fluctuation not greater than 7:431� 10�3.

Using H to describe abnormality oftraffic under DDOS flood attacks

From the previous discussions, we see that H isa parameter to characterize the properties ofboth LRD and self-similarity of traffic. On the otherhand, ACF is a statistical feature of a time series,which is used in queuing analysis of network sys-tems (Livny et al., 1993; Li and Hwang, 1993).Hence, the following lemma.

Lemma: Let x and y be normal traffic and abnor-mal traffic, respectively. Let rxx and ryy be theACFs of x and y, respectively. During the transi-tion process of DDOS flood attacking,

��ryy � rxx

��is noteworthy (Li, 2004).

Proof: A network system is a queuing system.Arrival traffic x of a queuing system has its sta-tistical pattern rxx (Livny et al., 1993; Li andHwang, 1993). Suppose the site suffers fromDDOS flood attacks. Suppose that

��ryy � rxx

�� isnegligible in this case. Then, the site would beoverwhelmed at its normal state even if therewere no DDOS flood packets. This is an obviouscontradiction. ,

For each value of H˛ð0:5; 1Þ, there is exactlyone ACF of FGN with LRD as can be seen from

Beran (1994, p. 55). Thus, a consequence of Lemmais that

��Hy � Hx

�� is considerable, where Hy and Hx

are average H values of x and y, respectively.Hence, H is a parameter that can yet be used todescribe abnormality of traffic under DDOS floodattacks. This gives the answer to the question (1)in Section ‘‘Introduction’’.

Change trend of H of traffic underDDOS flood attacks

Demonstrations

This subsection gives two demonstrations of HðnÞ.One is for normal traffic and the other abnormalone. Two demonstrations show that average valueof H of abnormal traffic tends to be significantlysmaller than that of normal one.

Demonstration 2 (attack free traffic): The first1024 points of the series xðiÞ of attack free trafficOM-W1-1-1999AF are indicated in Fig. 2(a). Its HðnÞis plotted in Fig. 2(b) and histogram in Fig. 2(c).

By computation, we obtain

Hx ¼ 0:895; ð7Þ

its variance¼ 5.693� 10�4, and the confidenceinterval with 95% confidence level [0.865, 0.895].

Demonstration 3 (abnormal traffic): The first1024 points of the series xðiÞ of attack containedtraffic OM-W2-1-1999AC are indicated in Fig. 3(a).


0 256 512 768 10240

1000

2000

i

x(i)

(a)

0 8 16 24 320.8

0.85

0.9

0.95

n

H(n

)

0.5 0.6 0.7 0.8 0.9 10

0.25

0.5

0.75

1

Hhi

st[H

(n)]

(b) (c)

Figure 2 Demonstrating HðnÞ of attack free traffic OM-W1-1-1999AF. (a) Time series of OM-W1-1-1999AF; (b)estimate HðnÞ of OM-W1-1-1999AF; (c) histogram of HðnÞ of OM-W1-1-1999AF.

Its HðnÞ is plotted in Fig. 3(b) and histogram inFig. 3(c).

By computation, we obtain

Hy ¼ 0:774; ð8Þ

its variance¼ 6.777� 10�4, and the confidenceinterval with 95% confidence level [0.723, 0.825].

Comparing the means of H in the above twodemonstrations, we see

Hy < Hx: ð9Þ

The above inequality exhibits a case of the changetrend of H of traffic under DDOS flood attacks. Itactually follows a general rule as can be seenfrom the following analysis.

0 256 512 768 10240

1000

2000

i

y(i)

(a)

0 8 16 24 320.7

0.75

0.8

0.85

n

H(n

)

0.5 0.6 0.7 0.8 0.9 10

0.25

0.5

0.75

1

H

hist

[H(n

)]

(b) (c)

Figure 3 Demonstrating HðnÞ of abnormal traffic OM-W2-1-1999AC. (a) Time series of OM-W2-1-1999AC; (b) estimateHðnÞ of OM-W2-1-1999AC; (c) histogram of HðnÞ of OM-W2-1-1999AC.

218 M. Li

Analysis of change trend of H of trafficunder DDOS flood attacks

In the case of multi-fractional FGN, we letH represent the mean estimate of the Hurstparameter as that in Eq. (6) for the sake ofsimplicity. As

0:5hðtþ 1Þ2H�2t2H þ ðt� 1Þ2H

i

is the finite second-order difference of 0:5t2H

(Beran, 1994; Mandelbrot, 2001; Li and Chi, 2003;Caccia et al., 1997), approximating it with 2-orderdifferential of 0:5t2H yields

0:5hðtþ 1Þ2H�2t2H þ ðt� 1Þ2H

izHð2H� 1Þt2H�2:

ð10Þ

In the domain of generalized functions (Lighthill,1958, p. 43), we obtain

F�jtj�ð2�2HÞ

�¼ 2cos

pð2H� 1Þ2

ð2H� 2Þ!juj�ð2H�1Þ;

ð11Þ

where F is the operator of the Fourier transform.As known, the frequency bandwidth of x is the

width of its power spectrum S(u), which is usuallyexplained in the sense of the maximum effectivefrequency in engineering (Stalling, 1994). Hence,the following is a consequence of Eq. (11).

Corollary: Let B1 and B2 be the bandwidths ofLRD FGN x1 and x2, respectively. Let mean esti-mates of H of x1 and x2 be H1 and H2, respectively.Then, H2 < H1 if B2 > B1.

As known, the data rate of abnormal traffic isusually greater than that of attack free traffic(Garber, 2000). Hence, the bandwidth of abnormaltraffic is wider than that of attack free traffic ac-cording to the relationship between data rate andbandwidth (Stalling, 1994). Then, according to Cor-ollary, we see that average H of abnormal traffic issmaller than that of attack free traffic, giving theanswer to the question (2) in section‘‘Introduc-tion’’. Eq. (9) is a case about this rule. As the largerthe H the stronger the LRD as well as self-similarity(Beran, 1994; Mandelbrot, 2001), we note that LRDand self-similarity of abnormal traffic becomeweaker than those of attack free traffic.

In passing, Corollary gives the reason why Li(2004) designs the case study by assigning abnor-mal traffic’s Hs smaller than that of normal one.

Conclusions

To reveal how a statistical feature of traffic variesunder DDOS flood attacks is crucial to anomalydetection of DDOS flood attacks (Liston, 2004). Asthe Hurst parameter H (or equivalently autocorre-lation function (Li, 2004; Paxson and Floyd, 1995;Li et al., 2003; Beran, 1994; Willinger and Paxson,1998; Willinger et al., 1995; Tsybakov and Georganas,1998; Willinger et al., 2002; Li et al., 2004;Mandelbrot, 2001; Paxson, 1997; Li and Chi,2003; Michiel and Laevens, 1997; Adas, 1997;Leland et al., 1994; Beran et al., 1995; Stallings,1998; Carmona et al., 1999; Pitts and Schormans,2000; MaDysan, 2000; Mandelbrot, 1971) plays akey role in traffic analysis, this paper aims atrevealing how H varies under DDOS flood attacks.We have explained that average H of abnormaltraffic significantly differs from that of normal oneas a consequence of Lemma, where H representsmean estimate in the case of multi-fractionalseries. We have given a corollary to show thataverage H of abnormal traffic is smaller thanthat of normal one. The results in theory aredemonstrated and also validated with the testdata provided by MIT Lincoln Laboratory.

Acknowledgement

This work was supported in part by the NationalNatural Science Foundation of China under theproject grant number 60573125. MIT Lincoln Lab-oratory is highly appreciated.

References

Adas A. Traffic models in broadband networks. IEEE Communi-cations Magazine 1997;35(7):82e9.

Bencsath B, Vajda I. Protection against DDoS attacks based ontraffic level measurements. In: International symposium oncollaborative technologies and systems. Waleed W. Smari,William McQuay; 2004. p. 22e8.

Bendat JS, Piersol AG. Random data: analysis and measurementprocedure. 2nd ed. John Wiley & Sons; 1986.

Beran J, Shernan R, Taqqu MS, Willinger W. Long-range depen-dence in variable bit-rate video traffic. IEEE Transactions onCommunications FebruaryeApril 1995;43(2e4):1566e79.

Beran J. Statistics for long-memory processes. Chapman & Hall;1994.

Bettati R, Zhao W, Teodor D. Real-time intrusion detection andsuppression in ATM networks. In: Proceedings of the firstUSENIX workshop on intrusion detection and network moni-toring; April 1999.

Caccia DC, Percival D, Cannon MJ, Raymond G,Bassingthwaighte JB. Analyzing exact fractal time series:evaluating dispersional analysis and rescaled range methods.Physica A 1997;246(3e4):609e32.


Carmona R, Hwang W-L, Torresani B. Practical time-frequencyanalysis: Gabor and wavelet transforms with an implementa-tion in S. Academic Press; 1999. p. 244e7.

Cho S, Cha S. SAD: web session anomaly detection based onparameter estimation. Computers & Security 2004;23(4):312e9.

Cho S-B, Park H-J. Efficient anomaly detection by modeling priv-ilege flows using hidden Markov model. Computers & Security2003;22(1):45e55.

Coulouris G, Dollimore J, Kindberg T. Distributed systems:concepts and design. 3rd ed. Addison-Wesley; 2001.

Csabai I. 1/f noise in computer network traffic. Journal of Phys-ics A: Mathematical and General 1994;27(12):L417e21.

Dataare available from:<http://www.acm.org/sigcomm/ITA/>.Distributed denial of service (DDoS) attacks/tools, <http://

staff.washington.edu/dittrich/misc/ddos/>.Dietrich S, Long N, Dittrich D. An analysis of the ‘Shaft’ distrib-

uted denial of service tool, <http://www.adelphi.edu/wspock/shaft_analysis.txt>.

Dittrich D. The DoS project’s ‘Trinoo’ distributed denial ofservice attack tool, <http://staff.washington.edu/dittrich/misc/trinoo.analysis> (Dittrich-a).

Dittrich D. The ‘Tribe Flood Network’ distributed denial ofservice attack tool, <http://staff.washington.edu/dittrich/misc/tfn.analysis.txt> (Dittrich-b).

Dittrich D. The ‘Stacheldraht’ distributed denial of serviceattack tool, <http://staff.washington.edu/dittrich/misc/stacheldraht.analysis.txt> (Dittrich-c).

Dittrich D. The ‘Mstream’ distributed denial of service attacktool, <http://staff.washington.edu/dittrich/misc/mstream.analysis.txt> (Dittrich-d).

Feinstein L, Schnackenberg D, Balupari R, Kindred D. Statisticalapproaches to DDoS attack detection and response. In:DARPA information survivability conference and exposi-tion, vol. I, April 22e24, 2003. Washington, DC; 2003.p. 303e14.

Garber L. Denial-of-service attacks rip the Internet. ComputerApril 2000;33(4):12e7.

Geng X, Huang Y, Whinston AB. Defending wireless infrastruc-ture against the challenge of DDoS attacks. Mobile Networksand Applications 2002;7:213e23.

Gong F. Deciphering detection techniques: part III denial ofservice detection. White Paper. McAfee Network SecurityTechnologies Group; January 2003.

Householder A, Houle K, Dougberty C. Computer attack trendschallenge Internet security. Supplement to Computer. IEEESecurity & Privacy April 2002;35(4):5e7.

Kemmerer RA, Vigna G. Intrusion detection: a brief history andoverview. Supplement to Computer. IEEE Security & PrivacyApril 2002;35(4):27e30.

Kim SS, Reddy ALN, Vannucci M. Detecting traffic anomalies atthe source though aggregate analysis of packet headerdata. In: Proceedings of Networking 2004. LNCS, vol. 3042,Athens, Greece; May 2004. p. 1047e59.

Kim Y, Lau WC, Chuah MC, Chao HJ. PacketScore: statistics-based overload control against distributed denial-of-serviceattacks. In: IEEE Infocom 2004, Hong Kong; 2004.

Lan K, Hussain A, Dutta D. Effect of malicious traffic on the net-work. In: Proceedings of passive and active measurementworkshop, April 2003, La Jolla, California; 2003.

Leland E, Taqqu M, Willinger W, Wilson DV. On the self-similarnature of ethernet traffic, (extended version). IEEE/ACMTransactions on Networking February 1994;2(1):1e15.

Li Ming, Chi C-H. A correlation-based computational methodfor simulating long-range dependent data. Journal of theFranklin Institute SeptembereNovember 2003;340(6e7):503e14.

Li S-Q, Hwang C-L. Queue response to input correlation func-tions: continuous spectral analysis. IEEE/ACM Transactionson Networking December 1993;1(6):678e92.

Li Ming, Zhao W, Jia WJ, Chi C-H, Long DY. Modeling auto-correlation functions of self-similar teletraffic in commu-nication networks based on optimal approximation inHilbert space. Applied Mathematical Modelling 2003;27(3):155e68.

Li Ming, Chi C-H, Long DY. Fractional Gaussian noise: a tool ofcharacterizing traffic for detection purpose. In: Content com-puting LNCS, vol. 3309. Springer; November 2004. p. 94e103.

Li Ming. An approach for reliably identifying signs of DDoS floodattacks based on LRD traffic pattern recognition. Computers& Security 2004;23(7):549e58.

Lighthill MJ. An introduction to Fourier analysis and generalisedfunctions. Cambridge University Press; 1958.

Liston K. Intrusion detection FAQ: can you explain traffic analy-sis and anomaly detection? <www.sans.org/resources/idfaq/anomaly_detection.php>; 6 July, 2004.

Livny M, Melamed B, Tsiolis AK. The impact of autocorrelation onqueuing systems. Management Science 1993;39:322e39.

MaDysan D. QoS & traffic management in IP & ATM networks.McGraw-Hill; 2000.

Mahajan R, Bellovin S, Floyd S, Ioannidis J, Paxson V, Shenker S.Controlling high bandwidth aggregates in the network.Computer Communications Review July 2002;32(3):62e73.

Mandelbrot BB. Fast fractional Gaussian noise generator. WaterResources Research 1971;7(3):543e53.

Mandelbrot BB. Gaussian self-affinity and fractals. Springer;2001.

McHugh J. Testing intrusion detection systems: a critique ofthe 1988 and 1999 DARPA intrusion detection system evalua-tions as performed by Lincoln laboratory. ACM Transactionson Information System Security November 2000;3(4):262e94.

Michiel H, Laevens K. Teletraffic engineering in a broad-band era.Proceedings of the IEEE December 1997;85(12):2007e33.

<http://www.ll.mit.edu/IST/ideval>.Muniandy SV, Lim SC. On some possible generalizations of frac-

tional Brownian motion. Physics Letters A 2000;266:140e5.Muniandy SV, Lim SC. Modelling of locally self-similar processes

using multifractional Brownian motion of RiemanneLiouvilletype. Physical Review E 2001;63:046104.

Oh SH, Lee WS. An anomaly intrusion detection method by clus-tering normal user behavior. Computers & Security 2003;22(7):596e612.

Paxson V, Floyd S. Wide-area traffic: the failure of Poisson mod-eling. IEEE/ACM Transactions on Networking June 1995;3(3):226e44.

Paxson V. Fast, approximate synthesis of fractional Gaussiannoise for generating self-similar network traffic. ComputerCommunications Review October 1997;27(5):5e18.

Pitts JM, Schormans JA. Introduction to IP and ATM design andperformance: with applications and analysis software.John Wiley; 2000. p. 287e93.

Schultz E. Intrusion prevention. Computers & Security 2004;23(4):265e6.

SorensenS.Competitiveoverviewof statisticalanomalydetection.White Paper. Juniper Networks Inc., www.juniper.net; 2004.

Stalling W. Data and computer communications. 4th ed.Macmillan; 1994.

Stallings W. High-speed networks: TCP/IP and ATM designprinciples. Prentice Hall; 1998 [chapter 8].

Streilein WW, Fried DJ, Cunninggham RK. Detecting flood-baseddenial-of-service attacks with SNMP/RMON. In: Workshop onstatistical and machine learning techniques in computerintrusion detection. September 24e26, 2003. George MasonUniversity; 2003.

http://www.acm.org/sigcomm/ITA/

http://staff.washington.edu/dittrich/misc/ddos/

http://staff.washington.edu/dittrich/misc/ddos/

http://www.adelphi.edu/&percnt;7Espock/shaft_analysis.txt

http://www.adelphi.edu/&percnt;7Espock/shaft_analysis.txt

http://staff.washington.edu/dittrich/misc/trinoo.analysis

http://staff.washington.edu/dittrich/misc/trinoo.analysis

http://staff.washington.edu/dittrich/misc/tfn.analysis.txt

http://staff.washington.edu/dittrich/misc/tfn.analysis.txt

http://staff.washington.edu/dittrich/misc/stacheldraht.analysis.txt

http://staff.washington.edu/dittrich/misc/stacheldraht.analysis.txt

http://staff.washington.edu/dittrich/misc/mstream.analysis.txt

http://staff.washington.edu/dittrich/misc/mstream.analysis.txt

http://www.sans.org/resources/idfaq/anomaly_detection.php

http://www.sans.org/resources/idfaq/anomaly_detection.php

http://www.ll.mit.edu/IST/ideval

http://www.juniper.net

220 M. Li

Tsybakov B, Georganas ND. Self-similar processes in communi-cations networks. IEEE Transactions on Information TheorySeptember 1998;44(5):1713e25.

Willinger W, Paxson V. Where mathematics meets the Internet.Notices of the American Mathematical Society August 1998;45(8):961e70.

Willinger W, Taqqu MS, Leland WE, Wilson DV. Self-similarity inhigh-speed packet traffic: analysis and modeling of ethernettraffic measurements. Statistical Science 1995;10(10):67e85.

Willinger W, Paxson V, Riedi RH, Taqqu MS. Long-range dependenceand data network traffic. In: Doukhan P, Oppenheim G,TaqquMS,editors.Long-rangedependence: theoryandapplica-tions. Birkhauser; 2002.

Ming Li completed his undergraduate program in electronicengineering at Tsinghua University. He received the M.S. degreein mechanics from China Ship Scientific Research Center andPh.D. degree in Computer Science from City University ofHong Kong, respectively. In March 2004, he joined East ChinaNormal University (ECNU) as a professor after several years’ ex-periences in National University of Singapore and City Universityof Hong Kong. He is currently a Division Head for Communica-tions & Information Systems at ECNU. His current researchinterests include teletraffic modeling and its applications toanomaly detection and guaranteed quality of service, fractaltime series, testing and measurement techniques. He has pub-lished over 50 papers in international journals and internationalconferences in those areas.



An empirical examination of the reverseengineering process for binary files

Iain Sutherland a,*, George E. Kalb b, Andrew Blyth a, Gaius Mulley a

a School of Computing, University of Glamoran, Treforest, Wales, UKb The Johns Hopkins University, Information Security Institute Baltimore, Maryland, USA


KEYWORDSReverse engineering;Software protection;Process metrics;Binary code;Complexity metrics

Abstract Reverse engineering of binary code file has become increasingly easierto perform. The binary reverse engineering and subsequent software exploitationactivities represent a significant threat to the intellectual property content of com-mercially supplied software products. Protection technologies integrated withinthe software products offer a viable solution towards deterring the software exploi-tation threat. However, the absence of metrics, measures, and models to charac-terize the software exploitation process prevents execution of quantitativeassessments to define the extent of protection technology suitable for applicationto a particular software product. This paper examines a framework for collectingreverse engineering measurements, the execution of a reverse engineering exper-iment, and the analysis of the findings to determine the primary factors that affectthe software exploitation process. The results of this research form a foundation forthe specification of metrics, gathering of additional measurements, and develop-ment of predictive models to characterize the software exploitation process.ª 2005 Elsevier Ltd. All rights reserved.

Introduction

Deployed software products are known to besusceptible to software exploitation through re-verse engineering of the binary code (executable)files. Numerous accounts of commercial companiesreverse engineering their competitor’s product,for purposes of gaining competitive advantages,have been published (Bull et al., 1995; Chen, 1995;

* Corresponding author.E-mail address: [email protected] (I. Sutherland).


Tabernero, 2002). Global movement towards theuse of industrial standards, commercially suppliedhardware computing environments, and commonoperating environments achieves software engi-neering goals of interoperability, portability, andreusability. This same global movement results ina reduced cost of entry for clandestine softwareexploiters to successfully reverse engineer a binarycode file. A software exploiter, with rudimentaryskills, possesses a threat to recently deployedcommercial software product because (1)machine-code instruction set and executable file

rved.



222 I. Sutherland et al.

formats (Tilley, 2000) are routinely published, (2)hex editors, dissemblers, software in-circuit emu-lators tools are readily available via Internet sour-ces, and (3) similar attack scenarios involvingreverse engineering of binary code files are readilyaccessible through numerous hacking websites.There are also legitimate reasons for reverse engi-neering code in such cases as legacy systems (Mull-er et al., 2000; Cifuentes and Fitzgerald, 2000) andso there is a body of published academic material(Weide et al., 1995; Interrante and Basrawala,1988; Demeyer et al., 1999; Wills and Cross,1996; Gannod et al., 1988) to which a software ex-ploiter could refer although the main focus of thiseffort is at source code level (Muller et al., 2000).

The commercial software product developer isforced to employ various protection technologies toprotect both the intellectual property content andthe software development investment representedby the software asset to be released into themarketplace. The commercial software product de-veloper must determine the appropriate protectiontechnologies that are both affordable and supplyadequate protection against the reverse engineeringthreat for a desired period of performance.

The absence of predictive models that charac-terize the binary reverse engineering softwareexploitation process precludes an objective andquantitative assessment of the time since firstrelease of the software asset to when softwareexploitation is expected to successfully extractuseful information content. Similar to parametricsoftware development estimation models (e.g.,COCOMO), the size and complexity of the binarycode file to be reverse engineered are consideredto be a prime contributing factor to the time andeffort required to execute the reverse engineeringactivity. Additionally, the skill level of the soft-ware exploiter is also considered to be a primarycontributing factor. This paper describes the exe-cution of an experiment to derive empirical datathat will validate a set of proposed attributes thatare believed to be the primary factors affectingthe binary reverse engineering process.

Background

An insider is assumed to have access to develop-mental information resources pertaining to thecommercial software product including the prod-uct source code. An outsider does not have accessto this information and must resort to analysis ofavailable software product resources. Such avail-able software product resources may be littlemore than the binary code file as released from

the original developer. The outsider is forced toexecute a binary reverse engineering activitybeginning with the binary code file and concludingwhen some desired end goal has been achieved.

The entry criterion is defined as the time whenthe outsider first obtains a copy of the binary codefile so as to commence the reverse engineeringprocess. The commercial software product vendormust assume that this entry criterion coincideswith the first market release of the product.

The exit criterion is determined by the time whenthe outsider has satisfied a particular end goal forthe software exploitation process. Unlike softwaredevelopment activities where the singular end goalis to deliver a reasonably well-tested softwareproduct to an end user given the available fundingand schedule resources, binary reverse engineeringactivities may have multiple software exploitationend goals (Kalb). The first software exploitation endgoal is defined as obtaining sufficient information re-garding the software product’s operational function,performance, capabilities, and limitation. Satisfyingthis first software exploitation end goal enables thesoftware exploiter to transfer the information gath-ered to other software products that are either indevelopment or are already deployed. The secondsoftware exploitation end goal builds upon the firstand is defined as enabling minor modifications toalter/enhance the deployed software product. Sat-isfying this second software exploitation end goalenables (1) circumvention of existing performancelimiters and protection technologies to enhancethe operational performance of the deployed soft-ware product, and/or (2) insertion of maliciouscode artefacts to corrupt the execution of the de-ployed software product. The third software exploi-tation end goal builds upon the previous two and isdefined as enabling major modifications to enhancethe operational performance of the deployed soft-ware product. Satisfying this third software exploi-tation end goal enables a significant alteration ofthe deployed software product’s functional andoperational performance characteristics.

Regardless of the particular software exploita-tion end goal to be obtained, the software exploi-tation process must be defined to base a series ofexperiments that will enable the capturing ofmeasurement data. This software exploitationprocess commences when the exploiter acquiresthe binary code file that represents the subject forthe reverse engineering activity. For network-centric computing, this acquisition step is ratherexpediently performed and may be no more effortthan locating the particular executable or load filethat will be the subject of subsequent reverseengineering activities. For commercial software

An empirical examination of the reverse engineering process for binary files 223

products, this acquisition step encompasses thepurchase and installation of the product followedby the selection of a particular executable or loadfile for subsequent reverse engineering activities.Embedded computer systems may require greatereffort during the acquisition step since the binarycode assets must be extracted from internalmemory devices using various attack scenarios(e.g., in-circuit emulators, bus monitors orinvasive memory read-out attacks).

The next step in the software exploitation pro-cess is a static analysis of the binary code file toderive information to support subsequent reverseengineering activities (Kalb). Using a hex editingtool the software exploiter can identify usefultext strings that may encompass library functionnames, symbol table entries, debug messages, errormessages, I/O messages, and/or residual text in-serted by the compilation environment (e.g., com-piler version number, data and time stamps, etc.).The software exploiter can analyze the file headerinformation used during the loading process to ver-ify the binary file format employed (e.g., COFF, ELF,PE, etc.). Knowledge of the binary file formatenables correct navigation through the contentsof the binary code file along with identification ofthe major structural segments contained withinthe binary code file such as the instruction segment.Static analysis may include using a disassembler toproduce human readable assembly code for sec-tions of the instruction segment that may be ana-lyzed to determine functional attributes.

The next step in the software exploitation processis a dynamic analysis of the binary code file toevaluate the operational characteristics of the soft-wareproduct (Kalb). Execution of the binary codefileeither on the actual target processor or within anemulation environment enables the observation ofthe execution behaviour of the software product.Test case inputs can be supplied to stimulate func-tionality within the software product wherein theexecution behaviour may be observed by the soft-ware exploiter. The information gathered throughstatic and dynamic analyses of the binary code fileis sufficient for the software exploiter to achievethe first software exploitation end goal.

Achieving the second or third software exploita-tion end goal requires modification of the softwareproduct. The software exploiter uses the informa-tion gathered through static and dynamic analysesof the software product to determine the natureand location of the desired change/enhancementto be applied to the software product. The actualapplication of the change/enhancement takes theform of a software patch of the existing binary codefile to alter the execution of the software product.

The extent of the modifications determines whetherit is the second or the third software exploitationend goal that is to be achieved.

Anticipating the software exploitation of thedeployed software product, the commercial soft-ware product developer can perform a vulnerabilityassessment culminating with the selection of ap-propriate tamper resistance technologies to beintegrated into the end product. The vulnerabilityassessment concludes with an estimate of the timesince first deployment of the software product whenit is anticipated that software exploiters would haveachieved one of the software exploitation endgoals. Based upon the estimate of software exploi-tation timeline, the commercial software productdeveloper may elect to employ software tamperresistance technology. The application of softwaretamper resistance technology extends the softwareexploitation timeline by increasing the difficultyrelating to reverse engineering of the binary codefile contents.

Experiments have been used in the past toperform both tool assessments and user studies(Cifuentes and Fitzgerald, 2000; Gleason, 1992;Storey et al., 1996). The experiment described inthis paper attempts to determine the primary fac-tors that affect the software reverse engineeringprocess. These primary factors once defined andcharacterized could be used to quantitatively esti-mate the software exploitation timeline diminish-ing the subjectivity that currently dominates theestimation process.

Assertions

Prior to executing the reverse engineering exper-iment, a set of assertions were identified to bevalidated once experimental results had beenobtained. The first assertion was that a statisticalmodel could illustrate the relationship betweeneducation and technical ability of the softwareexploiter and their ability to successfully reverseengineer a software product. The second assertionwas that the complexity of the binary code file isrelated to the complexity of the human readablesource code. The reverse engineering experimentuses the Halstead and McCabe software complexitymetrics to explore this relationship.

Experiment

The reverse engineering experiment requires a setof test subjects to perform a sequence of tasksrelating to the reverse engineering of a set of


binary code files. The test subject’s progress andsuccess during each task are monitored usinga variety of techniques to enable a series ofdeductions to be made concerning the effortrequired to reverse engineer a binary code file ofknown size and complexity. To expediently exe-cute the reverse engineering experiment, eachtask was allotted a specific amount of time. Theprogress of each test subject towards achievingthe task objective is then assessed. This approachavoids the potentially open ended approach ofallowing each test subject to perform the task toa completion criterion consuming as much time asrequired to complete the task.

The set of test subjects included 10 studentvolunteers attending the University of Glamorgan.This included six undergraduates (three second-year students and three third-year students),three masters students, and one post-mastersstudent providing diversity in the education/tech-nical skills suitable for experimental requirements.Prior to the commencement of the experiment thetest subjects were informed that the nature of theexperiment related to reverse engineering ofexecutable programs that contained simple algo-rithms. The test subjects were provided a readinglist and a copy of the platform used (Redhat 7.2GNU/Linux) along with documentation.

The reverse engineering experiment is parti-tioned into three stages that include an initialassessment of the test subject’s knowledge/skillbase, execution of the reverse engineering taskson a set of test objects, and a post-experimentassessment to obtain feedback on the experiment.A set of six test object programs were developedthat included (1) Hello World, (2) Date, (3) BubbleSort, (4) Prime Number, (5) LIBC, and (6) GCD(Table 1). The test object programs were purposelyselected to be easily recognizable algorithms,approximately same size to afford reasonable re-verse engineering progress given a restrictiveamount of time, and absence of proprietary soft-ware elements to avoid legal infringements associ-ated with reverse engineering of binary code files.A subset of the six test object programs werecompiled with the debug option enabled (ProgramSet A) while another subset of the six test objectswere compiled with the debug option disabled(Program Set B). This approach provides the testsubjects the opportunity to reverse engineer thesame test object thereby enabling the assessmentof the value that debug information retained in thebinary code file adds to the reverse engineeringprocess.

The initial assessment of the test subject’sknowledge/skill base requires each test subject

to complete a questionnaire. The questionnaireinquired as to the number of years of experiencethe test subject possessed regarding UNIX and theC programming language. The majority of testsubjects had at least one year’s experience withUNIX and the C programming language. The ques-tionnaire also included a series of multiple choicequestions. The multiple choice questions focusedon UNIX commands relating to reverse engineeringto provide an assessment of the test subject’s levelof experience/capability.

The execution of the reverse engineering ex-periment required each test subject to performa static, dynamic, and modification task on each ofthe test object programs within a constrained timelimit. Test object filenames were selected so asnot to reveal the function of the binary. Each testsubject was supplied with a tutorial worksheetthat provided general guidance during each spe-cific task. For example, the static task tutorialworksheet requested each test subject to deter-mine the size of the binary, determine the creationtime of the binary, speculate as to the type ofinformation contained in the file, identify allstrings and any constants present in the execut-able, and generate the assembly language for theprogram. The dynamic task tutorial worksheetrequested each test subject to determine if anyinput is required by the binary, describe the outputproduced by the binary, identify any command linearguments required by the binary, and describethe function/purpose of the binary. The modifytask tutorial worksheet requested each test sub-ject to perform a specific modification to the testobject program that requires the development andinsertion of a software patch to the binary codefile. For example, the test subjects were re-quested to modify the Hello World binary so thatupon execution the program would output ‘‘WorldHello’’ or to modify the Bubble Sort binary so thatupon execution the program sorts in descendingrather than ascending order. During the timeallotted for each task the test subjects wererequired to perform the work requested and re-cord their findings on the tutorial worksheetsprovided for that task. Upon expiration of theallotted time the tutorial worksheets were col-lected and replaced with the next tutorial work-sheet in the experiment.

Test subjects were provided with Program Set Aduring the morning session of the reverse engi-neering experiment. Experiment developers werepresent to observe the execution of the experi-ment and to observe any interactions between testsubjects. Test subjects were allowed to interactduring the lunchtime break since it was decided


Table 1 Reverse engineering experiment framework

Session Event Testobject

Programfunction

Task Duration(min)

Totalduration (min)

Morningsession

Initial assessmentProgram Set A(debug option enabled)

1 Hello World Static 15 35Dynamic 10Modify 10

2 Date Static 10 30Dynamic 10Modify 10

3 Bubble Sort Static 15 45Dynamic 15Modify 15

4 Prime Number Static 15 45Dynamic 15Modify 15

Lunch

Afternoonsession

Program Set B(debug option disabled)

5 Hello World Static 10 30Dynamic 10Modify 10

6 Date Static 10 30Dynamic 10Modify 10

7 GCD Static 15 45Dynamic 15Modify 15

8 LIBC Static 15 45Dynamic 15Modify 15

Exit questionnaire

that some limited collaboration on experimentalresults would emulate real world conditions pres-ent during actual software exploitation activities.Test subjects were provided with Program Set Bduring the afternoon session of the reverse engi-neering experiment. The experiment developerswere again present to observe any interactionsbetween test subjects.

To further observe the test subject activitiesduring the execution of the reverse engineeringexperiment, the test developers employed anautomated screen capture tool (Camtasia) to pro-vide a permanent record of activities. The reverseengineering experiment platform involved anIntel-based computer executing Linux Redhat 7.2within a VMWare virtual environment hosted onWindows NT4. This enabled the complete experi-mental environment to be retained for futureanalysis and included Bash histories of commandline instructions, and all temporary and historyfiles arising from Internet accesses. The screencaptures, Bash histories, temporary and historyfiles coupled with the initial questionnaire andtutorial worksheets, provide a detailed accountingof the test subject activities.

At the completion of Program Set B the testsubjects were provided an exit questionnaire toenable post-experiment assessment. The exitquestionnaire assessed the amount of materialssupplied on the reading list that were actually usedby test subjects during the experiment along withgeneral comments pertaining to the various stagesof the reverse engineering experiment.

Results

The measurements collected during the reverseengineering experiment are analyzed to validatethe two assertions defined in the beginning of thispaper (section Assertions).

Education/technical ability

The first assertion to be validated by the experi-mental results concerned whether the use ofa statistical model could illustrate the relationshipbetween education and technical ability of thesoftware exploiter and their ability to successfullyreverse engineer a software product. This assertion


is validated through analysis of the initial question-naire and tutorial worksheet responses. The edu-cation/technical ability (Fig. 1, ‘ability’) is derivedfrom the initial questionnaire responses for eachtest subject and is normalized to values between0 and 3 (Table 2) based on their experience withoperating systems, platforms, and the range ofcommands used during the reverse engineeringexperiment. The ability to successfully reverseengineer a software product (Fig. 1, ‘score’) is de-rived from the tutorial worksheet responses foreach test subject and is normalized by applyinga consistent grading scheme per question response(Table 2) then averaging over all of the responses(3 tasks� 8 test objects) for that particular testsubject. The education/technical ability and theability to successfully reverse engineer a softwareproduct values are plotted against the test sub-ject’s identification number. Although the two

2.5

2N

orm

alized

D

ata

1.5

Score

1

0.5

0

Test Subject

1 2 3 4 5 6 7 8 9 10

Ability

Figure 1 Grading scheme used to normalize responses.

Table 2 Grading scheme used to normalizeresponses

Grade Description

0 The test subject has failed toanswer the questions, or theanswer is completely incorrect.

1 The test subject has failed to demonstratean adequate understanding of the problem.There is some factual information presented,but there may be significant errors. Theanswer provided by the test subjectlacks sustentative matter.

2 Demonstrates an adequate understandingof the major issues and the complexity ofthe issues involved. The answer provided bythe test subject is correct, but it maycontain minor errors.

3 Demonstrates an excellent understandingof the problem and the complexity of theissues involved.

graphs do not coincide one-for-one, a correlationcoefficient of 0.7236642 was computed illustratinga statistically significant relationship between theeducational/technical ability of the software ex-ploiter and their ability to successfully reverse en-gineer the binary code file of a software product.This result provides validation evidence for the firstexperiment assertion.

Complexity/size metric

The second assertion to be validated by theexperimental results concerned the relationshipbetween the complexity of the binary code file tothe complexity of the human readable sourcecode. This assertion is validated through correla-tion of the tutorial worksheet responses (regardingthe reverse engineering of the eight test objects)versus the application of Halstead and McCabemetrics on the human readable source code (sixsoftware programs that when compiled producedthe eight test objects). The tutorial worksheetresponses for the static, dynamic, and modifica-tion tasks were normalized using the gradingscheme (Table 2) then averaged to produce themean grade per test object (3 tasks� 10 test sub-jects). The Halstead and McCabe metrics werecomputed using the source code for each of thetest objects. The mean grade per test object iscorrelated with each of the individual metric itemsto determine the extent of any dependencies(Tables 3 and 4).

The statistical analysis reveals that there are nosignificant positive correlations between thesource code metrics and the ability of the softwareexploiter to successfully reverse engineer a soft-ware product. The lack of correlation illustratesthat source code artefacts that contribute to sizeand complexity metrics do not impact the reverseengineering process applied to binary code files.For example, the amount of branching (decisionpoints) within a source code file is the basis ofthe McCabe cyclomatic complexity metric andhas significant bearing on unit-level testing ofthe software module. Comparatively, branchinginstructions (jump instructions) within a binarycode file are easily disassembled and understoodby the software exploiter.

Conclusion

The reverse engineering experiment as definedwithin this paper represents a framework for theexperimental collection of measurement data in


Table 3 Source code metrics debug enabled

Source program Hello World Date Bubble Sort Prime Number Correlation

Test object 1 2 3 4

Mean gradeper test object

1.483 1.300 0.786 0.867

Metric

Lines of code 6 10 9 21 �0.5802Software lengtha 7 27 14 33 �0.3958Software vocabularya 6 14 11 15 �0.5560Software volumea 18 103 48 130 �0.4006Software levela 0.667 0.167 2.5 0.094 �0.4833Software difficultya 1.499 5.988 5.988 10.638 �0.7454Efforta 27 618 120 1435 �0.3972Intelligencea 12 17 19 15 �0.6744Software timea 0.001 0.001 0.001 0.001 0Language levela 8 2.86 7.68 1.83 0.1909Cyclomatic complexity 1 1 1 3 �0.4802

a Halstead metrics.

a consistent and repeatable fashion. The 10 testsubjects participating in the actual reverseengineering experiment, although representinga relatively small data set, provide the basis ofa preliminary assessment as to the primary factorsthat affect the software reverse engineering pro-cess. The reverse engineering experiment providesquantitative evidence that there is a relationshipbetween the education/technical ability of thesoftware exploiter and their ability to successfullyreverse engineer a software product. This evi-dence provides the foundation for modellingof this relationship using existing predictive mod-els. Development and maturation of a reverseengineering model that characterizes the software

exploitation process will enable commercial soft-ware product developers to quantitatively predictthe time following product deployment when it isanticipated that a software exploiter would haveachieved a given exploitation end goal.

The reverse engineering experiment also pro-vides quantitative evidence that industry acceptedsource code size and complexity metrics are notsuitable for characterizing the size and complexityof binary code files pursuant to estimating thetime required to perform software exploitationactivities. Literary research conducted at thecommencement of this project did not identifybinary size and complexity metrics that could havebeen used instead of the source code size and

Table 4 Source code metrics debug disabled

Source program Hello World Date GCD LIBC Correlation

Test object 5 6 7 8

Mean grade per test object 1.350 1.558 1.700 1.008

Metric

Lines of code 6 10 49 665 �0.3821Software lengtha 7 27 40 59 �0.3922Software vocabularya 6 14 20 21 �0.0904Software volumea 18 103 178 275 �0.4189Software levela 0.667 0.167 0.131 0.134 �0.1045Software difficultya 1.499 5.988 7.633 7.462 0.0567Efforta 27 618 2346 5035 �0.5952Intelligencea 12 17 17 19 �0.1935Software timea 0.001 0.001 0.2 0.4 �0.5755Language levela 8 2.86 2.43 2.3 �0.0743Cyclomatic complexity 1 1 3 11 �0.7844

a Halstead metrics.


complexity metrics. Size and complexity metricsthat directly characterize the binary code filesmust be defined. Such size and complexity metricsare required to support the development of a soft-ware exploitation predictive models a follow-onresearch project has been proposed to definethese metrics and then use the existing reverseengineering experiment framework to gathermeasurements to corroborate the defined metrics.

Acknowledgment

The researchers wish to thank the sponsor of thisproject, who requested to remain anonymous, forthe generous funding of this project and for pro-viding funding for the follow-on research project.

References

Bull TM, Younger EJ, Bennett KH, Luo Z. Bylands: reverseengineering safety-critical systems. In: Proceedings of theinternational conference on software maintenance. IEEEComputer Society Press; October 17e20, 1995. p. 358e66.

Chen Y. Reverse engineering. In: Practical reusable UNIX soft-ware. John Wiley & Sons; 1995.

Cifuentes C, Fitzgerald A. The legal status of reverse engineer-ing of computer software. Annals of Software Engineering2000;vol. 9. Baltzer Science Publishers.

Demeyer S, Ducasse S, Lanza M. A hybrid reverse engineeringapproach combining metrics and program visualisation.Published in. In: The proceedings of working conferenceon reverse engineering 1999. Washington DC, USA: IEEEComputer Society; 1999.

Gannod G, Chen Y, Cheng B. An automated approach forsupporting software reuse via reverse engineering. In: Thir-teenth international conference on automated software en-gineering. IEEE Computer Society Press; 1988. p. 94e104.

Gleason JA. A reverse engineering tool for a computer aidedsoftware engineering (CASE) system. Technical Report. Mas-sachusetts Institute of Technology, Department of ElectricalEngineering and Computer Science; 1992.

Interrante MF, Basrawala Z. Reverse engineering annotated bib-liography technical report. Software Engineering ResearchCentre; 26 January 1988. Number SERC-TR-12-F.

Kalb G. Embedded computer systems e vulnerabilities, intru-sions and protection mechanisms courseware. The JohnsHopkins University Information Security Institute.

Muller H, Jahnke J, Smith D, Storey Margaret-Anne, Tilley S,Wong K. Reverse engineering: a roadmap. In: ACM 2000;2000.

Storey MAD, Wong K, Fong P, Hooper D, Hopkins K, Muller HA. Ondesigning an experiment to evaluate a reverse engineeringtool. Published in. In: The proceedings of working confer-ence on reverse engineering 1996. Washington DC, USA:IEEE Computer Society; 1996.

Tabernero M. Embedded system vulnerabilities & The IEEE1149.1 JTAG standard, <http://www.cs.jhu.edu/wkalb/Kalb_JTAG_page.htm>; February 2002.

Tilley SR. The canonical activities of reverse engineering. An-nals of Software Engineering 2000;vol. 9. Baltzer SciencePublishers.

Weide BW, Heym WD, Hollingsworth JE. Reverse engineering oflegacy code expose. In: Proceedings: 17th international con-ference on software engineering. IEEE Computer SocietyPress/ACM Press; 1995. p. 327e31.

Wills LM, Cross II JH. Recent trends and open issues in reverseengineering. Automated Software Engineering: An Interna-tional Journal July 1996;vol. 3(1/2):165e72. Kluwer Aca-demic Publishers.

Further reading

<http://www.wotsit.org>; July 1996.

Iain Sutherland is a lecturer in the Information Security Re-search Group at the School of Computing, University of Glamor-gan, UK. His main research interests are Information Securityand Computer Forensics. Dr. Sutherland received his Ph.D.from Cardiff University.

George E. Kalb is an Instructor and Institute Fellow at the JohnsHopkins University Information Security Institute, US. His re-search interests are in the domains of binary reverse engineer-ing and tamper resistance technologies. He has a B.A. in Physicsand Chemistry from University of Maryland and an M.S. in Com-puter Science from Johns Hopkins University.

Andrew Blyth is currently the Head of the Information SecurityResearch Group at the School of Computing, University of Gla-morgan, UK. His research interests include network and operat-ing systems security, and reverse engineering. Dr. Blythreceived his Ph.D. from Newcastle University.

Gaius Mulley is a senior lecturer at the University of Glamorgan.He is the author of GNU Modula-2 and the groff html devicedriver grohtml. His research interests also include performanceof micro-kernels and compiler design. Dr. Mulley received hisPh.D. and B.Sc.(Hons) from the University of Reading.

http://www.cs.jhu.edu/&percnt;126kalb/Kalb_JTAG_page.htm

http://www.cs.jhu.edu/&percnt;126kalb/Kalb_JTAG_page.htm

http://www.wotsit.org


A simple, configurable SMTP anti-spam filter: Greylists

Guillermo Gonzalez-Talavan*

Department of Computer Science and Automation, University of Salamanca, Facultad de Ciencias,

Plaza de la Merced, s/n, 37008 Salamanca, Spain


Article history:

Received 27 August 2004

Revised 25 October 2005

Accepted 15 February 2006

Keywords:

Spam

Anti-spam filter

Whitelist

Greylist

UNIX

Sendmail

a b s t r a c t

This paper addresses methods for combating spam, focusing especially on those based on

the economic motivations of unsolicited commercial e-mail. Considering the fact that to

date no machine has passed the Turing test, well-known blacklist and whitelist solutions

can be generalized by greylists. An outline of a simple SMTP anti-spam application follow-

ing these ideas and running on a UNIX machine is offered. Some problems regarding the

application are discussed, together with some of the results obtained after a two-month

test period.


1. Introduction

Spam is the word commonly used to refer to unsolicited com-

mercial e-mail (UCE) or unsolicited bulk e-mail (UBE). As well as

a certain displeasure for users, spam is a waste of money

and Internet resources (Grimes, 2004; Spam). Furthermore,

owing to its content, its distribution methods, and the way it

usually forges its sources, it can be regarded as fraudulent

(Hinde, 2003). Currently, more than half of all circulating Inter-

net e-mails are spam. Forecasts point to an even worse situa-

tion in the future. While the number of legitimate e-mails in

2007 will be the same as now, it is believed that spam will

double (Spam filters, 2004).

There is readily available software to fight spam. Anti-spam

software is in constant evolution, and so are the tools used to

generate it. Indeed, a fight has arisen at the spam battlefield

similar to the one between other computing opponents, such

as viruses and antivirus software. Just as a computer virus

has a life cycle comparable to the life cycle of its biological

counterpart, spam also seems to resemble another type of

biological behaviour: i.e., parasitic behaviour. Spammers eat

Internet resources for their own benefit and give nothing in

return. It is said that if spam keeps expanding at its present

pace, it may well bring Internet to an end, at least as it is known

now. If the parasite comparison is correct, however, this will

not actually happen, since no parasite wishes its host’s death,

which in the long run of course means its own death.

In the following sections several methods for fighting spam

will be discussed. A simple application based on some of them

will be presented. This application is easy to implement on

a UNIX machine and is currently being tested at our

Department.

2. Some methods for combating spam

In order to combat spam, two main independent battlefronts

have been opened: the legal one and the technological one.

* Tel.: þ34 923294500x1302; fax: þ34 923294514.E-mail address: [email protected]




There are supporters of the former (Mertz) as well as of the lat-

ter (Grimes, 2004; Hinde, 2003). However, a problem as multi-

faceted as spam probably requires a combined solution.

With regard to the legal front, some difficulties have been

encountered. They may be due to the international character

of Internet and the lenience of some recently enacted laws

(Asaravala). A severe anti-spam legislation could perhaps

lead to a lack of competitiveness against less scrupulous

neighbouring countries. Opt-in and opt-out models are also be-

ing debated, as well as public registries where the addresses of

people who do not want e-mail marketing are to be included.

Some of these measures may be counter-productive, however,

since spammers can use them maliciously for their own

benefit.

Regarding the technological aspect, several measures have

been devised and put into practice. Among them are the

following:

0) Preventive methods: such as trying to prevent spammers

from including one’s e-mail address in their lists.

1) Blacklists: these are lists of e-mail or machine addresses

from which it is known that spam is sent. They may be

personal or public, local or distributed. When a message

arrives coming from an address or machine listed on the

blacklist, it is rejected.

2) Honeypots: in connection with blacklists, these consist of

invented e-mail addresses. Their aim is to attract as

much spam as possible in order to alert other users or

take further measures. They are based on spam usually

being distributed in bulk. Characteristic features

( fingerprints) are obtained from received messages. User

software connects to the honeypot to find out if the rele-

vant message has already been received there.

3) Whitelists: their operation is the opposite of blacklists.

They consist of a list of addresses from which all mail is

accepted. Mail coming from other addresses is transferred

to a low priority folder (Ookoboiny). A few commercial

implementations are available and some of them are eval-

uated in PC Magazine, 2004.

4) Content filters: these compute a score for each incoming

message as a function of some previously user-estab-

lished criteria. If the score of the message is greater than

a given threshold, the message is considered spam.

5) Bayesian filters (Graham): Statistics about the content of

the message are used for the purpose of being able to clas-

sify it as spam or not. Users must train their filters to make

them ‘‘learn’’ which messages are spam and which are

not. This method is appealing because it is adaptable;

that is, it learns from its user’s concept of spam as more

and more messages are processed.

6) Neural networks (Vinther): if a human being is easily capa-

ble of detecting spam, perhaps artificial intelligence

should be tried out. Although no systems are currently

available commercially, some efforts have been made.

7) Sender Id: this method is devised to get rid of forged sender

information (domain spoofing). It simply asks the presumed

sender domain for IP addresses from which that message

can be sent. The message is considered spam if the e-mail

connection did not come from one of those (Sender ID

Framework).

In order to find out how good a spam detection method is,

it must be considered that two kinds of errors can be pro-

duced: a message is classified as spam and it is not (a false

positive), or, the message is classified as correct and it is

spam (a false negative). Depending on the type of user, it

may be necessary for one of these errors to be minimized or

even abolished.

No single method can be considered as the universal

panacea against spam; each has its own problems. In black-

lists, a legitimate user belonging to a spammers’ domain

may be misjudged. Whitelists are not suitable when the ma-

jority of good e-mails come from unknown people who are

sending their first e-mail to that address. Users of filters or

neural networks lack control over their operations and the

message must be received in its entirety before a decision

is made.

Another important aspect is the policy to be followed with

the spam detected. Anti-spam software usually moves it to

a different folder or marks it. This is a disadvantage since

spammers, who do not get the message bounced back, can

think that at least their messages are reaching their destina-

tion and that there is an active address there. If an alternative

policy is decided on and the messages are rejected even before

retrieving their body, there is no chance of rescuing false

positives.

For a complete review of the techniques and tools currently

used by spammers, as well as the sources where they get

e-mail addresses from, Cournane and Hunt (2004) can be

consulted. Some personal statistics about the efficiency of

several anti-spam methods appear in Mertz.

3. Spam on spam

Recently, there has been much debate about the economic as-

pects of fighting spam (McWilliams). Clear evidence in support

of how profitable a spam-based commercial campaign can be

is seen in spam e-mails that advertise spam services. Some-

times, e-mails offering bulk marketing programs via e-mail

are received. Their price ranges between a few dollar cents

and one dollar for each thousand e-mails sent, depending on

whether the spammers are in charge of the design. If one is

interested in merely buying e-mail addresses to send spam

e-mails, the price is in the region of a hundred dollars. It thus

depends on whether the addresses are classified by country,

Internet domain, field of activity, etc. and whether they are

verified (the addresses are not dead). The promised response

rate also ranges between 1% and a more realistic 1/10,000.

Quoting the very spammers’ advertisement: ‘‘You sell a prod-

uct or service for 10 euros. You decide to promote this product

or service on the Internet to 10 million people, only 1% decides

buy your product (sic), do the math and see how much money

you would make. [.] You would make one million euros send-

ing 10 million emails. You understand now why you receive so

much email every day in your mailbox: Advertising on Inter-

net is extremely lucrative..’’ Just as certain animals or plants

produce hundreds of eggs or seeds, a spammer spreads a huge

amount of messages, despite knowing that the vast majority

will not bear fruit.


Other spammers tell us in their advertisements about the

advantages of bulk e-mail marketing:

� Low cost per message

� Quick implementation

� Immediate results

� Market segmentation possibility

� Personalised messages

� Direct contact with clients

� Ability to interact with the recipient

� Almost no restrictions about message size or design

It is clear that this kind of marketing has these advantages

only for rather unknown companies with doubtful reputa-

tions. They certainly will not see or will not mind seeing their

name spoilt by thousands of people with indiscriminately

filled mailboxes.

Spam ‘‘companies’’ also offer premium services, such as:

� Legal advice, to dodge defective anti-spam legislation.

� Bullet-proof web sites: as spammers explain it ‘‘as you already

know, many web hosting companies have Terms of Service

(TOS) or Acceptable Use Policies (AUP) against the delivery of

e-mails advertising or promoting your web site. If your web

site host receives complaints or discovers that your web site

has been advertised in broadcasts, they may disconnect your

account and shut down your web site.’’

� Sorted or classified lists of addresses.

� Software for sending spam.

� Verified addresses.

It so happens that when a client pays money for a number

of e-mail addresses, that person can and will demand working

ones. Therefore, spammers need a certain amount of feed-

back when sending e-mail. This can be accomplished in

several ways:

� With personalised opt-out links. When somebody clicks onto

them, the spammer has proof that the e-mail reached the

recipient.

� Recording the lack of bounced mails. The message headers

are usually forged to make tracing back more difficult. But,

sometimes, exploratory e-mails with correct headers are

sent in the hope they will not be bounced back. In this

case, the address probed is added to the spammers’ list.

� By using an HTML e-mail trick. A common one consists of

placing a personalised URL for an embedded image in

the HTML message (e.g. http://mailcheckisp.biz/load_gifs.

asp?pic¼[email protected]). When the e-mail is shown,

the image is retrieved from the spammers’ server and they

log a hit. This resource is invaluable for efficient spammers

since they know with certainty that their e-mail has broken

all barriers and has been opened by the recipient. The Achil-

les heel results in an annoyed recipient forging the image

link to point to known spammers’ e-mail addresses (those

used for contact information on their advertised web pages).

� Lack of errors in sending connections. Spammers frequently

use customized software to send spam. That software di-

rectly connects to the destination e-mail relay to deliver

mail. If the software gets an error for an address, it will

probably discard it for future uses. The spammer is aware

of the address malfunction regardless of possible forged

headers.

The variety of spam services can sometimes lead to non-

sense: some spammers offer anti-spam filters or the possibil-

ity of removing an address from their lists by spam e-mail.

They are aware that if recipients are reading their mail it is be-

cause their anti-spam defence did not work! Following this

trend, one finds the idea of paying a small amount to spam-

mers for deleting certain e-mail addresses from their lists. I

do not believe this is an adequate solution. It encourages

new spam companies to be born to engage in this type of busi-

ness, currently very profitable in itself.

How can one combat spam from the economic view-

point? By means of cutting off its system source of profit,

the same as one gets rid of biological parasites by preventing

them from assimilating the host’s body substances and fi-

nally making them die of hunger. Essentially, the secret

lies in transferring part or all bulk e-mail expenses to the

sender of spam. Today, these expenses are born by informa-

tion transport companies and eventually passed onto their

Internet subscribers.

Among several proposed solutions based on economic

aspects, one is to place a very small fee (micropayment) on

electronic mail (Grimes, 2004). That micropayment may even

involve computation resources. This electronic franking is

thought to be minimum for a few mails, but unaffordable for

sending millions. Such a solution is severely contested by In-

ternet users, already accustomed to free electronic mail. The

solution also implies a serious handicap for legal companies

who have their clients’ consent to send electronic marketing.

Perhaps this idea could be put into practice only through some

kind of registered premium e-mail service, guaranteed free of

spam.

Other techniques (Miller, 2003) implying economic consid-

erations are indirectly based on the Turing test (Turing, 1950).

At present, there is an imbalance between generating and

fighting spam software. Spammers can send millions of e-

mails with relatively simple software, while users have to

deal with complex statistical or heuristic filters to set the

spam aside. In the worst cases, valuable user time is wasted

in separating the wheat from the chaff. Can these roles be

swapped? Some people have put forward ideas that require

some sort of intelligence on the sender’s part to keep ma-

chines out of the game. These methods are usually combined

with whitelists in the following way: if a message with an un-

known sender is received, that message is bounced back ask-

ing the sender to answer a simple challenge. The challenge

can be as simple as visiting a web page, answering an easy

question or even finding the solution of a popular riddle. The

task is designed to be very easy for a human being, but terribly

challenging for the machine. The answer can be added to the

subject line of the message, for instance see Manes. Thus, if

this becomes a common approach spam companies will

need to recruit new staff to send their mail, loosing competi-

tiveness against other more conventional marketing. Unfortu-

nately, this solution is expected to have limited validity: just

up to the time when a machine might be able to pass the

Turing test.

http://mailcheckisp.biz/[email protected]

http://mailcheckisp.biz/[email protected]


4. Spam filter test with greylists

Considering the above ideas, a low-cost and highly custom-

izable anti-spam application has been developed at the

Computer Science Department at the University of Sala-

manca. To accomplish this, only a web server (Apache) and

an e-mail management program (Sendmail) were needed.

A simple C program and a dozen lines added to the configu-

ration file of Sendmail (sendmail.cf) sufficed for work to

begin.

An SMTP anti-spam barrier was chosen. SMTP (Klensin)

stands for Simple Mail Transfer Protocol. As its name suggests,

SMTP is very straightforward. The simplest of all SMTP work-

ing procedures to deliver mail is shown in Fig. 1. Mail transfer

begins with the recipient machine greeting and introducing

itself. A dialogue with the sender’s machine follows, which

is quite easy to understand. After the sender has issued the

QUIT command and the other part has acknowledged it, the

connection is closed.

It must be remarked that the sender’s machine statements

about its name or the sender’s address may be false. There is

no guarantee that they are real. Once the mail has been pro-

cessed, the stated machine name, along with the IP address

from which the connection was established, will appear at

the Received headers of the e-mail. The rest of the headers

are probably kept unchanged (Fig. 2). It is important to men-

tion that both the origin address (MAIL FROM) and the destina-

tion address (RCPT TO) may differ from the ones at the

headers of the message (From: and To:, respectively). This is

why the former is sometimes known as SMTP envelope ad-

dresses. The specification states that any notification or error

detected once the SMTP connection is closed has to be

addressed to the MAIL FROM address.

The SMTP anti-spam filter developed therefore has to de-

cide whether the connection is good with only two pieces of

information at hand: who the recipient of the message is

(RCPT TO) and from whom it is said to come (MAIL FROM). By

working this way, the e-mail is rejected even before its body

has been transmitted, resulting in server resource saving in

space and time. If the spammer’s software is directly con-

nected to the machine where the application is held, it gets

an immediate feedback. The address to which it is trying to

send the e-mail is not working properly.

Each user belonging to the Department test program can

state both blacklisted sender addresses (immediately rejected)

and whitelisted sender addresses (admitted with no further

checking) by means of a configuration file. Furthermore, the

concept of blacklist and whitelist is extended to that of grey-

list. Each address received at a MAIL FROM statement must

belong to exactly one of the three recipient’s lists: blacklist,

whitelist or greylist. The action taken for either of the first

two is clear, but if the address belongs to the greylist, the

mail is rejected in a standard way with this additional infor-

mation (e.g.): ‘‘[email protected] blocked. Info: http://

tejo.fis.usal.es/wgyermo/as.htm.’’

The web page in the example belongs to the recipient of the

message and that person can change it as s/he likes. The

sender whose address appears in the greylist therefore has

to visit the web page to pass the spam barrier. Instructions

on how to proceed are to be found there. The English content

of the present web page is illustrated in Fig. 3. The challenge in

this case is merely to add a password (þpera) to the login

name of the recipient (gyermo). It is not possible to place the

password in the subject line, as is the custom, because the

message subject is not included in the SMTP envelope data

but in the body of the message. The addresses on the web

page are gif images in order to make their automatic retrieval

more difficult. It is obvious that the web page can be modified

at will to include a more difficult challenge of any sort, pro-

vided that the answer to the challenge is a word to be added

to the login name of the user.

sender.part.com recipient.part.com[[listening onport 25]]

220 recipient.part.com ESMTPHELO marte

250 recipient.part.com welcomes youMAIL FROM: <[email protected]>

250 okRCPT TO: <[email protected]>

250 okDATA

354 go ahead

Hello there....

250 ok 1093222544 qp 18015QUIT

221 nice to see you

Fig. 1 – Simple SMTP transaction.

http://tejo.fis.usal.es/%7Egyermo/as.htm

http://tejo.fis.usal.es/%7Egyermo/as.htm


From [email protected] Mon Oct 10 03:06:43 MDT 2005Received: from marte ([xxx.xxx.xxx.xxx])

by recipient.part.com (8.12.10/8.11.1) with SMTP id yyy for <[email protected]>;Mon, 10 Oct 2005 03:05:54 -0600(MDT)

Date: Mon, 10 Oct 2005 03:05:13 -0600 (MDT)Message-Id: <[email protected]>Subject: A present for youFrom: [email protected]: [email protected]

Hello there...

Fig. 2 – Received message, including headers.

5. User’s lists specification

Users can customize the filter to suit their needs. They must

create a file named ‘‘.blacklist’’ at their home directory. An

example of such a file can be seen in Fig. 4. The file syntax is

very simple. It is a text file with independent entries on differ-

ent lines. Each entry has two parts separated by a colon (‘‘:’’).

The type of entry comes after the colon and can be PASS-

WORD, BLACK, GREY or WHITE. A PASSWORD entry is used

to set the password of the user. BLACK, GREY or WHITE deal

with address lists. When a sender requests the system to de-

liver an e-mail to a local user and the address does not have

a password, the local user’s .blacklist file is scanned sequen-

tially. The first time that the left part of a line matches the

SMTP MAIL FROM address, the right part will show what to do

with the request (i.e. which list it belongs to). If the end of

the file is reached, the address is considered to be GREY. As

can be seen in the figure, regular expressions can be used to

specify addresses. This is not a minor enhancement. Joining

the three lists in a single file and using regular expressions af-

ford the application great flexibility.

For example, one can decide to allow all incoming mail

from the mars.com domain with the line *.mars.com:WHITE. If

later one finds that spam is arriving from [email protected],

inserting the line [email protected]:BLACK before the

first line can solve it. It is quite reasonable to insert all ad-

dresses of the most popular free webmail services into the

greylist, for spammers like to include false return addresses

from them. One only has to add *@popularwebmail.com:GREY

to the .blacklist file, for example, to accomplish this. If,

then, one’s grandmother opens an account with popularweb-

mail, one needs another addition before: granny@popularweb-

mail.com:WHITE. The user maintains absolute control over the

filter. The administrator of the machine is exempt from liabil-

ity in the event of losing an important e-mail due to the filter.

6. Application description

Thanks to the extraordinary configuration possibilities of

Sendmail, the reader can have a working filter of this sort run-

ning at a UNIX machine in a few hours. An updated version of

Sendmail is required. A standard web server is also recom-

mended for the challenges. Only a few lines have to be added

to the configuration file of Sendmail (sendmail.cf is its usual

name). This file is essentially made up of rules. Each rule

has two parts: an input string and an output string. If the input

string matches the left part of the rule, it is rewritten following

the instructions found at the right part of the rule. The rule

Fig. 3 – Current English content of the web page of the challenge.


may be repeated until the input string does not match its left

part. Then, the output string is the resulting input string. If

there is no match, the output string equals the input string.

For further information, readers are referred to Sendmail

documentation (The whole scoop in the configuration file).

Rules are grouped in procedures. Different procedures are

invoked in different parts of the SMTP connection. check_rcpt

is the name of the procedure which is called when an SMTP

RCPT TO address has been received (Sendmail 8.8). In the version

of Sendmail used to develop the test filter, check_rcpt calls an-

other procedure whose name is Local_check_rcpt. These proce-

dures are widely used to avoid mail relaying. However, they can

also be used to implement a spam filter. For example, Fig. 5 il-

lustrates how Local_check_rcpt was used at the test example.

Sendmail must be restarted for the changes to take effect.

A general explanation of Fig. 5 follows. On the first line, the

CheckAS key is defined as an executable program located at /

root/ANTISPAM/antispam. The program takes an argument con-

sisting of the recipient’s login, a colon and what was read from

the SMTP MAIL FROM statement. The program writes on the

standard output BLACK, GREY, WHITE, GOODPASS or BAD-

PASS according to the argument passed. The present version

looks up the user’s .blacklist file as described above. The sec-

ond line builds up a set, whose name is ProgramaAS, where all

local users who the administrator wants to be included in the

anti-spam program appear. In the example of the figure, only

user gyermo is included. Following these two lines, the rele-

vant core of the modifications is shown. When Sendmail

receives the SMTP RCPT TO statement, check_rcpt is invoked,

which, in turn, calls Local_check_rcpt. On the fifth line, the

rule adds what was stated in MAIL FROM plus the word ‘‘local’’

if the mail is for a local user. The sixth line more or less states

that if the receiving user is local and belongs to the anti-spam

program, the CheckAS program will be run with the corre-

sponding argument. The seventh line works the same as the

sixth, but for the password case. The eighth, ninth and tenth

pera:PASSWORD*.mars.com:[email protected]:[email protected]:WHITE*popularwebmail.com:GREY:[email protected]:GREY*[@.]recipient.part.com:WHITE

Fig. 4 – An example of .blacklist file.

lines intercept blacklist, greylist and bad password cases.

They make Sendmail produce an SMTP error. Attempts were

made to ensure that error codes would follow the RFC 3461

specification (Moore). The last line is only reached by the

rest of cases. It leaves the input string as it was at the proce-

dure entrance, being ready for additional rules if necessary.

It may seem strange to produce an error in the RCPT TO

statement provided that the main cause of that error comes

from the previous MAIL FROM statement. The fact is that both

items of information are needed to diagnose the problem.

This issue is reflected in RFC 2821: ‘‘Despite the apparent

scope of this requirement, there are circumstances in which

the acceptability of the reverse-path may not be determined

until one or more forward-paths (in RCPT commands) can be

examined. In those cases, the server MAY reasonably accept

the reverse-path (with a 250 reply) and then report problems

after the forward-paths are received and examined.’’

For simplicity, log functionality has not been considered in

Fig. 5. These logs allow one to check all incoming message ad-

dresses and how Sendmail classified them so that statistics

can be produced. It should also be noted that the method

used to implement the filter is highly inefficient. One process

has to be spawned for each received e-mail message. It is

therefore not suitable for large organizations and is only

presented here for testing purposes.

7. Results

The application was tested for two months on a single account.

Before the test was started, the e-mail account used to receive

an average of 27 spam messages a day: 1647 messages in two

months. After that, only 14 out of the previous 1647 spam mes-

sages reached the user’s inbox. These results were expected

and are compatible with other whitelist-based methods.

Regarding false positives, although they are difficult to de-

tect accurately with the application logs, only four cases were

found during these two months. Two of them correctly visited

the web page and sent the password accordingly. One of the

four users said that he thought the address had problems,

and concerning the last case nothing is known. These cases

cannot be considered statistically representative since there

are few of them and the account is mainly used for Computer

Science academic purposes. Nevertheless, when the program

started, the address was useless for practical purposes and is

now nearly fully functional again.

[01] KCheckAS program /root/ANTISPAM/antispam [02] C{ProgramaAS} gyermo[03][04] SLocal_check_rcpt[05] R$* $: $1 $| $&{rcpt_addr} $| $&{rcpt_mailer}[06] R$* $| $={ProgramaAS} $| local $1 $| $( CheckAS $2:$&{mail_addr} $) [07] R$* $| $={ProgramaAS}+$+ $| local $1 $| $( CheckAS $&{rcpt_addr}:$&{mail_addr} $) [08] R$* $| BLACK $#error $@ 5.7.1 $: "550 " $&{mail_addr} " blacklisted (spam)" [09] R$* $| GREY $#error $@ 5.1.0 $: "550 " $&{mail_addr} " blocked." " Info: http://tejo.fis.usal.es/~gyermo/as.htm" [10] R$* $| BADPASS $#error $@ 5.1.1 $: "550 Wrong or expired password" [11] R$* $| $* $1

Fig. 5 – Additions to sendmail.cf file.


Surprisingly, less spam mail is now caught trying to pass

through the filter. Presently, only an average of 13 spam con-

nection attempts a day are detected (as compared with 27 re-

ceived before). This means that some spammer software

detects some of the bounced messages and decides not to

send any more, at least for some time. This assumption re-

ceives support on considering that at one time spam increased

notably, exactly when the filter was down and messages did not

bounce. On that occasion, the spam level rose to approximately

the values usually detected before the filter was installed.

8. Problems

There is no universal solution to spam. Each user has his/her

own different requirements. Thus, for example, a person

working for an e-commerce site will probably not be willing

to lose a single client due to greylist challenges. In such

a case, that person has to be very careful with SMTP filtering,

if using it at all. Perhaps she/he can filter mails coming from

countries with which his/her company does not have trade

relations and also try a very soft content filter.

Some people are also concerned about losing an important

e-mail because of spam filtering. In a solution such as the one

shown here, a systematic check of rejected addresses is

a must. However, if applications like this become popular, peo-

ple sending e-mail to an address for the first time will gradually

get used to possibly receiving a confirmation request from the

recipient. It may even become to be considered as good e-man-

ners, a kind of introduction perhaps. All this small individual

inconvenience is after all due to some people abusing common

Internet e-mail resources. To draw an analogy, at some time

and place there was no need for closed doors. As time passed,

they had to be closed for security reasons despite the small in-

convenience of knocking before entering.

On the other hand, as stated, no verification of stated MAIL

FROM data is carried out. A spammer can simply lie about it. In

order to be successful, the spammer must find an address on

the recipient’s whitelist. The response to this may consist in

alerting the person whose address has been compromised

and passing the address, at least for some time, to the black-

list. It is a sudden reaction and it requires an also sudden

counter-measure on the spammer’s side. The situation be-

comes balanced out. It is a one to one fight. The spammer

needs human staff to keep the lists working.

But one may guess that the important issue here is not the

security of the SMTP information. The MAIL FROM field can be

understood as a visiting card or as a sort of soft password

for a user’s inbox. All strangers, or at least all who do not

show a known address, become suspects. In agreement with

Dominus today’s experience shows that spammers seldom

give valid return addresses, and even more seldom open

returned mail. At best, their software automatically discards

broken addresses.

Information inside the .blacklist file can also be stolen.

This is certainly a very serious problem and drastic measures

should be taken. It is also true that to do so the spammer

needs to break the security of the host where the lists are

and it may not be worth the effort. If this comes to be a prob-

lem, .blacklist can be coded with a hash function similar

to the one used in UNIX /etc/passwd. The drawback is that

regular expression flexibility would be lost.

Spammers can attack the filter using the null address (<>)

in MAIL FROM. RFC 2821 clearly states that all notification or er-

ror messages must set MAIL FROM to the null address to avoid

loops. Accordingly, it is not advisable to blacklist this address.

However, if spammers use this, as a collateral effect, a sure

method for identifying spam is clear. Nevertheless, spam

messages would share null addresses with legitimate return

messages and this means an additional problem. The solution

may be to place certain codes in all sent messages so that the

software knows when a return message was really sent by us

and is not a fake.

It is fairly convenient to include local domain addresses in

the whitelist. Spammers can effortlessly take advantage of

this. In such a case, and if one wishes to keep the local domain

in the whitelist, one has to reject mail from local users coming

from outside or, alternatively, install a validated SMTP server.

An open SMTP relay must, of course, not be allowed for the

sake of the hygiene of the Internet e-mail system.

One last point claimed against greylist-like methods is that

they are poorly compatible with automatic response e-mail

systems: online stores, mailing lists, etc. Most users can work

with a temporary address, free from spam, for such purposes.

If this is not possible and whitelisting the relevant domain

does not work either, the user can directly register his/her

password address (gyermoþ[email protected] in the exam-

ple) on those services. By making the e-mail address hold the

password instead of the message subject line, an additional ad-

vantage arises. We have plenty of fresh new addresses at our

disposal. If one of them falls, we change it and that’s that.

9. Summary and future work

With little effort and with readily available software, a first

barrier against spam proliferation has been accomplished at

the SMTP connection level. The proposed solution is 100%

compatible with current standards and older software. The re-

sults obtained are promising, especially considering that the

resources required were few. Although the solution has been

tested in a UNIX environment with Sendmail, it can be easily

ported to other operating systems or e-mail access methods,

including webmail.

The solution is highly customizable and is totally under the

user’s control. This relieves the system administrator of work

and liability. If solutions like this become popular, users are

expected to overcome adaptation problems.

There are plans to extend the system to more users and

add new features such as several passwords for the same ac-

count or the possibility of silently discarding e-mails (suitable

against indirect mail-bombing, i.e., mail-bombing based on

bounced e-mail).

Acknowledgements

This work has been partially supported by the Spanish Minis-

terio de Ciencia y Tecnologıa (FEDER funds, grant BFM2002-

00033) and by the Junta de Castilla y Leon (grant SA107/03).


r e f e r e n c e s

Asaravala A, et al. With this law, you can spam. Available from:<http://www.wired.com/news/business/0,1367,62020,00.html>.

Cournane A, Hunt R. An analysis of the tools used for the gen-eration and prevention of spam. Computers & Security 2004;23:154–66.

Dominus MJ. My life with spam: Part 3. Available from: <http://www.perl.com/pub/a/2000/03/spam3.html>.

Graham P. A plan for spam. Available from: <http://www.paulgraham.com/spam.html>.

Grimes GA. Issues with spam. Computer Fraud & Security 2004;5:12–6.

Hinde S. Spam: the evolution of a nuisance. Computers & Security2003;22:474–8.

Klensin J. Simple Mail Transfer Protocol (RFC 2821). Availablefrom: <http://www.ietf.org/rfc/rfc2821.txt>.

Manes S. Kill spam with your own two hands. Available from:<http://www.forbes.com/forbes/2003/0623/136_print.html>.

McWilliams B. Swollen orders show spam’s allure. Availablefrom: <http://www.wired.com/news/business/0,1367,59907,00.html>.

Mertz D. Spam filtering techniques, six approaches to eliminatingunwanted e-mail. Available from: <http://www-106.ibm.com/developerworks/linux/library/l-spamf.html>.

Miller MJ. Forward thinking. How spam solutions lead to moreproblems. PC Magazine December 2003:7.

Moore K. Simple Mail Transfer Protocol (SMTP) Service Extensionfor Delivery Status Notifications (DSNs) (RFC 3461). Availablefrom: <http://www.ietf.org/rfc/rfc3461.txt>.

Ookoboiny G. Whitelist-based spam filtering. Available from:<http://impressive.net/people/gerald/2000/12/spam-filtering.html>.

Whitelist. PC Magazine February 2004:82.Sender ID framework overview. Available from: <http://www.

microsoft.com/mscorp/safety/technologies/senderid/overview.mspx>.

Using check_* in sendmail 8.8. Available from: <http://www.sendmail.org/wca/email/check.html>.

Spam: about the problem. Available from: <http://www.cauce.org/about/problem.shtml>.

Spam filters. How technology works. Technology Review April2004:79.

The whole scoop in the configuration file. Available from: <http://www.sendmail.org/wca/email/doc8.12/op-sh-5.html#sh-5>.

Turing A. Computing machinery and intelligence. Mind 1950;59:236.Vinther M. Intelligent junk mail detection using neural networks.

Available from: <http://www.logicnet.dk/reports/JunkDetec-tion/JunkDetection.htm>.

Guillermo Gonzalez-Talavan is a graduate in Physics and

Computer Science from the University of Salamanca, in Spain.

He is a Tenured Lecturer at the Computer Science and Auto-

mation Department and is Head of the Computer Architecture

Area at the University of Salamanca. His main areas of interest

and research include Computer Architecture and Security,

and Operating Systems.

http://www.wired.com/news/business/0,1367,62020,00.html


http://www.perl.com/pub/a/2000/03/spam3.html

http://www.perl.com/pub/a/2000/03/spam3.html

http://www.paulgraham.com/spam.html

http://www.paulgraham.com/spam.html

http://www.ietf.org/rfc/rfc3461.txt

http://www.forbes.com/forbes/2003/0623/136_print.html



http://www-106.ibm.com/developerworks/linux/library/l-spamf.html

http://www-106.ibm.com/developerworks/linux/library/l-spamf.html

http://www.ietf.org/rfc/rfc3461.txt

http://impressive.net/people/gerald/2000/12/spam-filtering.html

http://impressive.net/people/gerald/2000/12/spam-filtering.html

http://www.microsoft.com/mscorp/safety/technologies/senderid/overview.mspx



http://www.sendmail.org/~ca/email/check.html

http://www.sendmail.org/~ca/email/check.html

http://www.cauce.org/about/problem.shtml

http://www.cauce.org/about/problem.shtml

http://www.sendmail.org/~ca/email/doc8.12/op-sh-5.html#sh-5

http://www.sendmail.org/~ca/email/doc8.12/op-sh-5.html#sh-5

http://www.logicnet.dk/reports/JunkDetection/JunkDetection.htm

http://www.logicnet.dk/reports/JunkDetection/JunkDetection.htm

Computers.and.Security.volume.25.Issue.3.May.2006.eBook EEn

Documents

Transcript of Computers.and.Security.volume.25.Issue.3.May.2006.eBook EEn