A multi-region empirical study on the internet presence of global

14
A multi-region empirical study on the internet presence of global extremist organizations Jialun Qin & Yilu Zhou & Hsinchun Chen # Springer Science+Business Media, LLC 2010 Abstract Extremist organizations are heavily utilizing In- ternet technologies to increase their abilities to influence the world. Studying those global extremist organizationsInter- net presence would allow us to better understand extremist organizationstechnical sophistication and their propaganda plans. In this work, we explore an integrated approach for collecting and analyzing extremist Internet presence. We employed automatic Web crawling techniques to build a comprehensive international extremist Web collection. We then used a systematic content analysis tool called the Dark Web Attribute System to analyze and compare these extremist organizationsInternet usage from three perspec- tives: technical sophistication, content richness, and Web interactivity. By studying 1.7 million multimedia Web documents from around 224 Web sites of extremist organizations, we found that while all extremist organiza- tions covered in this study demonstrate high level of technical sophistication in their Web presence, Middle Eastern extremists are among the most sophisticated groups in both technical sophistication and media richness. US groups are the most active in supporting Internet communi- cations. Our analysis results will help domain experts deepen their understanding on the global extremism movements and make better counter-extremism measures on the Internet. Keywords Extremism . Internet . Web mining 1 Introduction Global extremist organizations, ranging from U.S. domestic racist and militia groups to Latin American guerrilla groups and Islamic military groups, have created thousands of Web sites that support psychological warfare, fundraising, recruitment, and distribution of propaganda materials. From those Web sites, supporters can download multimedia training materials, buy games, T-shirts, and music CDs, and access forums and chat services such as PalTalk (Bowers 2004; Muriel 2004; Weimann 2004). Such Web sites are technically supported by those who are Internet- savvy to provide sophisticated propaganda images and videos via proxy servers to mask ownerships (Armstrong and Forde 2003). As posited by Jenkins (2004), through operating their own Web sites and online forums, extremists have effectively created their sophisticated terror news network.Studying the sophistication of global extremist organ- izationsWeb presence would allow us to better understand extremist organizationstechnical sophistication, their access to information technology related resources, and their propaganda plans. However, due to the covert nature of the Dark Web and the lack of efficient automatic methodologies to monitor and analyze large amount of Web contents, few previous research have attempted to study the extremist organizationsWeb sites on a global J. Qin (*) Operations and Information Systems Department, University of Massachusetts Lowell, Lowell, MA, USA e-mail: [email protected] Y. Zhou Information Systems and Technology Management, George Washington University, Washington, DC, USA e-mail: [email protected] H. Chen Department of Management Information Systems, The University of Arizona, Tucson, AZ, USA e-mail: [email protected] Inf Syst Front DOI 10.1007/s10796-010-9277-6

Transcript of A multi-region empirical study on the internet presence of global

Page 1: A multi-region empirical study on the internet presence of global

A multi-region empirical study on the internet presenceof global extremist organizations

Jialun Qin & Yilu Zhou & Hsinchun Chen

# Springer Science+Business Media, LLC 2010

Abstract Extremist organizations are heavily utilizing In-ternet technologies to increase their abilities to influence theworld. Studying those global extremist organizations’ Inter-net presence would allow us to better understand extremistorganizations’ technical sophistication and their propagandaplans. In this work, we explore an integrated approach forcollecting and analyzing extremist Internet presence. Weemployed automatic Web crawling techniques to build acomprehensive international extremist Web collection. Wethen used a systematic content analysis tool called the DarkWeb Attribute System to analyze and compare theseextremist organizations’ Internet usage from three perspec-tives: technical sophistication, content richness, and Webinteractivity. By studying 1.7 million multimedia Webdocuments from around 224 Web sites of extremistorganizations, we found that while all extremist organiza-tions covered in this study demonstrate high level oftechnical sophistication in their Web presence, MiddleEastern extremists are among the most sophisticated groupsin both technical sophistication and media richness. US

groups are the most active in supporting Internet communi-cations. Our analysis results will help domain experts deepentheir understanding on the global extremism movements andmake better counter-extremism measures on the Internet.

Keywords Extremism . Internet .Web mining

1 Introduction

Global extremist organizations, ranging from U.S. domesticracist and militia groups to Latin American guerrilla groupsand Islamic military groups, have created thousands of Websites that support psychological warfare, fundraising,recruitment, and distribution of propaganda materials. Fromthose Web sites, supporters can download multimediatraining materials, buy games, T-shirts, and music CDs,and access forums and chat services such as PalTalk(Bowers 2004; Muriel 2004; Weimann 2004). Such Websites are technically supported by those who are Internet-savvy to provide sophisticated propaganda images andvideos via proxy servers to mask ownerships (Armstrongand Forde 2003). As posited by Jenkins (2004), throughoperating their own Web sites and online forums, extremistshave effectively created their sophisticated “terror newsnetwork.”

Studying the sophistication of global extremist organ-izations’ Web presence would allow us to better understandextremist organizations’ technical sophistication, theiraccess to information technology related resources, andtheir propaganda plans. However, due to the covert natureof the Dark Web and the lack of efficient automaticmethodologies to monitor and analyze large amount ofWeb contents, few previous research have attempted tostudy the extremist organizations’ Web sites on a global

J. Qin (*)Operations and Information Systems Department,University of Massachusetts Lowell,Lowell, MA, USAe-mail: [email protected]

Y. ZhouInformation Systems and Technology Management,George Washington University,Washington, DC, USAe-mail: [email protected]

H. ChenDepartment of Management Information Systems,The University of Arizona,Tucson, AZ, USAe-mail: [email protected]

Inf Syst FrontDOI 10.1007/s10796-010-9277-6

Page 2: A multi-region empirical study on the internet presence of global

scale. Scope of existing Dark Web studies was often limitedby the low efficiency of manual analysis approaches. Manybasic questions about global Dark Web developmentremain unanswered. For example, do different organiza-tions have different level of sophistications in terms of theirInternet usage? How effective have they been using theInternet technologies in terms of supporting communica-tions and propaganda activities?

In order to gain a more comprehensive understanding ofthe global Dark Web development, in this work, we explorean integrated approach for collecting and monitoring DarkWeb contents. We employed automatic Web crawlingtechniques to build a comprehensive Dark Web collectionwhich covers Web sites created by more than 200 domesticand international extremist organizations. We then applied asystematic content analysis methodology called the DarkWeb Attribute System (DWAS) (Qin et al. 2007) to enablequantitative assessment of the technical sophistication andeffectiveness of these extremist organizations’ Internetusage. We have tested the DWAS in a study of the majorMiddle Eastern extremist organizations’ Internet usage (Qinet al. 2007). The results demonstrated the effectiveness ofthe DWAS in studying organizations’ Internet usage.Furthermore, the high level of automation in the DWASleads to high efficiency, making the analysis of very largeWeb collection possible.

The rest of this paper is organized as follows. InSection 2, we briefly review previous works on extremists’use of the Web. In Section 3, we present our researchquestions and the proposed methodologies. In Section 4, wedescribe the findings obtained from a case study on thetechnical sophistication, content richness, and Web inter-activity features of major extremist organizations from threeregions: North America, Latin American, and MiddleEastern countries. In the last section, we provide conclu-sions and discuss the future directions of this research.

2 Literature review

2.1 Extremism on the internet

Previous research showed that extremists mainly utilize theInternet to enhance their information operations surround-ing propaganda, communication, and psychological warfare(Thomas 2003; Denning 2004; Weimann 2004). Accordingto Weimann (2004), almost all major extremist organiza-tions in the world have established their presence on theInternet.

Islamic militant organizations, such as Al Qaeda, Hamas,Hezbollah, etc., have been intensively utilizing the Internetto disseminate their anti-Western, anti-Israel propaganda,provide training materials to their supporters, plan their

operations, and raise funds by selling goods through theirWeb sites (9/11 Commission Report 2004; Weimann 2004).The level of technical sophistication of the Islamicextremist organizations’ Web sites has been increasingaccording to Katz, who monitors Islamic fundamentalistInternet activities (Internet Haganah 2005; SITE 2004).

Latin American guerrilla groups are also among theInternet-savvy extremist organizations. Mexico's Zapatistaguerrillas have been rallying support online since their 1994uprising. Their Web site (http://www.ezln.org/) has longbeen a Lycos Web Points’ (http://point.lycos.com/) top 5%WWW sites and serves a mouthpiece for the organization.Other major Latin American extremist groups such as theRevolutionary Armed Forces of Colombia (FARC) and the“Shining Path” in Peru also host their own Web sitescontaining scrolls of propaganda materials.

U.S. domestic extremist and hate groups have alsoexploited Internet technology to enhance their operationsand were among the early adopters of computer bulletinboards that eventually evolved into the Internet (Gerstenfeld etal. 2003). Stormfront.org, a neo-Nazi’s Website set up in1995, is considered the first major domestic “hate site” onthe World Wide Web because of its depth of content and itspresentation style which represented a new period for onlineright-wing extremism (Whine 1999). The neo-Nazis groupsshare a hatred for Jews and other minorities, and a love forAdolf Hitler and Nazi Germany. A social network analysis ofextremist Websites revealed that the Stormfornt.org served asa central node that occupied a prominent position within theWhite Supremacist network (Burris et al. 2000).

Extremist groups have sought to replicate or supplementthe communication, fundraising, propaganda, recruitment,and training functions on the Web by building web siteswith massive and dynamic online libraries of speeches,training manuals, and multimedia resources that are hyper-linked to other sites that share similar beliefs (Coll andGlasser 2005; Weimann 2004). The Web sites are designedto communicate with diverse global audiences of members,sympathizers, media, enemies, and the public (Weimann2004). Since extremist organizations are active on theInternet, studying their Web presence may help us developa better understanding of the extremists themselves.

2.2 Existing dark web studies and research gaps

In recent years, there have been studies on how extremistorganizations use the Web to facilitate their activities (Zhouet al. 2005; Chen et al. 2004; ISTS 2004; Thomas 2003;Tsfati and Weimann 2002; SITE 2004; Weimann 2004). Forexample, since the late 1990s, several organizations, suchas SITE Institute, the Anti-Terrorism Coalition, and theMiddle East Media Research Institute (MEMRI), started tomonitor contents from extremist Web sites for research and

Inf Syst Front

Page 3: A multi-region empirical study on the internet presence of global

intelligence purposes. However, due to the limitations ofmanual analysis approaches employed in those studies, thescopes of those studies have been limited to some selectedgroups. Table 1 lists some of the organizations that captureand analyze extremists’ Web sites grouped into threefunctional categories: archive, research center, and vigilantecommunity.

Except for the Artificial Intelligence (AI) Lab, none ofthe enumerated organizations seem to use automatedmethodologies for both collection building and analysis ofextremist Web sites. Due to the low efficiency of themanual collection and analysis approaches, comprehensive-ness of their analyses has been limited. In order to gaindeeper understanding on global extremists’ use of theInternet, we believe it is important to analyze the technicalsophistication, content richness, and Web interactivity ofextremist Web sites on a global scale.

2.3 Dark web collection building

The first step towards studying the extremist Web presenceis to capture extremist Web sites and store them in arepository for further analysis. Previous studies havesuggested three types of approaches to collecting Webcontents in specific domains: manual approach, automaticapproach, and semiautomatic approach. In order to buildthe September 11 and Election 2002 Web Archives(Schneider et al. 2003), the Library of Congress manuallycollected relevant seed URLs and downloaded their con-tents. The limitation of such a manual approach is that it istime-consuming and inefficient. To archive Norwegian

legal deposit documents on the Web, Albertsen (2003)used an automatic approach in the “Paradigma” project.They employed a focused Web crawler (Kleinberg 1999),an automatic program that discovers and downloads Websites in particular domains by following Web links.Intelligent focused Web crawlers can automatically restrictthe crawling to be with a specific domain (Chau and Chen2003). The automatic approach is more efficient than themanual approach; however, due to the limitations of currentfocused crawling techniques, automatic approaches oftenintroduce noise (off-topic Web pages) into the collection.

In order to ensure both quality and efficiency in collectingDark Web contents, we proposed a semi-automatic Dark Webcrawling approach which combined the accuracy of humanexperts and the efficiency of automatic Web crawlers (Zhou etal. 2004). The semi-automatic approach contains four majorsteps. First, a list of extremist organizations is identified fromauthoritative sources such as U.S. State Department report,FBI report, and UN security counsel. Then, URLs of Websites created by these organizations are identified eitherdirectly from the same authoritative sources or by searchingthe Internet using those organizations’ information (groupname, leader names, jargons, etc.) as queries. The identifiedURLs form the initial seed URL set and this set are thenfurther expanded through out-link and in-link expansionapproaches. Last, the identified extremist Web sites areautomatically downloaded using an intelligent Web crawlercalled SpidersRUs (Chau et al. 2008).

Using this approach, we successfully created acomprehensive Dark Web testbed containing more than100 Web sites created by extremists. We believe that this

Table 1 Organizations that capture and analyze extremists’ Web sites

Organization Description Access

Archive

1. Internet Archive (IA) 1996-. Collect open access HTML pages (every 2 mths.) Via http://www.archive.org

Research centers

2. Anti-terrorism Coalition(ATC)

2003-. Jihad Watch. Has 448 extremist Web sites & forums Via http://www.atcoalition.net

3. Artificial Intelligence (AI)Lab, University ofArizona

2003-. Spidering (every 2 mos.) to collect extremist Web sites. Has 1000 sWeb sites: U.S. Domestic, Latin America, & Middle Eastern Web sites

Via testbed portal called Dark WebPortal

4. MEMRI 2003 -. Jihad & Terrorism Studies Project. Access reports via http://www.memri.org

5. Site Institute 2003 -. Capture Web sites every 24 hrs. Extensive collection of 1000 sof files.

Access reports & fee-based intelli-gence services http://siteinstitute.org

6. Weimann (Univ. Haifa,Israel)

1998 -. Capture Web sites daily. Extensive collectionof 1000 s of files.

Closed collection

Vigilante community

7. Internet Haganah 2001-. Confronting the Global Jihad Project. Has100 s links to Web sites.

Provides snapshots of terrorist Websites http://haganah.us

Inf Syst Front

Page 4: A multi-region empirical study on the internet presence of global

semiautomatic approach is most suitable for creating thecomprehensive Dark Web collection for this study.

2.4 Dark Web content analysis

In order to reach an understanding of the various facets ofextremists’ Web usage and communications, a systematicanalysis of the Web sites’ content is required. Researchersin the extremism domain have used observation and contentanalysis to analyze Web site data. In Bunt’s (2003)overview of Jihadi movements’ presence on the Web, hedescribed the reaction of the global Muslim community tothe content of extremist Web sites. His assessment of theinfluence such content had on Muslims and Westerners wasbased on a qualitative analysis of message contentsextracted from Taliban and Al Qaeda Web sites. Tsfatiand Weimann (2002) conducted a content analysis of thecharacteristics of extremist groups’ communications. Theysaid that the small size of their collection and thedescriptive nature of their research questions made aquantitative analysis infeasible.

In order to enable quantitative study, we proposed asystematic Dark Web content analysis approach called theDark Web Attribute System (DWAS). The DWAS extractsthe appearances of specific attributes from extremist Websites and assigns each Web site three scores to indicate theirlevels of technical sophistication, content richness, and Webinteractivity. The attributes used in DWAS were identifiedfrom literatures in e-Commerce (Palmer and Griffith 1998),e-Government (Demchak et al. 2001), and e-Education(Chou 2003) domains. Unlike most manual-based, qualita-tive content analysis approaches used in previous Dark Webstudies, the DWAS employs programs to automaticallyidentify the appearances of attributes and generate quanti-tative results. We successfully applied the DWAS to studythe technical sophistication and effectiveness of MiddleEastern extremist organizations’ Internet usage. We believethat the DWAS is also an effective tool to study theextremist’s tactical use of the Internet on a global scale.

3 The multi-region empirical study on extremistorganizations’ internet presence

Studying the Dark Web helps us deepen our understandingson the global extremism movements. However, traditionalmanual based Web analysis approaches were not efficientenough to conduct comprehensive Dark Web studies on aglobal scale. To address this research gap, we propose alarge scale empirical study on the technical sophisticationand effectiveness of global extremist organizations’ Internetusage. To ensure the comprehensiveness, our study coversWeb sites created by major extremist organizations from

three geographical regions across the world: US domesticracist and hate groups, Latin American guerrilla andseparatist groups, as well as Middle Eastern Islamicextremist groups. We also conducted cross-compared Websites of different types of extremist groups to reveal thedifferences in extremist organizations’ online capabilitiesand strategies.

The research questions postulated in our study are:

& What design features and attributes are necessary tobuild a highly relevant and comprehensive global DarkWeb collection for analysis purposes?

& For extremist Web sites, what are the levels of technicalsophistication, content richness, and interactivity?

& What major differences exist between the characteristicsof Web sites created by extremists from differentregions with different ideologies?

To study the research questions, we propose to use theDark Web analysis methodology proposed in Qin et al.(2007) and expand the scope of the study to a global level.Both the semi-automatic collection build approach and theDWAS have been shown as effective tools in our previousstudies on the Middle Eastern extremist organizations’ Webpresence (Qin et al. 2007).

3.1 Dark web collection building

To ensure the quality of our collection, we propose to use asemi-automated approach to collecting Dark Web contents(Zhou et al. 2005). The collection used in this study wasbuilt in May 2006. The collection was built in the following4 steps:

1. Identify extremist groups: We started the collectionbuilding process by identifying the groups that areconsidered by authoritative sources as extremist groups.The sources include government agency reports (e.g.,U.S. State Department reports, FBI reports, etc.),authoritative organization reports (e.g., Counter-Terrorism Committee of the UN Security Council,etc.), and studies published by extremism researchcenters such as the Anti-Terrorism Coalition (ATC), theMiddle East Media Research Institute (MEMRI), etc.From those sources, we identified around 200 U.S.domestic groups, and around 400 International groups.Information such as extremist group names, leadernames, and extremist jargons are identified from thesources to create a extremism keyword lexicon for usein the next step.

2. Identify extremist group URLs: We manually identifieda set of seed extremist group URLs from two sources.First, some extremist URLs were directly identifiedfrom the authoritative sources and literatures mentioned

Inf Syst Front

Page 5: A multi-region empirical study on the internet presence of global

above. Second, we identified another set of extremistURLs by querying major search engines. The initial setof seed URLs was then expanded. The queries wereissued in the corresponding extremist groups’ nativelanguages. For example, the queries we used whensearching for Middle Eastern groups’ URLs includeextremist leaders’ names such as(Sheikh Mujahid bin Laden), extremist groups’ namessuch as (“Khalq Iran”), and special wordsused by extremists such as (“Crusader’sWar”) and (“Infidels”).

3. Expand extremist URL set through link and forumanalysis: After identifying the seed URLs, We extractedout-links and in-links of the seed URLs using anautomatic link-analysis programs. The out-links wereextracted from the HTML contents of “favorite link”pages under the seed Web sites. The in-links wereextracted from Google in-link search service throughGoogle API. We also had language experts whobrowsed the contents of extremist supporting forumsand extract the extremist URLs posted by extremismsupporters. The expanded extremist URL set was themmanually filtered by domain experts to ensure thatirrelevant and bogus Web sites did not make way intoour collection. After the filtering, a total of 224extremist group URLs (92 U.S. domestic group URLs,53 Latin American group URLs, and 79 Middle Easterngroup URLs) were included in our final URL set.

4. Download extremist Web site contents: The multimediaand multilingual contents of the identified extremistWeb sites were automatically collected using a Webcrawler developed by our group. Our Web crawler wasdesigned to download not only the textual files (e.g.,HTML, TXT, PDF, etc.) but also multimedia files (e.g.,images, video, audio, etc.) and dynamically generatedWeb files (e.g., PHP, ASP, JSP, etc.). Moreover,because extremist organizations set up forums within

their Web sites whose contents are of special value toresearch communities, our Web crawler also canautomatically log into the forums and download thedynamic forum contents. The automatic Web crawlingapproach allows us to effectively build Dark Webcollections with millions of documents. This wouldgreatly increase the comprehensiveness of our DarkWeb study.

Following the four steps described above, we built a globalDark Web collection containing around 1.7 million multime-dia documents. Table 2 summarizes the detailed file typebreakdown of the global Dark Web collection. The textualfiles make the large category in the Dark Web collection.Textual files include static textual files such as HTML files,PDF files, MS Word documents, as well as dynamic filessuch as PHP files, ASP files, and JSP files. Interestingly,more than half of the textual files in the Dark Web collectionare dynamic files. In particular, dynamic files make up to78% of all textual files. We conducted a preliminary analysison the contents of these dynamic files and found that mostdynamic files were forum postings. This indicates that onlineforums play an important role in extremists’ Web usage,especially for Middle Eastern groups.

Other than textual files, multimedia files also make asignificant presence in the extremist collection whichindicates heavy use of multimedia technologies in extremistWeb sites. The last two types of files, archive files and non-standard files, made up less than 5% of the collection.Archive files are compressed file packages such.zip filesand.rar files. They could be password-protected. Non-standard files are files that cannot be recognized by theWindows operating system. These files may be of specialinterest of extremism researchers and experts because theycould be encrypted information created by extremists.Further analysis is needed to study the contents of thesetwo types of files.

File types Number of files

U.S. Domestic Latin American Middle Eastern

Textual Files 312408 (76%) 230977 (80%) 804145 (77%)

Static Files 154148 89150 176061

Dynamic Files 158260 141827 628084

Multimedia Files 96738 (23%) 55618 (19%) 225557 (22%)

Image Files 91089 54422 216520

Audio Files 3769 941 1437

Video Files 1880 255 7600

Archive Files 327 (0.1%) 852 (0.6%) 1499 (0.5%)

Non-Standard Files 1355 (0.9%) 650 (0.4%) 1537 (0.5%)

Total 410828 (100%) 288097 (100%) 1032738 (100%)

Table 2 Dark web collectionfile type breakdown

Inf Syst Front

Page 6: A multi-region empirical study on the internet presence of global

Comparing documents created by groups from differ-ent regions, we found that the number of Web documentscreated by the Middle Eastern groups is much larger thanthose of the U.S. domestic groups and the LatinAmerican groups. This indicates the Middle Easternextremist organizations are making a more prominentpresence on the Web. A more detailed analysis on thetechnical sophistication and effectiveness of the extrem-ists’ Internet presence will be described in the nextsection.

3.2 Dark web content analysis using the DWAS

We used the DWAS as our content analysis tool to generatequantitative indications of the technical sophistication and

effectiveness of global extremists’ use of the Internet. TheDWAS contains three sets of attributes: 13 technicalsophistication (TS) attributes, five content richness (CR)attributes (an extension of the traditional media richnessattributes), and 11 Web interactivity (WI) attributes.Different weights were assigned to each technical sophis-tication and Web interactivity attribute to indicate theirdifferent levels of importance. The weights were assignedbased on Web experts’ opinions collected via an emailsurvey conducted in our previous research (Qin et al. 2007).We conducted reliability test on the experts’ answers andthe resulting reliability score (Cronbach’s Alpha) was 0.89which was well above the 0.70 mark required foracceptable reliable scale reliability (Nunnelly 1978). A listof these attributes is summarized in Table 3.

TS attributes Weights

Basic HTML techniques Use of lists 1

Use of tables 2

Use of frames 2

Use of forms 1.5

Embedded multimedia Use of background image 1

Use of background music 2

Use of stream audio/video 3.5

Advanced HTML Use of DHTML/SHTML 2.5

Use of predefined script functions 2

Use of self-defined script functions 4.5

Dynamic web programming Use of CGI 2.5

Use of PHP 4.5

Use of JSP/ASP 5.5

CR attributes Scores

Hyperlink # of hyperlinks Hyperlink

File/Software Download # of downloadable documents File/Software Download

Image # of images Image

Audio files # of audio files Audio Files

Video files # of video files Video Files

WI attributes Weights

One-to-one level interactivity

Email feedback 1.75

Email list 2.25

Contact address 1.25

Feedback form 2.75

Guest book 1.50

Community level interactivity

Private message 4.25

Online forum 4.25

Chat room 4.75

Transaction level interactivity

Online shop 4.00

Online payment 4.00

Online application form 4.00

Table 3 Summary of DWASAttributes and Weights

Inf Syst Front

Page 7: A multi-region empirical study on the internet presence of global

We developed strategies to efficiently and accuratelyidentify the presence of the DWAS attributes from DarkWeb sites. The TS and CR attributes are marked byHTML tags in page contents or file extension names inthe page URL strings. For example, a URL string ending

with “.jsp” indicates that the page utilizes JSP technol-ogy. We developed programs to automatically analyzeDark Web page contents and URL strings to extract thepresence of the TS and CR attributes. Since there are noclear indications or rules that a program could follow to

ANOVA Basic TS (Significance Level at 0.05) F p-Value F crit

0.019863** 3.987669 3.035794

a

ANOVA Media TS (Significance Level at 0.05) F p-Value F crit 6.23225 0.00233** 3.036902

b

ANOVA Advanced TS (Significance Level at 0.05) F p-Value16.10481 -71097.2 3.036902

F crit

c

**

ANOVA Dynamic TS (Significance Level at 0.05) F p-Value F crit 4.438946 0.012885** 3.036902

d

Fig. 1 a Basic HTML technical sophistication ANOVA results. bEmbedded media technical sophistication ANOVA results. c Ad-vanced technical sophistication ANOVA results. d Dynamic web

programming technical sophistication ANOVA results. e Overalltechnical sophistication ANOVA results

Inf Syst Front

Page 8: A multi-region empirical study on the internet presence of global

identify WI attributes from Dark Web contents with ahigh degree of accuracy; we developed a set of codingscheme to allow human coders to identify their presencein Dark Web sites. Technical sophistication, contentrichness, and Web interactivity scores are calculated foreach Web site in the collection based on the presence ofthe attributes to indicate how advanced and effective thesite is in terms of supporting communications andinteractions.

3.3 Experimental results

Following the DWAS approach, presence of the technicalsophistication and media richness attributes was automati-cally extracted from the collections using programs.Presence of the Web interactivity attributes was extractedfrom each Web site by language experts based on thecoding scheme in DWAS. Because of the time limitation,language experts only examined the top two level Webpages in each Web site. For each Web site in the globalDark Web collection, three scores (technical sophistication,content richness, and Web interactivity) were calculatedbased on the presence of the attributes and theircorresponding weights in DWAS. Statistical analysis wasconducted to cross-compare the advancement/effectivenessscores achieved by the Web sites of extremist organizationsfrom the three different regions.

3.3.1 Comparison results: technical sophistication

To learn whether there are differences in extremistorganizations’ level of sophistication, we used ANOVAanalysis to compare the technical sophistication scoresachieved by Web sites of extremist groups from differentregions. Figures 1a–e show the ANOVA results of differenttechnical sophistication levels.

Figure 1a shows that, in terms of applying basic HTMLtechniques, there are significant differences in the levels oftechnical sophistication between Web sites created bygroups from the three regions. More specifically, theconfidence interval plot shown in Fig. 1a demonstrates thatU.S. domestic group Web sites and Middle Eastern groupWeb sites achieved similar level of basic technicalsophistication and they are both significantly better thanLatin American group Web sites.

In terms of embedded media usage, as shown in Fig. 1b,the ANOVA result shows that there are significant differ-ences (p-Value=0.00233) in the levels of technical sophis-tication between Web sites created by groups from the threeregions. The confidence interval plot further shows that thedifferences in embedded media usage between Web sitescreated by groups of the three regions follow the samepattern as the differences in basic HTML techniques: whileU.S. groups and Middle Eastern groups are comparable; bothof them are significantly better than Latin American groups.

For advanced HTML technique usage, as shown in Fig. 1c,similar patterns were observed again. U.S. domestic groupWeb sites and Middle Eastern group Web sites performedcomparably and they both significantly (p-Value<0.00001)outperformed the Latin American Web sites.

For the use of dynamic Web programming languagessuch as PHP and JSP, a different pattern was observed. Asshown in Fig. 1d, Middle Eastern groups are the mostadvanced ones in terms of applying dynamic Web program-ming techniques in their Web sites. They are significantlybetter (p-Value=0.012885) than both U.S. domestic groupsand Latin American groups. While the U.S. domesticgroups performed better than the Latin American groups,the difference is not significant.

When taking all four attributes of technical sophisti-cation into consideration, as shown in Fig. 1e, MiddleEaster groups are the best among all extremist groupsacross the world, although the difference between themand U.S. domestic groups is not significant. LatinAmerican groups lag behind both Middle Eastern groupsand U.S. domestic groups. The difference between LatinAmerican groups and groups from the other two regions issignificant (p-Value=0.0000107).

Technical sophistication of Web sites run by extremistgroups is a good indication of the level of IT expertise thegroups have as well as the level of investment the these

ANOVA Overall TS (Significance Level at 0.05) F p-Value12.06542 -51007.1 ** 3.036902

e

F crit

Fig. 1 (continued)

Inf Syst Front

Page 9: A multi-region empirical study on the internet presence of global

groups have put on building their Internet presence.Considering the United State is the most IT-savvy countryin the world where Internet technologies and services areeasily and cheaply available, it is not surprising to see U.S.domestic extremist groups have achieved high level of

technical sophistication in utilizing the Internet infrastruc-ture. The Middle Eastern groups, on the other hand, aremostly rooted in countries where Internet technologies andinfrastructure are much less developed. Nevertheless, theyachieved a technical sophistication level that is even

ANOVA # of Links CR (Significance Level is at 0.05) F p-Value F crit 3.037265 0.048818** 3.013056

a

ANOVA # of Downloads (Significance Level is at 0.05) F p-Value F crit

0.226105 0.797729 3.016694

b

ANOVA # of Images (Significance Level is at 0.05) F p-Value crit F4.218931 0.015325** 3.016602

c

ANOVA # of Audio Files (Significance Level is at 0.05) F p-Value2.00641 0.135728 3.016694

d

F crit

Fig. 2 a Number of links ANOVA results. b Number of downloads ANOVA results. c Number of images ANOVA results. d Number of audiofiles ANOVA results. e Number of audio files ANOVA results

Inf Syst Front

Page 10: A multi-region empirical study on the internet presence of global

higher than that of the U.S. domestic groups. Thisindicates that the Internet has become a very importantpart of the Middle Eastern extremist organizations’ agendaand they have made the efforts to take advantages of thelatest Internet technologies. The Latin American extremistgroups seem to have a different attitude towards Internet.Their Web sites are significantly less sophisticated thangroups from the other two regions, which indicates thatthey had less investment on Internet technologies. Fur-thermore, the Middle Eastern groups are significant moreadvanced in terms of utilizing dynamic Web programmingtechniques in their Web site than groups from the othertwo regions. Based on our preliminary studies, inextremist Web sites, dynamic Web programming techni-ques are usually used to support communication function-alities such as online forums and chat rooms. This highlevel usage of dynamic Web programming techniques inthe Middle Easter group Web sites calls for furtherinvestigations.

3.3.2 Comparison results: Content richness

Content richness is an important criterion to measure theeffectiveness of extremists’ online propaganda plans. Thericher the contents are on their Web sites, the moreinformation can the extremist groups convey to theirsupporters, thus achieving better mobilization goals. Tostudy the propaganda plans of different extremist groups,

we conducted ANOVA analysis to compare the contentrichness of their Web sites.

Figure 2a shows the comparison results of averagenumber of hyperlinks per Web site. As we can see fromthe confidence interval plot, the Middle Eastern groupsWeb sites contain significantly more hyperlinks than Websites of the other two categories of groups (p-Value=0.048818). Having more hyperlinks in their Web sites, theMiddle Eastern groups provide their supporters with moreopportunities to locate Web documents that they reallywant. Moreover, more hyperlinks between differentgroups Websites indicates that stronger real world rela-tionships exist between those Middle Eastern extremistorganizations.

Another important content richness attribute is thenumber of downloadable documents on a Web site.Downloadable documents include textual files (e.g., PDFfiles, MS Word Files, etc.) and archive files (e.g., ZIP files,RAR files, etc.). Previous studies (Bowers 2004; Muriel2004; Weimann 2004; SITE 2004) showed that providingdownloadable documents on Web sites has become a majormeans for extremists to disseminate their propagandamaterials. From Fig. 2b; we can see that extremist groupsfrom all three major regions provide similar amount ofdownloadable documents on their Web sites.

Multimedia documents, including images, audio files,and video files, are the most important vehicles to conveyinformation to Web users. They are more attractive andtend to leave a stronger impression on people than puretextual contents. As shown in Fig. 2c–e, the Middle Eastergroups posted significantly (p-Value=0.001219) moreimages and video files on their Web sites than the U.S.domestic and Latin American groups. The U.S. Domesticgroups posted more audio files than the Middle Easternand Latin American groups; but the difference is notsignificant (p-Value=0.135728).

The large amount of multimedia content posted on theMiddle Eastern groups’ Web sites is an indication thatthe Middle Eastern groups have very active propagandastrategies. Moreover, hosting large volume of multime-dia contents usually requires Web servers with highstability and bandwidth. The Middle Eastern extremistgroups succeeded in building such a stable onlineinfrastructure to support their sophisticated online pro-paganda campaigns.

3.3.3 Comparison results: Web interactivity

Supporting the communications between their members andtheir supporters is one of the major goals of extremists’Internet exploitation. We conducted ANOVA analysis tocompare the effectiveness of extremist groups’ communi-cations through their Web sites.

ANOVA # of Video Files (Significance Level is at 0.05) F p-Value6.816345 0.001219** 3.016893

F crit

e

Fig. 2 (continued)

Inf Syst Front

Page 11: A multi-region empirical study on the internet presence of global

As shown in Fig. 3a, in terms of one-to-one levelinteractivity, both the U.S. domestic groups and LatinAmerican groups performed significantly (p-Value=0.000181) better than the Middle Eastern groups. One

possible explanation for the low one-to-one interactivitysupport from the Middle Easter groups is that the MiddleEastern extremist groups are more radical and covert than theU.S. domestic and Latin American groups. Many of the

ANOVA 1-to-1 Interactivity (Significance Level is at 0.05) F p-Value9.027465 0.000181** 3.044505

ANOVA Community Interactivity (Significance Level is at 0.05) F p-Value3.155795 0.044895** 3.044505

a

b

F crit

F critANOVA Transaction Interactivity (Significance Level is at 0.05) F p-Value15.69915 -71001.5 ** 3.044505

ANOVA Overall Interactivity (Significance Level is at 0.05) F p-Value15.7745 -7107.5 ** 3.044505

c

d

F crit

F crit

Fig. 3 a One-to-one level interactivity ANOVA results. b Community level interactivity ANOVA results. c Transaction level interactivityANOVA results. d Overall web interactivity ANOVA results

Inf Syst Front

Page 12: A multi-region empirical study on the internet presence of global

Middle Eastern groups are currently under military suppres-sion from the West. In many cases, they could not give outtheir address and other contact information to the public.

At the community level, as shown in Fig. 3b, the U.S.domestic groups and the Middle Eastern groups bothperformed better than the Latin American groups. Thedifference between the U.S. domestic and Latin Americangroups is significant (p-Value=0.044895).

The U.S. domestic extremist groups are among theearliest adopters of Internet forums. It is not surprising tosee that they are still heavily utilizing Internet tools such asforums and chat rooms to support their communication withtheir supporters. The Middle Eastern groups are the neweradopters of such Internet-based communication tools, butthey are also very active in terms of hosting andmaintaining online forums and bulletin boards. Some ofthe Middle Eastern extremist group forums have grownvery large in scale. For example, www.shawati.com has31,894 registered forum members and 418,196 posts;www.kuwaitchat.net has 11,531 registered members and624,694 posts. Not all of the forum members areextremism or extremists. Many of them are just supportersor sympathizers. Members of these large forums partic-ipate in daily discussions, express their support of theextremist groups, and reinforce each other’s beliefs in theextremist groups’ courses. They sometimes can getmessages directly from active members of extremistgroups. For example, messages from the Iraqi extremistleader, Abu Mus’ab Zarqawi can often be found inonline forum www.islamic-f.net. These dynamic forumsprovide snapshots of extremist groups’ activities, commu-

nications, ideologies, relationships, and evolutionarydevelopments.

The transaction level interactivity is the most advancedlevel of interactivity that Web sites can support. At thislevel, as shown in Fig. 3c, the U.S. domestic groupssignificantly outperformed both the Latin American andMiddle Eastern groups. Supporting transaction level inter-activity requires high level of technical sophisticationwhich the Latin American groups did not demonstratebased on our TS comparison results. That is one possibleexplanation for Latin American groups’ low performance.On the other hand, transaction level interactivities usuallyinvolve the transfer of funds online. In order to performsuch tasks, one is often required to provide their identityinformation (bank account member, billing address, contactinformation, etc.) to online service providers. It is difficultfor the Middle Eastern groups to meet these requirementsbecause they have to remain covert, which is a possibleexplanation for their low performance in supportingtransaction level interactivity.

When taking all three levels of Web interactivity intoconsideration, as shown in Fig. 3d, the U.S. domestic groupsperformed significantly better (p-Value<0.0001) than boththe Latin American and Middle Eastern groups. Table 4below summarizes all the comparison results we obtained inthis study. In the results column, a “=” indicates that groupsfrom the two region performed comparably in that particularcategory. A “>” indicates that groups from one regionperform better than groups from another region; but thedifference is not significant. A “>>” indicates groups in oneregion performed significantly better than another.

p-Value

Technical sophistication comparison results

Basic HTML techniques U.S. = Mid. East. >> Lt. America 0.019863

Embedded multimedia U.S. = Mid. East. >> Lt. America 0.00233

Advanced HTML U.S. = Mid. East. >> Lt. America 0.0000002

Dynamic web programming Mid. East. >> U.S. > Lt. America 0.012885

Overall TS U.S. = Mid. East. >> Lt. America 0.0000107

Content richness comparison results

# of hyperlink Mid. East. > Lt. America > U.S. 0.048818

# of file/software download U.S. = Mid. East. = Lt. America 0.797729

# of Image Mid. East. >> U.S. > Lt. America 0.015325

# of audio files U.S. > Mid. East. = Lt. America 0.135728

# of video files Mid. East. >> U.S. > Lt. America 0.001219

Web interactivity comparison results

One-to-one level U.S. > Lt. America >> Mid. East. 0.000181

Community level U.S. > Mid. East. >> Lt. America 0.044895

Transaction level U.S. >> Mid. East. = Lt. America 0.0000005

Overall WI U.S. >> Mid. East. = Lt. America 0.0000005

Table 4 Summary of ANOVAComparison Results

Inf Syst Front

Page 13: A multi-region empirical study on the internet presence of global

4 Conclusions and future directions

In this paper, we discussed a large scale empirical study toexplore the application of automatic Web crawling techni-ques and the Dark Web Attribute System in studying globalextremist organizations’ Internet presence. Using a semi-automatic crawling approach, we collected more than 1.7million multimedia Web documents from around 224 Websites created by major extremist organizations rooted inNorth America, Latin America, and Middle Easterncountries. We then used the Dark Web Attribute Systemto study these extremist organizations’ Internet usage fromthree perspectives: technical sophistication, content rich-ness, and Web interactivity. In order to gain a deeperunderstanding on different extremist organizations’ ITcapabilities, we also conducted statistical analysis tocross-compare the technical sophistication and effectivenessof Web sites created by extremist groups from differentregions.

Our analysis results showed that, among groups from allthree regions, the Middle Eastern extremist organizationsare the most active exploiters of the Internet. Theydemonstrated the highest level of technical sophisticationand provided the richest multimedia contents in their Websites. However, due to their covert nature, they did notperform as well as the U.S. domestic extremist organiza-tions in terms of supporting communications using Internettechnologies. Because the U.S. domestic groups take fulladvantages of Internet technologies such as forums, chatrooms, and e-Commerce transactions, to facilitate theircommunication and interaction with their supporters. TheLatin American groups, on the other hand, lagged behindgroups from the other two regions in terms of exploiting theInternet. Their Web sites are not as sophisticated as those oftheir U.S. and Middle Eastern counterparts. They were alsonot as effective in terms of utilizing the Internet to supporttheir communications.

The contribution of this study is twofold. First, this studyfurther explored the high effectiveness and efficiency ofautomatic Web mining techniques in Dark Web studies. Itexpanded the scope of our previous Dark Web contentanalysis (Qin et al., forthcoming) and pushed the compre-hensiveness of Dark Web studies to a new level. Second,the results of our empirical study help domain expertsdeepen their understanding on the global extremism move-ments and make better counter-extremism measures on theInternet.

We have several future research directions to pursue.First, we will further improve the Dark Web AttributesSystem by incorporating more accurate attributes into thesystem. We will also collaborate with more Internettechnology experts to further fine tune the weightsassociated with the DWAS attributes. Second, we will

explore more effective Web crawling techniques to furtherexpand the coverage of our Dark Web study. Third, we willcollaborate with more extremism domain experts to betterinterpret our findings. Last but not least, we will explore theapplication of the proposed automatic Web crawling andcontent analysis tools in other domains such as businessintelligence and e-Government.

References

9/11 Commision Report. (2004). Published by the National Commis-sion on Terrorist Attacks Upon the United States, available athttp://govinfo.library.unt.edu/911/report/911Report.pdf

Albertsen, K. (2003). The Paradigma Web Harvesting Environment.In Proc. of 3rd European Conference of Research and AdvancedTechnology for Digital Libraries (ECDL) Workshop on WebArchives 2003.

Armstrong, H. L., & Forde, P. J. (2003). Internet anonymity practicesin computer crime. Information Management & ComputerSecurity, 11(5), 209–215.

Bowers, F. (2004). Terrorists spread their messages online. ChristianScience Monitor, July 28, 2004, available at http://www.csmonitor.com/2004/0728/p03s01-usgn.htm.

Bunt, G. R. (2003). Islam in the Digital Age: E-jihad, Online Fatwasand Cyber Islamic Environments. London: Pluto Press.

Burris, V., Smith, E., & Strahm, A. (2000). White supremacistnetworks on the Internet. Sociological focus, 33(2), 215-234.

Chau, M., & Chen, H. (2003). Comparison of three vertical searchspiders. IEEE Computer, 36, 56–62.

Chau, M., Qin, J., Zhou, Y., Tseng, C., & Chen, H. (2008).SpidersRUs: creating specialized search engines in multiplelanguages. Decision Support Systems, 45, 621–640.

Chen, H., Qin, J., Reid, E., Chung, W., Zhou, Y., Xi, W., et al. (2004).The Dark Web Portal: Collecting and Analyzing the Presence ofDomestic and International Terrorist Groups on the Web. In Proc.of International IEEE Conference on Intelligent TransportationSystems.

Chou, C. (2003). Interactivity and interactive functions in web-basedlearning systems: a technical framework for designers. BritishJournal of Educational Technology, 34(3), 265–279.

Coll, S. & Glasser, S. B. (2005). Terrorists Turn to the Web as Base ofOperations. Washington Post, Aug 7, 2005.

Demchak, C., Friis, C., & La Porte, T. M. (2001). Webbinggovernance: National differences in constructing the face ofpublic organizations. Handbook of public information systems.G. D. Garson. NYC: Marcel Dekker.

Denning, D. E. (2004). Information Operations and Terrorism,”Journal of Information Warfare (draft), 2004, available at http://www.jinfowar.com.

Gerstenfeld, P. B., Grant, D. R., & Chiang, C. (2003). Hate Online: AContent Analysis of Extremist Internet Sites. Analyses of SocialIssues and Public Policy, 3(1), 29-44.

Internet Haganah, Internet Haganah report, 2005, available at http://en.wikipedia.org/wiki/Internet_Haganah.

ISTS (2004). Examining the Cyber Capabilities of Islamic TerroristGroups”. Report, Institute for Security Technology Studies, 2004.http://www.ists.dartmouth.edu/

Jenkins, B. M. (2004). World becomes the hostage of media-savvyterrorists: Commentary. USA Today, August 22, 2004. http://www.rand.org/.

Inf Syst Front

Page 14: A multi-region empirical study on the internet presence of global

Kleinberg, J. (1999). Authoritative Sources in a Hyperlinked Envi-ronment. Journal of the ACM, 46(5), 604-632.

Muriel, D. (2004). Terror Moves to the Virtual World. CNN News,April 8, 2004, available at http://edition.cnn.com/2004/TECH/04/08/internet.terror/.

Nunnelly, J. (1978). Psychometric theory. New York: McGraw Hill.Palmer, J. W., & Griffith, D. A. (1998). An emerging model of Web site

design for marketing. Communications of the ACM, 41(3), 45–51.Qin, J., Zhou, Y., Reid, E., Lai, G., & Chen, H. (2007). Analyzing

terror campaigns on the internet: technical sophistication, contentrichness, and Web interactivity. International Journal on HumanComputer Studies, 65(1), 71–84.

Schneider, S. M., Foot, K., Kimpton, M., & Jones, G. (2003).Building thematic web collections: Challenges and experiencesfrom the September 11 Web Archive and the Election 2002 WebArchive. In Proc. of the 3rd ECDL Workshop on Web Archives,Trondheim, Norway, August 2003.

SITE (2004). Special report published by SITE Institute, http://siteinstitute.org.

Thomas, T. L. (2003). Al Qaeda and the Internet: The Danger of‘Cyberplanning’. Parameters, Spring 2003, pp. 112–23, availableat http://carlisle-www.army.mil/usawc/Parameters/03spring/thomas.htm.

Tsfati, Y., & Weimann, G. (2002). www.terrorism.com: terror on theinternet. Studies in Conflict & Terrorism, 25, 317–332.

Weimann, G. (2004). www.terror.net: How modern terrorism use theinternet. Special Report, U.S. Institute of Peace. Available athttp://www.usip.org/pubs/specialreports/sr116.pdf.

Whine, M. (1999). Cyberspace: A new medium for communication,command and control by extremists. Available at http://www.ict.org.il/articles/cyberspace.htm

Zhou, Y., Reid, E., Qin, J., Chen, H., & Lai, G. (2005). U.S. domesticextremist groups on the Web: link and content analysis. IEEEIntelligent Systems Special Issue on Homeland Security, 20(5),44–51.

Jialun Qin is an Assistant Professor of Management in the Operationsand Information Systems Department at University of MassachusettsLowell. He received his Ph.D. degree in Management InformationSystems from the University of Arizona. His research interests includedata mining and Web mining, social network analysis, and humancomputer interaction. He has published more than 30 research articlesin major journals and conferences including Decision SupportSystems, Journal of the American Society for Information Scienceand Technology, and IEEE Intelligent Systems. Contact him [email protected].

Yilu Zhou is an Assistant Professor in the Department of InformationSystems and Technology Management at George Washington Univer-sity. Her current research interests include multilingual knowledgediscovery, Web mining, text mining and human computer interaction.She received a Ph.D. in Management of Information System from theUniversity of Arizona, where she was also a research associate of theArtificial Intelligence Lab. She received a B.S. in Computer Sciencefrom Shanghai Jiaotong University. Contact her at [email protected].

Hsinchun Chen is McClelland Professor of Management Informa-tion Systems at the University of Arizona and Andersen ConsultingProfessor of the Year (1999). He received the B.S. degree from theNational Chiao-Tung University in Taiwan, the MBA degree fromSUNY Buffalo, and the Ph.D. degree in Information Systems fromthe New York University. Dr. Chen is a Fellow of IEEE andAAAS. He received the IEEE Computer Society 2006 TechnicalAchievement Award. He is author/editor of 13 books, 17 bookchapters, and more than 130 SCI journal articles coveringintelligence analysis, biomedical informatics, data/text/web mining,digital library, knowledge management, and Web computing. Dr.Chen was ranked #8 in publication productivity in InformationSystems (CAIS 2005) and #1 in Digital Library research (IP&M2005) in two recent bibliometric studies. He serves on ten editorialboards including: ACM Transactions on Information Systems,IEEE Transactions on Systems, Man, and Cybernetics, Journal ofthe American Society for Information Science and Technology, andDecision Support Systems. Dr. Chen has served as a ScientificCounselor/Advisor of the National Library of Medicine (USA),Academia Sinica (Taiwan), and National Library of China (China).He has been an advisor for major NSF, DOJ, NLM, DOD, DHS,and other international research programs in digital library, digitalgovernment, medical informatics, and national security research.Dr. Chen is the founding director of Artificial Intelligence Lab andHoffman E-Commerce Lab. He is conference co-chair of ACM/IEEE Joint Conference on Digital Libraries (JCDL) 2004 and hasserved as the conference/program co-chair for the past eightInternational Conferences of Asian Digital Libraries (ICADL), thepremiere digital library meeting in Asia that he helped develop. Dr.Chen is also (founding) conference co-chair of the IEEE Interna-tional Conferences on Intelligence and Security Informatics (ISI)2003–2007. Dr. Chen has also received numerous awards ininformation technology and knowledge management educationand research including: AT&T Foundation Award, SAP Award,the Andersen Consulting Professor of the Year Award, theUniversity of Arizona Technology Innovation Award, and theNational Chaio-Tung University Distinguished Alumnus Award.Further information can be found at http://ai.arizona.edu/hchen/.

Inf Syst Front