DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
-
Upload
impellotyrannis -
Category
Documents
-
view
214 -
download
0
Transcript of DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
1/71
a
GAOUnited States General Accounting Office
Report to the Ranking Minority Member,Subcommittee on Financial Management,the Budget, and International Security,Committee on Governmental Affairs,U.S. Senate
May 2004 DATA MINING
Federal Efforts Covera Wide Range of Uses
GAO-04-548
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
2/71
Federal agencies are using data mining for a variety of purposes, rangingfrom improving service or performance to analyzing and detecting terrorist
patterns and activities. Our survey of 128 federal departments and agencieson their use of data mining shows that 52 agencies are using or are planningto use data mining. These departments and agencies reported 199 datamining efforts, of which 68 are planned and 131 are operational. The figurehere shows the most common uses of data mining efforts as described byagencies. Of these uses, the Department of Defense reported the largest
number of efforts aimed at improving service or performance, managinghuman resources, and analyzing intelligence and detecting terroristactivities. The Department of Education reported the largest number ofefforts aimed at detecting fraud, waste, and abuse. The National Aeronauticand Space Administration reported the largest number of efforts aimed atanalyzing scientific and research information. For detecting criminalactivities or patterns, however, efforts are spread relatively evenly amongthe agencies that reported having such efforts.
In addition, out of all 199 data mining efforts identified, 122 used personalinformation. For these efforts, the primary purposes were improving serviceor performance; detecting fraud, waste, and abuse; analyzing scientific andresearch information; managing human resources; detecting criminal
activities or patterns; and analyzing intelligence and detecting terroristactivities.
Agencies also identified efforts to mine data from the private sector and datafrom other federal agencies, both of which could include personalinformation. Of 54 efforts to mine data from the private sector (such ascredit reports or credit card transactions), 36 involve personal information.Of 77 efforts to mine data from other federal agencies, 46 involve personalinformation (including student loan application data, bank account numberscredit card information, and taxpayer identification numbers).
Top Six Purposes of Data Mining Efforts in Departments and Agencies
Both the government and theprivate sector are increasinglyusing data miningthat is, theapplication of database technologyand techniques (such as statisticalanalysis and modeling) to uncoverhidden patterns and subtlerelationships in data and to inferrules that allow for the prediction
of future results. As has beenwidely reported, many federal datamining efforts involve the use ofpersonal information that is minedfrom databases maintained bypublic as well as private sectororganizations.
GAO was asked to survey datamining systems and activities infederal agencies. Specifically, GAOwas asked to identify planned andoperational federal data miningefforts and describe their
characteristics.
www.gao.gov/cgi-bin/getrpt?GAO-04-548
To view the full product, including the scopeand methodology, click on the link above.For more information, contact Linda Koontz at(202) 512-6240 or [email protected].
Highlights of GAO-04-548, a report to theRanking Minority Member, Subcommitteeon Financial Management, the Budget,and International Security, Committee on
Governmental Affairs, U.S. Senate
May 2004
DATA MINING
Federal Efforts Cover a Wide Range ofUses
http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548http://www.gao.gov/cgi-bin/getrpt?GAO-04-548 -
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
3/71
Page i GAO-04-548 Data Minin
Contents
LetterResults in BriefBackgroundAgencies Identified Numerous Data Mining Efforts with Various
AimsSummary 1
AppendixesAppendix I: Objective, Scope, and Methodology 1
Appendix II: Surveyed Departments and Agencies 1
Appendix III: Departments and Agencies Reporting No Data MiningEfforts 2
Appendix IV: Inventories of Efforts 2
Tables Table 1: Top Six Purposes of Data Mining Efforts in Departmentsand Agencies and Number of Efforts Reported
Table 2: Department of Agricultures Inventory of Data MiningEfforts 2
Table 3: Department of Commerces Inventory of Data Mining
Efforts 2Table 4: Department of Defenses Inventory of Data Mining
Efforts 2Table 5: Department of Educations Inventory of Data Mining
Efforts 3Table 6: Department of Energys Inventory of Data Mining
Efforts 4Table 7: Department of Health and Human Services Inventory of
Data Mining Efforts 4Table 8: Department of Homeland Securitys Inventory of Data
Mining Efforts 4Table 9: Department of the Interiors Inventory of Data Mining
Efforts 4Table 10: Department of Justices Inventory of Data Mining
Efforts 4Table 11: Department of Labors Inventory of Data Mining Efforts 4Table 12: Department of States Inventory of Data Mining Efforts 5Table 13: Department of Transportations Inventory of Data Mining
Efforts 5
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
4/71
Contents
Page ii GAO-04-548 Data Minin
Table 14: Department of the Treasurys Inventory of Data MiningEfforts 5
Table 15: Department of Veterans Affairs Inventory of Data MiningEfforts 5
Table 16: Environmental Protection Agencys Inventory of DataMining Efforts 5
Table 17: Export-Import Bank of the United States Inventory of DataMining Efforts 5
Table 18: Federal Deposit Insurance Corporations Inventory of DataMining Efforts 5
Table 19: Federal Reserve Systems Inventory of Data Mining
Efforts 5Table 20: National Aeronautics and Space Administrations
Inventory of Data Mining Efforts 5Table 21: Nuclear Regulatory Commissions Inventory of Data
Mining Efforts 6Table 22: Office of Personnel Managements Inventory of Data
Mining Efforts 6Table 23: Pension Benefit Guaranty Corporations Inventory of Data
Mining Efforts 6Table 24: Railroad Retirement Boards Inventory of Data Mining
Efforts 6Table 25: Small Business Administrations Inventory of Data Mining
Efforts 6
Figures Figure 1: Top Six Purposes of Data Mining Efforts That InvolvePersonal Information 1
Figure 2: Top Six Purposes of Data Mining Efforts That InvolvePrivate Sector Data 1
Figure 3: Top Six Purposes of Data Mining Efforts That InvolveData from Other Federal Agencies 1
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
5/71
Contents
Page iii GAO-04-548 Data Minin
Abbreviations
CARDS Counterintelligence Analytical Research Data SystemCG Coast GuardCI-AIMS Counterintelligence Automated Investigative
Management SystemDHHS Department of Health and Human ServicesDOD Department of DefenseDOE Department of EnergyDOT Department of Transportation
EFTPS Electronic Federal Tax Payment SystemEOS Earth Observing SystemFARS Fatality Analysis Reporting SystemFDA Food and Drug AdministrationGENESIS Global Environmental and Earth Science Information
SystemGSFC Goddard Space Federal CenterHR Human ResourcesHRSA Health Resources and Services AdministrationMATRIX Multistate Anti-terrorism Information Exchange SystemNASA National Aeronautics and Space AdministrationNVO National Virtual Observatory
OIG Office of Inspector GeneralOLAP On-line Analytical ProcessingRSST Real Estate Stress TestSAA Spectral Analysis AutomationSAS Safety Automated SystemSMARTS Statistical Management Analysis and Reporting Tool
SystemSWC Space Warfare CenterTIMS Technical Information Management SystemTOP Treasury Offset Program VA Veterans AffairsVHA Veterans Health Administration
VISN Veterans Integrated Service Network
This is a work of the U.S. government and is not subject to copyright protection in theUnited States. It may be reproduced and distributed in its entirety without furtherpermission from GAO. However, because this work may contain copyrighted images orother material, permission from the copyright holder may be necessary if you wish toreproduce this material separately.
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
6/71
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
7/71
Page 2 GAO-04-548 Data Minin
To address our objective to identify and describe operational and planneddata mining systems and activities in federal agencies, we surveyed chiefinformation officers or comparable officials at 128 federal departments andagencies to determine whether the agencies had operational and planneddata mining systems or activities.2 We then conducted telephone interviewswith the reported system managers to obtain information on thecharacteristics of the identified data mining efforts. To verify theinformation we received, we sent follow-up letters to agencies thatresponded as well as to those that did not respond, we asked responsibleofficials to verify the information, and we performed random assessmentsof the means that these officials used to verify the information.
In addition, we conducted a search of technical literature and periodicalsto develop a comprehensive list of federal government data mining effortsand then compared these efforts with data mining efforts reported byfederal agencies. If the data mining efforts on our lists were not reported onthe survey, we contacted the appropriate chief information officers and,with their concurrence, added the efforts.
We performed our work from May 2003 to April 2004 in accordance withgenerally accepted government auditing standards. Additional details onour scope and methodology are provided in appendix I.
Results in Brief Federal agencies are using data mining for a variety of purposes, rangingfrom improving service or performance to analyzing and detecting terroristpatterns and activities. Our survey of 128 federal departments and agencieson their use of data mining shows that 52 agencies are using or are planningto use data mining. These departments and agencies reported 199 datamining efforts, of which 68 were planned and 131 were operational. Themost common uses of data mining efforts were described by agencies as
improving service or performance;
detecting fraud, waste, and abuse;
analyzing scientific and research information;
2That is, we asked about both systems explicitly dedicated to data mining and activitiesusing automated tools to mine databases that are part of other systems. In this report, weuse the word efforts to refer to both systems and activities, unless otherwise specified.
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
8/71
Page 3 GAO-04-548 Data Minin
managing human resources;
detecting criminal activities or patterns; and
analyzing intelligence and detecting terrorist activities.
The Department of Defense reported having the largest number of datamining efforts aimed at improving service or performance and at managinghuman resources. Defense was also the most frequent user of efforts aimedat analyzing intelligence and detecting terrorist activities, followed by theDepartments of Homeland Security, Justice, and Education.
The Department of Education reported the largest number of efforts aimedat detecting fraud, waste, and abuse, while the National Aeronautics andSpace Administration targets most of their data mining efforts (21 out of23) toward analyzing scientific and research information. Data miningefforts for detecting criminal activities or patterns, however, were spreadrelatively evenly among the reporting agencies.
In addition, out of all 199 data mining efforts identified, 122 used personalinformation. For these efforts, the primary purposes were detecting fraud,waste, and abuse; detecting criminal activities or patterns; analyzingintelligence and detecting terrorist activities; and increasing taxcompliance.
Agencies also identified efforts to mine data from the private sector anddata from other federal agencies, both of which could include personalinformation. Of 54 efforts to mine data from the private sector (such ascredit reports or credit card transactions), 36 involve personal informationOf 77 efforts to mine data from other federal agencies, 46 involve personalinformation (including student loan application data, bank accountnumbers, credit card information, and taxpayer identification numbers).
Background Data mining enables corporations and government agencies to analyzemassive volumes of data quickly and relatively inexpensively. The use ofthis type of information retrieval has been driven by the exponentialgrowth in the volumes and availability of information collected by thepublic and private sectors, as well as by advances in computing and datastorage capabilities. In response to these trends, generic data mining toolsare increasingly available foror built intomajor commercial databaseapplications. Today, mining can be performed on many types of data,
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
9/71
Page 4 GAO-04-548 Data Minin
including those in structured, textual, spatial, Web, or multimedia forms.Data mining is becoming a big business; Forrester Research has estimatedthat the data mining market is passing the billion dollar mark.
Although the use and sophistication of data mining have increased in boththe government and the private sector, data mining remains an ambiguousterm. According to some experts, data mining overlaps a wide range ofanalytical activities, including data profiling, data warehousing, onlineanalytical processing, and enterprise analytical applications.3 Some of theterms used to describe data mining or similar analytical activities includefactual data analysis and predictive analytics. We surveyed technical
literature and developed a definition of data mining based on the mostcommonly used terms found in this literature. Based on this search, wedefine data mining as the application of database technology andtechniquessuch as statistical analysis and modelingto uncover hiddenpatterns and subtle relationships in data and to infer rules that allow for theprediction of future results. We used this definition in our initial survey ofchief information officers; these officials found the definition sufficient toidentify agency data mining efforts.
Data mining has been used successfully for a number of years in the privateand public sectors in a broad range of applications. In the private sector,these applications include customer relationship management, marketresearch, retail and supply chain analysis, medical analysis and diagnosticsfinancial analysis, and fraud detection. In the government, data mining wasinitially used to detect financial fraud and abuse. For example, data mininghas been an integral part of GAO audits and investigations of federalgovernment purchase and credit card programs.4 Data mining and relatedtechnologies are also emerging as key tools in Department of HomelandSecurity initiatives.
3Lou Agosta, Data Mining Is DeadLong Live Predictive Analytics! (Forrester Research,Oct. 30, 2003), http://www.forrester.com/Research/LegacyIT/0,7208,33030,00.html(downloaded Jan. 26, 2004).
4For more information on the uses of data mining in GAO audits, see U.S. GeneralAccounting Office,Data Mining: Results and Challenges for Government Programs,Audits, and Investigations, GAO-03-591T (Washington, D.C: Mar. 25, 2003).
http://www.forrester.com/Research/LegacyIT/0http://www.forrester.com/Research/LegacyIT/0,7208,33030,00.htmlhttp://www.gao.gov/cgi-bin/getrpt?GAO-03-591Thttp://www.gao.gov/cgi-bin/getrpt?GAO-03-591Thttp://www.gao.gov/cgi-bin/getrpt?GAO-03-591Thttp://www.forrester.com/Research/LegacyIT/0http://www.forrester.com/Research/LegacyIT/0,7208,33030,00.html -
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
10/71
Page 5 GAO-04-548 Data Minin
Data Mining Poses PrivacyChallenge
Since the terrorist attacks of September 11, 2001, data mining has beenseen increasingly as a useful tool to help detect terrorist threats byimproving the collection and analysis of public and private sector data. In arecent report on information sharing and analysis to address the challengesof homeland security, it was noted that agencies at all levels of governmentare now interested in collecting and mining large amounts of data fromcommercial sources.5 The report noted that agencies may use such data noonly for investigations of known terrorists, but also to perform large-scaledata analysis and pattern discovery in order to discern potential terroristactivity by unknown individuals. Such use of data mining by federal
agencies has raised public and congressional concerns regarding privacy.
One example of a large-scale development effort launched in the wake ofthe September 11 attacks is the Multistate Anti-terrorism InformationExchange System, known as MATRIX. MATRIX, currently used in fivestates,6 provides the capability to store, analyze, and exchange sensitiveterrorism-related and other criminal intelligence data among agencieswithin a state, among states, and between state and federal agencies.Information in MATRIX databases includes criminal history records,drivers license data, vehicle registration records, incarceration records,and digitized photographs. Public awareness of MATRIX and of similarlarge-scale data mining or data mining-like projects has led to concerns
about the governments use of data mining to conduct a massdataveillance7a surveillance of large groups of peopleto sift throughvast amounts of personally identifying data to find individuals who might fita terrorist profile.
5Creating a Trusted Information Network for Homeland Security (New York City: TheMarkle Foundation, December 2003),
http://www.markletaskforce.org/Report2_Full_Report.pdf(downloaded Mar. 8, 2004).6Five states are currently participating in the MATRIX pilot project: Connecticut, Florida,Michigan, Ohio, and Pennsylvania.
7Roger Clarke, Information Technology and Dataveillance, Communications of the ACMvol. 31, issue 5 (New York City: ACM Press, May 1988),http://www.anu.edu.au/people/Roger.Clarke/DV/CACM88.html (downloaded Mar. 5, 2004).Clarke defines mass dataveillance as the systematic use of personal data systems in theinvestigation or monitoring of the actions or communications of groups of people.
http://www.markletaskforce.org/Report2_Full_Report.pdfhttp://www.anu.edu.au/people/Roger.Clarke/DV/CACM88.htmlhttp://www.anu.edu.au/people/Roger.Clarke/DV/CACM88.htmlhttp://www.markletaskforce.org/Report2_Full_Report.pdfhttp://www.anu.edu.au/people/Roger.Clarke/DV/CACM88.htmlhttp://www.markletaskforce.org/Report2_Full_Report.pdf -
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
11/71
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
12/71
Page 7 GAO-04-548 Data Minin
Agencies IdentifiedNumerous Data MiningEfforts with VariousAims
Of 128 federal departments and agencies surveyed for information on theirplanned and operational data mining efforts (listed in app. II), 52 agenciesreported 199 data mining efforts, and 69 agencies reported that they werenot engaged in data mining and were not planning such efforts (listed inapp. III). Of the 199 data mining efforts, 68 were planned and 131 wereoperational. Seven agencies did not respond to our survey.10 Appendix IVlists the 199 data mining efforts reported, along with key characteristics.
Agencies described the most common purposes of data mining efforts as
improving service or performance;
detecting fraud, waste, and abuse;
analyzing scientific and research information;
managing human resources;
detecting criminal activities or patterns; and
analyzing intelligence and detecting terrorist activities.
As shown in table 1, the Department of Defense reported the largestnumber of efforts aimed at improving service or performance (with 19 outof 65 reported efforts) and at managing human resources (with 14 out of 17efforts). Defense was also the most frequent user of efforts aimed atanalyzing intelligence and detecting terrorist activities, with 5 of 14 effortsfollowed by the Departments of Homeland Security and Justice, with 4 and3 efforts, respectively. The Department of Education has the largestnumber of efforts aimed at detecting fraud, waste, and abuse (9 out of 24efforts reported). The National Aeronautics and Space Administrationaccounts for 21 of the 23 identified efforts for analyzing scientific andresearch information. Efforts are spread relatively evenly among theagencies that reported using data mining efforts for detecting criminal
10Agencies that did not respond to our survey are (1) the Central Intelligence Agency; (2) theCorporation for National and Community Services; (3) the Department of Army, Departmenof Defense; (4) the Equal Employment Opportunity Commission; (5) the National ParkService, Department of the Interior; (6) the National Security Agency, Department ofDefense; and (7) the Rural Utilities Service, Department of Agriculture.
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
13/71
Page 8 GAO-04-548 Data Minin
activities or patterns. Table 1 summarizes the top six uses of data miningefforts among the responding agencies.
Table 1: Top Six Purposes of Data Mining Efforts in Departments and Agencies and Number of Efforts Reported
Source: GAO analysis of agency-provided data.
Department or agency
Improvingservice or
performance
Detectingfraud, waste,
and abuse
Analyzingscientific and
researchinformation
Managinghuman
resources
Detectingcriminal
activities orpatterns
Analyzinintelligenc
and detectinterroris
activitie
Department of Agriculture 8 1
Department of Commerce
Department of Defense 19 1 1 14 1
Department of Education 6 9 3
Department of Energy 3
Department of Health and HumanServices 4 1
Department of Homeland Security 5 2 2
Department of the Interior 1
Department of Justice 1 1 3
Department of Labor 3 1
Department of State 2
Department of Transportation 1
Department of the Treasury 4 1 2
Department of Veterans Affairs 5 5 1
Environmental Protection Agency 1
Export-Import Bank of the UnitedStates 1
Federal Deposit Insurance Corporation 1
Federal Reserve System 1
National Aeronautics and SpaceAdministration 1 1 21
Nuclear Regulatory Commission 1Office of Personnel Management 1
Pension Benefit Guaranty Corporation 2
Railroad Retirement Board 1
Small Business Administration 1
Total 65 24 23 17 15 1
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
14/71
Page 9 GAO-04-548 Data Minin
Some data mining purposes focus on human activities and therefore areinherently likely to involve personal information; examples of thesepurposes are detecting fraud, waste, and abuse; detecting criminalactivities or patterns; managing human resources; and analyzingintelligence. The following are examples of data mining efforts for each ofthese purposes:
Detecting fraud, waste, and abuse. The Veterans BenefitsAdministrations C & P Payment Data Analysis effort mines veteranscompensation and pension data for evidence of fraud.
Detecting criminal activities or patterns. The Department ofEducations Title IV Identity Theft Initiative effort focuses on identitytheft cases involving education loans.
Managing human resources. The U.S. Air Forces Oracle HR (HumanResources) uses data mining to provide information on promotions, paygrades, clearances, and other information relevant to human resourcesplanning.
Analyzing intelligence and detecting terrorist activities. The DefenseIntelligence Agencys Verity K2 Enterprise mines data from theintelligence community and Internet sources to identify foreignterrorists or U.S. citizens connected to foreign terrorism activities.
On the other hand, other categories of efforts do not necessarily focus onhuman activities or involve personal information, such as many of theefforts aimed at analyzing scientific and research information. The NationaAeronautics and Space Administration, for example, mines large, complexearth science data sets to find patterns and relationships to detect hiddenevents (the system is called Machine Learning and Data Mining forImproved Data Understanding of High Dimensional Earth Sensed Data).
Similarly, many efforts aimed at improving service or performance (the
most frequently cited purpose of data mining efforts) do not involvepersonal information. For example, the Department of the Navys SupplyManagement System Multidimensional Cubes system includes a datawarehouse containing data on every ship part that has been ordered sincethe 1980s, with multidimensional information on each part. The Navy usesdata mining to calculate failure rates and identify needed improvements;according to the Navy, this system reduces downtime on ships byimproving parts replacement.
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
15/71
Page 10 GAO-04-548 Data Minin
However, some efforts aimed at improving service or performance doinvolve personal information. For example, the Veterans AdministrationsVISN (Veterans Integrated Service Network) 16 Data Warehouse is minedfor a variety of information, including patient visits, laboratory tests, andpharmacy records, to provide management with health care systemperformance information.
Overall, 122 of the 199 data mining efforts involve personal information.Figure 1 shows the top six purposes of these efforts, as well as theirdistribution.
Figure 1: Top Six Purposes of Data Mining Efforts That Involve Personal Informatio
Of the 199 data mining efforts, 54 use or plan to use data from the privatesector. Of these, 36 involve personal information. The personal informationfrom the private sector included credit reports and credit card transactionrecords. Figure 2 shows the distribution of the top six purposes of the 54
efforts involving data from the private sector.
0 10 20 30 40
Purposes
Source: GAO analysis of agency data.
Number of data mining efforts
Managing human resources
Analyzing intelligence and detectingterrorist activities
Increasing tax compliance
Detecting criminal activities orpatterns
Improving service or performance
Detecting fraud, waste, and abuse
7
10
15
15
24
33
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
16/71
Page 11 GAO-04-548 Data Minin
Figure 2: Top Six Purposes of Data Mining Efforts That Involve Private Sector Data
Of the 199 data mining efforts, 77 efforts use or plan to use data from otherfederal agencies. Of the 77 efforts, 46 involve personal information. Thepersonal information from other federal agencies included student loanapplication data, bank account numbers, credit card information, andtaxpayer identification numbers. Figure 3 shows the top six uses for the 77efforts involving data from other federal agencies and their distribution.
0 10 20 30 40
Purposes
Source: GAO analysis of agency data.
Number of data mining efforts
Improving safety
Detecting criminal activities or
patterns
Analyzing intelligence and
detecting terrorist activities
Analyzing scientific and research
information
Detecting fraud, waste, and abuse
Improving service or performance
4
4
5
8
9
14
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
17/71
Page 12 GAO-04-548 Data Minin
Figure 3: Top Six Purposes of Data Mining Efforts That Involve Data from Other
Federal Agencies
Summary Driven by advances in computing and data storage capabilities and bygrowth in the volumes and availability of information collected by thepublic and private sectors, data mining enables government agencies toanalyze massive volumes of data. Our survey shows that data mining isincreasingly being used by government for a variety of purposes, rangingfrom improving service or performance to analyzing and detecting terroristpatterns and activities.
Although this survey provides a broad overview of the emerging uses ofdata mining in the federal government, more work is needed to shed lighton the privacy implications of these efforts. In future work, we plan toexamine selected federal data mining efforts and their implications.
As agreed with your office, unless you publicly announce the contents ofthe report earlier, we plan no further distribution until 30 days from thereport date. At that time, we will send copies of this report to the Chairmenand Ranking Minority Members of the House Committee on GovernmentReform; Subcommittee on Civil Service and Agency Organization, HouseCommittee on Government Reform; Select Committee on HomelandSecurity, House of Representatives; Senate Committee on Governmental
0 10 20 30 40
Purposes
Source: GAO analysis of agency data.
Number of data mining efforts
Detecting fraud, waste, and abuse
Analyzing scientific and researchinformation
Analyzing intelligence and detecting
terrorist activities
Detecting criminal activities or patterns
Managing human resources
Improving service or performance
5
6
7
12
13
20
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
18/71
Page 13 GAO-04-548 Data Minin
Affairs; and the Subcommittee on Oversight of Government Management,the Federal Workforce and the District of Columbia, Senate Committee onGovernmental Affairs. We will also make copies available to others onrequest. In addition, this report will be available at no charge on the GAOWeb site at http://www.gao.gov.
If you have any questions concerning this report, please call me at (202)512-6240 or Mirko J. Dolak, Assistant Director, at (202) 512-6362. We canalso be reached by e-mail at [email protected] and [email protected],respectively. Key contributors to this report were Camille M. Chaires,Barbara S. Collier, Orlando O. Copeland, Nancy E. Glover, Stuart M.
Kaufman, Lori D. Martinez, Morgan F. Walts, and Marcia C. Washington.
Sincerely yours,
Linda D. KoontzDirector, Information Management Issues
http://www.gao.gov./mailto:[email protected]:[email protected]:[email protected]:[email protected]://www.gao.gov./mailto:[email protected]:[email protected]:[email protected]:[email protected] -
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
19/71
Page 14 GAO-04-548 Data Minin
Appendix I
Objective, Scope, and Methodology
Our objective was to identify and describe planned and operational federadata mining efforts. As a first step in addressing this objective, wedeveloped a definition of data mining. Because this expression has arange of meanings, we surveyed the technical literature to develop adefinition based on the most commonly used terms found in this literatureWe defined data mining as the application of database technology andtechniquessuch as statistical analysis and modelingto uncover hiddenpatterns and subtle relationships in data and to infer rules that allow for theprediction of future results. In our initial survey of chief informationofficers, these officials found the definition sufficient to identify agencydata mining efforts.
We then surveyed chief information officers or comparable officials at 128federal departments and agencies (see app. II) and asked them to identifywhether their agency had operational and planned data mining efforts. Weachieved a 95 percent response rate. Of the 121 agencies that responded, 69reported that they did not have any data mining efforts (see app. III). Wefollowed up with these 69 agencies and gave them another opportunity toreport data mining efforts.
To obtain information on the characteristics of the identified operational orplanned data mining efforts, we conducted structured telephoneinterviews1 with the identified system owners or activity managers. Theinterviews were designed to obtain detailed information about each datamining system, including the purpose and size, the use of personalinformation, and the use of data from the private sector or other federalorganizations. We pretested the structured interview to ensure relevanceand clarity.
We aggregated these data by agency and sent them back to the chiefinformation officer, comparable official, or their designee and asked thatthey review the characteristics for completeness and accuracy. One of the52 departments and agencies that reported data mining systemstheDepartment of Homeland Securityhas not responded to our request to
review the reported data for completeness and accuracy.
1In a structured interview, the interviewer asks the same questions of numerous individualsor individuals representing numerous organizations in a precise manner, offering eachinterviewee the same set of possible responses.
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
20/71
Appendix IObjective, Scope, and Methodology
Page 15 GAO-04-548 Data Minin
We performed random assessments of the means that these officials usedto verify the information. Based on these assessments, we concluded thatthe agencies verification methods were reasonable and that as a result, wecould rely on the accuracy of the reported data. We also conducted asearch of technical literature and periodicals to develop a list of federalgovernment data mining efforts and then compared the efforts on this listwith the data mining efforts reported by federal agencies. If the data miningefforts on our list were not reported on the survey, we contacted the chiefinformation officer or comparable official to determine whether that datamining effort should be included in our survey.
Because this was not a sample survey, there are no sampling errors.However, the practical difficulties of conducting any survey may introduceerrors, commonly referred to as nonsampling errors. For example,difficulties in how a particular question is interpreted, in the sources ofinformation that are available to respondents, or in how the data areentered into a database or were analyzed can introduce unwantedvariability into the survey results. We took steps in the development of thestructured interview, the data collection, and the data analysis to minimizethese nonsampling errors. Among these steps, we pretested the structuredinterview instrument, contacted nonresponding agencies as well asagencies not identifying data mining efforts, and sent the aggregated datato the agency chief information officer for review.
We conducted our work from May 2003 to April 2004 in accordance withgenerally accepted government auditing standards.
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
21/71
Page 16 GAO-04-548 Data Minin
Appendix II
Surveyed Departments and Agencies
Department of Agriculture
Agricultural Marketing Service
Agricultural Research Service
Animal and Plant Health Inspection Service
Cooperative State Research, Education, and Extension Service
Farm Service Agency
Food and Nutrition Service
Food Safety and Inspection Service
Foreign Agricultural Service
Forest Service
National Agricultural Statistics Service
Natural Resources Conservation Service
Risk Management Agency
Rural Utilities Service
Department of Commerce
Bureau of the Census
Economic Development Administration
International Trade Administration
National Oceanic and Atmospheric Administration
U.S. Patent and Trademark Office
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
22/71
Appendix IISurveyed Departments and Agencies
Page 17 GAO-04-548 Data Minin
Department of Defense
Missile Defense Agency
Defense Advanced Research Projects Agency
Defense Commissary Agency
Defense Contract Audit Agency
Defense Contract Management Agency
Defense Information Systems Agency
Defense Intelligence Agency
Defense Legal Services Agency
Defense Logistics Agency
Defense Security Cooperation Agency
Defense Security Service
Defense Threat Reduction Agency
Department of the Air Force
Department of the Army
Department of the Navy
National Geospatial-Intelligence Agency
National Security Agency
U.S. Marine Corps
Department of Education
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
23/71
Appendix IISurveyed Departments and Agencies
Page 18 GAO-04-548 Data Minin
Department of Energy
Bonneville Power Administration
Southeastern Power Administration
Southwestern Power Administration
Western Area Power Administration
Department of Health and Human Services
Administration for Children and Families
Agency for Healthcare Research and Quality
Centers for Disease Control and Prevention
Centers for Medicare and Medicaid Services
Food and Drug Administration
Health Resources and Services Administration
Indian Health Service
National Institutes of Health
Program Support Center
Department of Homeland Security
Border and Transportation Security Directorate
Bureau of Citizenship and Immigration Services
Emergency Preparedness and Response Directorate
Information Analysis and Infrastructure Protection Directorate
Management Directorate
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
24/71
Appendix IISurveyed Departments and Agencies
Page 19 GAO-04-548 Data Minin
Science and Technology Directorate
U.S. Coast Guard
U.S. Secret Service
Department of Housing and Urban Development
Department of the Interior
Bureau of Indian Affairs
Bureau of Land Management
Bureau of Reclamation
Minerals Management Service
National Park Service
Office of Surface Mining Reclamation and Enforcement
U.S. Fish and Wildlife Service
U.S. Geological Survey
Department of Justice
Bureau of Alcohol, Tobacco, Firearms, and Explosives
Drug Enforcement Administration
Federal Bureau of Investigation
Federal Bureau of Prisons
U.S. Marshals Service
Department of Labor
Department of State
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
25/71
Appendix IISurveyed Departments and Agencies
Page 20 GAO-04-548 Data Minin
Department of Transportation
Federal Aviation Administration
Federal Highway Administration
Federal Motor Carrier Safety Administration
Federal Railroad Administration
Federal Transit Administration
National Highway Traffic Safety Administration
Department of the Treasury
Bureau of Engraving and Printing
Bureau of the Public Debt
Financial Management Service
Internal Revenue Service
Office of the Comptroller of the Currency
Office of Thrift Supervision
U.S. Mint
Department of Veterans Affairs
Veterans Benefits Administration
Veterans Health Administration
Agency for International Development
Central Intelligence Agency
Corporation for National and Community Service
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
26/71
Appendix IISurveyed Departments and Agencies
Page 21 GAO-04-548 Data Minin
Environmental Protection Agency
Equal Employment Opportunity Commission
Executive Office of the President
Export-Import Bank of the United States
Federal Deposit Insurance Corporation
Federal Energy Regulatory Commission
Federal Reserve System
Federal Retirement Thrift Investment Board
General Services Administration
Legal Services Corporation
National Aeronautics and Space Administration
National Credit Union Administration
National Labor Relations Board
National Science Foundation
Nuclear Regulatory Commission
Office of Management and Budget
Office of Personnel Management
Peace Corps
Pension Benefit Guaranty Corporation
Railroad Retirement Board
Securities and Exchange Commission
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
27/71
Appendix IISurveyed Departments and Agencies
Page 22 GAO-04-548 Data Minin
Small Business Administration
Smithsonian Institution
Social Security Administration
U.S. Postal Service
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
28/71
Page 23 GAO-04-548 Data Minin
Appendix III
Departments and Agencies Reporting No DataMining Efforts
The following 69 departments and agencies reported that they have nooperational or planned data mining efforts:
Department of Agriculture
Agricultural Marketing Service
Agricultural Research Service
Animal and Plant Health Inspection Service
Cooperative State Research, Education, and Extension Service
Farm Service Agency
Foreign Agricultural Service
Forest Service
National Agricultural Statistics Service
Food Safety and Inspection Service
Department of Commerce
Economic Development Administration
Bureau of the Census
International Trade Administration
Department of Commerce Headquarters
National Oceanic and Atmospheric Administration
Department of Defense
Defense Contract Audit Agency
Missile Defense Agency
Defense Legal Services Agency
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
29/71
Appendix IIIDepartments and Agencies Reporting NoData Mining Efforts
Page 24 GAO-04-548 Data Minin
Defense Security Service
Defense Threat Reduction Agency
Defense Logistics Agency
Defense Advanced Research Projects Agency
Defense Contract Management Agency
Defense Security Cooperation Agency
Department of Energy
Bonneville Power Administration
Southeastern Power Administration
Southwestern Power Administration
Western Area Power Administration
Department of Health and Human Services
Centers for Medicare and Medicaid Services
Administration for Children and Families
National Institutes of Health
Indian Health Service
Department of Homeland Security
Science and Technology Directorate
Management Directorate
Bureau of Citizenship and Immigration Services
Department of Homeland Security Headquarters
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
30/71
Appendix IIIDepartments and Agencies Reporting NoData Mining Efforts
Page 25 GAO-04-548 Data Minin
Department of Housing and Urban Development
Department of the Interior
Bureau of Reclamation
Bureau of Land Management
U.S. Geological Survey
Fish and Wildlife Service
Office of Surface Mining Reclamation and Enforcement
Bureau of Indian Affairs
Department of the Interior Headquarters
Department of Justice
Bureau of Alcohol, Tobacco, Firearms, and Explosives
Department of Transportation
Federal Aviation Administration
Federal Transit Administration
Federal Railroad Administration
Federal Motor Carrier Safety Administration
Federal Highway Administration
Department of the Treasury
Comptroller of the Currency
Bureau of the Public Debt
Office of Thrift Supervision
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
31/71
Appendix IIIDepartments and Agencies Reporting NoData Mining Efforts
Page 26 GAO-04-548 Data Minin
Department of the Treasury Headquarters
Bureau of Engraving and Printing
Agency for International Development
Executive Office of the President
Federal Energy Regulatory Commission
Federal Retirement Thrift Investment Board
General Services Administration
Legal Services Corporation
National Credit Union Administration
National Labor Relations Board
National Science Foundation
Office of Management and Budget
Peace Corps
Security and Exchange Commission
Smithsonian Institution
Social Security Administration
U.S. Postal service
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
32/71
Page 27 GAO-04-548 Data Minin
Appendix IV
Inventories of Efforts
The following tables present selected information from our survey of 128major federal departments and agencies on their use of data mining. Thetables list the purpose of each data mining effort, whether the system isplanned or operational, and whether the system uses personal informationdata from the private sector, or data from other federal agencies. Thesurvey shows that 52 departments and agencies are using or are planning touse data mining. These departments and agencies reported 199 data miningefforts, of which 68 were planned and 131 were operational.
Table 2: Department of Agricultures Inventory of Data Mining Efforts
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesector data
Otheragencydata
Department of Agriculture Headquarters
Travel Data Mart Will consolidate employee travelinformation from financial andtravel systems. Will allow for agovernmentwide e-travel systemand provide the department withinformation on the financial
ramifications of its travel.
Improvingservice orperformance
Planned Yes No No
Financial StatementsData Warehouse
Is used in the production ofconsolidated financial statements.Provides information for productsthat are used to satisfy externalreporting requirements, such asOffice of Management and Budgetand Department of the Treasuryrequirements.
Financialmanagement
Operational No No No
Financial DataWarehouse
Is the departments internalfinancial management reportingsystem. Data mining is done for adhoc and on-demand reports.
Financialmanagement
Operational Yes No No
Food and Nutrition Service
Grantee MonitoringActivitiesSoutheastRegional Office
Assists in monitoring the financialstatus of grant holders. Granteesare required to provideexpenditure reports, and analysisis performed quarterly thatmatches stated draws to theactual draws from the U.S.Treasury.
Improvingservice orperformance
Operational Yes No No
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
33/71
Appendix IVInventories of Efforts
Page 28 GAO-04-548 Data Minin
Source: Department of Agriculture.
Grantee MonitoringActivitiesMountainPlains RegionalOffice
Assists in monitoring themanagement and distribution ofIndian funds for major food benefitprograms, such as food stamps, in10 grantee states.
Improvingservice orperformance
Operational Yes No No
Grantee MonitoringActivities
Southwest RegionalOffice
Maximizes on-site monitoringefforts by confirming the accuracy
of grantee accounting. Reduceson-site time, maximizes time tocomplete reviews, and hasachieved a 50 percent travelsavings.
Improvingservice or
performance
Operational Yes No No
Grantee MonitoringActivitiesMidwestRegional Office
Will be a reporting system toprovide reports and automate theaudit process. Plans are toacquire data mining tools to reviewand compare budgets, reports,and plans.
Improvingservice orperformance
Planned No No Yes
Grantee MonitoringActivitiesNortheastRegional Office
Supports on-site reviews ofanalyses to confirm financialreport information.
Improvingservice orperformance
Operational Yes Yes No
Integrated ProgramAccounting SystemData Integrity
Will create ad-hoc reportingcenters to validate accountinginformation.
Improvingservice orperformance
Planned No No No
Natural Resources Conservation Service
National ResourceInventory Used forStatistical Analysis ofPast Soil SurveyDatabases.
Is a trending database that tracksmore than 200 resource issuessuch as monitoring erosion. Alsoprocesses statistical technology.
Improvingservice orperformance
Operational No No No
Risk Management Agency
CAE Is part of a congressionallymandated project to assist theRisk Management Agency incontrolling fraud, waste, and
abuse in the Federal CropInsurance Corporation program.
Detectingfraud, waste,and abuse
Operational Yes Yes Yes
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesector data
Otheragencydata
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
34/71
Appendix IVInventories of Efforts
Page 29 GAO-04-548 Data Minin
Table 3: Department of Commerces Inventory of Data Mining Efforts
Source: Department of Commerce.
Table 4: Department of Defenses Inventory of Data Mining Efforts
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesector data
Otheragencydata
U.S. Patent and Trademark Office
CompensationProjection Model inthe Enterprise DataWarehouse
Generates and makes availablecompensation projection data,both salary and benefits, oncurrent employees and onplanned hires. It also accounts forplanned attritions.
Managinghumanresources
Operational Yes No Yes
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
Defense Commissary Agency
DeCA ElectronicRecordsManagement andArchive System
Will be a corporate informationsystem for managing unstructureddata. It will allow for electronicrecord keeping, documentmanagement, and automatedreceipt processes.
Improvingservice orperformance
Planned Yes Yes Yes
Corporate DecisionSupport System/CommissaryOperationsManagement System
Mines data to produce analyticaldata on commissary operations.Provides information such as whatitems stores are selling and helpsdetermine whether cashiers arebeing honest.
Improvingservice orperformance
Operational No No No
Defense Information Systems Agency
Enterprise BusinessIntelligence System
Will replace the currentmanagement informationenvironment, which includesoperations, reporting, billing,statistics, and other managementinformation activities.
Improvingservice orperformance
Planned No No No
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
35/71
Appendix IVInventories of Efforts
Page 30 GAO-04-548 Data Minin
Defense Intelligence Agency
Insight SmartDiscovery
Will be a data mining knowledgediscovery tool to work againstunstructured text. Will categorizenouns (names, locations, events)and present information in images.
Analyzingintelligenceand detectingterroristactivities
Planned Yes No Yes
Verity K2 Enterprise Mines data from the intelligence
community and Internet searchesto identify foreign terrorists or U.S.citizens connected to foreignterrorism activities.
Analyzing
intelligenceand detectingterroristactivities
Operational Yes Yes Yes
PATHFINDER Is a data mining tool developed foranalysts that provides the ability toanalyze government and privatesector databases rapidly. It cancompare and search multiplelarge databases quickly.
Analyzingintelligenceand detectingterroristactivities
Operational Yes No Yes
Autonomy Is a large search engine tool thatis used to search hundreds ofthousands of word documents. Isused for the organization and
knowledge discovery ofintelligence.
Analyzingintelligenceand detectingterrorist
activities
Operational No No Yes
Department of the Air Force
ANG DataWarehouseGuardian
Will be used to measure militaryreadiness. It incorporatesinformation on all disciplines toprovide management informationneeded to assess militaryreadiness.
Measuringmilitaryreadiness
Planned Yes No No
Integrated SpaceWarfare Center(SWC) InformationSystem
Will be an internal databasecontaining information on alldevelopment/execution activitieswithin the SWC. Will be used by allmanagement and analystpersonnel to track and align thecenters activities to warfighterneeds, report on execution status,financial status, schedule status,and performance measurements.
Improvingservice orperformance
Planned Yes No No
Safety AutomatedSystem (SAS)
Will query databases to findautomation mishaps. Governed byDirective 920124 and will allow forthe investigation and reporting ofidentified automation mishaps.
Improvingsafety
Planned Yes No No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
36/71
Appendix IVInventories of Efforts
Page 31 GAO-04-548 Data Minin
Enterprise BusinessSystem
Will support strategic planning,assist in building scientific andtechnical budgets for the AirForce, and serve as a launch pointfor all new programs. Researchand development case files will bemaintained for 75 years; theactivity indexes, catalogs, and
tracks these files.
Improvingservice orperformance
Planned No No Yes
Genomic andProteomic ResultsAnalysis
Analyzes National Institutes ofHealths genetic data.
Analyzingscientific andresearchinformation
Operational No No Yes
IG CorporateInformation System
Enhances combat readiness andmission capabilities for Air CombatCommand units and commanders.It assists in preparing for andconducting inspections.
Improvingservice orperformance
Operational Yes No No
Computer NetworkDefense System
Evaluates network activities tocreate rules for intrusion detectionsystem signature sets.
Improvinginformationsecurity
Operational No No No
FAME Will serve as a central repositoryfor Air Force manpowerinformation. Will track manpowerand unit authorization funding.
Managinghumanresources
Planned No No Yes
Resource Wizard Serves as a manpower trackingsystem. Tracks positions andcaptures data for specific fundingpurposes.
Improvingservice orperformance
Operational No No No
GovernmentPurchase Card
Is used in overseeing purchasesmade by Air Force personnel withgovernment-provided credit cards.
Detectingfraud, waste,and abuse
Operational Yes Yes No
Ambulatory DataSystem Queries
Tracks the initial diagnosis ofpatients with the results of furthertesting and diagnosis. Allows forearly notification of diseases andinjuries.
Monitoringpublic health
Operational Yes No No
Modus OperandiDatabase
Is an investigative tool used toidentify and track trends incriminal behavior. It linkscharacteristics of crimes andprovides details on crime scenesand other crime factors.
Detectingcriminalactivities orpatterns
Operational Yes No No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
37/71
Appendix IVInventories of Efforts
Page 32 GAO-04-548 Data Minin
Executive DecisionSupport System
Takes data from all functionalmetric balances. Processes chartsand graphs to identify trends andto make sure goals areaccomplished.
Improvingservice orperformance
Operational No No No
Inspire Is a tool that assists in providing anarrative description of all
research and development that isbeing conducted within the AirForce. Provides cost andmilestone information on researchand development projects.
Performingstrategic
planning
Operational Yes No Yes
Discoverer Is used to manage personnelrecords, including individualaliases and histories.
Managinghumanresources
Operational Yes No No
Requirements andConcepts System
Will serve as a repository for newsystem projects and systemrequirements. It will be availablefor consultation for information onall project requests and identifiedrequirements.
Improvingservice orperformance
Planned No No No
Business Objects Is a commercial off-the-shelf toolthat is used to analyze and reporton human resources activities.
Managinghumanresources
Operational Yes No Yes
THRMIS Uses commercial off-the-shelfsoftware to maintain a datawarehouse of integrated inventoryand manpower data for the TotalForce: active duty (officer andenlisted), Air Force Reserve, AirNational Guard, and civilians. Isused to assess and analyze thehealth of the Air Force.
Managinghumanresources
Operational Yes No No
SAS Is a Web-enabled personnel datasystem that gives authorizedusers worldwide the ability to
tabulate demographic data onrecruitment, promotion, andretention.
Managinghumanresources
Operational Yes No No
Oracle HR Is a personnel managementsystem that manages informationfor promotions, pay grades,clearances, and other informationrelevant to human resources.
Managinghumanresources
Operational Yes No No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
38/71
Appendix IVInventories of Efforts
Page 33 GAO-04-548 Data Minin
Health Modeling andInformatics DivisionData Mart
Provides information and decisionsupport to the Air Forceheadquarters surgeon general fordecision making, policydevelopment, and resourceallocation. It also providesperformance information andanalysis to medical field units in
support of performancemeasurement objectives.
Improvingservice orperformance
Operational Yes No No
FIRST EDV (BRIO) Will deal with Air Force budgetsand other components of itsfinancial environment. Historicalanalyses and trend analyses willbe performed on the budgetprocess.
Improvingservice orperformance
Planned No Yes No
IG World Is used to store and track data andrequirements, such as lodging andaugmentee requirements, for thePAC inspector general.
Improvingservice orperformance
Operational Yes No No
Department of Defense Headquarters
AutomatedContinuingEvaluation System
Will be used to improve personnelsecurity continuing evaluationefforts within Department ofDefense (DOD) by identifyingissues of security concernbetween the normalreinvestigation cycle for those whohold DOD security clearances andhave signed a consent form that isstill in effect.
Managinghumanresources
Operational Yes Yes Yes
Department of the Navy
Human ResourceTrend Analysis
Is used to improve Navyreadiness. Data on personnelmanning levels are mined toensure that each Navy unit has
the correct number of trainingpersonnel aboard.
Managinghumanresources
Operational No No No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
39/71
Appendix IVInventories of Efforts
Page 34 GAO-04-548 Data Minin
U.S. Naval Academy Allows for the assessment ofacademic performance ofmidshipmen. It includesdemographic information,information on grades,participation in sports, leadershippositions, etc. It is an extension ofthe registrars system and is
mined for comparisons andtrends.
Managinghumanresources
Operational Yes No No
Navy Training MasterPlanning System
Provides overall Navy traininginformation to assist in deliveringNavy training in the most efficientmanner. Pertinent data frommultiple databases areconsolidated into a singledatabase that is mined.
Managinghumanresources
Operational Yes Yes No
DHAMSMultidimensionalCubes
Is a database that containsinformation on the time andattendance of 3,000 marinersacross 120 ships. Allowsmanagers to look at what people
were doing at a particular time andto look across the fleet as a wholeand compare ship activities.
Improvingservice orperformance
Operational No No No
National CargoTracking Plan CargoTracking Division
Is used to conduct predictiveanalysis for counterterrorism,small weapons of massdestruction proliferation, narcotics,alien smuggling, and other high-interest activities involvingcontainer shipping activity.
Analyzingintelligenceand detectingterroristactivities
Operational No Yes No
Supply ManagementSystemMultidimensionalCubes
Reduces downtime on ships byallowing for the analysis of shipparts information. The datawarehouse contains data on everypart that has been ordered since
the 1980s, and hasmultidimensional information oneach part. Failure rates can becalculated and improvements canbe identified.
Improvingservice orperformance
Operational No No No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
40/71
Appendix IVInventories of Efforts
Page 35 GAO-04-548 Data Minin
Type CommandersReadinessManagement System
Is designed to provide a fullyintegrated environment for onlineanalytical processing of readinessindicators. Examples of readinessindicators include status ofsupplies available, equipment inoperation, health status, andcapabilities of the crew.
Measuringmilitaryreadiness
Operational No No Yes
FATHOM (APMCHuman Resources)
Will be an internal program andproject tool used to improvestaffing, recruiting, and managingday-to-day operations.
Managinghumanresources
Planned Yes No No
Navy Training QuotaManagement System
Is used for planning andforecasting training needs basedon skill requirements.
Improvingservice orperformance
Operational No No Yes
National Geospatial-Intelligence Agency
OLAP (On-LineAnalyticalProcessing)
Will provide aggregations ofimagery system performance datafor management officers andsenior source decision makers tocharacterize system performance
and contribution to intelligenceissues of national priority.
Improvingservice orperformance
Planned No No No
CITO Data Mining Will evaluate and identify imagerysystem performance trends foroptimization, monitoring, orreengineering.
Improvingservice orperformance
Planned No No No
InformationRelevance Prototype
Will establish an informationrelevancy prototype to serve as aframework for communityevaluation of commercialinformation relevanceapproaches, methods, andtechnology. The term informationrelevance refers to the ability ofusers to receive or extract, then
display and describe, informationwith measurable satisfactionaccording to their need.
Improvingservice orperformance
Planned No No No
U.S. Marine Corps
Operational DataStore Enterprise
Is used for workforce planning. Managinghumanresources
Operational Yes No No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
41/71
Appendix IVInventories of Efforts
Page 36 GAO-04-548 Data Minin
Source: Department of Defense.
Global CombatSupport SystemsMarine Corps
Will be a physical implementationof the IT enterprise architecturedesigned to support bothimproved and enhanced marineair/ground task force combatservice support functions andcommander and combatantcommander joint task force
combatant support informationrequirements. Data mining willallow for interoperability withlegacy Marine Corps systems andallow for a shared dataenvironment.
Improvingservice orperformance
Planned No Yes No
Total Force DataWarehouse
Is a system whose primarypurpose is workforce planning andworkforce policy decision making.It contains current (after 30 days)and historical workforce data.
Managinghumanresources
Operational Yes No No
Marine CorpsRecruitingInformation Support
System
Is a Web-based informationsystem used for managing assetsand tracking enlisted and officer
accessions into the Marine Corps.
Managinghumanresources
Operational Yes No No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
42/71
Appendix IVInventories of Efforts
Page 37 GAO-04-548 Data Minin
Table 5: Department of Educations Inventory of Data Mining Efforts
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
Citizenship of PLUSLoan BorrowersNational StudentLoan Data Systems
Looks for issues regardingcitizenship among its PLUS loanborrowers. Flags records basedon selected criteria and requestsadditional information from
schools.
Improvingservice orperformance
Operational Yes Yes Yes
Foreign SchoolsInitiatives NationalStudent Loan DataSystem/CentralProcessing
Is a proactive investigation effortthat looks at whether financial aidwas granted individuals attendingforeign institutions during periodsof nonenrollment.
Detectingcriminalactivities orpatterns
Operational Yes No Yes
ProfessionalJudgment Practices:Title IV Pell Grants,National StudentLoan Data
Used to determine whenprofessional judgment has beenexercised for special situationswhere families cannot affordcollege expenses.
Improvingservice orperformance
Operational Yes Yes Yes
Title IV ApplicantDeath DatabaseMatch
Compares Department ofEducation data with the SocialSecurity Administrations death
database to detect fraud orcriminal activity.
Detectingfraud, waste,and abuse
Operational Yes No Yes
Title IV Loans withNo Applications
Will compare information from theFree Application for FederalStudent Aid Program with theFederal Family Education LoanProgram to identify fraud.
Detectingfraud, waste,and abuse
Planned Yes No No
OIGProjectStrikeback
Compares Department ofEducation and Federal Bureau ofInvestigation data for anomalies.Also verifies personal identifiers.
Analyzingintelligenceand detectingterroristactivities
Operational Yes No Yes
Accuracy of U.S.Department of
Education PersonalData
Audits and verifies personalinformation that is contained in the
Department of Educationspersonal data system.
Detectingfraud, waste,
and abuse
Operational Yes No Yes
Impact of CohortDefault RateRedefinitionNational StudentLoan Data System
Audits data to determine theimpact of legislation that extendedthe college loan repayment defaultperiod from 180 to 270 days.
Legislativeimpact
Operational Yes No No
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
43/71
Appendix IVInventories of Efforts
Page 38 GAO-04-548 Data Minin
CheckFreeSoftware/PurchaseCard Program
Takes monthly billing informationfrom the Bank of America tocreate reports on purchases,purchase quantity, and frequencyof purchases. Data are mined forinstances of fraud or abuse.
Detectingfraud, waste,and abuse
Operational Yes Yes No
Improper Pell Grant
Payment Activity
Will compare Pell Grants issued
with the amounts received andlook at the eligibility of grantrecipients.
Detecting
fraud, waste,and abuse
Planned Yes No No
Title IV Identity TheftInitiative
Helps identify patterns and trendsin identity theft cases involvingloans for education. Provides aninvestigative resource for victimsof identity theft.
Detectingcriminalactivities orpatterns
Operational Yes No No
Title IV ApplicantUse of MultipleAddresses/CentralProcessing System
Reviews addresses listed on TitleIV applications to see if they arevalid. For example, jails oremployment addresses are notconsidered valid addresses.
Improvingservice orperformance
Operational Yes No Yes
Lapsed
Funds/ImproperDraw of FederalGrant Proceeds
Identifies funds that remain in the
grants and payment processingsystem beyond the time period forallocating the funds.
Improving
service orperformance
Operational No No No
Decision SupportSystem with OnlineAnalytical ProcessingQuery
Will support the departmentsperformance-based initiative. Willallow custom queries of schoolsfrom state and local databases fordemographics and test scores.
Improvingservice orperformance
Planned No No No
Grant Administrationand Payment System
Assists in managing grantactivities and aids in detectinginstances of fraud or abuse ingrant activities.
Detectingfraud, waste,and abuse
Operational Yes Yes Yes
Budget ExecutionSupport
Uses information in the NationalStudent Loan Data System and asample drawn from it to estimatecohort distributions for financialactivities related to the FederalFamily Education Loan Programpursuant to the Credit Reform Act.
Financialmanagement
Operational Yes No No
Pell Grant ModelAssumptions
Provides estimates on the totalcost of the Pell Grant program. Ituses data from previous years andmakes assumptions for futureyears.
Financialmanagement
Operational No No No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
44/71
Appendix IVInventories of Efforts
Page 39 GAO-04-548 Data Minin
Source: Department of Education.
National StudentLoan Data System
Compiles student loan informationfrom the guaranteeing agencies.Is used for eligibility tracking andto calculate default rates.
Detectingfraud, waste,and abuse
Operational Yes No Yes
Loan ModelAssumptions
Estimates the cost of loanprograms. Also analyzes loandefault behavior.
Financialmanagement
Operational Yes No Yes
Office of theInspector General(OIG) Projects:Tumbleweed/Snowball
Is part of an OIG investigation todetermine potential fraud offinancial aid grants primarily inNew Hampshire.
Detectingcriminalactivities orpatterns
Operational Yes No Yes
Central ProcessingSystem
Processes applications for studentaid. Contains data on more than13 million applications. Data aremined for demographic trends.
Detectingfraud, waste,and abuse
Operational Yes No No
Direct Loan ServicesSystem
Is used to track the life of studentdirect loans and to monitor loanrepayments.
Improvingservice orperformance
Operational Yes Yes Yes
CheckFreeSoftware/Travel Card
Program
Uses monthly billing informationfrom Bank of America to create
reports on travel expenditures tolook for improper use of travelcards.
Detectingfraud, waste,
and abuse
Operational Yes Yes No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
45/71
Appendix IVInventories of Efforts
Page 40 GAO-04-548 Data Minin
Table 6: Department of Energys Inventory of Data Mining Efforts
Source: Department of Energy.
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
CounterintelligenceAutomatedInvestigativeManagement System(CI-AIMS)
Is an investigative managementsystem used by Department ofEnergy (DOE) field sites to trackinvestigative cases on individualsor countries that threaten DOE
assets. Information stored in thisdatabase is also used to supportfederal and state law enforcementagencies in support of nationalsecurity.
Detectingcriminalactivities orpatterns
Operational Yes No No
Autonomy Will be used to mine a myriadintelligence-related databaseswithin the intelligence communityto uncover criminal or terroristactivities relating to DOE assets.
Detectingcriminalactivities orpatterns
Planned Yes No No
CounterintelligenceAnalytical ResearchData System(CARDS)
Is used to log briefings anddebriefings given to DOEemployees who travel to foreigncountries or interact with foreignvisitors to DOE facilities. Data aremined to identify potential threatsto DOE assets.
Detectingcriminalactivities orpatterns
Operational Yes No Yes
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
46/71
Appendix IVInventories of Efforts
Page 41 GAO-04-548 Data Minin
Table 7: Department of Health and Human Services Inventory of Data Mining Efforts
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesector data
Otheragency data
Agency for Healthcare Research and Quality
National PatientSafety Network
Will contain reports on adversemedical events that are filed byhospitals. The planned networkspurpose is to take out patientpersonal identifiers and otheritems that may violate certainrules and create a warehouse thatcan be used by registered andunregistered users to evaluate andimplement patient safety andquality measures. The network willbe used to create tools thathospitals can use for makingquality improvements.
Improvingservice orperformance
Planned No No No
Centers for Disease Control and Prevention
BioSense Enhances the nations capability torapidly detect bioterrorism events.
Analyzingintelligenceand detectingterrorist
activities
Operational No Yes Yes
Department of Health and Human Services Headquarters
DHHS BloodMonitoring Program
Monitors the countrys bloodsupply by keeping an inventory onred blood cells and platelets andmonitors blood supply shortages,the nature of the shortage, andsize of the shortages.
Monitoringpublic health
Operational No Yes No
Food and Drug Administration
MissionAccomplishment andRegulatoryCompliance ServicesSystem
Is a comprehensive redesign andreengineering of two core mission-critical legacy systems at Foodand Drug Administration (FDA)that support the regulatory
functions that primarily take placein FDA's field offices.
Monitoringfood or drugsafety
Operational No Yes Yes
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
47/71
Appendix IVInventories of Efforts
Page 42 GAO-04-548 Data Minin
Turbo EstablishmentInspection Report
Provides a standardized databaseof citations of regulations andstatutes, and help investigators inpreparing reports. It will collectdata on specific observationsuncovered during inspections andprovide a more uniform formatnationwide that will allow forelectronic searches and statistical
analysis to be performed bycitation.
Improvingsafety
Operational No Yes No
PhoneticOrthographicComputer Analysis
Is a search engine that providesresults indicating how similar twodrug names are on a phonetic andorthographic basis. Its purpose isto help in the safety evaluation ofproposed proprietary names toreduce drug name confusion afteran application is approved by theFDA.
Improvingsafety
Operational No Yes No
MPRIS DataWarehouse
Will provide data to support enduser ad-hoc query analysis andstandard reporting needs. It will
provide the foundation for a centralreporting repository that can beused to populate business-specificdata marts.
Improvingservice orperformance
Planned No No No
Development andDeployment ofAdvanced AnalyticalTools for Drug SafetyRisk Assessment
Will develop advanced softwaretools for quantitative analysis ofdrug safety data. Medical officersand safety evaluators will usethese advances in software tools.
Analyzingscientific andresearchinformation
Planned Yes Yes Yes
Add data miningcapability to CFSANAdverse EventReporting System
Is a comprehensive system fortracking, reviewing, and reportingadverse event incidences involvingfoods, cosmetics, and dietarysupplements. Integrating andcentralizing the system and
eliminating patchwork systemsmake information on theseadverse events available tofederal, state, and localgovernments as well as to industryand the public in a more timelyand efficient manner.
Monitoringfood or drugsafety
Planned Yes Yes Yes
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesector data
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
48/71
Appendix IVInventories of Efforts
Page 43 GAO-04-548 Data Minin
Source: Department of Health and Human Services.
Table 8: Department of Homeland Securitys Inventory of Data Mining Efforts
Health Resources and Services Administration
HRSA GeospatialData Warehouse
Data warehouse that primarilycollects programmatic,demographic, and statistical data.
Improvingservice orperformance
Operational No Yes Yes
Program Support Center
Employee AssistanceProgram Analysis
Uses information from a databaseof employee assistance programcase information that does notcontain client personal identifiers.Data are mined for qualityassurance and programmanagement information that isused to enhance the quality andcost effectiveness of services.
Improvingservice orperformance
Operational No No No
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesector data
Otheragency data
Border and Transportation Security Directorate
Workforce ProfileData Mart
Contains payroll and personneldata and is mined for workforcetrends.
Managinghumanresources
Operational Yes No Yes
Customs IntegratedPersonnel PayrollSystem Data Mart
Is a Customs data mart containedwithin Department of HomelandSecuritys workforce profile datamart. Personnel and payroll dataare mined for workforce trends.
Managinghumanresources
Operational Yes No Yes
Internal AffairsTreasuryEnforcementCommunicationsSystem Audit DataMart
Assists the Internal Affairs group bymining criminal activity data toascertain how Customs employeesare using the Treasury EnforcementSystem.
Detectingcriminalactivities orpatterns
Operational Yes No Yes
OperationsManagementReports Data Mart
Assists in managing the operationof all ports of entry for incomingcarriers, people, and cargo. Helpsin making resource (people andequipment) allocation andoperational improvement decisions.
Improvingservice orperformance
Operational No No Yes
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesector data
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
49/71
Appendix IVInventories of Efforts
Page 44 GAO-04-548 Data Minin
Automated ExportSystem Data Mart
Mines data on export trade in theU.S. and produces reports onhistorical shipping and receivingtrends.
Improvingservice orperformance
Operational No Yes Yes
Seized Property/Forfeitures,Penalties, and FinesCase Management
Data Mart
Mines data to ensure data qualityand review work assignments.System has two components: onethat processes legal cases like a
law firm, and a second that servesas property and inventory control bytracking property seized.
Improvingservice orperformance
Operational Yes No No
Incident Data Mart Will look through incident logs forpatterns of events. An incident is anevent involving a law enforcementor government agency for which alog was created (e.g., traffic ticket,drug arrest, or firearm possession).The system may look at crimes in aparticular geographic location,particular types of arrests, or anytype of unusual activity.
Analyzingintelligenceand detectingterroristactivities
Planned Yes Yes Yes
Case ManagementData Mart
Assists in managing lawenforcement cases, including
Customs cases. Reviews caseloads, status, and relationshipsamong cases.
Analyzingintelligence
and detectingterroristactivities
Operational Yes Yes Yes
Emergency Preparedness and Response Directorate
Enterprise DataWarehouse
Will take data from multiple,disparate systems and integrate thedata into one reportingenvironment. The objective of theeffort is to allow for the reduction ofdata within the agency and toprovide an enterprise view ofinformation necessary to drivecritical business processes anddecisions. Data on internal human
resources, all aspects of disastermanagement, infrastructure,equipment location, etc., will beused.
Disasterresponse andrecovery
Planned Yes Yes Yes
Information Analysis and Infrastructure Protection Directorate
Analyst Notebook I2 Correlates events and people tospecific information
Analyzingintelligenceand detectingterroristactivities
Operational Yes Yes No
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesector data
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
50/71
Appendix IVInventories of Efforts
Page 45 GAO-04-548 Data Minin
Source: Department of Homeland Security.
Automatic MessageHandling System(Verity)
Automatically takes messages fromexternal agencies and routes themto appropriate recipients
Analyzingintelligenceand detectingterroristactivities
Planned No No Yes
U.S. Coast Guard
ReadinessManagement
System
Assists in ensuring readiness for allCoast Guard missions.
Improvingservice or
performance
Operational Yes No No
CG Info Provides one-stop shopping forCoast Guard information. It is thecentral location and commoninterface for the entire Coast Guardto gain near real-time access todata from multiple, disparate CoastGuard information systems. Itprovides a single interface for usersto view mission-critical supportdata.
Improvingservice orperformance
Operational Yes No Yes
U.S. Secret Service
CriminalInvestigation
Division Data Mining
Mines data in suspicious activityreports received from banks to find
commonalities in data to assist instrategically allocating resources.
Detectingcriminal
activities orpatterns
Operational Yes No Yes
(Continued From Previous Page)
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesector data
Otheragency data
-
8/9/2019 DATA MINING - Federal Efforts Cover a Wide Range of Uses [Implementing Data Mining Systems]
51/71
Appendix IVInventories of Efforts
Page 46 GAO-04-548 Data Minin
Table 9: Department of the Interiors Inventory of Data Mining Efforts
Source: Department of the Interior.
Features
Organization/system name Description Purpose Status
Personalinformation
Privatesectordata
Otheragency data
Minerals Management Service
Data Mining of theTechnical InformationManagement System(TIMS) Database
Is a corporate database for oil andgas leases. The database ismined in support of policydevelopment. One area of datamining is identification of leasesthat will be abandoned in the nearfuture. Data mini