Data Analysis for the New Millennium: Data Mining, Genetic...
Transcript of Data Analysis for the New Millennium: Data Mining, Genetic...
![Page 1: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/1.jpg)
Data Analysis for the New Millennium:
Data Mining, Genetic Algorithms, and Visualization
by
Bruce L. Golden
RH Smith School of Business
University of Maryland
IMC Knowledge Management Seminar
April 15, 1999
![Page 2: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/2.jpg)
2
Focus
g Connection between data and knowledge
g Examples of data analysis from late 1980s
g Contrast with data analysis in late 1990s
g Introduce techniques
• MDS and Sammon maps
• neural networks and SOMs
• decision trees
• genetic algorithms
g Illustrate the power of visualization
g Data analysis as a strategic asset
g Conclusion
![Page 3: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/3.jpg)
3
Setting the Stage
g “Data is information devoid of context, information is data in context
and knowledge is information with causal links. The more structure is
added to a pool of information, the more we can talk about knowledge.”
(California Management Review, Winter 1999)
g How can we systematically discover knowledge from data?
g data information knowledge
![Page 4: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/4.jpg)
4
Modeling Salinity Dynamics in the Chesapeake Bay
g The Goal: Construct multiple regression models that accurately
describe the dynamics of salinity in the Maryland portion of the Bay
g The work was done in late 1989/early 1990
g Motivation:
• Salinity exerts a major influence over the survival and distribution
of many fish species in the Chesapeake Bay
• Maryland was the nation’s leader in oyster production several
decades ago
• MDNR’s oyster production rebuilding program relied on predicting
salinity levels for various areas in the Bay
![Page 5: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/5.jpg)
5
Model Building
g Transformation of key variables
g Extensive screening for independent variables
g Used stepwise regression in SPSS/PC
g Data collected at 34 stations by USEPA from 1984 to 1989
g 36,258 water samples collected at different depths (9 variables
per sample)
g Constructed 10 models in all (bottom data / total data)
• Upper, Middle, Lower, Entire Bay, Lower Tributaries
g Four key independent variables
• Day, Depth, Latitude, Longitude
![Page 6: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/6.jpg)
6
Model Results
g Six independent variables in each model
g Model assumptions are not violated
g R2 values range from 0.56 to 0.81
g Entire Bay Model
R2 = 0.649
depth increases, salinity increases
Salinity = 199.839 - 1.151 Day1 + 1.161 Day2
+ 0.283 Depth - 4.863 Latitude
- 1.543 Longitude - 13.402 Longitude1
![Page 7: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/7.jpg)
7
Salinity Modeling Summary
g The regression models were validated using new (1990) data
involving 7,000 observations
g These regression models can be used to predict salinity for a
location on the Bay at a specified depth and date
g In 1991, we applied neural networks to the same problem
g To our surprise, the neural network models predicted salinity
levels more accurately than the regression models in 90% of
the cases
![Page 8: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/8.jpg)
8
The Problem with Linear Regression
g “But we all know the world is nonlinear.” (Harold Hotelling, 1948)
Predicted Regression Line
True Regression Line
Linear Regression Shortcomings: Nonlinear Data (Cabena et al., 1998)
![Page 9: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/9.jpg)
9
Neural Network Configuration
Salinity Level
Hidden Nodes
StationNumber
Depth Latitude Longitude Date Longitude * Depth
![Page 10: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/10.jpg)
10
Neural Networks
g Neural networks are computer programs designed to recognize
patterns and learn “like” the human brain
g They are versatile and have been used to perform prediction and
classification
g The key is to iteratively determine the “best” weights for the links
connecting the nodes
g Drawback: It is difficult to explain/interpret the results (same is true
for regression)
![Page 11: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/11.jpg)
11
Visualization
g Psychologists claim that more than 80% of the information
we absorb is received visually (Cabena et al., 1997)
g Data is often highly multidimensional
g Mapping from three or more dimensions to two dimensions
is not easy
![Page 12: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/12.jpg)
12
Flattening the Earth
g “Would you tell me, please, which way I ought to go from here?” asked Alice.
“That depends a good deal on where you want to get to,” said the Cat.
(Lewis Carroll)
![Page 13: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/13.jpg)
13
![Page 14: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/14.jpg)
14
Map Projections
g We use map projections to represent a spherical Earth on a
flat surface
g Two map projections of the world can look quite different
g All map projections distort reality in some ways -- shape, area,
distance, angles, etc.
g Equivalent projections preserve area
g Conformal projections preserve angles
g No projection can be both conformal and equivalent
g Bottom line: map projections are extremely useful, but offer
compromise solutions
![Page 15: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/15.jpg)
15
![Page 16: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/16.jpg)
16
![Page 17: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/17.jpg)
17
![Page 18: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/18.jpg)
18
Visual Clustering (Segmentation) Methods
g Multi-Dimensional Scaling (MDS)
g Sammon Mapping
g Self-Organizing Maps
g Euclidean distance (more or less) is used as a similarity measure
![Page 19: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/19.jpg)
19
![Page 20: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/20.jpg)
20
![Page 21: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/21.jpg)
21
Self-Organizing Maps (SOMs)
g Developed by Teuvo Kohonen in early 1980s
g Observations are mapped onto a two-dimensional hexagonal grid
g Related to MDS and Sammon maps, but ensures better spacing
g Colors are used to indicate clusters
g Software: SOM_PAK (Public domain, WWW), Viscovery (Eudaptics,
Austria)
![Page 22: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/22.jpg)
22
Country Risk Data
g Goal: Look at risks involved in investing in stock markets around the world
g Source: Wall Street Journal of June 26, 1997
g 52 countries, 20 variables
g The article clusters countries into five groups of approximately equal size
• those most similar to the U.S.
• other developed countries
• mature and emerging markets
• newly emerging markets
• frontier markets
![Page 23: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/23.jpg)
23
![Page 24: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/24.jpg)
24
![Page 25: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/25.jpg)
25
Country Risk Data (continued)
g Nine clusters is a better representation of the data than five clusters
g Component maps and cluster summary statistics help explain why
g Numerous other applications in finance and economics
![Page 26: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/26.jpg)
26
![Page 27: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/27.jpg)
27
![Page 28: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/28.jpg)
28
![Page 29: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/29.jpg)
29
![Page 30: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/30.jpg)
30
![Page 31: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/31.jpg)
31
Picking Mutual Funds with SOMs
g Source: Morningstar 1997 data on 500 mutual funds
g Among the most successful funds, historically
g Approximately 15 variables
g Categories: World Stocks, International Bonds, Large & Midsize Stocks,
Small Cap Stocks, Emerging Markets, All Funds
g Diversification ⇒ invest in funds that are in different clusters
![Page 32: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/32.jpg)
32
Current SOM Projects
g Direct Mail Response
• observations -- hundreds of thousands of customers
• variables -- customer history with firm, age, zip code
• goal -- identify clusters of customers for direct mail promotion
g Profit Opportunities in Telecommunications Worldwide
• observations -- approximately 200 countries
• variables -- socio-economic measures, teledensity measures
• goal -- identify clusters of countries in which demand for
wireless services may be high
![Page 33: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/33.jpg)
33
Flattening the Earth and SOMs: Connections
g There is an art and science to each
g Each is based on sophisticated mathematics
g When you move from many dimensions to two dimensions, you lose
important details
g On the other hand, visualization generates insights and impact
![Page 34: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/34.jpg)
34
Data Mining and Knowledge Management
g Two types of organizational knowledge
i Explicit Knowledge
databases
reports
manuals
g Data mining attempts to convert some of this tacit knowledge
into explicit knowledge
i Tacit Knowledge
in employees’ heads
learned from experience
not yet codified
![Page 35: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/35.jpg)
35
Decision Trees
g Given a table of data
• potential customers are the rows
• independent variables and dependent variable are the columns
g Decision trees are used for classification, prediction, or estimation of
the dependent variable
g Accuracy is typically less than 100%
g One popular approach -- information theory
• maximize information gain at each split
• limit the number of splits
• software: C4.5
g Another popular approach -- statistics
• software: CART, CHAID
![Page 36: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/36.jpg)
36
A Decision Tree for Widget Buyers
11 yes
9 no
1 yes
6 no
10 yes
3 no
8 yes
2 yes
3 noNo
Yes
No
Res ‡
NY
Age > 35
Age <= 35
Res = NY
Rule 1. If residence not NY,
then not a widget buyer
Rule 2. If residence NY and
age > 35, then not a widget
buyer
Rule 3. If residence NY and
age <= 35, then a widget
buyer
85.020
836
#
=++
=
=total #
classifiedcorrectlyAccuracy
Adapted from (Dhar & Stein, 1997)
![Page 37: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/37.jpg)
37
Evolutionary Algorithms / Genetic Algorithms
g Developed by John Holland in the late 1960s / early 1970s
g Speed up evolution a millionfold or so on the computer
g Simple, elegant, powerful idea
![Page 38: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/38.jpg)
38
A Simple Genetic Algorithm
1. Start with a randomly generated population of n chromosomes
(candidate solutions to a problem)
2. Calculate the fitness f(x) of each chromosome x in the population
3. Repeat the following steps until n offspring have been created
a. Randomly select a pair of parent chromosomes from
the current population
b. Crossover (mate) the pair at a randomly chosen point to
form two offspring
c. Randomly mutate the two offspring and add the resulting
chromosomes to the population
d. Calculate the fitness of the resulting chromosomes
![Page 39: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/39.jpg)
39
A Simple Genetic Algorithm (continued)
4. Let the n fittest chromosomes survive to the next generation
5. Go to Step 3 (repeat for 50 generations)
![Page 40: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/40.jpg)
40
Financial Investment Example
g Five sectors
1. Financial services
2. Health care
3. Utilities
4. Technology
5. Consumer
g Two parents
21 24 18 1615 10 30
percent invested
in sector 3
10 31 25
![Page 41: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/41.jpg)
41
Financial Investment Example (continued)
g Crossover
21 24 15 1615 10 30 21 31 25
g After normalization and rounding, we obtain two offspring
g Mutation
10 1618 31 25 10 2418 10 30
19 1514 29 23 11 2619 11 33
19 1514 29 23
and
19 1523 29 14⇒
⇒
![Page 42: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/42.jpg)
42
Railroad
AreaNatl_prod
Birth_rt
Natl_prod
Inf_mor_rtLow
MediumLowHighMedium
HighMedium
Low
<= 2090
<= 1138910
> 14
> 1045
> 886.3<= 886.3
<= 1045
<= 14
> 57<= 57
> 2090
>1138910
Crossover Illustration for Decision Trees
Parent 1 Parent 2
![Page 43: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/43.jpg)
43
Crossover Illustration (continued)
Child 1 Child 2
Railroad
Area
Birth_rt
Low
MediumLow
Medium
<= 2090
<= 1138910
> 14<= 14> 2090
>1138910
Natl_prod
HighMedium
> 1045<= 1045
Natl_prod
Inf_mor_rt
High
Low
> 886.3<= 886.3
> 57<= 57
![Page 44: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/44.jpg)
44
Railroad
Natl_prod
Inf_mor_rtLow
HighMedium
<= 2090
> 886.3<= 886.3
> 57<= 57
> 2090
Highways
Elect_prod High
> 208350<= 208350
MediumLow
<= 33 > 33
Mutation Illustration
![Page 45: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/45.jpg)
45
GA Applications
g Financial Data Analysis
i State Street Global Advisors
i Advanced Investment Technologies
i Barclays Global Investors
i PanAgora Asset Management
i Fidelity Funds
g Operations and Supply Chain Management
i General Motors
i Volvo
i Cemex
g Engineering Design
i General Electric
i Boeing
![Page 46: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/46.jpg)
46
Data Analysis Then and Now
Late 1980s
Linear methods
Large data sets
Few dimensions
Ask specific questions
Search for information
Late 1990s
Nonlinear methods
Massive data sets
Highly multi-dimensional
What can we infer?
Search for knowledge
![Page 47: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/47.jpg)
47
Data Analysis as a Strategic Asset
g Competitive advantage
g Sustainable over a period of time
![Page 48: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/48.jpg)
48
Examples
g BT Labs
g Enterprise Rent-A-Car
g Dupont
![Page 49: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/49.jpg)
49
Concluding Remarks
g Corporate data is more plentiful than ever before
g Companies are becoming more serious about mining that data
g Powerful software tools are widely available
g Many companies already view data analysis/data mining as a strategic
asset
g Data analysis/data mining is a key area within knowledge management
g Exciting opportunities exist for collaborative research
![Page 50: Data Analysis for the New Millennium: Data Mining, Genetic ...terpconnect.umd.edu/~bgolden/recent_presentation_pdfs_links/1999… · 7 Salinity Modeling Summary g The regression models](https://reader034.fdocuments.us/reader034/viewer/2022050412/5f89159c0da90441cb7bd77e/html5/thumbnails/50.jpg)
50
Recommended Books
g Berry & Linoff, Data Mining Techniques for Marketing, Sales, and
Customer Support, Wiley (1997)
g Deboeck & Kohonen, Visual Explorations in Finance with SOMs,
Springer (1998)
g Dhar & Stein, Seven Methods for Transforming Corporate Data into
Business Intelligence, Prentice Hall (1997)
g Mitchell, An Introduction to Genetic Algorithms, MIT Press (1996)
g Monmonier, How to Lie with Maps (second edition), University of
Chicago Press (1996)