Managing complexity in large data bases using self-organizing maps.pdf

download Managing complexity in large data bases using self-organizing maps.pdf

of 20

Transcript of Managing complexity in large data bases using self-organizing maps.pdf

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    1/20

    Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    Managing complexity in large data bases usingself-organizing maps

    Barbro Backa,*, Kaisa Sereb,1, Hannu Vanharantac,2

    aTurku School of Economics and Business Administration, Turku, Finlandbbo Akademi University, Turku, Finland

    cLappeenranta University of Technology, Lappeenranta, Finland

    Received 1 March 1997; received in revised form 1 July 1998; accepted 13 July 1998

    Abstract

    The amount of financial information in todays sophisticated large data bases is substantialand makes comparisons between company performanceespecially over timedifficult or at

    least very time consuming. The aim of this paper is to investigate whether neural networksin the form of self-organizing maps can be used to manage the complexity in large data bases.We structure and analyze accounting numbers in a large data base over several time periods.By using self-organizing maps, we overcome the problems associated with finding the appro-priate underlying distribution and the functional form of the underlying data in the structuringtask that is often encountered, for example, when using cluster analysis. The method chosenalso offers a way of visualizing the results. The data base in this study consists of annualreports of more than 120 world wide pulp and paper companies with data from a five yeartime period. 1998 Elsevier Science Ltd. All rights reserved.

    Keywords: Complexity; Adaptive systems; Self-organizing maps; Financial benchmarking; Financial per-formance; Strategic management

    1. Introduction

    Todays data bases hold a substantial amount of information about companies.The trick is to find patterns in the data that reveal important information about com-

    * Corresponding author. E-mail: [email protected] [email protected] [email protected]

    0959-8022/98/$19.00 1998 Elsevier Science Ltd. All rights reserved.

    PII: S 0 9 5 9 - 8 0 2 2 ( 9 8 ) 0 0 0 0 9 - 5

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    2/20

    192 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    panies for different stakeholders, i.e., stockholders, creditors, auditors, financial ana-lysts, and management. Finding patterns in financial performance can, for example,be helpful in identifying internal problems, firm evaluation by investors, and forbenchmarking purposes.

    In this paper, we focus on analysis of financial performance for benchmarkingpurposes. Benchmarking is an important company-internal process, in which thefunctions and performance of one company are compared with those of other compa-nies. Financial competitive benchmarking uses financial informationmost often inthe form of ratiosto perform these comparisons. Financial competitive bench-marking is utilized, among other things, as a communication tool in strategic manage-ment, for example, in situations where company management must gain approval,from internal and external interest groups alike, for new functional objectives for

    the company.Vanharanta (1995) has built a hyperknowledge-based system for financial bench-marking. The system contains a data base with financial data on more than 160 pulpand paper companies worldwide. This data base is used as a basis for the presentstudy, too. The amount of financial information in this system is, however, so largeand the structure of it is so complex that it makes comparisons between companiesdifficultor at least very time consuming.

    Multivariate statistical methods, especially cluster analysis, has been used as atool of analysis of company performance although mostly in research contexts(Ketchen & Shook, 1996). However, many problems have been reported concerning

    these methods. The two most important problems are the assumption on normalityin the underlying distributions and difficulties in finding an appropriate functionalform for the distributions (Trigueiros, 1995), (Fernandes-Castro & Smith, 1994).Moreover, results of analyses are difficult to visualize when there are several explana-

    tory variables (Vermeulen et al., 1994).Many researchers have addressed these problems: Trigueiros (1995) reports on

    several studies that have shown the existence of positive or negative skewedness inthe ratios and on different remedies to overcome these difficulties. He also explainsthe existence of symmetrical and negatively skewed ratios and offers guidelines for

    achieving higher precision when using ratios in statistical context.Fernandes-Castro & Smith (1994) used a non-parametric model of corporate per-formance to overcome the need for specification of statistical distribution or func-tional form. Vermeulen et al. (1994) presented a way to visualize the results withinterfirm comparison when the explanatory variable was explained by more than onefirm characteristic. Successful use of visual information depends substantially on itsacceptance by the user. Meyer (1997) states that visualized information makes thetransfer of information easier and thus a bottleneck in human information processingis avoided.

    Ketchen & Shook (1996) evaluate the past use of cluster analysis in strategic

    management research. One concern has been the extensive reliance on researcherjudgment that is inherent in cluster analysis. As another concern they list that theapplications lack an underlying theoretical rationale and that clustering dimensions

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    3/20

    193B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    seem to be selected haphazardly. There has also been concern with the standardiz-ation of variables and problems with multicollinearity among variables.

    Self-organizing maps, which are a form of artificial neural networks, are a promis-ing new paradigm in information processing. One of the main features of neuralnetworks is their ability to learn from examples and adapt their behavior to newsituations. The theory of self-organizing maps facilitates a reduction and clusteranalysis of high dimensional feature spaces into two-dimensional arrays of represen-tative weight vectors (Kohonen, 1997). The method does not need any specificationof an underlying distribution or of the functional form of the financial indicators.Furthermore, one can visualize the results in a comprehensive way.

    Neural networks have previously been suggested by Trigueiros (1995) for use withcomputerized accounting reports data bases, and by Chen et al. (1995) to definecluster structures in large data bases. Martin-del-Brio & Serrano-Cinca (1995) usedself-organizing maps for analyzing the financial state of Spanish companies.

    In a previous study (Back et al., 1998) we investigated the potential of self-organizing maps to structure 76 companies financial data in our data base andpresented an approximated position of one companys financial performance com-pared to that of other companies. The study was explorative and limited to Nordicand North-American companies. However, the results were very promising and thatstudy served as a basis for this paper.

    We use the self-organizing maps to structure the financial information on morethan 120 companies, including now also Central-European companies, in our data

    base into clusters based on the underlying weight vectors. Each cluster is then namedaccording to the financial characteristics of the cluster. We analyse the financialperformance of the companies year 1985 and take a closer look in these clustersover the years 198589. Even though we focus only on specific companies, anyindividual company or group of companies can be the focus of interest.

    The rest of the paper is organized as follows: Section 2 describes the methodologywe have used, the network structure, the data base, the list of companies in the studyand the criteria for and the choice of financial ratios. Section 3 presents the construc-tion of the self-organizing maps and Section 4 presents an analysis of the maps. Theconclusions of our study are presented in Section 5.

    2. Methodology

    2.1. Benchmarking

    Competitive benchmarking is a company-internal process in which the activitiesof a given company are measured against the best practices of other, best-in-classcompanies (Geber, 1990). In the process of competitive benchmarking, internal func-tions are analyzed and measured using financial (i.e. quantitative) and/or non-finan-

    cial (i.e. qualitative) yardsticks. Functions measured from one company are comparedwith similar functions measured from leading competitors, or they are compared withthe best practices in other industries. The differences between compared functions

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    4/20

    194 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    are measured. The overall management goal of competitive benchmarking within agiven company is to close the measured gap by changing the companys character-istics in ways that will improve company performance.

    The generic benchmarking process consists of a planning phase, an analysis phaseand an integration and action phase. The specific activity offinancial competitivebenchmarking is an integral part of the generic benchmarking process. In financialbenchmarking, the aim is to compare the company with its competitors using avail-able financial information, financial yardsticks. At the beginning of a benchmarkingprocess, in its planning phase, financial benchmarking plays an important role in theidentification and selection of the right competitors and/or good performers, thosethat will act as the benchmarks in the non-financial benchmarking to be done laterin the generic process. Financial benchmarking is also important in the analysis phasewhen performance gaps are being measured and future performance levels projected.In the integration and action phase, financial benchmarking is useful for monitoringand tracking progress and for re-calibrating the benchmarks. Financial benchmarkingachieves its greatest potential, however, as a communications tool at times whencompany management must gain approval, from internal and external interest groupsalike, for new functional objectives for the company, i.e. in strategic management.

    The financial information needed for financial benchmarking work is, however,invariably available only from large commercial data bases or from specializedreports and publications, from where it must be gleaned with difficulty. Such infor-mation is thus far removed from its active users. If the needed financial information

    is to be brought closer to the active users, it must first be pre-processed, i.e. refinedand classified. The overall objective of the present study is to pre-process, with thehelp of neural networks, the data and information needed for financial benchmarkingpurposes. Thus pre-processed, the information can be used in computerized bench-marking systems and executive support systems, making the task of competitivefinancial benchmarking easier and more effective.

    2.2. Neural networks

    A neural network is a computing device that is able to learn from examples. It

    consists of a set of simple processing units, neurons, that are connected to each otherto form a network topology. A neural network compares input data with output data,and tries to approximate some complicated, unknown functionality between the two.When developing a neural network, the first step is to find a suitable topology forthe network and thereafter train it so that it gradually learns the desired input/outputfunctionality. There are two ways to train a network, supervisedandunsupervised.In supervised learning the network is presented with examples of known input-outputdata pairs, after which it starts to mimic the presented input-output behavior. Thenetwork is then tested to see whether it is able to produce correct output, when onlyinput is presented to it. In unsupervised learning, the output data is not available

    and usually not even known beforehand. Instead, the network tries to find similaritiesbetween input data samples. Similar samples form clusters that constitute the outputof the network. The user is responsible for giving an interpretation to each cluster.

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    5/20

    195B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    Since companies in the data base do not have predefined labels describing theirfinancial status, a network intended for structuring their data can have no pre-desiredoutputs, i.e., the clustering is not known a priori. For this reason, we utilize anunsupervised learning method. A Kohonen network (Kohonen, 1997), being the mostcommon network model based on unsupervised learning, is used in this study.

    A Kohonen network usually consists of two layers of neurons: an input layerandan output layer. The input layer neurons present an input pattern to each of theoutput neurons. The neurons in the output layer are usually arranged in a grid, andare influenced by their neighbors in this grid. The goal is to automatically clusterthe input patterns in such a way that similar patterns are represented by the sameoutput neuron, or by one of its neighbors. Every output neuron has an associatedweight vector. The neighborhood structure of the output layer will cause neighboringneurons in it (the output layer) to have similar weight vectors. These vectors shouldrepresent some subclass of the input patterns, thus forming a map of the input space,aself-organizing map (SOM).

    The network topology can be described by the number of output neurons presentin the network and by the way in which the output neurons are interconnected, i.e.by describing which neurons in the output array are mutual neighbors. Usually, neu-rons on the output layer are arranged in either a rectangular or a hexagonal grid, seeFig. 1. In a rectangular grid each neuron is connected to four neighbors, except forthe ones at the edge of the grid. In all the networks we use, the output neurons arearranged in a hexagonal lattice structure. This means that every neuron is connected

    to exactly six neighbors, except for the ones at the edge of the grid. This choicewas made following the guidelines of Kohonen (1997).As mentioned above, a Kohonen network is trained using unsupervised learning.

    During the training process the network has no knowledge of the desired outputs.The training process is characterized by a competition between the output neurons.The input patterns are presented to the network one by one, in random order. Theoutput neurons compete for each and every pattern. The output neuron with a weightvector that is closest to the input vector is called the winner. For expressing thedistance (i.e. similarity) between two vectors, we use theEuclidean distancebetweenthe two vectors. The weight vector of the winner is adjusted in the direction of the

    input vector, and so are the weight vectors of the surrounding neurons in the output

    Fig. 1. Network topologies.

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    6/20

    196 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    array. The size of adjustment in the weight vectors of the neighboring neurons isdependent on the distance of that neuron from the winner in the output array.

    We use two learning parameters: the learning rate and the neighborhood widthparameter. The learning rate influences the size of the weight vector adjustmentsafter each training step, whereas the neighborhood width parameter determines towhat extent the surrounding neurons, the neighbors, are affected by the winner. Anadditional parameter is thetraining length, which measures the processing time, i.e.the number of iterations through the training data.

    Our criterion for the quality of a good map was the average quantization error,which is an average of the Euclidean distances of each input vector and its bestmatching reference vector in the SOM.

    The clusters are formed by identifying neurons on the output layer that are closeto each other using the weight vectors as a starting point. A tool called the U-matrix(Kohonen, 1997) can be used to visualize the distances between neighboring neurons.Hence, a set of neurons form a cluster, if they are sufficiently close to each other.Moreover, it is the analyst who determines the clusters based on these distances,they are not determined a priori.

    The methodology used when applying the self-organizing map is as follows:

    1. Choose the data material. It is often advisable to preprocess the input data so thatthe learning task of the network becomes easier (Kohonen, 1997).

    2. Choose the network topology, learning rate, and the neighborhood width.3. Construct the network. The construction process takes place by showing the input

    data to the network iteratively using the same input vector many times, the s.c.training length. The process ends when the average quantization error is smallenough.

    4. Choose the best map for further analysis. Identify the clusters using the U-matrixand interprete the clusters (give labels to them) using the weight vectors. Fromthe weight vectors we can read per input variable per neuron the value of thevariable associated with each neuron.

    2.3. Data base and selection of companies

    The Green Gold Financial Reports data base (Salonen & Vanharanta, 1990a, b,1991) is used as the experimental financial knowledge base for the neural networktests. It consists of income statements, balance sheets and cash flow statements, thathave been corrected for different accounting principles, of 160 companies in theinternational pulp and paper industry. The data base also consists of specific financialratios, calculated using information from the corrected reports as well as generalcompany information concerning products and production volumes. There are 47different key ratios for each company. The companies are all based in one of threeregions: North America, Northern Europe or Central Europe. The financial data

    covers a period of five years from 1985 to 1989. The companies are listed in Table1 (with some companies omitted that did not have enough data available).

    For our experiment we used some 120 companies from the data base. We have

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    7/20

    197B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    Table 1

    The companies

    Country Company

    Sweden 3 AVERAGE

    4 AB Statens Skogsindustrier

    5 Graningeverkens AB

    6 Korsnas AB

    7 Mo och Domsjo AB

    8 Munkedals AB

    9 Munksjo AB

    10 Norrlands Skogsagarens Cellulosa AB

    11 Norrsundets Bruks AB

    12 Obbola Linerboard AB

    13 Rottneros Bruk AB14 Svenska Cellulosa AB (SCA)

    15 Stora Kopparbergs Bergslags AB

    16 Sodra Skogsagarna AB

    Finland 1 AVERAGE

    17 A. Ahlstrom Oy

    18 Enso-Gutzeit Oy

    20 Kemi Oy

    21 Kymmene Oy

    22 Oy Kyro Ab

    23 Metsa-Serla Oy

    25 Rauma-Repola Oy

    26 Sunila Oy27 Oy Tampella Ab

    28 Oy Veitsiluoto Ab

    29 Yhtyneet Paperitehtaat Oy

    Norway 2 AVERAGE

    31 Norske Skogindustrier A.S.

    34 A/S Union

    USA 37 Boise Cascade Corporation

    39 Champion International Corporation

    40 Chesapeake Corporation

    41 Consolidated Papers Inc.

    42 Dennison Manufacturing Company

    43 The Dexter Corporation44 Federal Paper Board Company

    45 Gaylord Container Corporation

    46 Georgia-Pacific Corporation

    47 P.H. Glatfelter Company

    48 Great Northern Nekoosa Corporation

    49 International Paper Company

    50 James River Corporation

    51 Kimberly-Clark Corporation

    52 Longview Fibre Company

    53 Louisiana-Pacific Corporation

    54 Mead Corporation

    Continued overleaf

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    8/20

    198 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    Table 1

    (Continued)

    Country Company

    55 Mosinee Paper Corporation

    56 Pentair Inc.

    57 Potlatch Corporation

    58 The Procter and Gamble Company

    59 Scott Paper Company

    60 Sonoco Products Company

    61 Stone Container Corporation

    62 Tambrands Inc.

    63 Temple-Inland Inc.

    65 Union Camp Corporation

    66 Wausau Paper Mills Company67 Westvaco Corporation

    68 Weyerhaeuser Company

    69 WTD Industries Inc.

    Canada 71 Abitibi-Price Inc.

    72 Canfor Corporation

    73 Cascades Inc.

    75 Canadian Pacific Forest Products Ltd.

    76 Crestbrook Forest Industries Ltd.

    77 Doman Industries Ltd.

    78 Domtar Inc.

    79 Donohue Inc.

    80 Fletcher Challenge Canada Ltd.81 International Forest Products Ltd.

    82 MacMillan Bloedel Ltd.

    83 Noranda Inc.

    84 Noranda Forest Inc.

    85 Perkins Papers Ltd.

    86 Repap Enterprises Corporation Inc.

    87 Rolland Inc.

    88 Scott Paper Ltd.

    89 Tembec Inc.

    Austria 91 Laakirchen

    92 Nettingsdorfer

    93 Zellstoff 113 Lenzig

    114 Leykam

    115 Steyrermuhl

    Germany 94 Europa Carton

    95 Feldmuhle

    96 Haindl Papier

    97 Hannover

    98 Waldhof-A.

    99 Schwabishe

    100 Zanders

    123 MD Papier

    Continued opposite

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    9/20

    199B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    Table 1

    (Continued)

    Country Company

    Italy 101 Saffa

    124 Cartiera

    Holland 102 Berghuizer

    103 Buchmann

    104 Crown van Gelder

    105 Gelderse

    106 N.V. Papierfabriek

    107 Koninklijke

    108 Parenco

    Portugal 109 Portucel-Empresa

    125 Group Caima126 Celbi

    127 Soporcel

    UK 110 Associated

    111 BPB Industries

    112 David S. Smith

    135 Bowater Industries

    136 James Cropper

    France 116 Arjomari

    117 Aussedat

    118 Beghin-Say

    119 Kaysersberg

    120 La Cellulose Du.121 La Rochette

    122 Sibylle

    Spain 128 Empresa Nacion.

    129 La Papelera

    Swiss 130 Attisholz

    131 Biber Holding

    132 Industrieholding

    133 Holstoff

    134 Papierfab. Perlen

    also included the country averages of Finland, Norway and Sweden as threeadditional companies.

    2.4. Choice of ratios

    The population consists of 47 financial ratios in the benchmarking systemorganized in the benchmarking system into six groups under the headings:

    1. Profitability

    2. Indebtedness3. Capital Structure4. Liquidity

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    10/20

    200 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    5. Working capital6. Cash flow ratios

    In selecting the variables for the study we used the cognitive approach (Ketchen &Shook, 1996), i.e., we let experts choose the most important variables for the per-formance analysis task. We defend that policy in agreeing with Ketchen & Shook(1996), who cite Meyer et al. (1993) and state that in strategy research the variablesshould be chosen in a way that fosters rich description of a samples characteristics.We ended up with nine variables. As experts for the task we used ten financialanalysts from a large Finnish bank (Vanharanta et al., 1995). The financial analystshad all been in the field for at least 5 years.

    In principle we could have chosen all 47 financial ratios as input variables. Wewould not have had problems with multicollinearity, but we would probably have

    run into problems because of the limited number of observations. Too many inputvariables in combination with too few observations could have affected the neuralnetworks ability to learn.

    The following nine ratios were selected.

    - Operating profit (% of sales) (1)- Profit after financial items (% of sales) (1)- Return on total assets (ROTA) (1)- Return on equity (ROE) (1)- Total liabilities (% of sales) (2)

    - Solidity (3)- Current ratio (4)- Funds from operations (% of sales) (6)- Investments (% of sales) (6)

    The numbers in parentheses indicate the appropriate ratio group number shownearlier.

    We note that there are four profitability measures, one indebtedness measure, onecapital structure measures, one liquidity measure, no working capital measures andtwo cash flow measures. It seems reasonable that the emphasis is on profitability ina benchmarking situation.

    3. Constructing the maps

    In this section we give a description of the construction process followed indeveloping the self-organizing maps. The actual construction work was performedusing The Self-Organizing Map Program Package version 3.1 prepared by the SOMProgramming Team of the Helsinki University of Technology (Kohonen, 1997).

    We started by standardizing the ratios in the data base using histogram equalization(Klimasauskas, 1991) in order to ease the SOMs learning process and to improve

    its performance. Histogram equalization is a way of mapping rare figures to a smallpart of the target range and spreading out frequent figures so that it becomes easierfor the neural network to discriminate among frequent figures.

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    11/20

    201B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    All the maps were constructed in two phases. The purpose of the first phase wasto order the randomly initialized weight vectors of the maps to approximately cor-rect values. During the second phase the maps are fine-tuned, i.e. final orderingof the reference vectors takes place.

    The construction of maps is very fast due to the small amount of data available(see next section). Such maps enable comparisons between the financial situationsof companies to be made. This approach does include the presumption that the inputspace for each year contains an adequately comprehensive description of the wholepossible input space, i.e. all the realistically possible combinations of financial ratios.

    We constructed maps separately for each of the years 1985, 1986, 1987, 1988,and 1989. The network topology chosen was hexagonal with 15*10 neurons in eachmap. This is the same network structure as in our previous study (Back et al., 1998).The parameters of the best maps with respect to the average quantization error aregiven in Table 2.

    4. Analyzing the maps

    In the construction process hundreds of maps were constructed. The best ones, inrespect of average quantization error (shown in Table 2), were more carefullyinspected, i.e. the locations of the companies and the values of weights(corresponding to financial ratios) were visualized.

    We proceeded as follows. We identified the clusters from the S.C. U-matrix foryear 1985 and determined if these clusters still existed in the years 198689. Alterna-tively we could have analysed the clustering per year, but in this paper we wantedto study the behaviour of the companies within the fixed clusters.

    Table 2

    Network parameters

    Year Phase Training Learning rate Neighborhood Quantization

    length width error

    1985 1 1000 0.05 10

    2 95,000 0.02 3 0.247267

    1986 1 1000 0.08 10

    2 115,000 0.02 3 0.261194

    1987 1 1000 0.07 10

    2 95,000 0.03 3 0.274494

    1988 1 1000 0.06 11

    2 120,000 0.02 3 0.257365

    1989 1 1000 0.06 12

    2 100,000 0.03 3 0.253538

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    12/20

    202 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    4.1. Identifying the clusters

    Based on the U-matrix in Fig. 2 we identified eight (8) clusters, groups A to Has follows. In the matrix, distances between neurons are given via gray-scaling. Thenodes in the matrix marked with a dot represent the neurons. The nodes withoutdots represent the distances: the darker the node between two neurons the longerthe distance. The clusters are groups of neurons surrounded by dark bordering nodes.

    The interpretation to the eight clusters is given by analysing the weight vectors,the s.c. weight maps (Appendix B) where the weight for each neuron is visualizedby gray-level imaginglight shades representing high values and dark shades rep-resenting low values. We can look at each input variable separately. The value ofthe variable operating profit for example is high for the neurons on the right handside of the map and low for the neurons on the left hand side. Hence, a companythat is mapped onto the neurons on the right hand side of a self-organizing map hasa higher operating profit than the companies on the left hand side. On the other handthe variable total liabilities has a low value in the neurons in the lower part of the mapand a high value in the upper part of it. Therefore, a company with high liabilities ismapped on the lower half of the self-organizing map and so on.

    We give here the U-matrix as well as the weight maps for the year 1985 only. Asimilar analysis is done for all the other years, too. Therefore a group that is physi-cally placed in a certain spot on the self-organizing map for the year 1985 mightappear on a totally different place for the other years. However, as is shown below,

    the eight groups do exist throughout the years even though during the final years ofour research period certain new clusters start to emerge. Moreover, the companiesmapped into the eight clusters might vary from year to year.

    4.2. Financial performance within the groups

    Our interpretation of the defined groups (see Appendix A) based on weight mapsfor year 1985 is as follows:

    Fig. 2. U-matrix for the year 1995.

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    13/20

    203B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    - Group A is separated into subgroups A1 and A2. A1 can be considered as anaverage group. The group is doing rather well regardless of which ratio is usedas an indicator. The A1 group consists solely of US companies except for oneCentral European company. The A2 group is somewhat below average but veryclose to the A1 group in every respect.- Group B is separated into subgroups B1 and B2. Characteristic to group B arehigh total liabilities and investments combined with small profitability and lowsolvency. The difference between the subgroups is that B1 has slightly highervalues in liabilities and investments than does B2. Moreover, B1 has highoperating profit. The subgroup B1 consists mainly of Finnish and Canadian com-panies and B2 includes also US companies.- Group C is best defined as slightly better than average. It consists mainlyof North-American and Central European companies.- Group D represents the best companies in terms of high profitability, solidityand cash flow. On the other hand it is a group of low investment companies. Itconsists of mainly North-American companies and one Swedish company.- Group E represents companies with high investments and relatively low solid-ity but, surprisingly at the same time, the highest liquidity. Profitability is aboveaverage. It consists of two North-American, two Swedish and one Finnish com-pany.- Group F has a slightly lower profitability and liquidity than group E, but onthe other hand better solidity. If it were not for the extremely high current ratio

    of group E, these two groups would probably have been defined as one. It consistsof Finnish and US companies.- Group G is almost as good as group D. It probably would have been justifiedto define also groups D and G as subgroups like B1 and B2. It consists of twoSwedish, two North-American and two Central European companies.-Group His undoubtedly the most solid group. It is a group of low investments,cash flow and profitability, but high solidity and liquidity. It consists of mainlyNorth-American companies.

    Because the groups were identified with data from year 1985, the companies inthese groups for the other years are not always identical though the groups clearlyexist. Furthermore, for every year we do not have data from the same companiesresulting in some companies missing in some maps and appearing in others. Relatedto this we can notice that in the years 198889 a new group of companies, GroupX, starts to emerge showing an other side of the dynamics of the system.

    4.3. Financial performance over time

    In the following we focus only on the financial performance of the Finnish compa-nies over time. As was stated previously, most of the Finnish companies can be

    located to the group B for year 1985. Only two companies, namely 25 (Rauma-Repola) and 29 (Yhtyneet) are outside this group. The same pattern continues duringthe years 198689. Most of the Finnish companies are investing heavily with huge

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    14/20

    204 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    liabilities. They have a low solidity, a weak liquidity and a bad profitability basedon the ratios chosen for this study.

    In the year 1986 company 25 has joined the group B and stays within this groupthrough the rest of the years in this study. Company 29 stays outside group B untilthe last year of this study and joins the group B in 1989.

    5. Conclusions

    The objective of this study was to investigate the potential of self-organizing mapsto support in managing the complexity in a large data base by structuring the vastamount of financial data available on companies. Our work bench consisted of ahyperknowledge-based system for financial benchmarking. The data base containedfinancial data on more than 120 pulp and paper companies worldwide. Using ninedifferent ratios as variablesfour measuring profitability, one indebtedness, onecapital structure, one liquidity, and two cash flowwe constructed different mapsfor each of the years 1985, 1986, 1987, 1988, and 1989.

    We anticipate that neural networks can be used in future for benchmarking pur-poses to help executives find company characteristics that will lead to sustainableexcellence of a company, in other words to help answer the question: Which are thecharacteristics that lead a company towards long-lasting good performance? Somecompany characteristics seem to produce and maintain good overall company per-formance, sustainable profitability, increasing productivity and continuous growth.

    In this investigation we showed how to analyse Finnish pulp and paper companiesover time in a world-wide scale. Our study shows that self-organizing maps can beuseful for structuring large financial data bases in a meaningful way. We think thatthis approach, although historically based, is a valuable starting point for bench-marking purposes. From the maps presented the managers can pick out differentcompanies for further and deeper analyses.

    Acknowledgements

    We like to thank Mikko Irjala for carrying out the practical work with trainingthe networks and Marko Gronroos for helping to get the paper into the final format.We also want to thank the anonymous referees for their constructive comments. Thework reported here was carried out within the AnNet-project. The authors wish tothank the Foundation for Economic Education and the Academy of Finland for pro-viding financial support for this project.

    References

    Back, B., Irjala, M., Sere, K., & Vanharanta, V. (1998). Competitive financial benchmarking using self-

    organizing maps. In M. Vasarhelyi & A. Kogan (Eds.).Artificial Intelligence in Accounting and Audit-ing, (pp. 6981). Princeton: Marcus Wiener Publishers.

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    15/20

    205B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    Chen, S. K., Mangiameli, P., & West, D. (1995). The comparative ability of self-organizing neural net-

    works to define cluster structure.Omega, International Journal of Management Science,23, 271279.Fernandez-Castro, A., & Smith, P. (1994). Towards a General Non-parametric Model of Corporate Per-

    formance.Omega, International Journal of Management Science, 22, 237249.Geber, B. (1990). Benchmarking: Measuring yourself against the best. Training,27, 3644.Ketchen, D., & Shook, C. (1996). The application of cluster analysis in strategic management research:

    An analysis and critique.Strategic Management Journal,17, 441458.Klimasauskas, C. C. (1991). Applying neural networks, Part IV: improving performance.PC/AI Magazine,

    5, 4.Kohonen, T. (1997). Self-Organizing Maps. Berlin: Springer-Verlag.Martin-del-Brio, B., & Serrano-Cinca, C. (1995). Self organizing neural networks: The financial state of

    spanish companies. In A. Refenes (Ed.), Neural Networks in the Capital Markets, New York: JohnWiley and Sons.

    Meyer, J.A. (1997). The acceptance of visual information in management.Information and Management,32, 275287.

    Meyer, A. D., Tsui, A. S., & Hinings, C. R. (1993). Configurational approaches to organizational analysis.Academy of Management Journal,36, 11751195.

    Salonen, H., & Vanharanta, H. (1990a). Financial analysis world pulp and paper companies 19851989,

    Nordic Countries. Green Gold Financial Reports, 1, Ekono Oy, Espoor, Finland.

    Salonen, H., & Vanharanta, H. (1990b). Financial analysis world pulp and paper companies 19851989,

    North America. Green Gold Financial Reports, 2, Ekono Oy, Espoo, Finland.

    Salonen, H., & Vanharanta, H. (1991). Financial analysis world pulp and paper Companies 19851989,

    Europe.Green Gold Financial Reports, 3, Ekono Oy, Espoo, Finland.Trigueiros, D. (1995). Accounting identities and the distribution of ratios.British Accounting Review,27,

    109126.

    Vanharanta, H. (1995). Hyperknowledge and continuous strategy in executive support systems.Acta Acad-emiae Aboensis, 55, Turku, Finland.

    Vanharanta, H., Kakola, T., & Back, B. (1995). Validity and utility of a hyperknowledge-based financialbenchmarking system. In Proceedings of the Twenty-Eight Annual Hawaii International Conferenceon Systems Science, 3, 221230. IEEE:Computer Society Press.

    Vermuelen, E.M., Spronk, J., & Van Der Wijst, D. (1994). Visualising Interfirm Comparison. Omega,International Journal of Management Science,22, 237249.

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    16/20

    206 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    Appendix A

    MAPS FOR THE YEARS 19851989

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    17/20

    207B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    18/20

    208 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    19/20

    209B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210

    Appendix B

    WEIGHT MAPS FOR YEAR 1985

  • 8/10/2019 Managing complexity in large data bases using self-organizing maps.pdf

    20/20

    210 B. Back et al. / Accting., Mgmt. & Info. Tech. 8 (1998) 191210