KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki....

80
Knowl. Org. 38(2011)No.6 KO KNOWLEDGE ORGANIZATION Official Bi-Monthly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation Contents Articles James Blake. Some Issues in the Classification of Zoology................ 463 Ibrahim Bounhas, Bilel Elayeb, Fabrice Evrard and Yahya Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation and Terminology Extraction .................................................. 473 Martin Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci? ........................ 491 Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative Conceptualization of Topical Content Domains ......... 503 Heejin Park. A Conceptual Framework to Study Folksonomic Interaction.......................................515 Deborah Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes ..................................................530 Editor’s note ....................................................................540

Transcript of KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki....

Page 1: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6

KO KNOWLEDGE ORGANIZATION Official Bi-Monthly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Contents

Articles James Blake. Some Issues in the Classification of Zoology................ 463 Ibrahim Bounhas, Bilel Elayeb, Fabrice Evrard and Yahya Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation and Terminology Extraction.................................................. 473 Martin Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?........................ 491 Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative Conceptualization of Topical Content Domains ......... 503

Heejin Park. A Conceptual Framework to Study Folksonomic Interaction.......................................515 Deborah Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes ..................................................530 Editor’s note ....................................................................540

Page 2: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6

KNOWLEDGE ORGANIZATION KO Official Bi-Monthly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

KNOWLEDGE ORGANIZATION This journal is the organ of the INTERNATIONAL SOCIETY FOR KNOWLEDGE ORGANIZATION (General Secretariat: Vivien PETRAS, Humboldt-Universität zu Berlin, Institut für Bibliotheks- und Informationswissenschaft, Unter den Linden 6, 10099 Berlin, Germany. E-mail: [email protected].

Editors Dr. Richard P. SMIRAGLIA (Editor-in-Chief), School of Infor-mation Studies, University of Wisconsin, Milwaukee, Bolton Hall 5th Floor, 3210 N. Maryland Ave., Milwaukee, WI 53211 USA. E-mail: [email protected]

Dr. Joseph T. TENNIS (Book Review Editor), The Information School of the University of Washington, Box 352840, Mary Gates Hall Ste 370, Seattle WA 98195-2840 USA. E-mail: [email protected] Dr. Ia MCILWAINE (Literature Editor), Research Fellow. School of Library, Archive & Information Studies, University College London, Gower Street, London WC1E 6BT U.K. Email: [email protected]

Dr. Nancy WILLIAMSON (Classification Research News Edi-tor), Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6 Canada. Email: [email protected]

Hanne ALBRECHTSEN (ISKO News Editor), Institute of Knowledge Sharing, Bureauet, Slotsgade 2, 2nd floor DK-2200 Co-penhagen N Denmark. Email: [email protected]

David J. BLOOM (Editorial Assistant), School of Information Studies, University of Wisconsin, Milwaukee, Bolton Hall 5th Floor, 3210 N. Maryland Ave., Milwaukee, WI 53211 USA.

Melodie Joy FOX (Editorial Assistant), School of Information Studies, University of Wisconsin, Milwaukee, Bolton Hall 5th Floor, 3210 N. Maryland Ave., Milwaukee, WI 53211 USA.

Ann M. GRAF (Editorial Assistant, School of Information Stud-ies, University of Wisconsin, Milwaukee, Bolton Hall 5th Floor, 3210 N. Maryland Ave., Milwaukee, WI 53211 USA.

Consulting Editors Dr. Clare BEGHTOL, Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6, Canada. Email: [email protected]

Dr. Gerhard BUDIN, Dept. of Philosophy of Science, University of Vienna, Sensengasse 8, A-1090 Wien, Austria. Email: [email protected]

Prof. Jesús GASCÓN GARCÍA, Facultat de Biblioteconomia i Documentació, Universitat de Barcelona, C. Melcior de Palau, 140, 08014 Barcelona, Spain. Email: [email protected]

Claudio GNOLI, University of Pavia, Mathematics Department Library, via Ferrata 1, I-27100 Pavia, Italy. Email: [email protected]

Dr. Rebecca GREEN, Assistant Editor, Dewey Decimal Classifi-cation, Dewey Editorial Office, Library of Congress, Decimal Classification Division , 101 Independence Ave., S.E., Washington, DC 20540-4330, USA. Email: [email protected]

Dr. José Augusto Chaves GUIMARÃES, Departamento de Ciên-cia da Infromação, Universidade Estadual Paulista–UNESP, Av. Hygino Muzzi Filho 737, 17525-900 Marília SP Brazil. Email: [email protected]

Dr. Birger HJØRLAND, Royal School of Library and Informa-tion Science, Copenhagen Denmark. Email: [email protected]

Dr. Barbara H. KWASNIK, School of Information Studies, Syra-cuse University, Syracuse, NY 13244 USA, (315) 443-4547 voice, (315) 443-4506 fax. Email: [email protected]

Dr. Marianne LYKKE, e-Learning Lab, Center for User-driven Innovation, Learning and Design, Department of Communica-tion, Aalborg University, Kroghstraede 1, room 2.023 Denmark 9220 Aalborg OE. E-mail: [email protected]

Dr. Jens-Erik MAI, Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6, Canada. Email: [email protected]

Ms. Joan S. MITCHELL, Editor in Chief, Dewey Decimal Classi-fication, OCLC Online Computer Library Center, Inc., 6565 Frantz Road, Dublin, OH 43017-3395 USA. Email: [email protected]

Dr. Widad MUSTAFA el HADI, URF IDIST, Université Charles de Gaulle Lille 3, BP 149, 59653 Villeneuve D’Ascq, France. E-mail: [email protected]

H. Peter OHLY, GESIS – Leibniz Institute for the Social Sciences, Lennestr. 30, 53113 Bonn, Germany. eMail: [email protected]

Dr. Hope A. OLSON, School of Information Studies, 522 Bolton Hall, University of Wisconsin-Milwaukee, Milwaukee, WI 53201 USA. Email: [email protected]

Dr. M. P. SATIJA, Guru Nanak Dev University, School of Li-brary and Information Science, Amritsar-143 005, India. E-mail: [email protected]

Dr. Otto SECHSER, In der Ey 37, CH-8047 Zürich, Switzerland

Dr. Winfried SCHMITZ-ESSER, Salvatorgasse 23, 6060 Hall, Ti-rol, Austria.

Dr. Dagobert SOERGEL, Department of Library and Infor- mation Studies, Graduate School of Education, University at Buffalo, 534 Baldy Hall, Buffalo, NY 14260-1020. E-mail: [email protected]

Dr. Eduard R. SUKIASYAN, Vozdvizhenka 3, RU-101000, Mos-cow, Russia.

Dr. Martin van der WALT, Department of Information Science, University of Stellenbosch, Private Bag X1, Stellenbosch 7602, South Africa. Email: [email protected]

Prof. Dr. Harald ZIMMERMANN, Softex, Schmollerstrasse 31, D-66111 Saarbrücken, Germany

Founded under the title International Classification in 1974 by Dr. Ingetraut Dahlberg, the founding president of ISKO. Dr. Dahl-berg served as the journal's editor from 1974 to 1997, and as its publisher (Indeks Verlag of Frankfurt) from 1981 to 1997.

Page 3: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

463

Some Issues in the Classification of Zoology†

James Blake

Central Library, Imperial College London, London, SW7 2AZ, <[email protected]>

James Blake qualified as a librarian in 2009. He works in the acquisitions and metadata team of the li-brary of Imperial College London. His interests include the relationship between scholarly and biblio-graphic classifications in all subjects, and all aspects of zoological classification.

Blake, James. Some Issues in the Classification of Zoology. Knowledge Organization, 38(6), 463-472. 39 references. ABSTRACT: This paper identifies and discusses features of the classification of mammals that are relevant to the bibliographic classification of the subject. The tendency of zoological classifications to change, the differing sizes of groups of species, the use zoologists make of groupings other than taxa, and the links in zoology between classification and nomenclature, are identified as key themes the bibliographic classificationist needs to be aware of. The impact of cladistics, a novel classificatory method and philosophy adopted by zoologists in the last few decades, is identified as the defining feature of the current, rather turbulent, state of zoological classification. However be-cause zoologists still employ some non-cladistic classifications, because cladistic classifications are in some ways unsuited to optimal information storage and retrieval, and because some of their consequences for zoological classification are as yet un-known, bibliographic classifications cannot be modelled entirely on them.

Received 3 January 2011; revised and accepted 29 April 2011

† This paper is based on a thesis of the same title, completed as part of an MA in Library and Information Studies at Univer-sity College London in 2009, and available at http://62.32.98.6/elibsql2uk_Z10300UK_Documents/Catalogued_PDFs/ Some_issues_in_the_classification_of_zoology.PDF. Thanks are due to Vanda Broughton, who supervised the MA thesis; and to Diane Tough of the Natural History Museum, London and Ann Sylph of the Zoological Society of London, who both provided valuable insights into the classification of zoological literature.

1.0 Introduction The classification of animals is central to the disci-pline of zoology (Heywood 1975, 57; de Queiroz and Gauthier 1992, 472), and zoologists see it as serving two functions. It records scientific knowledge—to be precise, our understanding of the genealogical rela-tionships between species—and it is a method of storing and retrieving information about the different species and groups of species (Simpson 1945, 4, 13; Mayr 1982, 148-9; Groves 2001a, 30). The biblio-graphic classificationist is likely to be pleased that zo-ologists place so much importance on classification, and in particular that they view it as a tool for infor-mation retrieval. However, a comparison of zoological classifications and the corresponding bibliographic

classifications shows that, while the latter are clearly based on the former, they differ from them in signifi-cant ways, which are not easily summarised.

There are several reasons for this, two of which de-serve mention here. Firstly, the classification of ani-mals is a complex and often problematic activity; as this paper shows, several key features of the classifi-cations used by zoologists need to be understood be-fore they can be used as a basis for bibliographic clas-sifications. Secondly, while bibliographic classifica-tionists are likely to be interested in both scientific accuracy and efficacy in information retrieval, it is a reasonable assumption that, compared with zoolo-gists, they are likely to give more weight to the latter. How much more weight they should give is not a straightforward question to answer. In this regard it is

Page 4: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

464

instructive to note that Hjørland and Nicolaisen (2004, 56-7) argue that bibliographic classificationists should in most circumstances base their schemes on scholarly classifications, while New and Trotter (1996, 5) assert that the importance of literary war-rant is “hard to overestimate.” Reconciling these two injunctions is likely to be central to the work of the bibliographic classificationist.

In this paper, one group of animals, the mammals, is used as a case study. Comments are offered on as-pects of the Dewey Decimal Classication’s treatment of mammalogy, but no attempt is made to evaluate the scheme comprehensively or to compare it with other schemes. 2.0 Change in zoological classification Firstly, it is important to note what has been described as the “inherent fluidity” of the classification of or-ganisms such as mammals (Wilson and Reeder 2005b, xix). Comparison of different classifications, such as those summarised by Rose and Archibald (2005, 3), shows that change is constant and of several kinds. The differences between the influential classifications by Simpson (1945) and Wilson and Reeder (2005a) il-lustrate this. Simpson’s 18 orders of mammal have be-come 29 in Wilson and Reeder, and there are numer-ous changes in the sequence of orders, too complex to summarise. Other changes reflect new conclusions about relationships within orders. For example, Simp-son divides the order Carnivora into terrestrial and marine forms: cats, dogs, bears, etc. (Fissipedia) on the one hand, and seals and sealions (Pinnipedia) on the other. In Wozencraft (2005) in Wilson and Reeder (2005a), the primary division is between cats and their relatives (Feliformia) and dogs and their relatives (Caniformia), the seals and sealions becoming a sub-division of the Caniformia.

An examination of change in zoological taxonomy shows that it has at least two major causes: new theo-ries about the relationships between species, and new ideas about the information a classification should convey. In recent decades, major changes have been caused by molecular studies, which have led to new theories about the relationships between species, and cladistics, which represents a new conception of how a classification should reflect those relationships.

Molecular studies mostly focus on DNA and have proved a powerful tool for studying the relationships between taxa (Rose and Archibald 2005, 2; Lecointre and Le Guyader 2006, 5). The word “revolution,” sometimes used in connection with these studies

(Groves 2001a, 10), is often also applied to cladistics (see for example Groves 2001a, 8). Cladistics origi-nated in the 1950s and more recently has won near-universal acceptance among zoologists engaged in classificatory work (Groves 2001a, 8; Mishler 2009, 63). Both a philosophy and a suite of methods, it is the philosophy that is relevant to the present discus-sion.

In zoological classification the taxon, “a group of organisms that is recognised as a formal unit” (Le-cointre and Le Guyader 2006, 23), has long been a key concept. In cladistic philosophy, a higher taxon (any taxon above species level) must be a clade: a group composed of an ancestral species, all of its descen-dants, and no other organisms (Groves 2001a, 9). An-cestry is seen as the only criterion for classification.

The distinctiveness of the cladistic approach can be appreciated by comparing it with another classifica-tory school, evolutionary taxonomy, one that has now been largely discarded (Groves 2001a, 7). An is-sue in the classification of humans and our closest liv-ing relatives illustrates the difference in approach. Traditionally, humans were placed in one family, the Hominidae, and apes in another, the Pongidae (Simp-son 1945, 67-8). Molecular studies, however, indicate that chimpanzees are more closely related to humans than they are to gorillas (Lecointre and Le Guyader 2006, 494).

Figure 1. Evolutionary relationships between gorillas, chim-

panzees and humans

With a cladistic approach, the ape-human distinction cannot be maintained, because a chimpanzee-gorilla grouping that excludes humans is not a clade. Evolu-tionary taxonomists, by contrast, would not necessar-ily object to the ape-human distinction, even while accepting the molecular data. They would view the traditional ape family as being acceptable in consist-ing of an ancestral species and some of its descen-dants. Furthermore, they might see value in placing humans in a separate family to indicate how different we are from our relatives in, for instance, intelligence. Cladists regard this approach as unsatisfactory be-cause this “evolutionary distance” cannot be meas-

Page 5: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

465

ured (Groves 2001a, 7). Cladistics thus brings both simplicity and rigour to the process of classification, contrasting with the more complex and subjective judgements necessary in earlier schools of zoological classification.

The combined effects of molecular studies and cladistics have in some respects been relatively mod-est for the classification of mammals. Mammals as a whole are still regarded as forming a valid taxon, as are many important groupings, such as rodents, bats, primates and carnivorans. In another sense, cladistics has brought profound change because a rigorously cladistic approach produces hierarchies of taxa of very different shape to traditional taxonomy. The diagram below shows a traditional classification of the family Hominidae (as defined by Groves 2005, 181-2). The Linnaean system of ranks provides the classification's basic structure. As in this diagram, Linnaean classifi-cations often make use of certain obligatory ranks only, in this case family and genus. Intermediate ranks such as subfamily are omitted even though their use would convey information about relationships be-tween the taxa.

Figure 2. The Hominidae divided into genera

A rigorously cladistic approach produces a classifica-tion that looks rather different, as shown below:

Figure 3. The Hominidae divided into clades

The differences between these two classifications stem from the information each aims to convey, ra-ther than conflicting views about the relationships be-tween the species concerned. It is a distinctive feature of the second approach that many more levels in the hierarchy are shown—in other words, there are many more higher taxa—and that each higher taxon contains only two daughter taxa. It should be noted that the two hierarchies shown above represent extremes.

Many Linnaean classifications use more ranks than the obligatory ones (for example Simpson 1945). Equally, even the most rigorously cladistic classifica-tions are generally unable to present complete hierar-chies of clades, principally because zoologists know too little about the relationships between the taxa concerned.

Turning to the use of the two kinds of hierarchy in the zoological taxonomic literature, a distinction can be drawn between works whose main aim is to pro-vide information about the relationships between higher taxa (for instance McKenna and Bell 1997; Le-cointre and Le Guyader 2006) and those that princi-pally provide lists of species (such as Wilson and Reeder 2005a). The latter are less likely to follow a strictly cladistic approach, being interested in the higher taxa more as a way of structuring a list of spe-cies than as a mapping of evolutionary relationships; information retrieval is prioritised over the expression of scientific knowledge. A Linnaean classification has benefits from an information retrieval point of view; as well as familiarity, the smaller number of levels in a Linnaean hierarchy leads to a simpler arrangement of the material. A striking example of this approach is the website Encyclopedia of Life, which aims to offer a web page for every living species of organism and makes use of only the seven obligatory Linnaean ranks, from species to kingdom. 3.0 Disparities in the size of higher taxa Another feature of the classification of organisms that the bibliographic classificationist needs to be aware of is the tendency of higher taxa to vary greatly in the number of species they contain. As Linnaean and cladistic hierarchies differ in structure, they need to be considered separately when quantifying this. The Lin-naean classification of mammals can be examined us-ing Wilson and Reeder's (2005b, xxvi-xxx) summary of the number of species and genera in different or-ders. In their classification, 42 percent of species are members of the rodent order while another 21percent are bats; 11 out of 29 orders have 10 or fewer species.

Analysis of the number of mammal species in vari-ous clades shows that cladistics makes the disparities between species numbers in different higher taxa even greater. Here the clades described by Lecointre and Le Guyader (2006, 389) are considered in conjunc-tion with species numbers from Wilson and Reeder (2005b, xxvi-xxx), a work which is more authoritative at the species level but does not attempt a rigorously cladistic classification.

Page 6: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

466

Looking at mammals cladistically, the primary di-vision is between monotremes (5 species) and euthe-rians (5411 species). The eutherians then divide into 331 marsupials and 5,080 placentals. Among the pla-centals, the primary division is between 31 xenarthrans (American anteaters and relatives) and 5,049 others.

Figure 5. Species numbers in some major mammalian clades

Examining the mammals as a whole, we do find sister groups where the difference in size is less extreme. For example, the marsupials divide into 93 opossums and 238 others. Deeper down the hierarchy, however, there are still many sister groups of wildly unequal size.

Bibliographic classificationists have discussed the usefulness of notational expressivity from a variety of standpoints (Vickery 1956; McIlwaine 1996; Broughton 1999), while Broughton (1999) has also identified the sensible use of notational space as one of the features of a well-constructed classification. The divergent sizes of higher taxa mean that a biblio-graphic classification whose notation attempts to en-capsulate the hierarchy of those taxa will be wasteful of notational space. In a classification based on the Linnaean model, taxa with few species, such as mono- tremes, will be allotted far more space than they are likely to need. The problem will be more acute for a

bibliographic classification that attempts to follow a strictly cladistic approach by, for instance, allotting monotremes the same notational space as all the other mammals put together. It seems doubtful if even a specialist scheme employing a large notational base could model a schedule on cladistic hierarchies to any meaningful extent, though techniques such as Ranga-nathan’s telescopic notation would help to an extent at least (Bhattacharyya and Ranganathan 1974, 138-9).

As already noted, zoologists see biological classifi-cation as both an expression of theories about the re-lationships between taxa and as an information stor-age and retrieval system. Mayr (1982, 240-1) argues that the second of these functions imposes limits on both the number of taxa a higher taxon can sensibly contain and on the number of levels appropriate in a hierarchy. Thus cladistics, with its deep hierarchies, can be seen as a move towards greater scientific accu-racy at the expense of efficient information retrieval. This inefficiency with regard to information retrieval helps explain why many monographs and other pub-lications continue to organise their material using Linnaean ranks rather than hierarchies of clades. 4.0 Quasi-taxonomic groupings in zoology Although the concept of the taxon has always been important, zoologists group animals in a great num-ber of other ways, as well, even if they do not neces-sarily think of this activity as classification. Many of these groupings, such as the faunas of particular countries, have little to do with evolutionary relation-ships; the ways in which bibliographic classifications may make provision for all these is beyond the scope of this paper. Other groupings may be termed quasi-taxonomic, because, while they are not taxa, they bear some relationship to them.

An example is monotremes-and-marsupials. It has long been agreed that the deepest division within liv-ing mammals lies between the monotremes on the one hand and the marsupial and placental mammals on the other (Simpson 1945, 39; Lecointre and Le Guyader 2006, 389). There have, however, always been many monographs and other publications that take as their subject monotremes-and-marsupials, even though this combination of groups does not constitute a taxon. A search of WorldCat found 39 monographs about monotremes-and-marsupials, but only 20 solely about the monotremes. (This total ex-cludes works on particular kinds of monotreme.) The titles of two monographs illustrate the principal rea-sons these taxa are so often linked: Monotremes and

Figure 4. Percentage of mammal species in different orders

Page 7: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

467

Marsupials: the Other Mammals (Dawson 1983) and A Handbook of New Guinea Marsupials and Monotremes (Menzies 1991). Monotremes and marsupials are united by their otherness: they are different to the placental species that account for the great majority of mammals. They also together form the distinctive part of the Australasian mammal fauna (Wilson and Reeder 2005a).

It is noteworthy that works on monotremes-and-marsupials continue to be written in the cladistic era. The most zealous cladists, such as Lecointre and Le Guyader (2006, 6-7), criticise the use of such group-ings, pointing out an inconsistency in the way con-temporary zoologists subscribe to cladistic theory but continue to study, and write about, non-cladistic groups. Yet it seems likely that many quasi-taxonomic groupings will continue to prove useful to zoologists. Monotremes-and-marsupials, for example, provide an obvious focus for an Australasia-based mammalogist. Some of these quasi-taxonomic groups were once re-garded as taxa; although zoologists no longer believe them to be such, they continue to be studied and written about. Hoofed mammals, which form the subject of works such as Exotic Animal Field Guide: Nonnative Hoofed Mammals in the United States (Mungall 2007), are an example.

While cladistics has focused the attention of tax-onomists on defining taxa rigorously, it may also be having the effect of creating a greater division between the groupings zoologists create as part of their taxo-nomic work and the groupings they study and write about for other purposes. Cladistics now has very wide acceptance among taxonomists. The strenuous efforts made in the late twentieth century by zoolo-gists such as Mayr (1982, 209-50; 1995) to argue the case for other schools of taxonomy would seem to have failed. Yet zoologists’ acceptance of cladistics must be seen in the context of their practical work with non-cladistic groupings. In one sense, the cladists' victory has been incomplete. This is even more apparent beyond mammalogy: major groups of animals which are no longer regarded as valid taxa, such as fishes and reptiles, continue to be studied and written about (see for example Nelson 2006; Vitt and Caldwell 2009).

Bibliographic classifications need to make provision for these quasi-taxonomic groups. In the case of mammals, relatively few quasi-taxonomic groups seem to have a significant literature, meaning that it should be feasible to offer specific classmarks, or specific in-structions, for each of these in any schedule. While few in number, these groups can account for a signifi-

cant number of publications, and so bibliographic classificationists are likely to find it worthwhile to spend time working out how to make provision for them. 5.0 Change and ambiguity in zoological

nomenclature There is an intimate relationship between zoological classification and zoological nomenclature, and the bibliographic classificationist needs to be aware of the complications that arise from this. The current system of zoological nomenclature (summarised by Mayr 1982, 171-5) derives from the work of Linnaeus in the eighteenth century. Species are given a two-part scien-tific name, with the first element in the name indicat-ing the genus the species is part of. Linnaeus grouped genera into orders, orders into classes, classes into phyla, and phyla into kingdoms. Other rankings have been added since. It is now obligatory to assign spe-cies to a family, a rank between genus and order (McKenna and Bell 1997, 20), while other, intermedi-ate ranks are used at taxonomists' discretion.

While it is common knowledge that the vernacular names of animals are often uninformative or mislead-ing about a species’ affinities, it is perhaps less widely appreciated that, because of the link between nomen-clature and classification, as well as other factors, sci-entific names are often also ambiguous and liable to date. This is despite the existence of well-established rules for naming taxa (summarised by Groves 2001a, 21-2), which aim to limit the potential for confusion.

New theories about the relationships between taxa often mean that existing names take on new meanings, or new names need to be coined for the same animals. For example, Simpson (1945, 101) places four species of river dolphin in the family Platanistidae. Mead and Brownell (2005, 738) consider three of these different enough to be classed in a separate family, leaving just Platanista, from the Indus and Ganges, in the Platanis-tidae. When a zoologist uses the term Platanistidae, it may therefore be unclear which animals are being re-ferred to. Moreover, just as one scientific name can re-fer to different taxa, so multiple names can refer to the same animal or group of animals: Simpson’s Platan-istinae (which is a subdivision of his Platanistidae) and Mead and Brownell’s Platanistidae both refer to the river dolphins of the Indus and Ganges.

These ambiguities mean that extensive guidance may be necessary if cataloguers and other non-zoologists using bibliographic classifications are to classify works correctly. While scientific names are of-

Page 8: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

468

ten less ambiguous than their scientific equivalents, the reverse can be true; in English, “river dolphin” is an example. Therefore a scheme that uses both ver-nacular and scientific names will often be preferable. It is noteworthy that in successive editions, the Dewey Decimal Classification (DDC) has gradually provided both increasingly comprehensive lists of vernacular names to complement the scientific ones and more guidance about potential sources of confusion. 6.0 Nomenclature: current debates There is currently much debate among zoologists about whether the Linnaean system of nomenclature should be retained, modified, or replaced. This is fu-elled by both a long-standing awareness of the arbi-trary nature of important elements of the system and newer uncertainties over whether it can be satisfacto-rily combined with cladistic classification. There is agreement that the ranks assigned to taxa are arbi-trary and artificial, even if this is not necessarily true of the taxa themselves. For example, Rose and Archi-bald (2005, 2) note that the meaning of the term “or-der” has gradually shifted over the centuries since Linnaeus, now denoting much narrower groupings than originally. As the ranks are artificial, then the Linnaean system's privileging of the obligatory ranks such as order and family is artificial too.

Although the concept “species” is problematic (de Queiroz 2007), recent debates about nomenclature have focused more on higher taxa. Many suggestions have been made. For example, Groves (2001a, 17-20) discusses the possibility that ranks might be used to identify taxa which emerged at a particular time, with the rank of genus, for instance, being reserved for taxa which first appear in the fossil record four to six mil-lion years ago.

Other taxonomists have suggested that each rank should represent a particular level in the cladistic hier-archy (Lecointre and Le Guyader 2006, 23). This represents an attempt to do rigorously something which taxonomy has long aimed at in rather a vague manner. As with all but the most modest proposals for change, there would be upheaval. For example, Le-cointre and Le Guyader (2006, 23) demonstrate that while birds and mammals are traditionally both as-signed the rank of class, birds are now thought to oc-cupy a deeper position in the hierarchy of vertebrates. If mammals are to remain a class, birds will have to become, perhaps, an order. An additional problem lies in the fact that many more ranks would need to be used. This is because, as discussed above, Linnaean

and cladistic hierarchies have very different shapes. McKenna and Bell (1997) attempt a partial alignment of rank with position in the cladistic hierarchy and, as a result, have to use an extensive range of obscure and sometimes newly-coined ranks, such as magnorder, grandorder, and parvorder.

Mishler (2009, 64) suggests that the use of ranks is incompatible with a genuinely cladistic approach to classification. Similar thinking is apparent in the pro-posal for the PhyloCode (Cantino and de Queiroz 2010), which is presented by its authors as an alterna-tive to the Linnaean system. The PhyloCode makes the assignment of ranks to clades optional. This pro-posal does have some advantages. For example, in Linnaean nomenclature rank names are often in-flected: in animals (though not plants), family names end in -idae and subfamily names in -inae. These names therefore have to be amended if changes in our conception of the relationships between taxa mean that they move up or down the hierarchy. If it is de-cided that the river dolphins of the Indus and Ganges are best ranked as a family rather than subfamily, their name has to change from Platanistinae to Platanisti-dae. No such change is necessary with the PhyloCode, which thus has the potential for bringing additional stability to zoological nomenclature, by breaking some of the links between taxonomy and nomencla-ture. As a result, names convey less information in the PhyloCode: an uninflected and unranked clade name tells us nothing about how the taxon concerned is re-lated to other taxa (Vitt and Caldwell 2009, 24). Vitt and Caldwell also point out that any long-term bene-fits the PhyloCode might bring would need to be bal-anced against the huge initial upheaval as the switch was made.

It does not seem that any consensus is yet emerg-ing about the future of nomenclature in the cladistic era (in addition to the works cited in the three para-graphs above, see for example Schuh 2003; Kuntner and Agnarsson 2006; Mishler 2009). Debates among these taxonomists often centre on questions of how to balance stability with currency and how to combine effective information storage and retrieval with the expression of our understanding of the evolutionary relationships between taxa. For example, Groves (2001a, 6-7, 17) offers thoughts on when scientific ac-curacy should take precedence over stability and when the reverse is of benefit.

In practice, much recent zoological literature makes pragmatic compromises. McKenna and Bell (1997, 20) include some groupings that are not valid clades in their classification, as this reduces the number of ranks they

Page 9: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

469

need to employ. Groves (2001a, 18) believes it accept-able to use some ranks to enhance information retrieval by dividing large taxa into manageable units, even if those units are not valid taxa in themselves.

With nomenclature such a live topic among zoolo-gists, it would be unwise for bibliographic classification schemes to rely solely, or perhaps even primarily, on current scientific names or ranks to define the contents of classes. For example, if inflected rank names are re-tained, but clades are re-ranked according to age, then a great number of taxa will have names with different in-flections. If taxonomists decide that rank-free nomen-clature is the appropriate and desirable complement to cladistic classification, there will be even greater conse-quences for the bibliographic classificationist. This is because, at present, the obligatory ranks provide an ob-vious way to organise a schedule for zoological litera-ture, for example playing a key role in DDC. Further-more, without ranks, hierarchies of taxa will tend to be of the cladistic rather than Linnaean kind; as discussed above, these hierarchies present problems for the bib-liographic classificationist. 7.0 The current state of the classification

of mammals While historically the classification of mammals has been in a constant state of change, the rate of change has not been uniform. For example, the historical re-view by Rose and Archibald (2005, 3) shows that Simpson (1945) ushered in a period of relative stabil-ity, his classification forming the basis for major works as late as Nowak and Paradiso (1983). Soon af-ter this, the effects of molecular studies and cladistics became more apparent, meaning that the classifica-tions of McKenna and Bell (1997) and Wilson and Reeder (2005a) are different both from each other and from all earlier works. More recently, with cladis-tics well-established and a great number of molecular studies completed, many authorities have argued that a relatively solid consensus about the broad-scale classification of mammals is emerging (Lecointre and Le Guyader 2006, 390; Springer et al 2008).

At least three factors mean that, at best, only a lim-ited stability in the way mammals are classified is like-ly to emerge. Firstly, cladistic classifications may be inherently less stable than others (Groves, 2001b, 291). According to Groves, this is because cladistics is committed to reflecting our understanding of the evo-lutionary relationships between different organisms as accurately as possible; thus cladistic classifications change whenever that understanding changes, and

compromises in order to preserve stability are less ac-ceptable. We can see this as a shift in emphasis in zoo-logical taxonomy, towards a more accurate expression of scientific hypotheses at the expense of some con-venience in information storage and retrieval; the same theme has already been noted with respect to the deep hierarchies found in cladistic classifications. Sec-ondly, at present, many zoologists still make use of non-cladistic or semi-cladistic classifications, for in-stance when organising the contents of monographs. It is not obvious if this practice will remain common-place or whether a trend towards a more rigorously cladistic approach will emerge. Finally, debates about nomenclature seem far from resolution.

Beghtol (2003, 71) writes that “information re-trieval classifications are revised only when new ideas have already been generally accepted.” Whether or not this is always true, it would certainly seem to be a prescription for good practice, even though other fac-tors will also affect the timing of revisions. For exam-ple, New (1996, 387) emphasises the importance, in a general scheme such as DDC, of prioritising the sub-jects which are currently most poorly served, and of restricting the overall pace of change to that which the scheme's users are likely to find manageable. In practice, the bibliographic classificationist is left deal-ing with the familiar issues of balancing currency with stability, pragmatism with intellectual rigour (Gnoli 2006 148; Miksa 1998, 73-6; New 1996, 386-7).

Beghtol's prescription is not necessarily easy to put into practice in a discipline in which change is con-tinuous. Bibliographic classificationists seeking to update their zoology schedules will need to choose their moment judiciously. As the classification of mammals may be on the cusp of a period of relative stability, now may not be the ideal time to make changes. Another few years may well reveal if the novel hypotheses about the relationships between the major mammalian clades, developed in recent dec-ades, do represent a genuine consensus. Even so, it is unclear when other important issues, such as the question of the most suitable system of zoological nomenclature, will be resolved.

At present, many, perhaps most, current biblio-graphic classifications for mammals reflect quite out-dated science. The latest edition of DDC, for exam-ple, arranges mammals in essentially the same way as the second edition of 1885. Revisions since DDC2 have mainly focused on adding detail and giving more guidance to users about where to place certain taxa. New (1996) and New and Trotter (1996), in their ac-counts of the changes introduced to the zoology

Page 10: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

470

schedule in DDC21, emphasise pragmatic concerns such as avoiding the re-use of numbers, rather than keeping up with developments in zoology. Indeed, some of the changes made in DDC21, such as mov-ing the monotremes to a position between the mar-supials and placentals (Mitchell 1996, 1181), repre-sent a move away from scientific accuracy in the in-terests of practical concerns such as the efficient use of notational space. Such “outdated” classifications may still do their job well. The library of the Zoologi-cal Society of London uses its own scheme, devised in the 1960s and largely based on the Bliss Bibliographic Classification, to classify the monographs it holds. The librarian reports that, in most cases, her patrons are able to retrieve items and browse the collection effectively (Sylph 2009). The forthcoming revision of UDC’s zoology schedule (Civallero 2010, in press) will hopefully shed further light on how a scheme may manage change in this subject area. 8.0 Conclusion Understanding contemporary zoological classifica-tion means understanding cladistics. There are several good reasons why bibliographic classifications should not, at least at present, entirely be re-modelled on the cladistic hierarchies of taxa that zoologists now con-struct. Firstly, zoologists still make use of “unoffi-cial,” non-cladistic classifications in many situations, for instance in some of the literature they produce. Bibliographic classificationists may here face a con-flict between reflecting scientific knowledge and re-flecting literary warrant. This conflict can perhaps be at least partially resolved by seeing both as part of a broader task of paying attention to what may be called zoological practice: the totality of what zoolo-gists do. This will include making provision in biblio-graphic classifications for all the non-taxonomic and quasi-taxonomic groupings of animals that zoologists employ; while these groupings have always been a feature of zoological practice, they seem to be prov-ing to be particularly important in the cladistics era.

Secondly, cladistic classifications are often not ideal for information retrieval. The best bibliographic clas-sification schemes will be based upon, not only knowledge of zoological practice, but also an under-standing of what affects the usability of such schemes. Zoologists are themselves interested in effective in-formation retrieval, and so useful lessons may be learned from their own classificatory practices. Their continued use of Linnaean as well as cladistic hierar-chies suggests that the former are superior for some

purposes. They are more stable, generally contain more manageable numbers of hierarchies, and exhibit disparities in the size of taxa, which, while still some-times problematic, are more modest than those found in cladistic classifications.

Thirdly, cladistics is new enough and different enough that the exact extent of its impact on zoology, let alone on bibliographic classification, is as yet un-clear. Will the current system of zoological nomencla-ture endure? Will the current practice of continuing to use Linnaean classifications for certain purposes remain widespread? Will zoologists find ways of re-sponding to the greater instability of cladistic classifi-cations? The answers to these questions are as yet unknown, meaning that major changes to any biblio-graphic classification for zoology, if aimed at bringing that classification into line with cladistic thinking, would at this point be premature. Evaluating change in zoological classification, and responding appropri-ately to it, is thus a major task for the bibliographic classificationist. In particular, assessing whether zoo-logical classification is in a period of lesser or greater stability is useful.

The link between classification and nomenclature in zoology means that this is an area to which the bib-liographic classificationist needs to pay particular at-tention. The ambiguous and changeable nature of zoological nomenclature means that users of a biblio-graphic scheme will benefit from extensive guidance about where to place works on particular taxa, as well as from the use of both scientific and vernacular names. The possibility of radical change in zoological nomenclature in the near future means that a scheme should not be overly dependent on current scientific nomenclature.

The features of zoological classification discussed here cannot be directly translated into a prescription for the bibliographic classification of the subject. Bib-liographic classification is perhaps best seen as an art as well as a science, involving the balancing of com-peting priorities (such as attention to literary warrant and attention to scholarly classifications), the exercis-ing of judgement about likely future trends, and an understanding of both how zoologists work and the factors that make for efficient information retrieval. Careful consideration of the distinctive features of zoological classification provides a necessary, and yet not in itself sufficient, foundation for the work of the bibliographic classificationist concerned with this area.

Page 11: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

471

References Begthol, Clare. 2003. Classification for informa-

tion retrieval and classification for knowledge discovery: relationships between ‘professional’ and ‘naïve’ classifications. Knowledge organization 30: 64-73.

Bhattacharyya, Ganesh and Ranganathan, S. R. 1974. From knowledge classification to library classifica-tion. In Wojciechowski, Jerzy A., ed., Conceptual basis of the classification of knowledge: Proceedings of the Ottawa Conference on the Conceptual Basis of the Classification of Knowledge October1st to 5th, 1971. New York: K. G. Saur, 1978, pp. 119-43.

Broughton, Vanda. 1999. Notational expressivity; the case for and against the representation of in-ternal subject structure in notational coding. Knowledge organization 26: 140-8.

Cantino, Philip D. and de Queiroz, Kevin. 2010. In-ternational code for phylogenetic nomenclature: version 4c [website]. <www.ohio.edu/phylocode/ preface.html>. Accessed 29 December 2010.

Civallero, E. 2010 (in press). Introduction to the revi-sion of class 59 [title as yet unknown]. Extensions and corrections to the UDC 32.

Dawson, Terence J. 1983. Monotremes and marsupials: the other mammals. London: Edwin Arnold.

Dewey, Melvil et al. 1996. Dewey decimal classification and relative index. 21st ed., Joan S. Mitchell ed. Al-bany, NY: Forest Press.

Gnoli, Claudio. 2006. Phylogenetic classification. Knowledge organization 33: 138-52.

Groves, Colin. 2001a. Primate taxonomy. Washington DC: Smithsonian Institution Press.

Groves, Colin P. 2001b. Towards a taxonomy of the Hominidae. In Tobias, Philip V. et al eds., Human-ity from African naissance to coming millennia: col-loquia in human biology and palaeoanthropology. Firenze: Firenze University Press, pp. 291-7.

Groves, Colin P. 2005. Order Primates. In Wilson, Don E. and Reeder, DeeAnn M. eds., Mammal spe-cies of the world: a taxonomic and geographic refer-ence. 3rd ed. Baltimore: John Hopkins University Press, pp. 111-184.

Heywood, V. H. 1975. Contemporary philosophies in biological classification. In Horsnell, Verina ed., Informatics 2: proceedings of a conference held by the Aslib Coordinate Indexing Group on 25-27 March 1974 at New College Oxford. London: Aslib, pp. 57-60.

Hjørland, Birger and Nicolaisen, Jeppe. 2004. Scien-tific and scholarly classifications are not ‘naïve’: a

comment to Begthol (2003). Knowledge organiza-tion 31: 55-61.

Kuntner, Matjaz and Agnarsson, Ingi. 2006. Are the Linnean [sic] and phylogenetic nomenclatural sys-tems combinable? Recommendations for biologi-cal nomenclature. Systematic biology 55: 774-84.

Lecointre, Guillaume and Le Guyader, Hervé. 2006. The tree of life: a phylogenetic classification. London: Belknap Press.

Mayr, Ernst. 1982. The growth of biological thought: di-versity, evolution, and inheritance. London: Belknap Press.

McIlwaine, Ia C. 1996. New wine in old bottles: prob-lems of maintaining classification schemes. In Green, Rebecca, ed., Knowledge organization and change: Proceedings of the Fourth International ISKO Conference 15-18 July 1996 Washington, DC, USA. Frankfurt-Main: Indeks Verlag, pp. 122-8.

McKenna, Malcolm C. and Bell, Susan K. 1997. Clas-sification of mammals above the species level. New York: Columbia University Press.

Mead, James G. and Brownell, Robert L. 2005. Order Cetacea. In Wilson, Don E. and Reeder, DeeAnn M. eds., Mammal species of the world: a taxonomic and geographic reference. 3rd ed. Baltimore: John Hopkins University Press, pp. 723-43.

Menzies, James. 1991. A handbook of New Guinea marsupials and monotremes. Madang, Papua New Guinea: Kristen Press.

Miksa, Francis. 1998. The DDC, the universe of knowl-edge, and the post-modern library. Albany: Forest Press.

Mishler, Brent D. 2009. Three centuries of paradigm change in biological classification: is the end in sight?. Taxon 58: 61-7.

Mungall, Elizabeth Cary. 2007. Exotic animal field guide: nonnative hoofed mammals in the United States. College Station: Texas A&M University Press.

Nelson, Joseph S. (2006). Fishes of the world. 4th ed. Hoboken: J. Wiley.

New, G. R. 1996. Revision and stability in Dewey 21: the life sciences catch up. In Green, Rebecca, ed. Knowledge organization and change: Proceedings of the Fourth International ISKO Conference 15-18 July 1996 Washington, DC, USA. Frankfurt-Main: Indeks Verlag, pp. 386-95.

New, G. and Trotter, R. 1996. Revising the life sci-ences for Dewey 21. Catalogue and index 121: 1-6.

de Queiroz, K. 2007. Species concepts and species de-limitation. Systematic biology 56: 879-86.

Page 12: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 J. Blake. Some Issues in the Classification of Zoology

472

de Queiroz, K. and Gauthier, J. 1992. Phylogenetic taxonomy. Annual review of ecology and systematic 23: 449-80.

Rose, Kenneth D. and Archibald, J. David. 2005. Womb with a view: the rise of the placentals. In Rose, Kenneth D. and Archibald, J. David, The rise of placental mammals: origins and relationships of the major extant clades. Baltimore: John Hopkins University Press, pp. 1-8.

Schuh, Randall T. 2003. The Linnaean system and its 250-year persistence. Botanical review 69: 59-78.

Simpson, George Gaylord. 1945. The principles of classification and a classification of mammals. New York: American Museum of Natural History.

Springer, Mark S. et al. 2008. Morphology and placen-tal mammal phylogeny. Systematic biology 57: 499-503.

Sylph, Ann (Librarian, Zoological Society of London). 2009. Conversation with author. 1 June 2009.

Vickery, B. C. 1956. Notational symbols in classifica-tion, part II: notation as an ordering device. Jour-nal of documentation 12: 73-87.

Vitt, Laurie J. and Caldwell, Janalee P. (2009). Herpe-tology: an introductory biology of amphibians and reptiles. 3rd ed. London: Academic Press.

Walker, Ernest P., Nowak, Ronald M., and Paradiso, John L. 1983. Walker’s mammals of the world. 4th ed. Baltimore, Md.: Johns Hopkins University Press.

Wilson, Don E. and Reeder, DeeAnn M. eds. 2005a. Mammal species of the world: a taxonomic and geo-graphic reference. 3rd ed. Baltimore: John Hopkins University Press.

Wilson, Don E. and Reeder, DeeAnn M. 2005b. In-troduction. In Wilson, Don E. and Reeder, DeeAnn M. eds., Mammal species of the world: a taxonomic and geographic reference. 3rd ed. Balti-more: John Hopkins University Press, pp. xxiii-xxxiv.

Wozencraft, W. Christopher. 2005. Order Carnivora. In Wilson, Don E. and Reeder, DeeAnn M. eds., Mammal species of the world: a taxonomic and geographic reference. 3rd ed. Baltimore: John Hop-kins University Press, pp. 532-628.

Page 13: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

473

Organizing Contextual Knowledge for Arabic Text Disambiguation and

Terminology Extraction †

Ibrahim Bounhas*, Bilel Elayeb**, Fabrice Evrard***, Yahya Slimani****

* Department of Computer Science, Faculty of Sciences of Tunis, University of Tunis, 1060 Tunis, Tunisia, <[email protected]>

** RIADI-GDL Research Laboratory, The National School of Computer Sciences (ENSI), 2010 Manouba, Tunisia, <[email protected]>, Informatics Research Institute

of Toulouse (IRIT), 02 Rue Camichel, 31071 Toulouse, France. *** Informatics Research Institute of Toulouse (IRIT), 02 Rue Camichel,

31071 Toulouse, France. <[email protected]> **** Department of Computer Science, Faculty of Sciences of Tunis, University of Tunis,

1060 Tunis, Tunisia, <[email protected]>

Ibrahim Bounhas obtained a license degree in computer science applied to management in 2004 from the High School of Management of Tunis (ISG) and a master’s degree in computer science in 2006 from the National School of Computer Sciences (ENSI). He is a PhD student at the Department of Computer Science of Faculty of Sciences of Tunis (2007-2008). He presented a master’s thesis entitled “Un analyseur de contenu des documents scientifiques du Web.” His current research interests are: ontology engineering, document analysis, and Arabic text processing.

Bilel Elayeb is an assistant professor at the National School of Computer Science of La Manouba in Tunisia. He obtained his PhD in computer science from the National Polytechnic Institute of Toulouse and the National School of Computer Science, Tunisia in 2009. He obtained a master’s thesis in computer science from the ENSI in 2004. His research focuses on information retrieval, computational linguistics, Arabic NLP, and multiagent systems, including possibility theory and hierarchical small-worlds networks. He has been a member of the RIADI research laboratory since 2002 and of the Informatic Research Institute of Toulouse (IRIT) since 2005.

Fabrice Evrard has been an Assistant Professor at ENSEEIHT, Toulouse, France since 1983. His research focuses on multiagent systems, dictionary modeling and analysis, information retrieval, computational linguistics, NLP, and hierarchical small-worlds neworks. He supervised a master’s degree in artificial intelligence at the National Polytechnic Institute of Toulouse (INPT). He conducted Le Groupe Raisonnement, Action et Actes de Langage (GRAAL) team, which is a part of LILaC research group at the Informatic Research Institute of Toulouse (IRIT), France. He has supervised many master theses and PhD theses in artificial intelligence, information retrieval, and NLP.

Yahya Slimani studied at the Computer Science Institute of Alger’s (Algeria) from 1968 to 1973. He received the B.Sc.(Eng.), Dr Eng and Ph.D degrees from the Computer Science Institute of Alger’s (Algeria), University of Lille, and University of Oran (Algeria) in 1973, 1986, and 1993, respectively. He is currently a professor at the Department of Computer Science of Faculty of Sciences of Tunis. His research activities concern datamining, text mining, ontology engineering, parallelism, distributed

Page 14: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

474

systems, and grid computing. Professor Slimani has published more than 90 papers from 1986 to 2008. He joined the editorial boards of the Information International Journal in 2000.

Bounhas, Ibrahim, Elayeb, Bilel, Evrard, Fabrice, and Slimani, Yahya. Organizing Contextual Knowl-edge for Arabic Text Disambiguation and Terminology Extraction. Knowledge Organization, 38(6), 473-490. 38 references. ABSTRACT: Ontologies have an important role in knowledge organization and information retrieval. Domain ontologies are composed of concepts represented by domain relevant terms. Existing ap-proaches of ontology construction make use of statistical and linguistic information to extract domain

relevant terms. The quality and the quantity of this information influence the accuracy of terminology extraction approaches and other steps in knowledge extraction and information retrieval. This paper proposes an approach for handling domain rele-vant terms from Arabic non-diacriticised semi-structured corpora. In input, the structure of documents is exploited to organize knowledge in a contextual graph, which is exploited to extract relevant terms. This network contains simple and compound nouns handled by a morphosyntactic shallow parser. The noun phrases are evaluated in terms of termhood and unithood by means of possibilistic measures. We apply a qualitative approach, which weighs terms according to their positions in the struc-ture of the document. In output, the extracted knowledge is organized as network modeling dependencies between terms, which can be exploited to infer semantic relations. We test our approach on three specific domain corpora. The goal of this evaluation is to check if our model for organizing and exploiting contextual knowledge will improve the accuracy of extraction of simple and compound nouns. We also investigate the role of compound nouns in improving information retrieval results.

Received 31 December 2010; revised 16 March 2011; accepted 23 March 2011

† Sincere thanks to Dr. Ryan Roth, Dr. Nizar Habash, Dr. Owen Rambow and all researchers from Columbia University, USA, who participated in developing MADA and helped us working with this tool. We would also like to thank the anony-mous reviewers for their helpful comments and suggestions.

1.0 Introduction The huge amount of knowledge present in docu-ments needs to be organized to help the user exploit its richness. On the one hand, documents should be indexed to help search engines retrieve their content. On the other hand, there is a growing need for auto-matic text analysis, annotation techniques, and know- ledge organizing systems (KOS) of several types (Bourigault and Lame 2002; Broughton et al. 2005). Any of these resources is structured as a set of units (terms or concepts) organized through various types of relations. Consequently, term extraction is an im-portant step in Information Retrieval (IR) (Bou-laknadel 2006), question answering (Ferret et al. 2002), knowledge extraction, and many Natural Lan-guage Processing (NLP) tasks. Candidate term ex-traction requires to define statistical measures to weight and to filter terms, but also to handle Multi-Word Terms (MWTs). According to Martínez-Santi- ago et al. (2002, 1), detecting these entities “can be successfully used in many different tasks.” More pre-cisely, the knowledge organization literature shows that noun phrases (NPs) are the best entities that represent the document’s subject (Malaisé et al. 2003; Boulaknadel 2006). In this field, Souza and Raghavan

(2006, 559) defend “the hypothesis that NPs carry the greater part of the semantics of a document.” In addition, many ontology construction tools exploit networks of syntactic dependencies. In Bourigault and Lame (2002), a network of simple and compound noun phrases generated by a syntactic analyzer is en-riched by distributional links to build a “documentary ontology” exploited as a thematic index to access documents.

Semi-structured documents (e.g., books, scientific papers, and encyclopedia) contain additional informa-tion which may be exploited to understand, to index, and to infer knowledge from corpora. This paper proposes to exploit such knowledge in terminology extraction. In fact, we transform the structure of documents, which represents a logical division of knowledge, into an empiricist contextual graph. In-deed, many researchers have investigated and con-tinue to work on extracting candidate terms from textual and semi-structured corpora. However, only few works considered Arabic documents. This task requires sophisticated corpus analysis tools which are available for many languages (e.g., French and Eng-lish). Despite the great work done in the field of Ara-bic NLP, existing ontology environments can not be directly used to process Arabic documents. One of

Page 15: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

475

the main causes is the lack of sufficient linguistic re-sources for the Arabic language. Also, approaches for Arabic text disambiguation have to be improved since this language is highly ambiguous. Related work proves both the usefulness and the difficulty of build-ing these resources (Attia et al. 2008). This difficulty made existing works adopt a manual approach (e.g., Elkateb et al. 2006; Zaidi and Laskri 2005; Attia et al. 2008) or a semi-automatic approach (Rodríguez et al. 2008). A great deal of work has been done in the field of Arabic text parsing (Attia et al. 2008) and mor-phologic disambiguation (Habash et al. 2009). These approaches perform only the first step required for term extraction. Consequently, they are unable to give a clear evaluation of candidate terms. Other works of interest to document indexing and term ex-traction lack sophisticated NLP tools (Larkey et al. 2002; Boulaknadel et al. 2008). Through this litera-ture, we feel the need for an approach which exploits sufficient and well-organized linguistic and contex-tual knowledge to handle terms.

Probabilistic measures allow one to evaluate sepa-rately two fundamental properties of terms. For ex-ample, TF-IDF (Salton and McGill 1986) is used to evaluate termhood whether scores like LLR (Dun-ning 1994) are employed to compute unithood of compound terms. In this paper, we define a possibilis-tic measure for relevance which combines the term-hood and unithood dimensions of terms.

When we consider non-diacriticised Arabic texts, this process generates many types of ambiguities. Morphosyntactic disambiguation and domain rele-vance evaluation were previously considered as two separated steps. Our possibilistic measure is used both for disambiguation and for domain relevance evalua-tion considered as interrelated tasks. Our approach exploits the structure of documents which constitutes rich contextual information. The document is seen as a tree where nodes are linked with structural relations. The relevance of a term which appears on a given node is related not only to its distribution in corpora, but also to the position of the node in the structure of the document and its structural relations. Because the context is composed of complex relationships, we model this problem as an IR task where the query en-codes contextual constraints. These queries allow one to disambiguate syntactic trees and to retrieve the most domain relevant terms.

We test our hypotheses in the particular context of extracting many domain terminologies from books of Arabic stories organized by theme. Because of the lack of gold standards, the extracted terminologies are checked by human experts who build a reference

list for each domain. This method is influenced by the subjectivity of the expert. That’s why we suggest a second method of evaluation which consists of using the extracted knowledge in the context of a possi-bilistic IR system. We report encouraging results which are to confirm the targets set for the precision and recall metrics compared to the state-of-the-art measures.

This paper is structured as follows. Section 2 pre-sents a literature review in the field of terminology extraction, focusing on the characteristics of the Ara-bic language. In Section 3, we present our approach for domain relevant term identification based on a critical study of existing approaches. We experiment this approach on Arabic corpora and present the ob-tained results in Section 4. Section 5 concludes this paper by discussing these results and providing some directions for future research. 2.0 Related work Although the notion of “term” is not yet clear, we can cite a general definition as follows: a term is “a surface representation of a specific domain concept” (Jacquemin 1997, 9). Recent research proposed to use the termhood and unithood as properties to recog-nize terms. According to Pazienza et al. (2005, 1), the termhood “expresses how much (the degree) a lin-guistic unit is related to domain-specific concepts.” Mai (2008, 20) defines a domain as follows:

An evolving and open concept that will develop as the concept is used and applied in research and practice. [T]he concept is [here] used to re-fer to a group of people who share common goals. A domain could, for instance, be an area of expertise, a body of literature, or a group of people working together in an organization.

According to Hannan et al. (2007), a domain is a cul-turally bounded segment of the social world contain-ing producers/products, audiences, and a language that tells to whom these distinctions apply and what they mean. From these definitions, we can conclude that a domain is an area of knowledge composed of a set of related items (products). It corresponds to a common interest shared by a social community (pro-ducers and audiences having a common set of percep-tions, interests, beliefs, activities, values, etc.). This community shares also a set of concepts and a termi-nology defined by the consensus of its members. Ac-cording to Spradley (1979), a domain is defined by a

Page 16: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

476

cover term (which specifies the category of the cul-tural knowledge), a set of included terms, semantic relationships between included terms and between the cover term and the included terms, and the means to define boundaries (criteria to decide whether an item belongs to the domain).

The unithood “expresses strength or stability of syntagmatic collocations” (Pazienza et al. 2005, 5). It concerns terms which are composed of more than one word. Multi-word expressions (MWEs) can be defined as “idiosyncratic interpretations that cross word boundaries” (Attia 2008, 71). To be considered as a MWE, a sequence of words should fulfill syntac-tic and semantic conditions. Attia (2008, 72) defines many properties of MWEs, such as lexogrammatical fixedness (i.e., the expression is rigid or frozen) and single-word paraphrasability (i.e., the expression can be replaced by a single word). However the main property that distinguishes these expressions is non-compositionality, which means that we cannot derive the meaning of the expression from the meanings of its components. In other words, “a multiword is a succession of words whose sense taken as a whole differs from the sum of the senses of its single words” (Martínez-Santiago et al. 2002). For example “book cover” is a compositional expression. Nevertheless, “kick the bucket” is a non-compositional expression, because its meaning (i.e., “die”) is not related to any of its constituents.

Although it is difficult to decide (or to compute a binary value of) the compositionality of a given term, only non-compositional expressions are considered as eligible MWEs. However Attia (2008, 74) argues that it is possible to accept conventionalized or institu-tionalized expressions; these expressions “have come to such a frequent use that they block the use of other synonyms and near synonyms.” We think that such expressions are useful in the context of IR tasks because they constitute good candidates for docu-ment indexing and querying. We also extract other types of expressions useful for ontology construc-tion. Let’s consider the example of the following two expressions: "اللبن الحار" (Al~albanu AlHaAr: the hot milk) and "الماء الحار" (AlmaA’u AlHaAr: the hot wa-ter) extracted from a corpus talking about drinks. The two heads "لبن" (laban: milk) and "ماء" (maA’: water) represent specific domain concepts. However, the two expressions are compositional. Besides, they are neither conventionalized nor institutionalized. Nevertheless, it is useful to extract these expressions because we can infer a link between the two heads which share the same expansion ("حار": HAr, hot).

Finally, MWEs may be categorized as idioms (e.g., down the drain), phrasal verbs (e.g., rely on), verbs with particles (e.g., give up) compound nouns (e.g., book cover) and collocations (e.g., do a favor) (Attia 2008). As previously explained, our work will be lim-ited to compound nouns. However, we do not adopt Attia’s (2008, 80) definition, which considered that “a compound noun can be formed by a noun optionally followed by one or more nouns optionally followed by one or more adjectives.” In fact, Arabic compound nouns are noun phrases having complex structure which should be defined more precisely according to Arabic grammar (cf. section 2.1.2).

To summarize, we extract two types of units. On the one hand, we extract simple nouns (constituted of only one word). We call “simple term” a simple noun eligible as far as termhood is concerned. On the other hand, we handle compound nouns which are noun phrases composed of more than one word and eligible in terms of unithood and termhood. This category contains non-compositional expressions and composi-tional ones that may be useful for indexing and query-ing. In the following, we call such units multiword terms (MWTs). In the remainder of this paper, simple and MWTs will be called “Domain Relevant Terms” (DRTs). The set of DRTs constitute the “Domain Terminology” (DT). Also, we extract noun phrases which head a DRT. These expressions will help infer links between DRTs.

In this context, we study the characteristics of the Arabic language which influence DRT extraction (cf. section 2.1) and existing approaches which dealt more or less with this problem. These approaches are often classified into two main categories (Pazienza et al. 2005). From one side, linguistic approaches exploit morphologic, syntactic, or semantic information im-plemented in language-specific rules or programs (cf. section 2.2). From the other side, statistical ap-proaches make use of association measures exploiting frequency (cf. section 2.3). Finally, hybrid approaches try to combine linguistic and statistical techniques to recognize terms (cf. section 2.4). 2.1. Characteristics of the Arabic language Arabic texts are ambiguous at several levels of analy-sis. This section focuses on problems related to ter-minology extraction at the morphologic and syntactic levels. Nevertheless, ambiguities in these levels influ-ence the semantic level and consequently the whole process of ontology building.

Page 17: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

477

2.1.1. The morphologic level The Arabic language is agglutinative, derivational, and inflectional. For example, the term "وضوء" (wDw’) may be analyzed as "وضوء" (wuDuw’: ablution),"وضوء" (waDuw’: water for ablution) or "ضوء" (Dw’: light). In this example, the letter "و" is interpreted either as a conjunction or as the first letter of the lemma. Even in the second case, we obtain two possible lemmas dia-criticised differently. In fact, the main source of ambi-guity is the lack of diacritics in most existing Arabic texts. Morphological ambiguities make it difficult to extract simple terms because for each word corre-sponds many possible lemmas.

To reduce morphologic ambiguity, existing ap-proaches which deal with the Arabic language are con-text based. Let’s suppose that an entity has several possible morphologic solutions. The first step is to as-sociate to each interpretation one or more contexts by training in a labeled corpus. In a second step, one can try to disambiguate the entities of a test collection by comparing the new contexts to those learned in the first step. This approach was implemented, for exam-ple, for POS (Part Of Speech) tagging (Diab et al. 2004) and for full morphologic analysis (Habash et al. 2009). 2.1.2. The syntactic level There are many sources of syntactic ambiguity in the Arabic language. We can identify two types of ambi-guities which influence terminology extraction. On the one hand, Arabic has a relatively free word order. For example the noun phrase " الأكل في البيت" (Alakolu fy Albeyti: eating in the house) may be written " في البيت On .(fy Albeyti Alakolu: in the house, eating) "الأكل the other hand, Arabic nouns can take the role of a verb, a preposition and adverb, or an adjective. For ex-ample, the noun "البحث" (AlbaHth) in the sentence " أثمر Athmara AlbaHothu En nataAija) "البحث عن نتائج مثمرة muthmira: The research brought promising results) accomplishes a nominal function. However, it is con-sidered as a verbal noun in the following sentence: -HAwala AlbaHtha En Hal Ak) "حاول البحث عن حل آخر "har: He tried searching for another solution).

Syntactic ambiguities influence MWT extraction, as it is hard to identify the valid noun phrases in a sen-tence having many parse trees. Since MWTs have a great role in this process and, being interested in com-pound nouns, we start by recalling the categories of Arabic noun phrases. A noun phrase (NP) is a phrase containing a head, which is a noun or a pronoun, and,

optionally, an expansion which constitutes a set of modifiers. NPs apply to syntactic rules of the lan-guage. Hence, a NP may be a unique word (a simple noun) or a composite expression. The head and the expansion are related by a syntactic relation. As de-tailed in Bounhas and Slimani (2009b), Arabic gram-mar distinguishes five types of NPs: nominal con-structs (NC) (المركب الإضافي), adjectival phrases (AP) المركب ) prepositional phrases (PP) ,(المركب النعتي) and ,(المركب العطفي) conjunctive phrases (CP) ,(الحرفيcomplex noun phrases (CNP) (i.e., expressions linked two or more prepositions and/or conjunctions). 2.2. Linguistic approaches We can distinguish three main steps in a pure linguistic approach:

Parse the corpus: linguistic tools are used to token-ize the corpus. At least POS of the words are identi-fied,

Extract candidate terms using grammar rules im-plemented as patterns or parsers. In this step, begin-ning candidate terms are mostly identified with noun phrases (Pazienza et al. 2005).

Apply filters to refine the terminology: for example by eliminating stop words, words or collocation of very common usage in language (e.g., this thing).

As example of linguistic approach applied to Arabic language, Attia (2008) presented a pure linguistic ana-lyzer for handling MWTs. The input is a lexicon of MWTs constructed manually. Then, his system tries to identify other variations using a morphologic analyzer, a white space normalizer and a tokenizer. Precise rules take into account morphologic features such as gender and definiteness to extract MWTs. The MWTs struc-tures are described as trees that can be parsed to iden-tify the role of each constituent. The goal of Attia (2008) is to perform syntactic parsing and deal with linguistic ambiguities independently from the in-tended application or domain. 2.3. Statistical approaches These approaches make use of statistical measures to evaluate the termhood and the unithood (cf. Pazienza et al. [2005] for description and formulae). Measures that weigh termhood are mainly based on frequency. One may assume that the more frequent a term in a document or in a corpus, the more it represents its subject. Even when combined with linguistic filters, this approach generates non-relevant candidate terms. To solve this problem, one may use TF-IDF (Salton

Page 18: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

478

and McGill 1986). An example of approach employing this measure for the Arabic language is presented by Al-Qabbany et al. (2009).

MWT may be weighed in terms of termhood using the same measures. However, we need other statistical measures to evaluate the unithood. The state-of-the-art measures compute the degree of the dependency between the components of the MWT (Martínez-Santiago et al. 2002; Pazienza et al. 2005). Some of these measures were applied for the Arabic language (Boulaknadel et al. 2008; Pinto et al. 2007). 2.4. Hybrid approaches Pure linguistic approaches are unable to give a clear definition of termhood. Statistical approaches “are un-able to deal with low-frequency of MWTs” (Bou-laknadel et al. 2008, 1). To avoid the weaknesses of the two approaches, a commonly recognized solution is to combine statistical calculus and linguistic knowledge. In these approaches, linguistic analysis is performed before applying statistical filters to select all linguistic admissible candidates. The accuracy of statistical measures increases because they are applied to linguis-tically justified candidates. Hybrid approaches may be improved by exploiting contextual information. The idea consists of using statistical measures to compute the correlation between a term and its context (Missi-koff et al. 2003).

As far as Arabic language is concerned, Bou-laknadel et al. (2008) presented a hybrid approach to extract MWTs from Arabic documents. They defined patterns using the POS to select candidate terms. Af-ter that, candidate terms were ranked using statistical measures. First, the approach did not include a mor-phologic analyzer. The integrated POS tagger (Diab et al. 2004) is unable to separate affixes, conjunctions, and some prepositions from nouns and adjectives. Second, POS tagging does not consider many features while defining MWT patterns. For example, it is not possible to impose constraints regarding the gender and/or the number of the MWT constituents. Third, this approach does not recognize the internal struc-ture of MWTs. As previously explained, the Arabic language defines different roles of MWT constituents. Fourth, experiments were performed on only one domain, which means that the authors considered only the unithood of terms.

3.0 A hybrid approach for Arabic terminology extraction

Existing approaches on Arabic NLP and terminology extraction dealt with many steps of this process. Some researchers adopted for a purely linguistic approach for parsing and disambiguating Arabic texts (Attia 2008). Others developed statistical context-based approaches for morphologic and POS disambiguation (Diab et al. 2004; Habash et al. 2009). These works considered only the first step required for the terminology extrac-tion process by developing NLP tools. Consequently, they are not applied to evaluate termhood or unithood. On the other side, some approaches which tried to weigh terms lack sophisticated NLP tools to extract important morphologic features and recognize the in-ternal structure of MWTs (Boulaknadel et al. 2008). The weakness of the linguistic parsing step produces an ambiguous list of terms. For example, in Al-Qabbany et al. (2009), we find in the same cluster the words "سعودي" (a saoudian) and "السعودي" (the saoudian). Besides, there is a need to consider both termhood and unithood. These two dimensions should be taken into account early in the disambiguation step. In fact, choosing a morphologic or a syntactic solution means evaluating all the possible solutions.

Based on this discussion, we conceive a hybrid ap-proach for Arabic terminology extraction which stands out by the following aspects. Firstly, we per-form full morphosyntactic parsing of corpora. At the morphologic level, we integrate MADA, which is a linguistic tool designed to perform morphologic ana-lysis, disambiguation and POS tagging in one fell swoop (Habash et al. 2009). At the syntactic level, we reuse a tool developed by Bounhas and Slimani (2009b). It is a shallow parser which identifies the type of each NP (i.e., adjectival, prepositional, and so on), its structure, and the roles of its constituents (e.g., "المضاف": annexed noun and "المضاف اليه": noun to which we annex).

Secondly, we use many specific-domain corpora in order to evaluate termhood besides unithood. Third-ly, we use statistical measures to weigh the two di-mensions. These measures are used both for disam-biguation and for DRT recognition. Consequently, we do not make a distinction between the two steps. Fourthly, the concept of relevance is not related to the distribution of terms in corpora as in TF-IDF but to complex contextual information. In our case, am-biguity resolution and domain relevance computing are seen as IR tasks where we choose the best solu-tion (s) according to many contextual constraints (the query).

Page 19: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

479

To perform this task, we have to organize knowl-edge present in documents by means of i) indexing models and ii) possibilistic networks which encode contextual relations (cf. section 3.1). The process of terminology extraction consists of a learning step al-lowing to capture initial knowledge required for rele-vance evaluation (cf. section 3.2) and an inference step where noun phrases are weighted (cf. section 3.3). 3.1. Knowledge modeling Our model is inspired from possibility theory, which represents knowledge by possibilistic networks. In such networks, we define two types of edges which correspond respectively to structural contextual rela-tions and syntactic contextual relations (cf. sections 3.1.2 and 3.1.3). The edges are weighted by the fre-quencies of terms in the corpus. As we explain in sec-tion 3.1.1, the frequencies may be computed accord-ing to a quantitative or a qualitative approach. 3.1.1. Quantitative versus qualitative indexing A document analyzer (Bounhas and Slimani 2009a) is used to extract the structure of documents (i.e., the hi-erarchy of titles and section headings). It generates as output a list of fragments with corresponding levels in the hierarchy. If a document contains M levels, the head node(s) (e.g., the main title) is (are) assigned level M. Leaf nodes (paragraphs) are assigned level one.

Within the quantitative approach, the number of occurrence of the term ti in the document Dj is given by:

k

kiji ndtoccdtOcc ),(),( (1)

The value occ(ti,ndk) is the count of the term ti in the node ndk. Within the qualitative approach, the number of oc-currences is computed as follows:

k

kkiji ndlevelndtoccdtOcc )(*),(),( (2)

Where level(ndk) is the level of ndk in the structure of the document. With this formula, we assign greater importance to terms appearing in the head nodes than those contained in paragraphs.

In both the two cases, we compute the frequency of ti in Dj as follows:

i

jijiij dtOccdtOccFreq ),(/),( (3)

3.1.2. The structural contextual relations The structure of a document constitutes important contextual information. We assume that the title of a composed node defines a structural context for its sub-nodes. Terms which occur in the title of a node are related to terms of its children as follows:

This formula considers a couple of nodes (ndi, ndj) which belong to a document d (ndi d, ndj d). The node ndi should be one of the parents of ndj in the structure of the document. This means that a path ex-ists between ndi and ndj (path(ndi, ndj)) and that ndi is in a higher position compared to ndj (level(ndi)> level(ndj)). In this case, we link any two different terms ti and tj (ti tj), which correspond respectively to the nodes ndi and ndj (ti ndi and tj ndj). The edge is labeled “Sup” which stands for “Superior.” This means that the term ti is the superior of tj or in other words, the sense of ti generalizes the sense of tj.

The relation has a weight equal to the frequency of the term tj in the child node Freq(tj,ndj), divided by the difference of level between the two nodes. This means that terms which belong to the direct children of a node will have a greater weight than terms that occur in their descendants. If we take randomly two terms, they may appear in many relative positions with dif-ferent paths. In this case, we compute an average value of the “Sup” relations of all these occurrences. This kind of relation will be useful to compute the term-hood of terms (cf. section 3.3.1). Indeed, we will choose the morphosyntactic solutions which are more closely correlated with their superiors. 3.1.3. The syntactic context Given a MWT, we assume that each of its components constitutes a context for the other. Terms are linked based on the structure of MWTs. We distinguish two families of syntactic relations. On the one hand, con-junctive NPs and some NPs containing composite syntactic relations link entities in a symmetric manner. In this case, the MWT (T) is composed of two terms (t1 and t2) linked by a symmetric relations (sy). We compute contextual relations as follows:

))()(/(),(]),[,(,

)()(),,(,,,

jijjij

jijjii

jijiji

ndlevelndlevelndtFreqtSuptR

ttndtndt

ndlevelndlevelndndpathdnddnd

(4)

Page 20: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

480

This formula defines a contextual relation (R) which links the first term (t1) to a context composed of the symmetric relation (sy) and the second term t2 ([sy, t2]). In the same manner, we link t2 to [sy, t1]. The weight of the two relations is equal to the frequency of the MWT in the corpus (Freq (T)).

On the other hand, non symmetric NPs are com-posed of a syntactic relation (ns), a head (h) and an expansion (e).

)(]),_[,(]),_[,(),,,(

TFreqhheadnseR

eexpansionnshRsheT

(6)

In this case, we consider that the expansion (e) ap-pears in a context composed of a non-symmetric rela-tion in head (ns_head) and the head (h). In the same manner, the head (h) appears in a context composed of a non symmetric relation in expansion (ns_ expan-sion) and the expansion (e). The two relations have a weight equal to the frequency of the MWT (Freq(T)).

These types of relations are useful for syntactic disambiguation as we explain in section 3.3.2. Indeed, a composite NP is chosen if each of its components is correlated with the other based on frequencies we are defining in these formulae (5 and 6). 3.2. Knowledge learning Initially, contextual relations are computed from the non ambiguous elements of all sentences in the cor-pus. Also, titles and subtitles of the documents are manually disambiguated. In fact, their terms repre-sent a small percentage in terms of quantity compared to the size of the corpus, but they are the most im-portant entities which reflect the sense of documents.

Each contextual relation is composed of a term (ti) and a context (cj). The latter is constituted by a relation (which can be of the form sy, ns_head, ns_expansion or Sup). The contextual relations are seen as a possibilistic network which links terms to their contexts. The graph structure encodes dependence relation sets just like Bayesian nets (Benferhat et al. 2002).

Let us take the example of the document entitled -and already disambigu (AlzwAj: marriage) "الزواج"ated. (cf. figure 1). Let us also consider that the whole document contains 100 terms. The node N1 entitled " contains (lbAs AlErs: clothes of wedding) "لباس العرس20 terms. The term "لباس" (lbAs: clothes) occurs twice

in N1, while the terms "الرجل" (Alrjl: the men) and appear (lbAs Alrjl: clothes of the men) "لباس الرجل"only one time in the document.

Figure 1. Example of disambiguated Arabic document and

its translation. We compute the frequencies of terms within the quantitative and qualitative approaches as in Table 1. Frequency Quantitative

approach Qualitative approach

Freq ("لباس", N1) (1+1)/20 = 0.1 (2*1+1)/20 =

0.15 Freq ("عرس", N1)

(1+1)/20 = 0.1 (2*1+1)/20 = 0.15

Freq ("لباس العرس ", N1) 1/20 = 0.05 (1*2)/20 = 0.1

Freq ("رجل", N1) 1/20 = 0.05 1/20 = 0.05 Freq ("1 ," لباس الرجل) 0.15 = 1/20 0.05 = 1/20 Freq ("لباس", D) (1+1)/100 =

0.02 (2*1+1)/100 =

0.03 Freq ("عرس", D)

1/100 = 0.01 (1*2)/100 = 0.02

Freq ("لباس العرس ", D) 1/100 = 0.01 (1*2)/100 =

0.02 Freq ("رجل", D) 1/100 = 0.01 1/100 = 0.01 Freq ("لباس الرجل ", D) 1/100 = 0.01 1/100 = 0.01

Table 1. Frequencies of terms for the document of figure 1. We remark that the superiority relation (“Sup”) be-tween "زواج" (zwAj: marriage) and "لباس" (lbAs: clothes) occurred twice. That’s why we computed the average between the weights of the two occurrences. In the last four lines of the table, “NC” stands for “nominal construct.” The initial contextual relations and possibility distributions are used to treat the re-maining sentences of the corpus. They are updated in-crementally as far as these sentences are disambiguated.

Figures 2 and 3 represent the quantitative and qualitative networks learned from this document.

The graph represents contextual knowledge by means of weighted edges. Indeed, for each edge, the source represents a context for the destination. In these figures, the dashed lines correspond to superi-ority relations. The edges of this type may be seen as a tree where the most generic term is in the root (in this case it is "زواج" (zwAj: marriage)). The con-

)(]),[,(]),[,(),,,(

12

2121

TFreqtsytR

tsytRsyttT

(5)

Page 21: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

481

tinuous and dotted lines represent “NC_head” and “NC_expansion.” The weights of the edges represent possibility distributions which are equal to the fre-quencies computed in table 1. For example, we note in figure 2 that π([Sup, "عرس"] |"لباس") = 0.1 which means that the term "لباس" (lbAs: clothes) appears in a context composed of the “Sup” relation and the term .with a weight equal to 0.1 (Eurs: wedding) "عرس"

3.3 Knowledge inference The contextual knowledge encoded in possibilistic networks is exploited to disambiguate the remaining nominal phrases and to evaluate their termhood and unithood in order to compute the domain relevance. Before we present our formulae illustrated with exam-ples, we recall the matching possibilistic model used to compute the relevance of morphosyntactic solutions.

Figure 2. The qualitative network of contextual relations ex-tracted from the document of figure 1.

Figure 3. The qualitative network of contextual relations ex-tracted from the document of figure 1.

Page 22: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

482

This model was initially proposed in the field of in-formation retrieval. We suppose that there is a query Q composed of a set of items which represent con-straints. We will take the general case where these items are weighted. We have: Q = [(t1, w1) (t2, w2) … (tn, wm)] Where wi is the weight of the term ti. The degree of possibilistic relevance (DPR) of a document (Dj) given the query (Q) is com-puted by the two measures: possibility () and necessity (N). DPR(Dj) = (Dj|Q) + N(Dj|Q) (7) According to Elayeb et al. (2009), (Dj|Q) is pro-portional to: ‘(Dj|Q) = Freq1j *w1*…* Freqmj*wm (8) The necessity of Dj for the query Q, denoted N(Dj|Q), is computed as follows: N(Dj|Q) = 1- [ (1- 1j/w1)*…* (1- mj/ wm)] (9) Where: ij = Log10(|D|/nDi)*(Freqij) (10) In this formula, |D| is the number of documents. nDi is the number of documents containing the term ti

(i.e., Freqij >0). In our case, each term of the query is a contextual

constraint represented by a relation and a term (e.g., [Sup, "عرس"]). The documents are the morphosyntac-tic solutions to be weighted (e.g., "لباس" (lbAs: clothes)). The frequencies are the weights of edges linking terms in the possibilistic network. 3.3.1 Termhood evaluation This measure weighs a candidate term according to the structural context. Given a lemma of a simple noun or a composite NP which appears in a given node (n), a query (Q) is composed of all the terms which appear in the path linking n to the root. These terms of the query are weighed according to the dif-ference of level between the corresponding nodes (cf. section 3.3.4 for an example of query). The termhood of a term T is given by: Termhood (“T”) = DPR(T | Q) (11)

3.3.2 Unithood evaluation This measure is used to evaluate NPs by computing the degree of dependency between their constituents. Given a candidate NP (T) composed of two terms (t1 and t2) and a syntactic relation (s), we compute its unithood as follows:

This measure considers that the two constituents are linked if each of them is relevant for the other. That’s why we compute the product of the two relative DPRs. 3.3.3 The possibilistic domain relevance The possibilistic domain relevance (PDR) of a simple noun is equal to its possibilistic termhood.

PDR(t) = termhood(t) The PDR of a composite NP is equal to the product of the two dimensions: PDR(t) = termhood(t) * unithood(t) Terms which have a non null DPR are consid-ered as DRTs.

3.3.4 Example of disambiguation Let us consider the example of the document in figure 4. It is the document in figure 1 to which we added the word "المزخرف" (Almzxrf). We consider that the expression "لباس الرجل المزخرف" (lbAs Alrjl Almzxrf) is ambiguous. To simplify the calculus, we assume that this word has only one possible lemma (i.e., "مزخرف" (muzaxraf: decorated)). In this case, we do not know if this adjective is linked to the word "الرجل" (Alrjl) or the expression "لباس الرجل" (lbAs Alrjl).

Figure 4. Example of ambiguous document and its transla-

tion. Morphological disambiguation: we disambiguate the word "رجل" ,.which has two possible lemmas (e.g ,(Alrjl) "الرجل"(rajul: men) and "رجل" (rijl: foot)). We use the structural in-formation through the following query:

Unithood (T) = { DPR (t1 | [s, t2]) * DPR (t2 | [s, t1]) if s is symmetric

DPR(t1|[s_expansion, t2])*DPR (t2|[s_head, t1]) otherwise (12)

Page 23: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

483

Q = ([Sup,"لباس "], 1) ([Sup,"عرس"], 1) ([Sup,"لباس العرس "], 1) ([Sup,"زواج"], 0.5)

(13)

The weight of the term "زواج" (zwAj: marriage) in this query is 0.5 because the difference of level between the two nodes is 2. We compute the DPR of each so-lution employing the weights of the edges of the pos-sibilistic network. By applying formula 8, we obtain: ("رجل"|Q) = π([Sup,"لباس"]| "رجل")* 1 * π([Sup, * 1 *("رجل" |["لباس العرس" ,Sup])π * 1 *("رجل" |["عرس"π([Sup,"زواج"]| "رجل")* 1*0.05* 1*0.05 * 1*0.05 = 0.5 *0.05*0.5 = 0,175 According to (9), we have: N("رجل"|Q) = 1- [ (1- 1j/1)* (1- 2j/1)* (1- 3j/1)* (1- 4j/0.5)] = 1- [ (1- 0.015/1)* (1- 0.015/1)* (1- 0.015/1)* (1- 0.015/0.5)] = 0.073 According to (11), we obtain: Termhood ("رجل") = DPR ("رجل") = 0.073+0.175 = 0.248 In the same manner, we have: (" رجل "|Q) = 0 N(" رجل "|Q) = 0 Termhood ("رجل ") = DPR ("رجل ") = 0 In this case, the possibilistic calculus allowed us to se-lect the correct lemma for the word "الرجل".

Syntactic disambiguation: for the expression " لباس we have to decide ,(lbAs Alrjl Almzxrf) "الرجل المزخرفwhether we should link the word "الرجل" (Alrajul: the men) to the word "لباس" (lbAs: clothes) (i.e., we ob-tain a nominal construct) or to the word "المزخرف" (Almuzaxraf: decorated) (i.e., we obtain an adjectival phrase). These two relations are non-symmetric.

As far as termhood, we obtain the same results as in morphologic disambiguation. That is: Termhood ("لباس الرجل") = 0.248 Termhood ("الرجل المزخرف") = 0 According to (12), we have: Unithood ("لباس الرجل") = DPR("رجل" |[NC_head, (["رجل" ,NC_expansion]|لباس) DPR*([لباس DPR("رجل" |[NC_head, لباس]) = ("رجل" |[NC_head, 0.01 = 0+0.01 = ([لباس ,NC_head]| "رجل")N+([لباس

DPR(لباس|[NC_expansion, "رجل"]) = (لباس | [NC_ expansion, "رجل"])+N(لباس|[NC_expansion, "رجل"]) = 0.01+0 = 0.01 Unithood("لباس الرجل ") = 0.0001 In the same manner, we have Unithood ("الرجل 0 = ("المزخرف Finally, we have: PDR(" ٠.١٩٠١") = لباس الرجل and PDR(" ٠") = الرجل المزخرف . As a result, we select the correct solution. 4.0 Experimental results The general context of our work is a project which aims to organize documents of Arabic stories as so-cio-semantic maps. In this work, we are interested in the semantic axis. Our experiments in this paper con-stitute the first step toward the semantic representa-tion of Arabic stories. Section 4.1 gives further in-formation about this corpus. In section 4.2, we present our methodology of evaluation which con-sists of two methods of validation. We apply these methods to our corpora in section 4.3 and 4.4, respec-tively. 4.1. The corpus The corpora used in the experiments are constituted from six encyclopedic books of Arabic stories group- ed by theme. Story collectors grouped stories which correspond to the same domain of interest in the same chapter to facilitate their study and interpretation. Be-cause of this structure, these books have been the sub-ject of many works in computer and information sci-ences. They were studied in terms of reliability (Ghazizadeh et al. 2008; Bounhas et al. 2010). Being organized by theme, they constitute a good corpus for testing classification and clustering approaches (e.g., Al-Kabi and Al-sinjilawi 2007). They were also ex-ploited as a corpus for testing IR systems (e.g., Harrag et al. 2009).

We can classify the knowledge organization man-ner in books of Arabic stories as “rationalist” since the collectors were based on a logical thematic divi-sion (Mai 2008). However, there are some differences among the classifications of the different books. Even so, we can distinguish a set of bounded domains of interest. We compile a consensual classification from the titles of chapters of the different books which constitute cover terms. Nevertheless, we preserve the internal classification of chapters of different books.

Page 24: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

484

Consequently, the stories belonging to the same do-main of interest may be classified into sub-domains according to many points of view corresponding to the different collectors.

The whole corpus contains more than 2.5 million words and more than 95,000 fragments (titles and paragraphs). We started by analyzing the structure of these books to extract the different themes and sub-themes by using our document analyzer (Bounhas and Slimani, 2009a). This paper presents experiments on three corpora corresponding to the domains of inter-est “marriage” ("الزواج": AlzawAj), “drinks” ("الأشربة": Alachriba), and “purification” ("الطھارة": AlTahAra). Table 2 presents statistics about each domain.

The size of our corpus is comparable to other re-search works in the field. For example, MADA was tested on a corpus composed of approximately 51 K-words. Diab et al. (2004) tested their POS tagger with 400 sentences. Manual evaluation of the output of a morphologic analyzer or a POS tagger is hard and time-consuming. Approaches which do not per-form full parsing may be evaluated in larger corpora. For example, Boulaknadel et al. (2008) evaluated their MWT extractor on a corpus containing 475,148 words. Unfortunately, there are no tokenized specialized corpora for the Arabic language. Consequently, we were obliged to build our own corpus. 4.2. The methodology of evaluation The evaluation of knowledge extraction and IR sys-tems is based on performance metrics. Precision, re-call, and F-measure are commonly used to evaluate system performance (Rosemblat and Graham 2006). Evaluation assumes that there exists an ideal set the system is supposed to retrieve. The three metrics are defined as follows. The precision is the percentage of elements retrieved by the system, which are also in

the ideal set. The recall is the percentage of elements in the ideal set that were retrieved by the system. The F-measure is given by:

precisionrecallprecision*recall*2measureF

(14)

Because it is hard to define the ideal set, the evaluation issue is still challenging, thus limiting the development of KOS. The evaluation of these environments is nec-essary to validate the theoretical assumptions and the so built resources. Unfortunately, no gold standards have been developed to assess and compare different approaches in the field. Such standards may be pro-vided directly or through validation only by a human expert (Pazienza et al. 2005). In some cases, one can find domain knowledge organized as reference lists which may be used to evaluate system performance automatically (Martínez-Santiago et al. 2002). A refer-ence list may also be built by a human expert who ex-amines the corpus and extracts valid elements. When reference lists are unavailable, one can opt for the vali-dation method where an expert validates element by element the extracted ontologies (e.g., Missikoff et al. 2003; Al-Qabbany et al. 2009). This approach is time-consuming. Also, human intervention is influenced by subjectivity and personal interpretation of terms. Fi-nally, a terminological resource may be evaluated in the context of IR tasks. In this case, the goal is to check whether the resource will improve the perform-ance of IR systems in terms of document retrieval.

To our knowledge, no gold standards have been developed to validate Arabic terminologies in the three considered domains. That’s why we were obliged to build reference lists manually. An expert analyzes the corpora starting by titles of level 1 and 2. Because many steps in this process are manual, the quality of evaluation is influenced by subjectivity.

Drinks Marriage Purification Total

Number of titles of level 1 1 1 10 12

Number of titles of level 2 200 444 745 1389

Number of paragraphs 1897 3038 6130 11065

Number of words in level 1 1 (0.003%) 1 (0.002%) 131 (0.122%) 133 (0.069%)

Number of words in level 2 1165 (3.605%) 2669 (4.965%) 3618 (3.379%) 7452 (3.859%)

Number of words in paragraphs 31154 (96.392%) 51082 (95.033%) 103309 (96.498%) 185545 (96.073%)

Total number of words 32320 53752 107058 193130

Table 2. Statistics about fragments and terms in the three corpora.

Page 25: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

485

Nevertheless, we argue that the extracted lists may be used as reference models for comparing different ap-proaches of term extraction. Even so, we do not con-sider these lists as an optimal means to assess our sys-tem. To avoid this impasse and improve our assessment, we evaluate the extracted terminologies in an information retrieval system. In this step, the domain terminology is considered as a query which is supposed to retrieve the domain relevant documents. The terminologies are assessed iteratively. In each it-eration, the N top DRTs are used to query the whole corpus. We evaluate the results in terms of precision, recall, and F-measure. Both methods of evaluation are employed to compare three approaches. In the first one, we adopt the morphologic solution chosen by MADA. Then we use TF-IDF to evaluate term-

hood. Finally, we employ LLR to choose the syntactic solutions and evaluate unithood. This score reached the better results in other studies (Bounhas and Sli-mani 2009b). The second and the third approaches use, respectively, the quantitative and qualitative pos-sibilistic settings for morphosyntactic disambigua-tion, termhood, and unithood evaluation. In the fol-lowing sections, we present results of evaluation within the two methods designated, respectively, “ex-pert validation” and “system validation.” 4.3. Expert validation In this method of evaluation, we compare the list of terms returned by our system to the reference list proposed by the expert. Figures 5, 6, and 7 present

Figure 5. The curves of precision vs. recall for the

domain of drinks.

Figure 6. The curves of precision vs. recall for the

domain of marriage.

Figure 7. The curves of precision vs. recall for the

domain of purification.

Page 26: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

486

curves of precision versus recall for the three do-mains, respectively. In the three domains, the possi-bilistic approach improved term extraction compared to the probabilistic one (MADA + TF-IDF + LLR). This implies that domain relevance is related not only to the distribution of terms in corpora, but also to complex contextual relationships linking terms. What’s more, the qualitative approach reached better results than the quantitative one. This means that terms are ranked better when their frequencies are computed according to their positions in the struc-ture of the document.

We can study more precisely the impact of the structure by analyzing the distribution of domain relevant terms within the different levels of hierarchy. Table 3 presents the percentages of relevant terms which exist only in headings, only in paragraphs and in both for the three domains.

Domain Only in headings

Only in paragraphs In both

Drinks 19.83% 54.51% 25.65%

Marriage 16.13% 57.45% 26.42%

Purification 12.73% 52.08% 35.19%

Table 3. Distribution of relevant terms in the three domains.

These statistics show the importance of headings in representing the meaning of documents. Indeed, they represent only 3.927% from the number of words. However 15.52% of the relevant terms (to the three domains) exist only in these fragments. This explains the improvement realized within the qualitative ap-proach.

We also remark that our model for organizing con-textual knowledge extracts better MWTs. Indeed, structural knowledge constitutes semantic features which help in morphosyntactic disambiguation and interpretation of terms. In order to study more pre-cisely this fact, we assessed the accuracy MWT ex-traction in the three domains. Our results show that using the possibilistic approach instead of MADA + TF-IDF + LLR, improves the F-measure of MWT extraction with 26.67% in average for the three do-mains. It reached an average value equal to 63.10%. 4.4. System validation This method is applied twice for each domain. On the first hand, we employ all the types of terms in the queries. On the second hand, we use only MWTs. Figures 8 and 9 represent curves of F-measure versus the number of terms in the query (N) for the domain

Figure 8. The curves of F-measure for the domain

of purification (All terms)

Figure 9. The curves of F-measure for the domain

of purification (MWTs).

Page 27: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

487

of purification within these experiments. We obtained similar curves for the two other domains. The curves show the improvement we gain by adopting the pos-sibilistic approach compared to the one based on TF-IDF and LLR. We also see the contribution of the qualitative approach compared to the quantitative one. We compute the average of improvement of F-measure in the three domains as follows. By moving from “MADA + TF-IDF + LLR” to the quantitative possibilistic approach, we reach 8.98% and 6.87% of improvement when using all terms and MWTs, re-spectively. The qualitative approach performs 7.26% and 4.62% as improvement for all terms and only MWTs experimentations compared to the quantita-tive one. This amelioration shows, on the one hand, that our approaches extract better MWTs. On the other hand, we confirm results obtained for other languages which prove that MWTs are important en-tities that may be used to index and query documents (Martínez-Santiago et al. 2002).

As mean of comparison, this method of evaluation was used by Larkey et al. (2002) to assess different stemming approaches on the TREC-2001 Arabic corpus. The maximum value of F-measure of the best stemmer (light8) is about 0.43. Harrag et al. (2009), who applied their IR system in the same corpus (i.e., Arabic stories), reached an average value of F-measure equal to 0.47. In our case, F-measure reached respectively 0.88, 0.83, and 0.73 for the three do-mains. It is hard to compare these works because they have different goals and use different corpora and/or queries. Besides, they treated documents as a unique textual corpus while we decomposed our corpus in many specific, domain semi-structured corpora. The great improvement of the value of F-measure shown by our system is thus explained by the fact that terms which are used in the queries are already attested (ac-cording to a given measure) as DRTs. 5.0 Conclusion and future work The experimental results show the contribution of our approaches based on complex contextual relationships compared to the state-of-the-art measures like TF-IDF and LLR used by Boulaknadel et al. (2008). This result demonstrates empirically that our model of or-ganizing contextual knowledge based on the structure of documents has a great impact on the terminology extraction process. Consequently, the accuracy of our approach is related to the quality of the corpus. In-deed, the actual Web contains more and more semi-structured documents, while existing systems mainly

focus on text collections. To generalize our results, we should apply our approach in the general context of the Web. This will allow for a better understanding of the relation between the structure and the accuracy of terminology extraction, but also to test our hypothe-sis in larger corpora. We should also recognize that the structure of Web documents is not necessarily hierar-chical. One possible solution to be investigated is to consider types of relations other than superiority. This means that we would give a more detailed description of the structure. Weighting special parts of texts (like titles) more than other parts of text was a first ap-proach to give them different importance. Automatic annotating techniques are useful to give more detailed structure to semi-structured documents and may be used as much by the writer or designer of a document as the reader of that document. More generally, the structure tends to highlight parts of a document. Ad-joining a structure analyzer (such as the “micro-logical” analyzer developed by Bounhas and Slimani (2009a)) to our system should allow the recognition of the importance of particular parts of a document thanks to the interpretation of rhetoric markers as well as of spatial organizations, sizes, or styles applied on chunks of text.

Beside focusing on organizing and exploiting con-textual knowledge, we were obliged to consider NLP-related tasks. The importance of NLP tools in knowl-edge organization tools was studied in many research works in the field (e.g., Ibekwe-Sanjuan and Sanjuan 2002; Jiang and Tan, 2010). Consequently, we investi-gated problems specific to the Arabic language with a view to ontology construction. It is an attempt to in-troduce this language into ontology engineering envi-ronments.

Finally, our tools allow us to reorganize domain knowledge in an empiricist approach (Mai 2008). The generated network encodes dependency relations be-tween terms which may be exploited to infer semantic relations and thus build a domain ontology. In this step, distributional analysis seems to be a promising solution (Bourigault and Lame 2002; Cohen and Wid-dows 2009). References Al-kabi, Mohammed Naji and Al-sinjilawi, Saja I.

2007. A comparative study of the efficiency of dif-ferent measures to classify Arabic texts. Journal of pure & applied sciences 4n2: 13-26.

Al-Qabbany, Abdulaziz, AbdulMalik, Al-Salman, and Abdulrahman, Almuhareb. 2009. An automatic

Page 28: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

488

construction of Arabic similarity thesaurus. In Karim Bouzoubaa and Abdelfettah Hamdani ed., Proceedings of the 3rd IEEE International Confer-ence on Arabic Language Processing (CITALA2009) 4-5 May 2009 Rabat, Morocco. Morocco: IEEE Morocco Section, pp. 31-36.

Attia, Mohammed. 2008. Handling Arabic morphologi-cal and syntactic ambiguity within the LFG frame-work with a view to machine translation. Ph.D. the-sis, University of Manchester, Faculty of Huma- nities, UK.

Attia, Mohamed, Rashwan, Mohsen, Ragheb Ahmed, Al-Badrashiny, Mohamed, Al-Basoumy, Husein, and Abdou, Sherif. 2008. A compact Arabic Lexical semantics language resource based on the theory of semantic fields. In Bengt Nordström, Aarne Ranta ed., Advances in Natural Language Process-ing: Proceedings of the 6th international conference on Advances in Natural Language Processing 25-27 Au-gust 2008, Gothenburg, Sweden. Berlin, Heidelberg: Springer-Verlag, pp. 65-76.

Benferhat, Salem, Dubois, Didier, Garcia, Laurent, and Prade, Henri. 2002. On the transformation be-tween possibilistic logic bases and possibilistic causal networks, International journal of approxi-mate reasoning 29: 135-73.

Boulaknadel, Siham. 2006. Utilisation des syntagmes nominaux dans un système de recherche d’information en langue arabe. In Proceedings of Conférence Francophone en Recherche d’Information et Applications (CORIA) 15-17 Mars 2006 Lyon, France, pp. 341-46. Available http://bach2.imag.fr/ ARIA/publisparconf.php#3

Boulaknadel, Siham, Daille, Beatrice, and Abouta-jdine, Driss. 2008. A multi-word term extraction program for Arabic language. In Nicoletta Cal-zolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis and Daniel Tapias ed.,. Proceedings of the 6th international Con-ference on Language Resources and Evaluation (LREC) 28-30 May 2008 Marrakech, Morocco. Paris: ELRA, pp. 1485-88.

Bounhas, Ibrahim and Slimani, Yahya. 2009a. A social approach for semi-structured document modeling and analysis. In Kecheng, Liu. ed., Proceedings of the International Conference on Knowledge Man-agement and Information Sharing (KMIS) 6-8 Oc-tober 2009 Funchal, Madeira, Portugal. INSTICC Press, pp. 95-102.

Bounhas, Ibrahim and Slimani, Yahya. 2009b. A hy-brid approach for Arabic multi-word term extrac-tion. In Proceedings of the IEEE International Con-

ference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE) 24-27 September 2009 Dalian, China. Piscataway, N.J.: IEEE Computer Society, pp. 429-436. DOI: 10.1109/NLPKE.2009.5313852. Available http://iee explore.ieee.org/xpl/mostRecentIssue.jsp? punumber=5306518

Bounhas, Ibrahim, Elayeb, Bilel, Evrard, Fabrice, and Slimani, Yahya. 2010. Toward a computer study of the reliability of Arabic stories. Journal of the American Society for Information Science and Tech-nology 61: 1686-1705.

Bourigault, Didier and Lame, Guiraude. 2002. Analy-se distributionnelle et structuration de terminolo-gie, Application à la construction d’une ontologie documentaire du Droit. Traitement automatiques des langues 43: 129-50.

Broughton,Vanda, Hansson, Joacim, Hjørland, Birger, and Lopez-Huertas, Maria J. 2005. Knowledge or-ganization. Chapter 7 in Leif Kajberg and Leif Lor-ring ed., European curriculum reflections on library and information science education. Copenhagen: Royal School of Library and Information Science, pp. 133-148.

Cohen, Trevor and Widdows, Dominic. 2009. Empiri-cal distributional semantics: Methods and bio-medical applications Review Article. Journal of bi-omedical informatics 42: 390-405.

Diab, Mona, Kadri, Hacioglu, and Jurafsky, Daniel. 2004. Automatic tagging of Arabic text: From raw text to base phrase chunks. In Julia Hirschberg ed.,. Proceedings of The 5th Meeting of the North American Chapter of the Association for Computa-tional Linguistics/Human Language Technologies Conference (HLT-NAACL04) 2-7 May 2004 Bos-ton, Massachusetts, USA. East Stroudsburg, PA: Assoc. for Computational Linguistics, pp. 149-52.

Dunning, Ted. 1994. Accurate methods for the statis-tics of surprise and coincidence. Computational linguistics 19: 61-74.

Elayeb, Bilel, Evrard, Fabrice, Zaghdoud, Montaceur, and Ben Ahmed, Mohamed. 2009. Towards an in-telligent possibilistic web information retrieval us-ing multiagent system. The international journal of interactive technology and smart education (ITSE), Special issue: New learning support systems 6: 40-59.

Elkateb, Sabri, Black, William J., Vossen, Piek, Rodríguez, Horacio, Pease, Adam, Alkhalifa, Musa, and Christiane, Fellbaum. 2006. Building a Word-Net for Arabic. In Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk and Daniel Tapias, ed., Proceed-

Page 29: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

489

ings of the 5th Conference on Language Resources and Evaluation (LREC2006) 24-26 May 2006 Genoa Italy. Paris: ELRA, pp. 29-34.

Ferret, Olivier, Grau, Brigitte, Hurault-Plantet, Mar-tine, Illouz, Gabriel, Jacquemin, Christian, Mon-ceaux, Laura, Robba, Isabelle, and Vilnat, Anne. 2002. How NLP can improve question answering. Knowledge organization 29: 135-55.

Ghazizadeh, Mehdi, Zahedi, M. Hadi, Kahani, Moh-sen, and Bidgoli Minaei B. 2008. Fuzzy expert sys-tem in determining Hadith validity. Advances in computer and information sciences and engineering 354-59.

Habash, Nizar, Rambow, Owen, and Roth, Ryan. 2009. MADA + TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disam-biguation, POS tagging, stemming and lemmatiza-tion. In Khalid Choukri and Bente Maegaard ed.,. Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR) 22-23 April 2009 Cairo, Egypt. Cairo: MEDAR Consortium, pp. 102-9.

Hannan , Michael T., Pólos, László, and Carroll, Glenn R. 2007. Logics of organization theory: audi-ences, codes, and ecologies. Princeton: Princeton University Press.

Harrag, Fouzi, Hamdi-Cherif, Aboubekeu, Al-Salman, Abdul Malik S., and El-Qawasmeh, Eyas. 2009. Experiments in improvement of Arabic in-formation retrieval. In Karim Bouzoubaa and Ab-delfettah Hamdani ed., Proceedings of the 3rd IEEE International Conference on Arabic Language Proc-essing (CITALA2009) 4-5 May 2009 Rabat, Mo-rocc. Rabat: Mohammadia School of Engineers, pp. 71-81.

Ibekwe-Sanjuan, Fidelia, Sanjuan, Eric. 2002. From term variants to research topics, Knowledge organi-zation 29: 181-97.

Jacquemin, Christian. 1997. Variation terminologique: Reconnaissance et acquisition automatiques de termes et de leurs variantes en corpus. H.Dr. Thesis in fundamental computer science. University of Nantes, France.

Jiang, Xing and Tan, Ah-Hwee. 2010. CRCTOL: A Semantic-based domain ontology learning system. Journal of the American Society for Information Sci-ence and Technology 61: 150-68.

Larkey, Leah S., Ballesteros, Lisa, and Connell, Mar-garet E. 2002. Improving stemming for Arabic in-formation retrieval: Light stemming and cooccur-rence analysis. Proceedings of the 25th annual international ACM SIGIR conference on Research

and development in information retrieval 11-15 Au-gust 2002, Tampere, Finlande, New York, NY, USA: ACM, pp. 275-82.

Mai, Jens-Erik. 2008. Design and construction of controlled vocabularies: Analysis of actors, do-main, and constraints. Knowledge organization 35: 16-29.

Malaisé, Véronique, Zweigenbaum, Pierre, and Ba-chimont, Bruno. 2003. Vers une combinaison de méthodologies pour la structuration de termes en corpus : Premier pas vers des ontologies dédiées à l’indexation de documents audiovisuels. In Widad Mustafa El Hadi ed.,. Actes du 4e Congrès ISKO France 3-4 july 2003 Grenoble France. Paris: L’Harmattan, pp. 179-89.

Martínez-Santiago, Fernando., Díaz-Galiano, Manuel Carlos, Martín-Valdivia, Maite Teresa, Rivas-Santos, Víctor Manuel, and Ureña-López, Luis Al-fonso. 2002. Using neural networks for multiword recognition in IR. In López-Huertas, M. J. ed.,: Challenges in knowledge representation and organi-zation for the 21st century: Integration of knowledge across boundaries: Proceedings of the Seventh Inter-national ISKO Conference 10-13 July 2002 Gra-nada, España. Advances in knowledge organization 8. Würzburg: Ergon, pp. 559-64.

Missikoff, Michele, Velardi, Paolo, and Fabriani, Pao-lo. 2003. Text mining techniques to automatically enrich a domain ontology. Applied intelligence 18: 323–40.

Pazienza, Maria Teresa, Pennacchiotti, Marco, and Zanzotto, Fabio Massimo. 2005. Terminology ex-traction: An analysis of linguistic and statistical approaches. In Spiros Sirmakessis, ed., Knowledge mining series: Studies in fuzziness and soft comput-ing. Berlin, Heidelberg: Springer, pp. 255–79.

Pinto, David, Rosso, Paolo, Benajiba, Yassine, Aha-chad, Anas, and Jiménez-salazar, Héctor. 2007. Word sense induction in the Arabic Language: A self-term expansion based approach. In Adeeb Riad Ghonaimy, ed., Proceedings of the 7th Confer-ence on Language Engineering, The Egyptian Society of Language Engineering 5-6 December 2007 Cairo, Egypt. Cairo, Egyptian Society of Language Engi-neering, pp. 235-45.

Rodríguez, Horacio, Farwell, David, Farreres, Javi, Bertran, Manuel, Alkhalifa, Musa, and Martí, M. Antonia. 2008. Arabic WordNet: Semi-automatic extensions using Bayesian Inference. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Jo-seph Mariani, Jan Odjik, Stelios Piperidis and Daniel Tapias ed., Proceedings of the 6th interna-

Page 30: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 I. Bounhas, B. Elayeb, F. Evrard, Y. Slimani. Organizing Contextual Knowledge for Arabic Text Disambiguation...

490

tional Conference on Language Resources and Evaluation (LREC) 28-30 May 2008 Marrakech, Morocco. Paris: ELRA, pp. 1702-06.

Rosemblat, Graciela and Graham, Laurel. 2006. Cross-Language search in a monolingual health informa-tion system: Flexible designs and lexical processes. In Budin, Gerhard, Swertz, Christian, and Mit-gutsch, Konstantin, ed., Knowledge organization for a global learning society: Proceedings of the Ninth In-ternational ISKO Conference 4-7 July 2006 Vienna, Austria. Advances in knowledge organization 10. Würzburg: Ergon-Verlag, pp. 173-82.

Salton, Gerard and McGill, Michael J. 1986. Introduc-tion to modern information retrieval. New York, NY, USA.: McGraw-Hill, Inc.

Souza, Renato Rocha and Raghavan, K.S. 2006. A methodology for noun phrase-based automatic in-dexing. Knowledge organization 33: 45-56.

Spradley, James P. 1979. The ethnographic interview, New York: Holt, Rinehart and Winston.

Zaidi, Soraya and Laskri, Mohamed Tayeb. 2005. A cross-language information retrieval based on an Arabic ontology in the legal domain. In Richard Chbeir, Albert Dipanda and Kokou Yétongnon eds., Proceedings of the 1st International Conference on Signal-Image Technology and Internet-Based Sys-tem (SITIS) November 27 - December 1 2005 Yaounde, Cameroon. Dicolor Press, pp. 86-91.

Page 31: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

491

Faceted Classification: Orthogonal Facets and Graphs of Foci?

Martin Frické

SIRLS, The University of Arizona, 1515 E First St., Tucson, USA, AZ85719 <[email protected]>

Martin Frické is an Associate Professor in SIRLS, the School of Information Resources and Library Science, at the University of Arizona, United States. His current research interest involves the inter-section between the organization of knowledge, logic, and processing by computer.

Frické, Martin. Faceted Classification: Orthogonal Facets and Graphs of Foci? Knowledge Organiza-tion, 38(6), 491-502. 50 references. ABSTRACT: Faceted classification is based on the core ideas that there are kinds or categories of con-cepts, and that compound, or non-elemental, concepts, which are ubiquitous in classification and sub-ject annotation, are to be identified as being constructions of concepts of the different kinds. The categories of concepts are facets, and the individual concepts, which are instances of those facets, are foci. Usually, there are constraints on how the foci can be combined into the compound concepts. What is standard is that any combination of foci is permitted from kind-to-kind across facets, but that the foci within a facet are restricted in their use by virtue of being depend-ent on each other, either by being exclusive of each other or by bearing some kind of hierarchical relationship to each other. Thus faceted classification is typically considered to be a synthetic classification consisting of orthogonal facets which them-selves are composed individually either of exclusive foci or of a hierarchy of foci. This paper addresses in particular this second exclusive-or-hierarchical foci condition. It evaluates the arguments for the condition and finds them not conclusive. It suggests that wider synthetic constructions should be allowed on foci within a facet.

Received 10 November 2010; Revised 1 April 2011; Accepted 26 May 2011 1.0 Two preliminary distinctions In the realm of knowledge organization, faceting concerns the construction of compound, or complex, or non-elemental, concepts for the purposes of classi-fication. But concepts can play either of two different roles in the organization of knowledge. A classifica-tion, i.e., a directed graph of concepts, can either be a classification of things, kinds, processes, and the like, or it can be a classification of subjects, a thematic Ba-conian ‘Tree of Knowledge’ (Bacon 1605, 1620). The former is what now would be called an ontology (Smith 2004), and it might be used, for example, for knowledge representation or database design. The lat-ter, in its more general sense, is a directed acyclic graph (DAG) of topics or subjects, and it would be used for annotating or tagging information objects, as, for example, is the practice within librarianship with the Library of Congress Subject Headings

(LCSH) (Broughton 2010b) or the Medical Subject Headings (MeSH) (Lowe 1994, MeSH 2010).

Imagine the owner of an antique shop who also writes books on antiques. The classification scheme that she uses for her antiques might be a synthetic construction from entirely separate individual classi-fication schemes. She may invoke the concept Chair from a classification scheme for Furniture, the period 19th Century from a scheme for Periods, and French from a third scheme for Places, to make the synthe-sized classification concept ‘19th Century French Chair.’ The resulting fleshed out collection of con-cepts may well be adequate to represent the knowl-edge, information, or data about the inventory of her shop. With some simple synthetic classifications, a natural datastructure to represent the classified data is often a table or a relational database. In this case, the different kinds, Furniture, Periods, and Places, ideally would need to be orthogonal or independent (and the

Page 32: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

492

resulting database would be Third Normal Form [Date 1977]). There still will be a need to capture and represent any hierarchies, the semantics, in the syn-thetic classification; for example, the Places kind might have Europe as a Place that encompasses France, Germany, etc.

When the antique dealer writes books on antiques she might use exactly the same collection of classifi-cation concepts for antiques to indicate the topics of the books. So, for example, she might write a book on 19th Century French Chairs. The concept now is being used as an annotation. And annotations like these can be used to organize information objects, to organize knowledge. The annotation 19th Century Chairs identifies a broader or more general topic than 19th Century French Chairs, and thus relations like these can establish a DAG of topics.

So, concepts can play double duty: they can organ-ize things, and they can be the basis for topic annota-tion. One potential point of confusion is that classical librarians often make both uses of concepts, and they often use the latter, topic annotation, to accomplish the former, the organization of things. Brick and mortar libraries certainly have some commonalities with antique shops in as much as there are things, i.e., books, that need to be classified, listed in an inven-tory, and given physical location. And, of course, li-brarians have done this. Most actual existing informa-tion object (IO) classification schemes use what an IO is about, its subject or topic, to classify what the item is. In the Dewey Decimal Classification, for ex-ample, a book on physics is classified differently than a book on chemistry, and the basis of this difference is that the books are about different subjects.

The second preliminary distinction arises from the ‘Triangle of Meaning.’

The 'Triangle of Meaning' is a phrase originating with Ogden and Richards’ (1972) The Meaning of Meaning in the 1920s. Shiyali Ranganathan (1937, 327), the pre-eminent modern theorist of librarianship, also used the distinction in 1937; he called the Symbol Vertex the ‘verbal plane’ and the Concept Vertex the ‘idea

plane.’ In fact, the distinction goes back at least to Ar-istotle (De Interpretatione) and it was really focused on by the early 20th Century German philosopher Gottlob Frege (Tichy 1988; see also Almeida, Souza, and Fonseca 2011; Dahlberg 2009; Fugmann 2004).

The Triangle of Meaning makes distinctions, first between a concept and an expression or symbol or sign that names or identifies the concept, and then be-tween the concept and the things it applies to or refers to. So, there might be the word ‘horse,’ the concept of horse, and those particular delightful creatures which fall under that concept, for example, Secretariat, Sea Biscuit, Little Sorel, Trigger, Silver, Black Beauty, etc. In this context, the word 'concept' gets used in pretty well the same way as in ordinary speech and life, and that amounts roughly to 'general notion' or 'general idea' or even ‘meaning.’ Many describe concepts as be-ing mental or mental constructions; however, it is bet-ter to regard them as abstractions or abstract objects. Among other virtues, the Triangle of Meaning gives a transparent account of synonyms and homographs (these are just many-to-one or one-to-many relations between symbols and concepts).

The Triangle of Meaning has significance here and now because, in the areas to be addressed, there typi-cally is much back and forth between the Symbol and Concept vertices. For example, topic annotation is of-ten discussed in terms of strings; so-called ‘tagging’ is free vocabulary string annotation; LCSH and MeSH use ‘Headings’ which are strings; all of this is to use Symbol Vertex in preference to the Concept Vertex. Another example is thesauri. As Jean Aitchison, Alan Gilchrist, and David Bawden (2000, 1) write:

[a thesaurus is a] vocabulary of a controlled in-dexing language, formally organized so that a priori relationships between concepts are made explicit.

That is to say, a thesaurus gives Symbol Vertex repre-sentation of Concept Vertex DAGs of topics (to-gether with vocabulary control).

In sum, there is the need for a flexible awareness of classification and annotation, and of strings and con-cepts. 2.0 Introduction Faceted classification is typically considered to be a synthetic classification consisting of orthogonal fac-ets which themselves are composed individually ei-ther of exclusive foci or of a hierarchy of foci.

Page 33: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

493

(Broughton 2004, 2006; Buchanan 1979; Gnoli 2008; La Barre 2006, 2010; Ranganathan 1959, 1967; Vickery 1960, 1966, 2008; Wilson 2006). There is a syntax and a semantics to the synthesis. To take a toy example. A syntax might consist of the vocabulary

14th Century [kind=Period], 19th Century [kind=Period], Renaissance [kind=Period], French [kind=Place], German [kind =Place],

and the grammar might deem a well-formed classifica-tion label or term to consist either of any single vo-cabulary item of any kind, or of a single vocabulary item of the kind Period followed by a single vocabulary item of the kind Place. And that would permit the syn-thesis of well-formed labels like ‘German’ or ‘14th Cen-tury French,’ but not labels like ‘French 14th Century’ (which would be ill-formed under this grammar, which requires that the Period comes before the Place). There would be a need also for a semantics: that is, a specifi-cation of how the labeled classes or types relate to each other (as subtypes, supertypes, instances, and the like). So, for example, the 14th Century type could be deemed to be a subtype of the Renaissance type.

The ‘kinds’ here are the facets, so the example has a Period facet and a Place facet. And a ‘focus’ is ‘any subject or name or number for it’ (to use Rangana-than’s [1967, 88] terminology). Then, to move on to the general case, to grammars beyond that of the toy example, the facets are usually required to be or-thogonal or independent. This means that, when con-structing a synthesized value, the choice of a focus from one facet has no repercussions whatsoever for combination with a focus from another facet. So, for example, the choice of 19th Century from the Period Facet neither compels, nor excludes, a particular choice from the Place Facet—it can be combined with either French or German. Within a facet, though, the foci are not typically assumed to be orthogonal or in-dependent. In fact, they are assumed to be dependent. Choice of one focus precludes or affects choice of others. If, for example, French is chosen from the Place Facet, that choice prevents the additional choice of German; French cannot be combined with German. The foci for a facet are often talked of as being an ar-ray, or collection of arrays, of foci, from which one value, or one value from each, needs to be chosen. For example, Vanda Broughton conceives of the foci in a facet as being a collection of separate and individually exclusive arrays, often an enumerated Aristotelian hi-

erarchy, with a choice of no more than one focus from each array (Broughton 2006). And Anthony Foskett and Travis Wilson think something very similar (A. C. Foskett 1996; Wilson 2006).

This paper addresses, in particular, this second ex-clusive-or-hierarchical foci condition. It evaluates the arguments for the condition and finds them not con-clusive. It suggests that wider synthetic constructions should be allowed on foci within a facet.

Faceted classification is widespread nowadays, in the small, so to speak. In the large, there are probably only two examples of traditional classification schemes for Information Objects which are faceted at their core: Ranganathan’s Colon Classification (Ranganathan 1960) and the Bliss Bibliographical Classification of Mills, Broughton, and the Classification Research Group (Mills and Broughton 1977). Both these schemes recognize that there are kinds of concepts. Categorizing concepts is also the approach of many others (Austin 1984; Cheti and Paradisi 2008; A. C. Foskett 1996; Lambe 2007; Morville and Rosenfeld 2006; Slavic 2008; Vickery 1960, 1966; Willetts 1975) 3.0 Some background theory and nomenclature In an Aristotelian-Linnaean hierarchy, say

the items being classified are classified by the ‘leaves’ (Vickery 1975). So everything, every human that is, ends up being female or male (let us not worry about hermaphrodites, etc.). Such leaves have the JEPD property (jointly exclusive pairwise disjoint) i.e., the classification is exhaustive and exclusive. Nothing is classified, or classified directly, by the human node (because it is not a leaf). Certainly, the human node has instances, perhaps Sally, but Sally is classified or ‘cataloged’ by being female and inherits instantiation of being human thanks to the structure of the hierar-chy. The human class and the female class are not ex-clusive because females are humans, and Sally, for one, is in both classes. So, in a hierarchy, the classes as a whole will not usually be mutually exclusive, but, typically, the leaves, which are the classes that do the cataloging work, will, or should be, mutually exclu-sive. Distinct from the relations between the classes

Page 34: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

494

are the names or labels or notation that the classes have, in this case {‘human,’ ‘female,’ ‘male’}. These names are all different one from another, but the fact that the names are different, ‘unique,’ does not mean that the classes that those names signify are exclusive (because, for instance, females are human). In a true Aristotelian-Linnaean hierarchy, the actual classifica-tion is done by the leaves, in this case {female, male}, and likely the names or terms will be different and the classes that those names signify, the leaves, will have the JEPD property and be mutually exclusive,

But at least some librarian classification schemes are different from true Aristotelian-Linnaean classifi-cation hierarchies, principally in that they use some interior nodes for classification (in addition to the leaves). Here is a fragment of the Dewey Decimal Classification (DDC), around 820.

Some books are classified by the leaves (for example, Shakespeare’s Romeo and Juliet is going to be Litera-ture-English-Drama with classmark notation 822 (and it will gain some other decimal digits in a full classification), and others are classified by the internal nodes (for example, John Keats’s The Works of John Keats [complete Poetry and selected Prose] will be Lit-erature-English 820).

That there are works, e.g., Keats, that are instances of internal nodes yet not instances of any of that node’s children means that the sibling children are not exhaustive as to the contents of their parent. In this case, the leaves are not exhaustive. And the leaves together with the interior nodes are not exclusive one from another, because instances of the children are instances of their parents. The leaves of DDC do not have the JEPD property, and neither do the leaves to-gether with the interior nodes. But notice that the names used, or the notation numbers, are different (Literature-English 820, Literature-English-Drama 822). So, talking roughly, the names or terms are ‘ex-clusive,’ but the underlying classes are not.

In the setting of classical librarianship, there is the need to produce a systematic, or linear, order from the hierarchy for shelving, bibliographic lists, and the like. This amounts to converting a (classification) hi-erarchical tree to a list. There are different algo-rithmic tree traversals that can do this, but typically the children of node are considered ordered, and a shelving traversal can be generated by recursively vis-iting each node and its children in turn.

So, for example, part of the Dewey fragment above could be ‘systematized’ to

(omitting American and German literature for simpli-fication and clarity), and here are the corresponding Dewey classification numbers:

There is the important notion of an array. Unfortu-nately there is an ambiguity in its use that we need to be clear over. Ordinarily, in this setting, an array just amounts to an ordered list of the children of a node. So, in the first example, the node ‘human’ is parent to the array {female, male}. In the second example, the node ‘Literature-English’ is parent to the array {Po-etry, Drama, Fiction, Essays}. In this vein, Broughton (2004, 294), for example, offers the definition

array: a group of sub-classes all derived by ap-plying the same principle of division to the con-taining class;

And Vickery (1975, 14) writes

Any one level of subdivision gives rise to a group of terms that constitute an array (for ex-ample, within the class of Metals the array Bis-muth, Lithium, Mercury, Potassium, Sodium).

There are no class hierarchies in arrays like these sim-ply because all the values in the array are siblings. If the first level of subdivision is itself further subdi-

Page 35: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

495

vided using a second principle of division, further ar-rays are generated; in fact, the new ones are what Ranganathan would have called arrays of order 2 (and this process can be continued indefinitely with more principles of division, arrays of order 3, etc.) (Ranganathan 1967). So, for example,

has three arrays {b,c}, {d,e}, and {f,g}. Assuming suit- able principles of subdivision have been used, each of these arrays individually is exclusive as to the foci it contains. The arrays individually each have the JEPD property, and the two arrays (of order 2) together, which form the leaves, also collectively have the JEPD property.

But it is also common enough to include parent nodes and even grandchildren and great grandchil-dren in arrays. For example, Douglas Foskett (2003, 1064), certainly an expert, describes this ‘array’ from the DDC:

370 Education

370.1 Theory

370.7 Study

370.71 Meetings

370.72 Conferences

370.73 Teachers’ Colleges

370.732 Courses and programs

Essentially, this is a slice of a systematization of Dewey, and it includes parents, children, grandchildren, etc.

In sum, classification is typically by leaves only, those leaves typically have the JEPD property, and if those leaves are described as being an ‘array’ or several arrays, those arrays also have the JEPD property so that values of the arrays are exclusive and exhaustive; if, in contrast, classification also includes interior nodes, the classifying nodes do not have the JEPD property, and if those classifying nodes are described as being an ‘array,’ that array does not have the JEPD property, and its values are not exclusive and exhaustive.

4.0 Ersatz faceting and real faceting As mentioned, faceted classification can either be a classification of ‘things,’ an ontology, or a subject classification, a thematic ‘Tree of Knowledge.’

Chapter 21 of Book 8 of Pliny’s Natural History has the title (Pliny, 78):

21. Of Lynxes, Sphinges, Crocutes, Marmosets, of Indian Oxen, of Leucrocutes, of Eale, of the Ethiopian Bulls, of the Mantichora, the Uni-corn, of the Catoblepa, and the Basilisk.

Of these ‘land animals that go on foot,’ some are Sphinges, and some are Crocutes, and no creature is both a Sphinge and a Crocute. That classification is on-tological, it is part of a biological natural kind ontology. However, the subject matter of that chapter is both Sphinges and Crocutes (and some other land animals). Chapter 21 is polytopical. Thematic, topical, or sub-ject, classification accommodates this.

Broughton introduces faceted classification by means of an example involving physical socks (Broughton 2004, 2006). (She does this for pedagogi-cal reasons.) In her example, the socks are items, things, each individually with five different attributes drawn from the ‘facets’ Color, Pattern, Material, Func-tion, and Length; so there are black-striped-wool-work-ankle socks, white-striped-silk-work-knee socks, etc. Each of the individual facets is (or could be) an Aristotelian exclusive and exhaustive hierarchical clas-sification scheme. And the entire scheme synthesizes the five facets. A similar kind of faceting is also often seen where alternative different principles of subdivi-sion are applied separately to the same underlying class. So, for example, people could be divided up By Age, By Gender, By Occupation, By Religion, By Place of Birth, etc.; and any of the facets, say By Occupation, could itself be a hierarchical scheme of foci or values. This kind of arrangement is commonplace on the Web. Such department stores as Amazon, Target, or Walmart often display their wares essentially as tables or grids or database of orthogonal categories and provide navi-gation by means of hierarchical facets.

An example that we are all familiar with is the email client. An emailer might contain perhaps 27 messages displayed in a table, and it usually will have the capabil-ity of sorting the rows, the email names, By Date, By Subject, and By Sender. The dates, subjects, and send-ers are column values. When the emails are presented ordered By Date, there are 27 emails; when they are presented by By Subject, there are still 27 emails. There

Page 36: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

496

are just different principles of subdivision being ap-plied to the same underlying domain. These principles can be applied one after another; for example, the emails could be sorted by date, and then the result of that could itself be sorted by subject. In effect, the or-der of a sequence of sorts is a ‘citation order,’ and each individual column value sorting relation (for example, alphabetical, numerical) is a ‘filing order’ for the indi-vidual column in question. By adjusting the citation order and the filing orders, the individual emails can be grouped and scattered as desired. The natural data- structure here for holding and representing the data is a table. And, in turn, tables are the core of relational da-tabases. The facets are orthogonal—they are mutually independent. Any date can be combined with any sub-ject and any sender. This will likely mean that the asso-ciated database table will automatically be in Normal Form, in particular in 3rd Normal Form (which is a good thing for a database table to be) (Date 1977). Here is an example of a less than expert attempt at faceting that is not orthogonal. Imagine a shop that sold pedal-powered personal transportation devices. It might catalog its devices By Kind (with the values {unicycle, bicycle, tricycle}) and By Number of Wheels (with the values {1,2,3}). These ‘facets’ are not orthogonal because, for example, a unicycle has to have 1 wheel (there cannot be a unicycle with 2 wheels). And, similarly, a relational database table for this would have a transitive column dependency between the Kind of device and the Number of Wheels that it had—the table would not be in 3rd Normal Form.

Although a table is, in some sense, the natural data structure here, in current practice, the faceted data is often alternatively presented in a list or index, often a hierarchical indented list or index. One easy way to do this is to change the column order so that the order reflects the desired hierarchical importance or inden-tation from left to right (changing column order does not affect the data content of a table), and then to successively sort on the columns from right to left. For example,

and this can be given the slightly more elegant display

The tree here is being displayed horizontally with the leaves to the right, instead of the usual inverted verti-cal display; this type of display is also often used on computers to depict on screen their directory or fol-der structure. Interestingly enough, many thesauri and indexes collapse the levels in the hierarchy, they reduce the indentation by essentially writing the third column under the left as an alternative (Aitchison Gilchrist, and Bawden 2000; NISO 2005; Zeng 2005). So, for example, the three rows

might be depicted

The annotations in italics are ‘Node labels’ and they are indicating the principles of subdivision (Aitchison, Gilchrist, and Bawden 2000; NISO 2005; Zeng 2005). This may or may not be a good thing. It is confusing logically because there are duplicate names, there are four names and only three kinds of rows. And it loses the full hierarchical structure. On the other hand, a faceted scheme might have 10 facets, and 10 levels of indentation would be unusable.

Orthogonal attributes, or principles of subdivision, can provide faceting, ‘ersatz’ faceting, but it is not the style of faceting, or domain for faceting, envisaged by Julius Kaiser, Paul Otlet, Henri LaFontaine, Shiyali Ranganathan, and the Classification Research Group (CRG) (Classification Research Group 1955; Kaiser 1911; La Barre 2010; Ranganathan 1937, 1951, 1960; UDC 2010).

Grey socks are socks, which is to say that the un-derlying type or universe type of the Color facet is socks. Striped socks are socks, which is to say that the underlying type or universe type of the Pattern

Page 37: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

497

facet is (also) socks. So, too, for the other facets. This means that the non-elemental classes like black-striped-wool-work-ankle socks are what Ranganathan would have called superimposed classes (Ranganathan 1960). Grey socks are socks (socks which are also grey), ankle socks are socks (socks which are also an-kle length), and grey ankle socks are socks (socks which are also grey, and which are also ankle length).

Contrast this with Kaiser’s ‘Concretes’ and ‘Proc-esses’ (Kaiser 1911). Examples of concretes are alu-minum, iron, and steel; and examples of processes are smelting, welding, and rusting. And, of course, con-cretes can play a part in processes (or processes can involve concretes)—as in the ‘rusting of iron.’ But Concretes and Processes do not have the same under-lying universe type, the ’rusting of iron’ is not a su-perimposed type.

The story is similar with modern facet analysis (Aitchison, Gilchrist, and Bawden 2000; Buchanan 1979; Gopinath 1992; La Barre 2010; Ranganathan 1960; Spiteri 1998). With real faceting, the actual fac-ets are of different kinds (as opposed to being differ-ent attributes, or differently principled divisions, of the same underlying kind). 5.0 The Wilson argument Travis Wilson (2006) subscribes to the orthogonal-facets-exclusive-foci view for a particular area and kind of analysis (and, to keep the record straight, he is certainly aware that Ranganathan had a different analysis of what probably is a different area). Wilson offers an argument. It first rests on his conception of facet analysis (which addresses mainly ersatz faceting rather than real faceting). Wilson suggests we start with a ‘tag soup’, say

Pecan Pie, Chocolate Ice cream, Chocolate Cookie, Cherry Pie, Cherry Ice cream, Pecan Cookie, Chocolate Pie

And we extracting from these the ‘atoms’ or ‘ele-ments,’ and that gives, perhaps

Pecan, Pie, Chocolate, Ice cream, Cookie, Cherry

These are still a ‘soup.’ There is not considered to be any order or structure here yet. But suppose we wish to extract a structure, in particular a facet structure. One way we can do it is by asking, “Which atoms can be combined with which other atoms?” The ones that

can be combined are independent, orthogonal, and be-long in different facets. The ones that cannot be com-bined are dependent and belong as foci in the same facet. So, for example, if we said Pecan can be com-bined with Pie, Ice cream, and Cookie, but not with Chocolate and Cherry, that would place Pecan is a dif-ferent facet from {Pie, Ice cream, Cookie} and in the same facet as {Chocolate, Cherry}; if Pie can be com-bined with Pecan, Chocolate, and Cherry, but not with Ice cream and Cookie, that would place Pie is a differ-ent facet from {Pecan, Chocolate, Cherry} and in the same facet as {Ice cream, Cookie}; and if we followed through with this, two facets would be generated:

Substrates of {Pie, Ice cream, Cookie} Flavors of {Pecan, Chocolate, Cherry}.

There is no compulsion that drives to that particular division and combination. We could alternatively have allowed Chocolate to combine with Pecan and Cherry, etc. And that might have led to three facets

Substrates of {Pie, Ice cream, Cookie} Flavors of {Pecan, Cherry} Toppings of {Chocolate}

It is up to us what we do, how we do the partition; however the distinction between facets and foci is to be made on the basis of what is independent and what is dependent. And so foci, within a facet, have to be exclusive because they are defined to be exactly that.

Wilson’s argument is certainly an argument, and he uses the view it embodies to generate faceted classifi-cations by algorithm. And if the candidate labels were bare, meaningless labels, it would be a reasonable ar-gument. If the atomic tag soup were

DF2, 27, Km+,*,Wef And we wanted to establish a faceted scheme of these, what Wilson suggests is presumably exactly right. But in the realistic cases we encounter, the tags in the soup do have meanings, and they do have kinds, in-dependently of what can and cannot be combined. For example, in the soup

18th Century, French, German, 19th Century, two of the labels, or what they signify, are time peri-ods, and the other two are regions or places. And we can use the kinds to do the facet analysis (which is exactly what Ranganathan and the CRG did). So Wil-

Page 38: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

498

son’s argument is not definitive. We can combine or synthesize foci, if we think that desirable. (Wilson’s argument is also discussed in Vickery [2008]). 6.0 The (Anthony) Foskett argument Real faceted classification concerns subjects (or top-ics or concepts or types or tags), not things (or at-tributes of things). Its origins are from puzzles con-cerning the nature of subject nodes in discipline-based Bacon-style Trees of Knowledge (Bacon 1620). (All the traditional library classifications have this style.) Kaiser, Otlet, LaFontaine, Ranganathan, and others, noticed that many, indeed almost all, subject concepts were compound or composite concepts constructed from elemental components. And the elemental components themselves could be of differ-ent kinds or categories. And this invites the use of synthetic classification from different categories of elemental, or atomic, component concepts—faceted classification. So the target here is to use faceted clas-sification to produce labels or concepts or annota-tions for subject classification.

Anthony Foskett (1996, 148) writes:

The foci within a particular facet should be mu-tually exclusive; that is, we cannot envisage a composite subject which consists of two foci from the same facet. We cannot have the 17th Century 1800s, or German English, or copper aluminum, but we can have composite subjects consisting of combinations of foci from differ-ent facets: English novels, 17th Century German literature, analysis of copper, heat treatment of aluminium.

This just seems mistaken and a confusion between things and topics. It confuses antiques with books about antiques. An entity, such as a metal spoon made entirely of a single metal, cannot be both made en-tirely of copper and entirely of aluminium; but a sub-ject (a subject matter, a topic, a concept, a type) pre-sumably can encompass copper and aluminium—isn’t ‘heat treatment of aluminium and copper’ a subject? Isn’t ‘17th and 18th Century German literature’ a topic? And isn’t ‘Sphinges and Crocutes’ a topic? (There may just be some lack of clarity of expression here in the text that is being quoted. Foskett would be well aware that the synthetic operations of the Universal Decimal Classification [UDC], especially the ‘+’ operator, in effect permit the forming com-posites from foci within the same facet [Broughton

2010a; UDC 2010]. Also, there is a connection here to what might be called polytopic reduction. Suppose there is a book with the (honest, accurate, and com-prehensive) title ‘Heat treatment of aluminium and copper,’ and the question is asked, “How many sub-jects does this book have and what are they?” An an-swer, favoring polytopicality, is: “Two, and they are {Heat treatment of aluminium, Heat treatment of copper}.” Another answer, avoiding polytopicality, is: “One, and it is {Heat treatment of aluminium and copper}.” No judgment is passed here on what is most desirable, but the Foskett intuition can be large- ly retained if polytopicality is the choice.) 7.0 Trying to take Broughton’s account

a small step further What Broughton writes seems to be exactly right. However, it may be possible to improve the views it expresses in various ways.

An alternative way of describing a collection of foci, favored by some authors, is to say that there is an array of foci. And it is quite possible to use ersatz faceting as a subfacet of a faceting scheme, in which case there would be arrays of foci for that facet. For example, there could be the ‘manufacture of socks,’ which could be a combination of a Process and an En-tity; then the socks themselves could be ersatz faceted as above, and that would or could generate ‘manufac-ture of white socks,’ ‘manufacture of grey socks,’ etc., and that would subdivide the socks by color, i.e., there would be a By Color array, and it could also give ‘manufacture of ankle socks,’ ‘manufacture of knee socks,’ i.e., there could be a By Length array, and so on. And the principles of division could be used se-quentially; so, there can be arrays (of foci) of order 1, arrays of order 2, etc., as described earlier.

Broughton writes (2004, 267 and 2004, 54 empha-sis in the original): “an important thing to notice about the members of an array [i.e. the foci] is that they are all mutually exclusive classes.” And (2004, 270 and 2006, 54): “because all the terms within a facet come into the same category … the relationship be-tween them will be those of a hierarchy.”

At first glance this does not seem quite right. If the terms are a hierarchy, for example {human, fe-male, male}, they need not be mutually exclusive classes—female and human are not mutually exclusive classes, one is a subtype of the other (a female is a human). But if attention is paid to exactly what is said, it is the arrays that individually have members which are exclusive classes, and that is correct. Notice

Page 39: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

499

that the condition exhaustive does not appear (either in conjunction with exclusive or on its own). This typically is an indicator of the use of interior nodes for classification, as opposed to a pure Aristotelian hierarchy with leaves that have the JEPD property.

Moving on to a different point, Broughton favors enumerative or enumerated foci— probably fixed-in-stone hierarchical schemes for the foci—and this is coupled with synthesis between facets. She suggests that the foci for a facet might have a hierarchical ar-rangement. This hints at an enumeration as opposed to a synthesis. Why? The construction of the classes is top-down. The start is the root. Then a principle of di-vision is applied to produce some children, a second principle of division is applied to produce grandchil-dren, and so on. So the leaves are not the atoms from which the whole tree is constructed or synthesized bottom up, rather they are the residual fragments after a series of cleavages have been made to the root. It might be thought that this is somehow inessential and that the tree could be synthesized bottom up. But what makes this difficult or awkward in this case is that the arrays are not exhaustive (and this comes from us-ing the interior nodes for classification). If the children were exhaustive of the parents, then the parents could be considered just to be the collection of their chil-dren, and attempts could be made to build bottom up.

Broughton (2004, 270) writes: “Where a faceted classification differs most significantly from an enu-merative classification is in it potential to combine terms from different facets.” Notice “combine terms from different facets” but no mention of “combine terms from within a facet.” But why not permit syn-thesis for everything? Why not permit combining terms from within the same facet? Here is an example:

18th Century History 18th Century Geography 19th Century History

are composite subjects synthesized from different facets. But, presumably, we would want also to have the ability to form subjects like

18th and 19th Century History 18th Century History and Geography,

and this requires synthesis within a facet (as well as the synthesis across facets). Neither Broughton nor Wilson would permit this, because, for example, they hold that the choice of the focus 18th Century specifi-cally excludes the choice of 19th Century.

8.0 Muddy waters Part of what is driving the intuitions here is that con-cepts as topics for annotation have somewhat differ-ent properties to concepts for classification in ontolo-gies. Unfortunately, classification of information ob-jects is not always, and perhaps not even usually, by topic alone.

Earlier it was suggested that the DDC, for example, classifies by subject or topic. That is not entirely true. It, in common, with the Library of Congress Classifi-cation and almost all traditional library classifications, also takes some input from the ‘form’ of an informa-tion object. Form here might include whether the ob-ject is a bibliography or whether it is an encyclopedia. There is a good reason for doing this. The whole sys-tem is aimed to provide service for the user, and ex-perience has taught that users are often interested in form.

Form has also crept into subject headings (i.e., lists of topics). LCSH recognizes about 600 forms of lit-erature. And any of these values are permitted to be components of synthesis to create further subject headings. So, if ‘Physics’ is a subject or subject head-ing, so too is ‘Physics—Encyclopedias.’ Obviously, there could be a book on physics encyclopedias, but a book with the subject heading tag ‘Physics—Encyclopedias’ is not one of those, rather it is an en-cyclopedia on physics. This is unfortunate. Subject headings should be, well, subject headings. Informa-tion about forms should be separate and separately provided. Other systems are more careful here. MeSH will append ‘as topic’ when required (or use other syn-tactic devices); so ‘Clinical Trial’ marks a piece of lit-erature which is a clinical trial and ‘Clinical Trial as topic’ marks literature about clinical trials. However there is still a mixing of topics and forms at the dis-play level. (MeSH is a faceted, or partially faceted, sys-tem; it is faceted or ‘deconstructed’ at the record level; but when it displays to the user it sometimes com-bines these facets. This is a good approach. The only, mildest of mild, qualification, is that what is displayed to the user is not really a subject or a subject heading, rather it is just a heading, or locator, [which combines subject and form]. There are also initiatives to do facet analysis on LCSH [Chan and O’Neill 2010]).

Were such general information object systems to be approached with a view to performing faceted analysis on them, part of the project, such as pulling out forms, would effectively be ersatz faceting. And the remainder would be real facet analysis of topics (which is the central concern of this paper).

Page 40: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

500

9.0 Conclusion All concepts can be conceived of as having categories or kinds or types. In particular, the elemental or atomics concepts have kinds. Then the non-elemental or composite or compound concepts are conceived of being constructed or synthesized from the atomic concepts. (These compound concepts also have kinds, in the style of type theory in computer science or categorial grammar in linguistics [van Benthem 1990]. For the most part, this is not of central impor-tance in the context of this paper.)

Elemental or atomic concepts have categories or kinds. Here are some of the kinds they might be. They might be Concretes, Processes, Periods, Places, Things, Kinds, Parts (organs, constituents), Proper-ties, Materials, Operations, Patients (objects of ac-tion, raw materials), Products (substances), By-products, Agents, Forms, Genres, and, possibly, other kinds, depending on the discipline or subject matter in question.

Then non-elemental concepts, or labels for such, are constructed or synthesized from values or foci of the kinds in use—there will need to be a syntax or grammar for this.

There can (and should) also be synthesis within a facet. Here is an example scheme. Suppose we decide that the granularity for the kind Period should just come down to centuries, and so any century is per-mitted as an atom. So in

21st Century Schizoid Man 221st Century Schizoid Man 2021st Century Schizoid Man

The 21st Century, 221st Century, 2021st Century are all good as (elemental) foci. In fact, there are infinitely many elemental foci of the Period kind. Then there can be synthesis to form such periods as ‘17th and 18th Century.’ Synthesis does not have to be restricted to simple (Boolean) additions, the period ‘Before Pre-sent (BP)’ amounts to ‘All centuries before the pre-sent.’ There can be hierarchically higher-level cover-ings such as ‘Renaissance’ for the period 14th to 17th century; ‘Paleolithic’ to cover the period from a cou-ple of hundred thousand centuries ago up to 100 cen-turies BP. And these coverings do not have to be ex-clusive of each other; the Lower Paleolithic and Mid-dle Paleolithic Periods are generally taken to overlap each other.

This is a very important point. If the system allows for overlapping classes, the result is not going to be a

hierarchy. (A hierarchy, or tree, is, graph-theoretically, a connected acyclic graph. If classes overlap, they share at least one child, which means, if they also share an ancestor [the root], that there is a cycle, so the structure is not a hierarchy.)

So the overall structure is not really a hierarchy, rather there are infinitely many elemental and synthe-sized Period foci, and many of these bear subtype-supertype relations to each other. There is a way of representing the semantics, that of directed graphs; and they use, in essence, arrows between the nodes. This convention was used earlier in this paper to illus-trate trees and hierarchies; and we are all very familiar with it from links or hyperlinks on the World Wide Web. There can be just links from foci to foci. These links can also be given a different semantics, or sev-eral different semantics simultaneously and, possibly, ambiguously. For example, the relations ‘is a subtype of,’ ‘is an instance of,’ and ‘is a part of ’ are used foun-dationally to establish classification schemes and their associated hierarchies or graphs. It is often very im-portant to distinguish these, and to be correct on what they are. For example, classification can support inference; if fish is a subtype of vertebrate, then that supports the inference from Livingstone is a fish to Livingstone is a vertebrate. For this use, it matters whether ‘X is a subtype of Y,’ ‘X is an instance of Y,’ or ‘X is a part of Y.’ But, in the setting of Information Resources, the target is to assist search and to help the Patron to find the relevant Information Objects. Often, in that context, it does not matter what the connection is between X and Y, provided that being guided from X to Y helps in finding Y (or Informa-tion Objects labeled with, or given the metadata, Y). Which actual links there are, the semantics, can be es-tablished or described in a variety of ways. One such way is symbolic logic and logical inference. With er-satz faceting (and all faceting within a single facet is ersatz), a logic inference engine can produce the links. If one type is socks, and another brown socks, logic has the ability to say that the second is a sub-type of the first. Similarly with definitions or other statements, if the Renaissance is defined to span the 14th ,15th , 16th, and 17th Centuries, logic can establish what the links are.

The key to synthetic construction operations on the kind or facet Period is that the result must be a Period. (In other academic literature, there is a logic of periods—that could be invoked to provide assis-tance here.) And this requirement can be generalized. The results of acceptable synthetic constructions within a facet must themselves be within the facet.

Page 41: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

501

Here is another example: washing is a Process, drying is also a Process and the synthetic constructions ‘washing then drying’ and ‘drying then washing’ are both also Processes. (There is also a logic of processes that could provide guidance. The theory of Petri Nets is a somewhat advanced example of such a theory.)

A full generalization would also admit construc-tors that could have components which themselves were faceted (as Ranganathan described at length); so, for example, there could be a comparison con-structor which could be used to produce the class ‘Comparisons of the smelting of iron with the smelt-ing of steel.’ Indefinitely, many classes can be synthe-sized, which ones actually are synthesized depends on literary warrant and the needs at hand. (We all have the capability of saying indefinitely many sentences, but, at least in principle, what we actually do say de-pends on our interests and needs.)

Faceted classification can be fully synthetic classi-fication from different categories of elemental, or atomic, component concepts. References Aitchison, Jean, Gilchrist, Alan, and Bawden, David.

2000. Thesaurus construction and use: a practical manual. 4th ed. Chicago: Fitzroy Dearborn.

Almeida, Mauricio, Souza, Renato, and Fonseca, Fred. 2011. Semantics in the Semantic Web: a critical evaluation. Knowledge Organization 38: 187-203.

Austin, Derek. 1984. PRECIS: a manual of concept analysis and subject indexing. 2nd ed. London: The British Library.

Bacon, Francis. 1605. The Advancement of Learning. Available: http://www.gutenberg.org/ebooks/5500

Bacon, Francis. 1620. The Great Instauration. Available at http://www.constitution.org/bacon/ instauration.htm

Broughton, Vanda. 2004. Essential classification. New York: Neal-Schuman.

Broughton, Vanda. 2006. The need for a faceted classification as the basis of all methods of information retrieval. Aslib Proceedings: New Information Perspectives 58: 49-72.

Broughton, Vanda. 2010a. Concepts and terms in the faceted classification: the case of UDC. Knowledge Organization 37: 270-279.

Broughton, Vanda. 2010b. Essential Library of Con- gress Subject Headings. London: Facet Publishing.

Buchanan, Brian. 1979. Theory of library classification. London: Clive Bingley.

Chan, Lois Mai, and O’Neill, Edward T. 2010. FAST: Faceted Application of Subject Terminology: principles and application. Santa Barbara, Calif.: Libraries Unlimited.

Cheti, Alberto, and Paradisi, Federica. 2008. Facet analysis in the development of a general controlled vocabulary. Axiomathes 18: 223-241.

Classification Research Group. 1955. The need for a faceted classification as the basis of all methods for information retrieval. Library Association Record 57: 262-68.

Dahlberg, Ingetraut. 2009. Brief communication: concepts and terms – ISKO’s major challenge. Knowledge organization 36: 169-77.

Date, C.J. 1977. An Introduction to Database Systems (2 ed.). Reading, MA: Addison-Wesley.

Foskett, Anthony C. 1996. Subject approach to infor- mation. 5th ed. London: Facet Publishing.

Foskett, Douglas J. 2003. Facet analysis. In Drake, Miriam A. ed., Encyclopedia of library and infor- mation science. 2nd ed. New York: Marcel Dekker, pp. 1063-67.

Fugmann, Robert. 2004. Learning the lessons of the past. In Rayward, W. Boyd, and Bowden, Mary Ellen eds., The history and heritage of scientific and technical information systems: Proceedings of the 2002 Conference, Chemical Heritage Foundation. Medford, NJ: Information Today, pp. 168-81.

Gnoli, Claudio. 2008. Facets: a fruitful notion in many domains. Axiomathes 18: 127–30.

Gopinath, M. A. 1992. Ranganathan's theory of facet analysis and knowledge representation. DESIDOC Bulletin of Information Technology 12n5: 16-20.

Kaiser, Julius Otto. 1911. Systematic Indexing. London: Pitman.

La Barre, Kathryn. 2006. The use of faceted analytico-synthetic theory as revealed in the practice of website construction and design. Indiana University, Bloom- ington.

La Barre, Kathryn. 2010. Facet Analysis. In Blaise Cronin (Ed.), Annual review of information science and technology 44. Medford, NJ: Information Today, Inc., pp. 43-86.

Lambe, Patrick. 2007. Organising knowledge: taxono- mies, knowledge and organisational effectiveness. Oxford, England: Chandos Publishing.

Lowe, Henry J. 1994. Understanding and Using the Medical Subject Headings (MeSH) Vocabulary to Perform Literature Searches. Journal of the Ameri- can Medical Association 271: 1103-08.

MeSH. 2010. Medical Subject Headings - Home Page. Available at http://www.nlm.nih.gov/mesh/

Page 42: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Frické. Faceted Classification: Orthogonal Facets and Graphs of Foci?

502

Mills, Jack, and Broughton, Vanda. 1977. Bliss biblio- graphic classification. second edition. introduction and auxiliary schedules. London: Butterworths.

Morville, Peter, and Rosenfeld, Louis. 2006. Information architecture for the World Wide Web. Sebastapol, CA: O'Reilly.

NISO. 2005. NISO Standards: Z39.19, Available at http://www.niso.org/kst/reports/standards?step= 2&gid=&project_key=7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a

Ogden, Charles Kay, and Richards, Ivor Armstrong 1972. The meaning of meaning: a study of the in- fluence of language upon thought and of the science of symbolism. New York: Harcourt, & Brace.

Pliny, the Elder. 78. The Natural History. Available at http://www.perseus.tufts.edu/hopper/text?doc= Perseus:text:1999.02.0137

Ranganathan, Shiyali R. 1937. Prolegomena to library classification (3rd ed. 1967; 1st ed. 1937 ed.). Madras: The Madras Library Association.

Ranganathan, Shiyali R. 1951. Philosophy of library classification. Copenhagen: Munksgaard.

Ranganathan, Shiyali R. 1959. Elements of library classification 2 ed. London: Association of Assistant Librarians.

Ranganathan, Shiyali R. 1960. Colon classification (6 ed.). London: Asia Pub. House.

Ranganathan, Shiyali R. 1967. Prolegomena to library classification 3d ed. Available at http://dlist.sir. arizona.edu

Slavic, Aida. 2008. Faceted Classification: Management and Use. Axiomathes 18: 257-71.

Smith, Barry. 2004. Beyond Concepts: Ontology as Reality Representation. Paper presented at the Pro- ceedings of FOIS 2004. International Conference on Formal Ontology and Information Systems, Turin.

Spiteri, L. 1998. A simplified model for facet analysis: Ranganathan 101. Canadian journal of information and library science 23: 1-30.

Tichy, Pavel. 1988. The foundations of Frege's logic / Pavel Tichy. Berlin: de Gruyter.

UDC. 2010. UDC Consortium Home Page Available at http://www.udcc.org/

van Benthem, Johan. 1990. Categorial Grammar and Type Theory. Journal of philosophical logic 19: 115-68.

Vickery, Brian C. 1960. Faceted classification: a guide to construction and use of special schemes. London: Aslib.

Vickery, Brian C. 1966. Faceted classification schemes. In S. Artandi (Ed.), Rutgers series on systems for the intellectual organization of information 5. New Brunswick, NJ: Graduate School of Library Science at Rutgers University.

Vickery, Brian C. 1975. Classification and indexing in science (3 ed.). London: Butterworths.

Vickery, Brian C. 2008. Faceted classification for the Web. Axiomathes 18: 145-60.

Willetts, Margaret. 1975. An investigation of the nature of the relation between terms in thesauri. Journal of documentation 31: 158-84.

Wilson, Travis. 2006. The strict faceted classification model, Available at http://facetmap.com/pub/

Zeng, Marcia Lei. 2005. Construction of Controlled Vocabularies, A Primer (based on Z39.19) Available at http://www.slis.kent.edu/~mzeng/ Z3919/index.htm

Page 43: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

503

Epistemic Pluralism and Multi-Perspective Knowledge Organization:

Explorative Conceptualization of Topical Content Domains†

Mauri Kaipainen* and Antti Hautamäki**

*School of Communication, Media and Information Technology, Södertörn University, Alfred Nobels allé 7, S-141 89 Huddinge, Sweden, <[email protected]>

**Agora Center, University of Jyväskylä, Mattilanniemi 2, P.O. Box 35 (Agora), FI-40014, <[email protected]>

Mauri Kaipainen, PhD, professor of media technology at Södertörn University (Sweden), has a back-ground in education, musicology, and cognitive science. His research agenda focuses on processes of me-diation in which knowledge emerges from local or individual activities. His current theory construction aims to clarify the idea of ontospaces, dynamically evolving multi-faceted ontologies that constitute a model of knowledge organization, and concept emergence. In addition to interactive narrative, the model has a range of applications in media art, community and collaborative media applications, as well as learning environments with collaborative knowledge building, bottom-up e-democracy.

Antti Hautamäki is a research professor, director of the Agora Center at the University of Jyväskylä, and an adjunct professor of theoretical philosophy at the University of Helsinki. He holds a PhD in philoso-phy. Hautamäki has published and edited over 30 books and published over 100 articles about philosophy, cognitive science, innovation, and information society. His books include Points of View and Their Logi-cal Analysis (1987) and Sustainable Innovation: A New Age of Innovation and Finland’s Innovation Pol-icy (2010). His current research focus is on innovation processes and service innovation.

Kaipainen, Mauri and Hautamäki, Antti. Epistemic Pluralism and Multi-Perspective Knowledge Or-ganization: Explorative Conceptualization of Topical Content Domains. Knowledge Organization, 38(6), 503-514. 49 references. ABSTRACT: Based on strong philosophical traditions, cognitive science results, and recent discourses within the discipline of knowledge organization, the authors argue for a perspectivist approach to concepts in information sys-tems. In their approach, ontology is dissociated from concept, and instead conceptualization is left up to the epistemic activity of the information system user. A new spatial ontology model is explicated that supports multiple perspective-relative concep-tual projections of the same domain. With an example domain and a demo application, they provide a preliminary proof of con-cept of how different perspectives yield alternative classifications, categorizations and hierarchies, all the way to a different ways of narrating the domain. The results suggest the potential of multi-perspective knowledge organization systems that not only support search and retrieval of information but even the articulation and conceptual disposition of information.

Received 16 August 2010; Revised 29 July 2011; Accepted 29 July 2011

† We thank Vahur Rebas and Jaagup Kippar for the visualization tools that have been crucial in elaborating the concept of multi-perspective exploration, and the Neuroaesthetics Residency program in Berkeley, California, for providing an intellec-tually inspiring homebase for the final phase of the project.

Page 44: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

504

1.0 Introduction Every corpus of information can obviously be classi-fied, categorized, and conceptualized from multiple al-ternative perspectives. Despite this, most information systems, including library classification systems, data-base architectures, as well as indexing and search en-gines of the Internet, still today customarily assume a single conceptual structure, typically a hierarchical taxonomy that constrains the metadata of the domain in question. From the philosophical point of view, such a structure represents the ontological assump-tions underlying an information system in the sense of setting the constraints of what can retrieved and mate-rialized from there. This term of ontologies has there-fore been widely adopted by information technology.

It is obvious that an ontology can serve as a means to promote a particular scientific, ideological, peda-gogical, or aesthetical paradigm with its particular set of values and prioritizations. Ontologies are never neutral, but reflect special interests or power positions regardless of whether the power use is deliberate or merely due to the lack of alternatives. However, at least public information systems should avoid biases inherent in predetermined conceptualizations and fixed ways of organizing information. Citing Hjørland and Pedersen (2005, 586), “a specific interest (say that of Scandinavian public libraries) should lead to the de-sign of systems, which are optimal given the interest or purpose and which do not just lead to the accep-tance of implicit values inherent in systems that are designed, for example, for commercial purposes.” Therefore we suggest that there is demand for infor-mation systems that do not depend on a single ontol-ogy. In order to find alternative ways to handle con-cepts in information systems, we will reconsider the roles of epistemology and ontology in a way in which concept is not fixed to predetermined ontological as-sumptions, but instead becomes relative to perspec-tives taken.

After the review of concepts and concept theories in 1.1, in 1.2, we relate them to the philosophical dis-course of perspectivism. Then we will establish a par-ticular sense of talking about ontologies in 1.3, instru-mental to a perspectivist theory of concepts, and lay the ground for a dynamical approach to conceptualiza-tion by means of spatially modeled similarity in 1.4. 1.1 Concept theories Concept theories aim to define and describe concepts, the core of elements of cognition, that structure the

understanding of information. Therefore they are of interest not only to philosophy, linguistics, informat-ics, and cognitive sciences, but absolutely crucial to any discussion relating to knowledge organization, that is, activities such as document description, index-ing, and classification performed in libraries, data-bases, archives, etc. (Hjørland 2008). Due to this mul-tifacetedness, there is no concensual account of con-cepts, but rather a number of parallel and often com-peting discussion threads, which may sometimes—but not always—cross disciplinary borderlines.

In information science, theories of concepts have not until recently been considered systematically. Hjørland (2009) covers a range of concept theories starting from Plato to what he describes as post-Kuhnian, the trend following Thomas Kuhn’s (1962) suggestion that concepts—like scientific paradigms—evolve culturally and historically and should be inter-preted in such contexts. Further, he relates concept theories to epistemologies, which he divides into four groups: empiricism, rationalism, historicism, and pragmatism. Empiricism bases knowledge on observa-tions (and on inductions from a pool of observations); rationalism relies on logics, principles, rules, and ideal-ized models; while in the case of historicism, knowl-edge builds on social contexts, on historical develop-ments and on the explication of researchers’ pre-understanding. Finally, in pragmaticism, knowledge is based on the analysis of goals, purposes, values, and consequences.

Hjørland’s consequent (2009, 1523) definition of concepts is formulated as “dynamically constructed and collectively negotiated meanings that classify the world according to interests and theories.” Further, he stipulates that “concepts and their development cannot be understood in isolation from the interests and theories that motivated their construction, and, in general, we should expect competing conceptions and concepts to be at play in all domains at all times.”

To plot our approach onto the map of Hjørland’s four epistemological theories, we can fully accept that concepts reflect accumulated observations, rather than that they originate from some logical or rational infer-ence. We likewise adopt the empiricist interpretation and therewith justify the rejection of the rationalist approach in this context. However, from the point of view of our model, both historicism and pragmatism can be regarded as instances of a perspectivist episto-mology. Our model is all about perspectives that de-termine how observations are classified, regardless if they are historical or pragmatical by nature.

Page 45: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

505

To understand what is meant by the concept of ‘concept’ itself, it’s useful to clarify some related his-torical notions. In medieval logic, semantics was based on a threefold system consisting of a) an entity, such as a horse; b) a general noun such as "horse"; and c) an idea in mind, in this case the idea of horses (see Lyons 1977) that closest corresponds to what we mean by concept. Charles Peirce rearticulated these distinctions respectively as a) sign, b) object, and c) interpret. Gottlob Frege, in turn, made an important distinction between sense (Sinn), corresponding to concept in this discussion, and reference (Bedeutung) of expressions. For example, the phrases "The Morn-ing Star" and "the Evening Star" are different con-cepts, but have same reference (the same planet).

Modern mainstream philosophy follows the related two-part distinction between extension and intension. There, "extension" refers to the class of things to which it is applied, while "intension" points at the the set of essential properties that determine the applica-bility of the term (see Lyons 1977, 158-159), respec-tively. This distinction, in turn, allows telling apart ex-tensional and intensional logics. What could be de-scribed as extensionalist blindness apparently charac-terizes today’s mainstream information systems that are uncapable of recognizing intensions and cannot thereby deal with conditions that modify meaning, such as beliefs or points of view. At the same time, they are ignorant of a massive body of psychological evidence for the context-and attention-dependency in perception and cognition (e.g., Gärdenfors 2000, 112-114; Schwartz 2007; Smith and Vela 2000).

In the field of knowledge organization, Hjørland is not alone in suggesting that the categorizations are dynamical. Even Andersen (2002) suggests that clas-sification may be explained systematically from a fam-ily resemblance point of view and, furthermore, ar-gues that this approach allows for taxonomies being dynamic entities, which may undergo change. But we take the intensionalization of information systems further than a mere explanatory model. We propose information systems that assume dynamically evolv-ing conceptualization on the level of user interaction with the ontological level of system, that is, perspec-tive-relative epistemic exploration. Before elaborating this further, however, it is good to take a closer look to the implied perspectivism. 1.2 Perspectivism The recognition of the perspectival nature of cogni-tion can be called perspectivism (see Giere 2006). This

approach has long historical roots, at least going back to Friedrich Nietzsche, who stressed that one always knows, or perceives, or thinks from a particular per-spective. There can be more than one correct account of how things are in any given domain (Baghramian 2004, Chapter 10). The issue is not to state which per-spective is correct or true, but how to explore and mu-tually relate multiple perspectives. There is no need to assume any convergence of different perspectives to any final form. In this framework, perspectives are contexts of surrounding and constantly changing per-ceptions, impressions, influences, and ideas, conceived of through one’s language and social upbringing. (See also Magnus and Higgins 1996). Correspondingly, in philosophy of science, interpretations of observations are said to be theory-laden; that is, they depend on the theory adopted (e.g., Hanson 1958; Kuhn 1962; Fey-erabend 1981).

As another view to the multiplicity of perspectives, Quine (1980, 65) talks about "the totality of our so-called knowledge or beliefs" that is "a man-made fabric which impinges on experience only along the edges." According to him, different theories, or as we may in-terpret them, conceptualizations, are underdetermined by experience and can be empirically equivalent. That is, same facts can support different, even inconsistent conceptualizations, each of which only partially matches the experienced reality. A logical treatment of perspectivism has been elaborated by Antti Hau-tamäki (1986), based on the concept of determinables presented originally by Johnson (1964) in 1921. Ac-cording to the latter, determinables are adjectives, al-though grammatically they are substantival (colour). Determinates or determinate values, like different colours, in turn, produce logical divisions of the space of determinables. In this setting, determinates of a de-terminable must be exclusive and exhaustive. These terms are among the foundations of the approach to ontology, to be elaborated in the following. 1.3 Reconsidering ontology in information systems In the present context, the concept of ontology is dis-cussed in the practical sense of information systems, not in any metaphysical sense. Here ontology is related to the topics within an information system. In the standard practice, such ontologies, specifications of conceptualization (Gruber 1993), represent an analyti-cal view of an expert, which may yield a consensus of some particular community of practice. This kind of ontology constitutes a fixation to a particular concep-tualization, and thereby constitutes an obstacle for the

Page 46: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

506

perspectivist epistemology. In contrast, we consider another approach to ontology based on collaborative annotation, such as applied in recent social media like Facebook, Twitter, or YouTube. In this practice, par-ticipants mark and classify the content, popularly re-ferred to as ‘tagging.’ The resulting annotations, or ‘folksonomies’ (Mathes 2004, see also Quintarelli 2005), are non-hierarchical, or ‘flat,’ and do not di-rectly translate to expert-controlled classification sys-tems (e.g., Gruber 2005). However, the potential of collaborative tagging even for systems of knowledge organization has been recognized (e.g., Macgregor and McCullogh 2006). As the preliminary disposition, it has been suggested that collaborative annotations can be translated to formal ontologies (e.g., Zhang and Wu 2006; Laniado et al. 2007; Eda et al. 2008). In our view, however, such an approach misses the point that col-laborative annotations, as such, are manifestations of the multiplicity of conceptual points of view to the domain. It is equally justified to put the convention of formal ontologies into question on the grounds that they appear not to be capable of accounting for com-mon conceptualizations. We propose a reconsideration of ontology by decoupling ‘concept’ from ontology and leaving it up to the participant’s epistemic activity to conceptualize the domain. As a distinction to the canonical convention, we refer to ontologies as specifi-cations for conceptualization, rather than of conceptu-alization (referring to Gruber 1993). Thus, ontology is not to be understood as a specification that in itself provides an unambiguous conceptualization, but rather as the coordinate system instrumental for the dynami-cal conceptualization of the topical domains.

We will elaborate the idea of conceptualization as epistemic activity, which can take place as interaction with an information system, a kind of interaction that goes deeper than search and retrieval and relates and amounts to active knowledge-construction. It is to be appreciated that the idea that the mind actively con-structs the conceived world can be tracked back to Kant. In psychology this view has been prominent since Piaget, and has been further elaborated, for ex-ample, by Kelly’s (1955) Personal Construct Theory. 1.4 Similarity as the principle of dynamical

conceptualization The assumption of similarity relations as the founda-tion of conceptualization is a tradition leading through the history of philosophical epistemology via Hume, Hegel, Popper to Carnap, as well as through psychol-ogy since William James. It is justified by vast evidence

(e.g., Krumhansl 1978; Goldstone 1994 Tversky 1991). The key point is that similarity is not absolute but de-pends on the perspective of the observer. This has been pointed out in empirical settings: similarity judgments are highly attention and context-dependent (for an overview, see Gärdenfors 2000, 113).

We assume similarity relations to underlie perspec-tive-relative classification and categorization, hierar-chic mereological organization (part-of and belongs-to relations), and narration. In the following (section 2), we will elaborate a spatial model of information organization that models similarity in terms of prox-imity and allows a spatial model of perspectivism. Here we assume—in the Nietzschean spirit—that concepts are being continuously constructed in terms of epistemic exploration of alternative similarity-based groupings, instead of being fixed by a static ontology. In section 3, we sketch how such a model can be ap-plied to elaborate multi-perspective information sys-tems. In section 4, we aim to demonstrate how multi-ple alternative conceptualizations can be drawn from a single content in response to interactively explored perspectives. The conclusions follow in section 5. 2.0 Ontospace As a directly intensional and thereby instrumental point of departure to a perspectivist model of concep-tualization, we adopt Gärdenfors’ (2000) theory of conceptual spaces augmented with Hautamäki’s (1986) theory of viewpoints. For Gärdenfors, a concept is “an idea that characterizes a set, or category of objects” (60). In his model, such a set occupies a convex region of a conceptual space, which is determined by quality dimensions that describe the entities of a topic domain. Thus, if a description of an object is inside a concept, one can say that the concept applies to that entity. Gärdenfors’ approach allows quite naturally a dynami-cal extension that does not directly deal with the fixed conceptual space. Our approach deals with perspec-tive-relative projections of object distributions in that space that will be called representational spaces. Dis-tances in the representational space, in turn, can be re-lated as similarity relations, which, in turn, contribute to dynamical perspective-relative conceptualization. 2.1. Ontodimensions and ontocoordinates In order to elaborate ontology with regard to inten-sional logic and perspectivism, we apply the spatial metaphor of Gärdenfors and define ontology as a state space, consequently termed ontological space

Page 47: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

507

(ontospace), a coordinate system that specifies the dimensions with respect to which items of the topical domain vary, however without implying any universal ontological assumptions.

Let I be a set of determinables, also referred to as attributes, properties, or qualities, for example, I = {redness, roundness, weight, length,…}. Associated with each determinable, there is a set of determinate values Di. Then an ontospace for a topic domain is n-dimensional space A=D1xD2x…xDn =XiDi. Elements of A are n-tuples of the form a = [a1,a2,...,an], where ai belongs to Di. Each entity x of the topic domain can be represented as state s(x) = ax in ontospace A, where ax = [a1,a2,...,an], of which the elements are also conceivable as the ontocoordinates of x. It’s quite natural to suppose that there is a distance measure mi

for all determinables i, expressing the degree of mu-tual similarity elements in terms of set Di of determi-nate values. Here mi is a function from AixAi to the set of non-negative real numbers R+ where mi(ai,ai’) = the distance of values ai and ai’ in set Ai. Consequently, larger distance means less similarity.

The determinables can be qualitative or quantita-tive, both expressed by means of quantitative ontoco-ordinate values in order to refer to positions of enti-ties in the ontospace. In fact, all qualitative variables can be transformed into quantitative variables. To oc-cupy the whole ontospace, not just its origo and outer edges, we allow graded values. This can be done with the state function whose values include real numbers in the interval [0,1] for all determinables. These real numbers can be interpreted as expressions of the de-gree of membership in fuzzy sets (Zadeh 1965).

In terms of visualization, an ontospace is a multi-dimensional matrix that allows numerous dimension-reducing algorithms to be applied, such as multidi-mensional scaling MDS (e.g., Kruskal et al. 1978), Kohonen’s self-organizing map SOM (e.g., 1982), principal component analysis PCA, or Eigentaste (Goldberg et al. 2001). The ontospatial approach we build on assumes positions of observers concerned on prioritizations of the dimensions, that is, which di-mensions of the ontospace to take into account, which is not to be confused with direct preferences on ontocoordinates of elements themselves.

In summary, we have proposed an ontology model that accommodates varying perspectives as its inher-ent property, such that constitutes extraspatially ob-served perspective-relative similarity among entities of the topic domain. In the following, this is further developed to explain the epistemic activity of multi-perspective explorative concept-construction.

2.2. Perspectives to ontospace and explorative conceptualization

We have adopted the view that a perspective is always present and accordingly there is no such thing as a concept without a perspective. A tacit perspective may the form of the choice and prioritization of deter-minables, positions of each in hierarchy, the applied metric, means of measurement or scaling. We infer that no ontology, neither in the philosophical nor technical sense, should be interpreted in an absolute way, but rather as a construct that already in itself in-cludes some implicit interpretation and choice. Our contribution is to suggest making perspectives explicit and interactively explorable.

We assume that perception and cognition, ulti-mately the brain, cannot effectively deal with a world of unlimited dimensionality since evolution has mainly adapted it to the constraints of the directly perceivable two and three-dimensional aspects of the environment. The prerequisite of cognitive-perceptual sense-making is to reduce the high (or endless) di-mensionality of the environment, defined by a large number of attributes, properties, and relations, to something lower-dimensional. Our model addresses this reduction in two ways. First, based on Gärden-fors’s conceptual space theory, conceptualization can be seen as such a reduction. Secondly, it allows a per-spective-relative information organization that can be explored dynamically, reminiscent of the ways in which movement in space allows making sense of complex physical environment.

The key is that projecting data elements in onto-space A to a lower dimensionality of B allows relating objects to each other in terms of similarity relations, and their derivative classifications, mereologies and hi-erarchies. It is difficult, however, to determine the op-timal level of the reduced dimensionality, beyond the heuristic that it should not exceed cognitive manage-ability, which may be placed n=7, as discussed in terms of Miller’s (1956) magical number of seven, or some measure of the capacity of comprehension or working memory (e.g., Just and Carpenter 1992), or visual perceivability (e.g., Dastani 2002). In terms of visualization, more than 3-dimensional spaces cannot be easily made intelligible, perhaps due to the evolu-tionary adaptation to the physical environment. As a practical solution without categorical commitment to any particular dimensionality, it is simplest to think of dimensionality of B being two when it comes to inter-active visualizations, and one when the spatial disposi-tions on B are to be conceptualized as narratives.

Page 48: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

508

For Hautamäki (1986), points of view are defined as selections of determinables. Intuitively, a point of view is a set of relevant or notified determinables. Fol-lowing Kaipainen et al. (2008) we generalize this to perspective, an array P = [p1,p2,…,pn] of weights, ex-pressed as real numbers ranging within interval [0,1] associated to all determinables. The weights pi express the interest or attention of a speaker towards each on-tological dimension i. Perspective P is in control of transformation RP from high-dimensional ontospace A to lower-dimensional representational space B. We suppose that a distance measure M in B is defined in some way (among several options). The core of our method is that function RP respects the distance measures mi of determinables in the following way:

a) If pi=1, then the distance mi contribute fully to the distance measure M. b) If pi=0 then the distance mi is ignored by M. c) Intermediate values 0< pi <1 refer to partial contributions to the distance measure.

By means of function RP, objects of the domain can be categorized in a manner that reflects the adopted perspective. The result of transformation RP is a per-spective-relative spatial organization of the entities on B, constituted by distance measure MP for the objects within the domain B. This can be based, for example, on Euclidean distance.

As discussed earlier, we generally assume that di-mensionality-reduced mapping RP facilitates the cog-nitive manageability of A, on grounds discussed ear-lier. In this context, we interpret ontological space A to in terms of representational space B where a rela-tion of similarity is defined. Then the transformation RP from A to B generates similarity relation to A: a and b in A are similar if their representations RP(a) and RP(b) are close to each other in B. This observed similarity results of a categorization rule in Gärden-fors’s terms, modelled as perspective P in our model, resulting in a perspective-relative partitioning of the space. Representational space B can be then thought of as a visually apparent two-dimensional map of si-milarity relations that forms the basis of gradually more elaborate conceptual structures, perhaps follow-ing the following sequence of inferences:

1) Prototypes, categories, and tessellations have been topics of a vast literature. For the present treatment, it suffices to assume that concepts are centered around a prototypical representa-tive entity, as proposed by Eleanor Rosch’s pro-

totype theory (e.g., 1973, 1975, 1983). Corre-spondingly, in Gärdenfors’s conceptual space model, concepts will correspond to convex re-gions of the space (Gärdenfors 2000, 60, 71). Supposing the betweenness relation is defined, a region C in a conceptual space is “convex if, for all points x and y in C, all points between them are also in C” (ibid. 69). A similarity relation can be defined by unbroken betweenness (Gärdenfors and Williams, 2001). Further, the Region Connection Calculus of Cohn et al. (1997), together with the classical Voronyi (1907) tessellation, can be applied to de-termine C by means of thresholds of similarity that forms its category boundaries (Gärdenfors and Williams 2001). To interpret these spatial means of conceptualization in the present framework, let us assume prototypes qk for cer-tain concepts selected by some cognitive or per-ceptual processes. Each prototype is an element of ontospace A: qk = [ak1,ak2,…akn]. By using the available distance measure MP , we can form new concepts C(qk,tk) based on prototypes qk and a threshold tk, which is a positive real number. Object x belongs to the extension of the concept C(qk,tk) if and only if MP(s(x),qk) < tk The in-terpretation of this definition is that an entity belongs to concept c(qk,tk) if its distance from the prototype qk is smaller than the given threshold tk. A Voronyi tessallation in concep-tual space C is given by the triple (P,d,C) where P is a set of generator points {p1,p2 ,.., pm}, and d is a distace measure on C. The tessallation re-gions c(pi) is {x|d(pi,x) <= d(pj,x) for j=1,2,...,m}. c(pi) is the category generated by pi. 2) We propose that perspective-relative spatial dispositions in B define not only distinctions be-tween comparable (same-level) categories, but even a multi-level hierarchy of mereological is-a and is-part relations. Let K and L be subsets (concepts) of a conceptual space A. If K is a sub-set of L, then K is a subconcept of L. This kind of mereology may have any number of levels. 3) As a preliminary notion, we also suggest that narrative disposition can be inferred from mereological hierarchies, interpreted as disposi-tions of the entities. This will be sketched in the examples below.

Page 49: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

509

3.0 Multi-perspective knowledge organization We envision multi-perspective knowledge organiza-tion, interactive means of conceptualization that al-low the observer to explore the topical domain from multiple perspectives in order to construct their indi-vidual conceptualization of the domain consisting of information elements. From each perspective opens an ontospatial disposition in which information ele-ments link to each other in terms of similarity rela-tions, allowing inference of concepts as exemplified below with an example domain. 3.1 Example: Economy stimulation policies For a preliminary proof-of-concept, assume an on-demand compilation of articles on economical recov-ery stimulation policies. Since the challenge of design-ing and implementing multi-perspective information systems with actual content is way beyond the re-sources available, our example is based on an imagi-nary content database on policies adopted by govern-ments to fight economical recession, a topic to which there is surely a range of perspectives.

Ontospatially described metadata for the content was generated manually to simulate stereotypical stances of ‘Keynesians,’ ‘Neoliberalists,’ ‘Interven-tionalists,’ and ‘State-Capitalists,’ four examples of

each, referring to terms commonly used in the litera-rature of economics, e.g., Jessop (2002). Each article is annotated in terms of ontodimensions, each referring to the degree of acceptance of a particular policy proposition. The metadata could potentially originate from some collaborative annotation practice (‘tag-ging’), or they could be drawn from descriptive statis-tics collected by some automated text analysis, for ex-ample by means of deriving so-called subject access points from the documents themselves (see Hjørland 2003). Further, as our example shows, nothing ex-cludes the possibility that they are authored by an ex-pert either. Consider also an interface for interactive multi-perspective exploration of the content onto-space. In this example, a drag-and-drop interface al-lows ontodimensions to be taken into account by dragging them from the list of Ontodimensions into the Perspective column (middle), or vice versa, and to be mutually prioritized by dragging them up or down within the Perspective column. In the demo applica-tion3 weight pi = 1/q is assigned to dimension i, where q denotes the priority order position—a function among many alternatives to implement a priority ranking among the dimensions. In response to each choice of perspective, the application generates a MDS representation of the topic domain on which prox-imity corresponds to similarity from that perspective, and displays it on the right column (Figure 1).

Figure 1. Ontospace of the domain of stimulation policies as it appears from the perspective of the single ontodimension corre-sponding to proposition “Cheap state loan to banks.” By means of the proposed interface a perspective is taken by dragging chosen Ontodimensions from the left column to the Perspective (second left column) in order to have the spatial distribution of the content entities visualized correspondingly. The gray cone opens to the direction of increas-ing support for the proposition.

Page 50: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

510

In order to demonstrate the emergence and consis-tency of concepts based on proximity-qua-similarity cross a sequence of perspectives, imagine the follow-ing exploratory sequence. Note that in computing the visualizations, labels are not taken into account, but they are provided only for the purposes of interpret-ing the results. 3.1.1 Perspective 1 As the initial perspective, draw proposition “Cheap state loan to banks” to the Perspective column, corre-sponding to weight pCheap state loan to banks = 1.0. This re-sults in a one-dimensional sorting of the content, di-viding the policies to two groups with the proponents on the upright.

In this case, the visualization suggests a polariza-tion of the policies with respect to the chosen dimen-sion between those standing for the proposal and those against it. It is to be appreciated that, even in everyday life, the initial understanding of a novel do-main is often equally simple. The additional benefit of the spatial representation with respect to a regular list of search results is that it allows estimating the relative distance between the clusters and reveals variation be-

tween individual items. For example, it appears that the degree of agreement among the ‘proponents’ var-ies significantly more than that of the ‘opponents.’ This may give a hint of the need for further explora-tion of whether this setting is a flat oversimplification, or whether it holds also from some other perspective. 3.1.2 Perspective 2 To explore this further, drag dimension “Cheap loan to enterprises” to the second priority position in the Perspective column (Figure 2).

In terms of the spatial metaphor, the additional di-mension reveals significant ontospatial depth of the conceptualization, revealing a three-component con-cept, projected on a two-dimensional map. Even so, it remains epistemically uncertain whether this conceptu-alization has continuity or higher-dimensional volume. 3.1.3 Perspective 3 Taking into account the additional dimension of “In-crease state ownership” as the 3rd strongest proposi-tion with pCheap state loan to banks =0.33 still confirms the three-class conceptualization, but splits the "Keynes-

Figure 2. A three-part concept achieved by taking ontodimension “Cheap loan to enterprises” into account in addition to “Cheap loan to banks.” The reader may imagine spatially ‘peeking around’ axis (1) to reveal the “Neolib” (oval-marked) cluster that was initially superimposed by the “StateC” cluster in Image 1. The first position in the Perspective column corresponds to p’Cheap_loan_to_banks’ = 1/1 = 1, the second to p’Cheap_loan_to_enterprises’ = 1/2, and so on.

Page 51: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

511

ian-Interventionalist" class into two subclasses, thus revealing another dimension of the conceptual vol-ume . 3.2 Conclusions Our model suggests that conceptualization is like ex-ploration of abstract entities in space, comparable to observing artifacts in physical space. Knowledge is constructed by means of integrating subsequent ex-ploratory movements in the memory and considering similarity relations that constitute conceptual disposi-tions. At a given moment of the assumedly continu-ous exploratory process, the observer may infer the following. 3.2.1 Prototypes, categories and classes Prototypes, categories, and classes may be founded on Gestalt-type perceptual interpretation of the visu-alization, but may also be additionally supported by algorithmic division, e.g., by means of prototype se-lection and subsequent Voronyi tessellation. These, in turn, may provide the basis of identification and nam-ing categorizations, assuming that names are not

given a priori. Our point is that, provided spatially constituted similarities, a range of means can be har-nessed to serve explorative multi-perspective knowl-edge organization, which not only allows conceptu-alization-supporting visualization, but may even be applied for perspective-relative content montage. 3.2.2 Mereological hierarchies Mereological hierarchies, as depicted by Figure 3, that support hierarchical navigation within multidimen-sional content, where each title links to the underly-ing content element, for example as below.

Economical recovery stimulation policies: 1. State Capitalist 2. Neoliberalist 3. Keynesian-Interventionalist 3.1. Keynesian 3.2. Interventionalist 3.2.2 Narratives

Mereological hierarchy can be interpreted as a dispo-sition for sequential narrative. Assuming that the on-todimensions reflect meaningful aspects of the con-

Figure 3. The introduction of third ontodimension "Increase state ownership" splits the leftmost cluster into two separate, yet mutually relatively similar clusters ("Keyne" and "Interv"), to be interpreted as a subdivision of a category. The hand-drawn ovals characterize the implied mereological hierarchy of three levels.

Page 52: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

512

tent, the perspective-relative hierarchy forms a mean-ingful context for narrating the concept, for example as below:

There are three kinds of economical recovery stimulation policies: State capitalist, Neoliberal-ist and Keynesian-Interventionalist. While state capitalists [content from database], neoliberal-ists [content from database]. Apart from these there are Keynesian-Interventionalist policies, according to which [fill in content from data-base]. Among the latter, Keynesian policies dif-fer from Interventionalists in that [fill in con-tent from database].

Consequently, knowledge is more than a static view of the relations between the elements. It is the dynamic understanding of the domain that integrates concep-tualization from a range of perpectives, something that accommodates the fact that another observer may end up with a totally different, yet equally justified, conceptualization. The readers are invited to perform their own explorations with an online demonstration accessible on the Internet (http://mt.sh.se/ose). 4.0 Discussion This article has drawn together implications from the discussions of epistemic and ontological pluralism in philosophy for information systems. From this ground, a reconsideration of ontology as a coordinate system serving dynamical conceptualization has been proposed. Further, we have elaborated a spatial model of ontology (ontospace), a system of multiple de-scriptive dimensions, within which entities of an in-formation domain are described in terms of their re-spective coordinates. Here the activity of conceptu-alization can be seen as continuous exploration of perspectival views to the ontospace, based on the psychological and cognitive science models of simi-larity as spatial density. Each perspective to the onto-space constitutes a unique set of criteria for similarity among the elements. As we have shown, each per-spective to the ontospace allows for a particular cog-nitive-perceptual construction of 1) concepts as re-gions of the ontospace; 2) their interpretation as pro-totypes, categories, and tessellations; 3) hierarchies of mutually embedded clusters, and even their transla-tion to 4) sequential narrations.

As a preliminary proof of concept, we have drafted an interactive tool, a perspective-relative means of knowledge organization that supports the partici-

pant’s mental conceptualization, linkings and narra-tives to grasp the relations of the artefacts displayed. We apply this tool to an imaginary domain in order to demonstrate how epistemic involvement contributes to conceptualization. Each perspective explored amounts to a particular way of classification, categori-zation, hierarchization and narrating the topical con-tent, as we claim,

While this study makes the shortcut of assuming an ontospace as given, in practice, the issue of annotation remains open for multiple solutions. Obviously, the method of annotation is crucially related to the amount of labour and cost of multi-perspective knowledge organization—and thereby its viability in general. A path already suggested is that the ontolo-gies result from collaborative annotation or ‘tagging,’ assuming that the practice is immersed in the context of a social or collaborative practice. In some cases, ‘tagging’ can take place implicitly by means of behav-ioral tracking of measurements. Another possibility is that the annotations are collected by means of auto-mated text analysis, although this would also mean in-heriting a range of semantic-related issues therewith.

Our version of perspectivism is compatible with pragmatism and historicism, but goes further by pro-viding a formal model for expressing different points of view. We believe that our approach provides not only an instrumental formalization for information re-trieval based on an empirical attitude towards classifi-cation of documents. It goes even beyond retrieval to what can be considered conceptual articulation of in-formation. In this sense, this can be said to amount to being a medium and to involve mediation. Applica-tions of multi-perspective knowledge organization in interactive media can be envisioned, including web-sites presenting, say, compilations of daily news flow, whose hierarchical disposition (or ‘navigation’) is re-generated for each individual perspective chosen, in-teractive films that adapt their narrative in realtime or on-demand books, of which the disposition is deter-mined by the perspective chosen by the buyer before it is printed out. References Andersen, Hanne. 2002. The development of scientific

taxonomies. In Magnani, Lorenzo, and Nersessian, Nancy J., eds., Model-based reasoning: science, tech-nology, values. New York: Kluwer Academic, pp. 95-112.

Baghramian, Maria. 2004. Relativism. London: Rout-ledge.

Page 53: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

513

Cohn, Anthony G. et al. 1997. Qualitative spatial rep-resentation and reasoning with the region connec-tion calculus. Geoinformatica 1: 275-316.

Dastani, Mehdi. 2002. The role of visual perception in data visualization. Journal of visual languages and computing 13: 601-22.

Feyerabend, Paul K. 1981. Problems of empiricism: philosophical papers, volume 2. Cambridge, UK: Cambridge University Press.

Giere, Ronald N. 2006. Scientific perspectivism. Chi-cago: University of Chicago Press.

Goldberg, Ken et al. 2001. Eigentaste: A constant time collaborative filtering algorithm. Information retrieval 4: 133-51.

Goldstone, Robert L. 1994. Similarity, interactive ac-tivation, and mapping. Journal of experimental psy-chology: learning, memory, and cognition 20: 3-28.

Gruber, Thomas. 2007. Ontology of folksonomy: A mash-up of apples and oranges. International journal on Semantic Web and information systems 3n1: 1-11.

Gruber, Thomas. 1993. A translation approach to portable ontology specifications. Knowledge acqui-sition 5: 199-220.

Gärdenfors, Peter. 2000. Conceptual spaces: the geome-try of thought. Cambridge, Mass.: MIT Press.

Gärdenfors, Peter. 2004. Conceptual spaces as a fra-mework for knowledge representations. Mind and matter 2n2: 9-27.

Gärdenfors, Peter, and Williams, Mary-Anne. 2001. Reasoning about categories in conceptual spaces. In Nebel, Bernhard, ed., Proceedings of the Seven-teenth International Joint Conference on Artificial Intelligence, IJCAI 2001, Seattle, Washington, USA, August 4-10, 2001. San Francisco: Morgan Kauf-man, pp. 385-92.

Hanson, Norwood Russell. 1958. Patterns of discovery: an inquiry into the conceptual foundations of science. Cambridge, UK: Cambridge University Press.

Hautamäki, Antti. 1986. Points of view and their lo-gical analysis. Acta Philosophica Fennica 41.

Hjørland, Birger. 2003. Fundamentals of knowledge organization. Knowledge organization 30: 87–111.

Hjørland, Birger. 2008. What is knowledge organiza-tion (KO)? Knowledge organization 35: 86–101.

Hjørland, Birger. 2009. Concept theory. Journal of the American Society for Information Science and Tech-nology 60: 1519–36.

Hjørland, Birger, and Pedersen, Karsten N. 2005. A substantive theory of classification for information retrieval. Journal of documentation 61: 582 - 97.

Jessop, Bob. 2002. The future of the capitalist state. Cambridge, UK: Polity.

Johnson, William E. 1964. Logic. New York: Dover. Just, Marcel A., and Carpenter, Patricia A. 1992. A

capacity theory of comprehension: individual dif-ferences in working memory. Psychological review 99: 122-49.

Kaipainen, Mauri et al. 2008. Soft ontologies, spatial representations and multi-perspective explorability. Expert systems 25: 474-83.

Kelly, George. 1955. The psychology of personal con-structs. New York: Norton.

Kohonen, Teuvo. 1982. Self-organized formation of topologically correct feature maps. Biological cy-bernetics 43: 59-69.

Krumhansl, Carol L. 1978. Concerning the applicabil-ity of geometric models to similarity data: the in-terrelationship between similarity and spatial den-sity. Psychological review 85: 445-63.

Kruskal, Joseph B, and Wish, Myron. 1978. Multidi-mensional scaling. Thousand Oaks, Calif.: Sage Publications.

Kuhn, Thomas S. 1962. The structure of scientific revo-lutions. Chicago: University of Chicago Press.

Laniado, David, Eynard, Davide, and Colombetti, Marco. 2007. Using WordNet to turn a folksonomy into a hierarchy of concepts. In Semeraro, Giovanni et al. eds., Proceedings of the 4th Italian Semantic Web Workshop, Dipartimento di Informatica - Uni-versita’ degli Studi di Bari - Italy, 18-20 December, 2007. CEUR Workshop Proceedings, Vol. 314. Ba-ri, Italy: Dip. di Informatica, Università di Ba-ri, Bari, Italy, pp. 192-201.

Lyons, John. 1977. Semantics 1. Cambridge: Cam-bridge University Press.

Magnus, Bernd, and Higgins, Kathleen M., eds. 1996. The Cambridge companion to Nietzsche. Cam-bridge, UK: Cambridge University Press.

Macgregor, George, and McCulloch, Emma. 2006. Collaborative tagging as a knowledge organisation and resource discovery tool. Library review 55: 291-300.

Mathes, Adam. 2004. “Folksonomies: cooperative clas-sification and communication through shared meta-data.” http://www.adammathes.com/academic/ computer-mediated-communication/folksonomies. html.

McLuhan, Marshall, and Fiore, Quentin. 1967. The medium is the message. New York: Bantam Books.

Miller, George A. 1956. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological review 63: 81-97.

Page 54: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 M. Kaipainen and A. Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization

514

Murray, Janet. 1997. Hamlet on the holodeck: the fu-ture of narrative in cyberspace. Cambridge, Mass.: MIT Press.

Niglas, Katrin, Kaipainen, Mauri, and Kippar, Jaagup. 2008. Multi-perspective exploration as a tool for mixed methods research. In Bergman, Manfred Max, ed., Advances in mixed methods research: theo-ries and applications. Los Angeles: Sage Publica-tions, pp. 172-92.

Quine, Willard V. O. 1980. Two dogmas of empiri-cism. In Morick, Harold et al., eds., Challenges to empiricism. London: Methuen, pp. 46-70.

Quintarelli, Emanuele. 2005. “Folksonomies: power to the people.” Paper presented at the ISKO Italy-UniMIB meeting in Milan, Italy, June 24, 2005. Available: http://www.iskoi.org/doc/folksonomies. htm.

Rosch, Eleanor. 1975. Cognitive representations of semantic categories. Journal of experimental psy-chology: General 104: 192-233.

Rosch, Eleanor H. 1973. Natural categories. Cogni-tive psychology 4: 328-50.

Rosch, Eleanor. 1983. Prototype classification and logical classification: the two systems. In Schol-nick, Ellin K., ed., New trends in conceptual repre-sentation: challenges to Piaget’s theory? Hillsdale, N.J.: Lawrence Erlbaum Associates, pp. 73-86.

Schwarz, Norbert. 2007. Attitude construction: eva-luation in context. Social cognition 25: 638-56.

Smith, Steven M., and Vela, Edward. 2000. Environ-mental context-dependent memory: a review and meta-analysis. Psychonomic bulletin & review 8: 203-20.

Takeharu, Eda et al. 2009. The effectiveness of latent semantic analysis for building up a bottom-up tax-onomy from folksonomy tags. World Wide Web 12: 421-40.

Tversky, Barbara. 1991. Spatial mental models. In Bower, Gordon H., ed., The psychology of learning and motivation, Vol. 27. Burlington, Mass.: Else-vier, pp. 109-46.

Voronoi, Georges. 1907. Nouvelles applications des paramètres continus à la théorie des formes qua-dratiques. Journal für die reine und angewandte Mathematik 133: 97-178.

Zadeh, Lofti A. 1965. Fuzzy sets. Information and control 8: 338-53.

Zhang, Lei, Wu, Xian, and Yu, Yong. 2006. Emergent semantics from folksonomies: a quantitative study. Lecture notes in computer science 4090: 168-86.

Page 55: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

515

A Conceptual Framework to Study Folksonomic Interaction

Heejin Park

School of Library and Information Services, Sung Kyun Kwan University, Myongryun-Dong 3-53, Jongro-Gu, Seoul, Republic of Korea

<[email protected]>

Heejin Park has a PhD from the University of British Columbia’s School of Library, Archival and In-formation Studies and a Master’s from Kent State University. Her research interests lie in various fields regarding representation, organization of information, and human behaviors, but especially in the Web environment and digital libraries. Her current research explores users’ tagging behaviors engaged in in-teraction with folksonomy.

Park, Heejin. A Conceptual Framework to Study Folksonomic Interaction. Knowledge Organization, 38(6), 515-529. 51 references. ABSTRACT: This paper proposes a conceptual framework to recast a folksonomy as a Web classifica-tion and to use this to explore the ways in which people work with it in assessing, sharing, and navigating Web resources. The au-thor uses information scent and foraging theory as a context to discuss how folksonomy is constructed through interactions among users, a folksonomic system, and a given domain that consists of a group of users who share the same interest or goals. The discussion centers on two dimensions of folksonomies: (1) folksonomy as a Web classification which puts like information together in a Web context; and (2) folksonomy as information scent which helps users to find related resources and users, and obtain desired information. This paper aims to integrate these two dimensions with a conceptual framework that addresses the structure of a folksonomy shaped by users’ interactions. A proposed framework consists of three components of users’ interac-tions with a folksonomy: (a) tagging – cognitive categorization of Web accessible resources by an individual user; (b) navigation – exploration and discovery of Web accessible resources in the folksonomic system; and (c) knowledge sharing – representation and communication of knowledge within a domain. This understanding will help us motivate possible future directions of re-search in folksonomy. This initial framework will frame a number of research questions and help lay the groundwork for future empirical research which focuses on qualitative analysis of a folksonomy and users’ tagging behaviors.

Received 20 September 2008; Revised 18 May 2011; Accepted 1 August 2011 1.0 Introduction In recent years, the folksonomy has been developed as a new concept of user-created classification and com-munication through shared metadata in the Web envi-ronment (Guy and Tonkin 2006). The term folkso-nomy was first coined by Vander Wal to denote a “practice of collaborative categorization using freely chosen keywords by a group of people cooperating spontaneously” (Quintarelli 2005, 5). Folksonomies feature prominently on a number of well-known Web-based information systems such as Amazon.com. Typically, such sites allow users to publicly tag and share their resources, so that they can not only classify

information for themselves, but can also browse the information classified by others (Golder and Huber-man 2006).

A folksonomy encourages users to organize infor-mation in their own way and involves users actively in the organizational system (Mathes 2004). In this sense, a folksonomy has the potential to serve as a Web classification that allows users to interact within a system and to participate in the development of a clas-sification system on the Web. The interest in folkso-nomies arises from this relation between folksonomies and Web classifications, i.e., how a folksonomy differs from other types of Web classification and how users contribute to the development of a Web classification

Page 56: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

516

system. Despite increasing interests of a folksonomy in practice as well as in research, little has been done to build a solid conceptual framework to understand how people classify Web resources using a folkso-nomy. This paper attempts to articulate a conceptual framework that will help us better understand the structure of a folksonomy shaped by users’ interac-tions.

To this end, this paper defines two dimensions of folksonomy: a folksonomy as a Web classification, and a folksonomy as information scent. The first dimen-sion concerns how people use a folksonomy as a Web classification. The other dimension concerns how a folksonomy is structured through users’ interactions, using information foraging and scent theory. These two dimensions will provide us a better understanding of tags as categories which put like things together, and as information scent which leads users to the in-formation they seek and to interact with others in the context of a folksonomy. This paper aims to integrate these two dimensions of a folksonomy with a concep-tual framework that addresses the structure of a folk-sonomy shaped by users’ interactions. A conceptual framework consists of three different users’ interac-tions with a folksonomy from users’ points of view: tagging, navigation, and knowledge sharing. This uni-fied conceptual framework might provide insight into the ways in which a folksonomy can reflect an interac-tion among users, a domain, and a classification struc-ture. The proposed framework can be used to guide empirical research on users’ interactions with a folk-sonomy.

I must first define the terms used in our discussion prior to addressing the dimensions of folksonomy. In this paper, the term “Web classification” is used to de-note putting like Web-accessible resources, such as a photo, a blog, a Web site, or an article accessible on the Web, together. The concept of Web classification is generally used to describe Web-based information systems incorporating categories. For an example of Web classification, this paper widely refers to two kinds in practice: (1) implications of existing classifi-cation systems such as the Dewey Decimal Classifica-tion (DDC) and Library of Congress Classification (LCC) (e.g., DDC in NetFirst, CyberDewey, the Renardus Project); and (2) custom-built development of classifications (e.g., Wikipedia Contents Category, Open Directory Project). Here, this paper adds a folksonomy as a new kind of Web classification which is created by users with an emergent categorical struc-ture. Section 3 describes the nature and structure of a folksonomy as a Web classification. To demonstrate

how a folksonomy works as a Web classification, I will outline the classification theory that is shifted from a long standing classification approaches.

In order to better frame a users’ interaction with a folksonomy, this paper also adapts an information for-aging approach. Information foraging theory denotes adaptive information seeking behaviors of users within the human information interaction environ-ment. When searching, people utilize a foraging me-chanism evolved to help our animal ancestors find food (Chi et al 2001; Jacoby 2005). Information es-sentially has a scent and users rely heavily on informa-tion scent in order to optimize their search outcome; just as animals rely on local smell cues to make judg-ments about where to go next in pursing some preys (Pirolli and Card 1995). This paper refers “informa-tion scent” to a user’s perception of the value and cost of accessing a piece of information based on the per-ceptual cues available to him or her. Section 4 outlines information foraging and scent theory to discuss how to identify users’ interactions with a folksonomy.

With regard to the structure of a folksonomy, a “tag” refers to a keyword which people assign to Web resources with a purpose to share, discover, and re-cover them. The primary goals of this paper is to gain a better understanding of tag as a category that groups like things together as well as information scent, that is a primary navigation tool for finding relevant re-sources and people. For this purpose, therefore, these two terms, “tag” and “category,” will be used inter-changeably through the paper, especially concerning a folksonomy as a Web classification. The next section provides details on the approach to conceptualization of a folksonomy, tags, and the act of tagging. 2.0 Folksonomy, tagging, and tags Because the folksonomy is implemented through tags, the term ‘tagging system’ is often used interchangea-bly. Users describe and organize pieces of information such as Web documents or photos with terms from their own vocabulary known as “tags” (Mathes 2004; Weinberger 2006). However, a folksonomy is clearly distinguished from tagging. Tagging is the process by which users assign one or more keywords (tags) to Web resources with a purpose to share, discover, and recover them, whereas a folksonomy is the grassroots classification that emerges from tagging.

Two generally recognized aspects of tagging make folksonomic mechanisms highly popular and useful in recent Web-based information systems. First, tagging is not restricted to authors or domain experts; anyone

Page 57: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

517

can produce tags. As a result, a folksonomy allows for diverse viewpoints of users who might tag the re-source differently from the author or each other (Weinberger 2006). Second, tagging is social in that users are encouraged to publicly tag and share their tags and resources. Social tagging allows groups to form around similarities of interests and points of view. As soon as users assign a tag to a Web resource, they can see the cluster of related users and tags that are associated with the same resource. This instant feedback leads users to find and network with other users who are interested in the same topic.

Tagging collectively produces a larger classification system, a folksonomy. A folksonomy consists of a flat space defined by the set of tags with which a group of users tagged information resources. A folk-sonomy can be displayed through a tag cloud, a col-lection of the popular tags which the folksonomic system provides (see Figure 1) (Golder and Huber-man 2006). This is an informal classification system. Categories are emerged in an ad hoc fashion from ag-gregated tags with contributions by any user who has access to information in the system over time (Shirky 2005; Weinberger 2006). Users classify Web resources using their chosen tags; in other words, tags act as cat- egories grouping similar things together on the basis of similarity which they think useful at a given mo-ment. Therefore, a folksonomy is distinct from for-mal classifications in which categories are defined on-ly by properties that all members share.

There are a number of studies to evaluate these characteristics of a folksonomy. Previous studies have contributed to a general understanding of the struc-ture of a folksonomy and tag usage. These studies mostly focused on the potential of a folksonomy and

user-created tags for indexing and searching mecha-nisms. Various techniques have been used to bridge the gap between the existing controlled vocabularies (e.g.,› library classification and subject headings) and user-created tags in order to improve search effective-ness (i.e., Kipp 2007; Lin et al. 2006; Spiteri 2006; Trant 2006; Voss 2006; Xu et al. 2006). Such studies are often limited to consider a folksonomy and tags within a framework of directed searching or informa-tion retrieval. As Vander Wal (2007) points out, “tag-ging seemed to be working for finding things more from exploration and serendipity than through search-ing and intent.” Some work has been done within a framework of browsing (Lerman and Jones 2007; Mil-len et al. 2007; Yun and Boqin 2008). Little is known, however, about the ways in which people interact with a folksonomy. This paper suggests a different angle to conceptualizing the structure of a folksonomy and the act of tagging, from an information foraging ap-proach.

Figure 2 below illustrates our approach to concep-tualization of a folksonomy and the act of tagging. The left side of Figure 2 indicates the scope of this pa-per and our interests in tags. In this paper, the concern is solely with the use of tags and a folksonomy in browsing or information foraging rather than directed searching or looking for a known target. For the pur-pose, tags are understood as categories and as infor-mation scent with respect to information foraging. This view is contrasting with the previous approaches that see tags as indexing and searching terms to im-prove search effectiveness in the framework of search-ing (the right side of Figure 2).

This paper explores how people use tags in orga-nizing Web resources, specifically how they group

Figure 1. Tag Cloud

Page 58: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

518

similar resources together through tagging. The ap-proach to tags as categories proposes an alternative view of folksonomy which extends from traditional known-item retrieval capability to more exploratory and foraging capability. That is, rather than searching for resources by keywords (tags), users can gather and forage information through the tags assigned to collocate related information together.

This paper views tagging as being related to brows-ing and information foraging behavior because “peo-ple are constantly gathering, monitoring, and screen-ing information around them as they go through daily life” (Rice et al. 2001, 8), in contrast to directed searching. In general, browsing is distinct from di-rected searching in the characteristics of the users’ goals or tasks or their information needs. Browsing refers to the task of looking to see what is available, and searching refers to the task of looking for a known target (Furnas and Jul 1997). In the Web envi-ronment, browsing and searching are often referred to the terms, “navigation” and “querying” respec-tively. Olston and Chi (2001, 1) identify these two in

terms of behaviors for locating information in the Web environment:

Browsing is the process of viewing pages one at a time and navigating between them sequentially using hyperlinks. Searching is the process of en-tering a search query (usually a list of keywords) into a search engine, which produces a ranked list of links to pages that match the query.

Similarly, Marchionini (1995) distinguishes between browsing and searching as a general strategy. He de-fines browsing as being heuristic, which is dependent on recognizing relevant information. On the con-trary, searching is analytic and depends on careful planning, recall of query terms, iterative query refor-mulations, and examination of results. He remarks that browsing is more dependent on interaction be-tween the information seeker and the system because people continually guide themselves using browsing environmental cues. In this sense, browsing might not be an efficient strategy of locating specific infor-

Figure 2. Approach to study tags

Page 59: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

519

mation based on the criteria of precision and recall. It is more important that an information browser or forager recognize relevant information using classifi-cation and information scent than finding specific in-formation quickly and precisely. Therefore, the pri-mary concerns with tags are involved with their func-tions as categories and information scent to employ browsing and information foraging. The focus of this paper is on the conceptual issues that shape our un-derstanding of tags as categories which put like things together, and as information scent, which helps users to find relevant information they seek. 3.0 Folksonomy as Web classification 3.1 Classificatory approach to folksonomy In library-and-information science (LIS), classification is widely referred to as “putting together of like things” (Hjørland 2003, 103). In a narrow sense, this meaning of classification can be separated into two concepts: the process, namely classifying; and the product, a classification system (Kwasnik 1999). As a process, classification refers to the method of organiz-ing information, which bring like information to-gether on the basis of what they have in common. On the other hand, as a system, classification refers to a representational tool used to organize a collection of information resources. These two aspects of classifica-tion are separable but closely intertwined because “a full appreciation of the implication of classification systems for organizing information resources requires a basic understanding of the classification process it-self ” (Jacob 2004, 5). The definition of classification as the putting like things together is a broad defini-tion. This paper adapts a broad definition of classifica-tion because the semantic ambiguities presented in folksonomies are not described accurately using the restricted bibliographic concept of classification, as mutually exclusive classes. A folksonomy requires a broad approach to the definition of classification ba-sed on the sharing of some similarity rather than of essential properties.

A thorough understanding of classification is based on the study of categorization (Iyer 1995). According to Iyer (1995), the classification process is often used with “categorization” in the literature because catego-rization is a fundamental human thought process and is the most natural way we know to organize informa-tion. Our conceptual structures are formed using categorization, and we experience and understand ideas and objects by grouping them in useful ways

(Lakoff 1987). This process is reflected in the design of classification systems such that “the individual, idiosyncratic categories that each person forms are ab-stracted into more formal and general categories that can be logically perceived and used by anyone” (Iyer 1995, 88).

Theories of categorization have resulted in two distinct paradigms: the classical and the probabilistic (Iyer 1995; Jacob 1991, 2004; Lakoff 1987). The cen-tral assumption underlying the classical theories is that categories are defined only by a set of properties that all members share. Thus classical theory rests on three assumptions:

– The definition of a category is the union of the essential features that identify the mem-berships of that category;

– The defining features for a category are both individually necessary and jointly sufficient to define the category;

– Categories are nested, so that the subordinate category possesses all the features of the su-perordinate category (Iyer 1995; Jacob 2004; Lakoff 1987).

The classic theory has dominated our view of classifi-cation, informing and directing the systematic assign-ment of entities to classes according to an established set of principles. It leads to a formal classification sys-tem, or a hierarchical structure of fixed classes which reflect logical genus-species relationships. Classes should be mutually exclusive and totally exhaustive.

On the other hand, probabilistic approaches argue that the classical view of categorization can not ac-count for the findings from empirical studies. Field-work in cognitive psychology and anthropology shows that category members need not share a com-mon property (Lakoff 1987). Members can be similar to one another in different ways. According to Barsa-lou (1983), categories are defined solely in terms of how their members fulfill some desired goal or plan. For example, there are categories like “things to sell at a garage sale” and “things to take on a camping trip” which are spontaneously generated categories that group things in a goal-directed way (Murphy 2002). These goal-directed categories are ad hoc, and very lit-tle of the category structure is explained by necessary and sufficient conditions by which category members are defined. The central argument of probabilistic ap-proaches is that human categorization is based on the nature of human bodies and our experience (Lakoff 1987).

Page 60: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

520

Probabilistic approaches emphasize that categori-zation is not merely a conceptual structure identify-ing the world, but a cognitive process closely associ-ated with the individual perception. In particular, pro-totype effects, the conceptualization of a category by holding certain examples as ideals, are superficial (La-koff 1987). Prototypes are influenced by culture and the environment, so people who hold different proto-types tend to think of categories differently and reach different conclusions. What constitutes a prototype category is also a matter of perspective, and thus may change as an individual’s perception changes over time (Iyer 1995).

Looking at classification as this dichotomy of the classical and the probabilistic offers us a way to un-derstand how people form categories and structure a folksonomy. In particular, probabilistic approaches provide us new insights on folksonomies in which idiosyncratic and communal categories coexist. Folk-sonomies allow users to classify Web resources in any sense which represents the way they perceive them with their chosen tags. Here, tags act as categories in terms of the structure of a folksonomy by bringing like Web resources together. Classifying is dynamic and creative, therefore idiosyncratic categories reflect the way an individual classifies things at the moment, to express his or her immediate information needs (Iyer 1995). In this sense, the tags that remind users of their projects and tasks (e.g., ‘LIS2013’ ‘to read’), and those which could be only meaningful to the user including affective reactions (e.g., ‘interesting’ ‘use-ful’) are understood as idiosyncratic categories.

On the other hand, communal categories emerge in a collective pattern which seems to form from a nascent consensus (Golder and Huberman 2006). Communal categories are generated in a social con-text where users interact with each other. When users share their categories and contents, they tend to use the suggested popular categories, or imitate others’ category formation (Campbell 2006; Golder and Hu-berman 2006; Shirky 2005). In most folksonomic sys-tems, the user interface provides the immediate feed-back from the community of users in various form of the aggregated tag use of all users. For example, popular tags for a given URL can influence the selec-tion of tags by providing hints about how others have tagged the resource. As the empirical work of Golder and Huberman (2006) reveals, the communal cate-gory demonstrates important implications for the stability of the folksonomic structure using its social nature. The stable, consensual choices that emerge can be used on a large scale to describe how users see

their relationship to the resource (Golder and Hu-berman 2006).

Unlike a formal classification system, a folksonomy generally does not provide hierarchical relations be-tween categories. Instead, a folksonomy is a flat space in which related categories are automatically generated based on the similarity among tags given by their co-occurrence and the collaborative recordings of tags given to the same resource (Munk and Mork 2007). Categories are not rigidly bound but frequently over-lapping; membership in one category does not pro-hibit membership in any other category because cate-gories are not constrained by a requirement for mu-tual exclusivity. This conception is also well mani-fested by probabilistic approaches, which divides the world into groups of entities whose members are “in some way similar to each other” (18). Therefore, a folksonomy largely relies on an individual’s ability to form the categories. Following Jacob’s (2004, 519) conception of categorization, categories are appar-ently unstable, reflecting “a function of immediate context, personal goals, or past experience” of the in-dividual. This plasticity may prohibit a folksonomy from being a persistent information structure. 3.2 Conceptualization of folksonomy as a reflective

and interactive Web classification The new concept of folksonomy has begun to extend a notion of classification beyond traditional clas- sification to explain the semantic ambiguities pre-sented in folksonomies. This section draws upon the potential of folksonomy to serve as a Web classifica-tion that allows users to participate in the develop-ment of a classification system and interact within a system.

A folksonomy has advantages as a reflective Web classification system. First, a folksonomy can directly reflect the vocabulary of users in the classification system (Mathes 2004). The strength of folksonomy is the ability of any given user to organize the world as he or she sees it (Guy and Tonkin 2006). Unlike the traditional classification systems undertaken by high-ly trained information professionals, using a scheme that may be biased (Olson 1998), all users can par-ticipate and contribute the category formation with their own tags in the structure of a folksonomy. There- fore, a folksonomy can reflect the users’ conceptual model more accurately (Macgregor and McCulloch 2006). A folksonomy allows the variety of category definitions and the corresponding variability of cate-gory memberships as a reflection of immediate con-

Page 61: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

521

text. Because idiosyncratic views can co-exist and thrive in the form of idiosyncratic categories in the folksonomy, a folksonomy can discover the variety of users’ needs and views without a singular or authori-tative cultural, social, or political bias.

More importantly, a folksonomy provides useful social network as users share tags and resources. There exists a network among users, Web resources, and tags in the folksonomy (Cattuto 2006; Van Damme et al. 2007). A user creates the association between tags and resources by assigning tags to that resource; each tag serves as a link to additional re-sources tagged the same way by others. As a result, users are indirectly linked with others by sharing the same tags and/or resources. Through this complex network among shared tags, resources, and users, a folksonomy offers the opportunity for users to more easily discover others who have similar interests and to learn of their resources. This implies the potential of a folksonomy to represent a community that shares interests. Many folksonomic systems, includ-ing Biblsonomy, Flickr, and Connotea, provide the ability to join one or more groups, where users can engage in networking more actively. Through these social network applications, the users manage their references collaboratively, and agree to use tags which are appropriate for a given subject. In so doing, users can adapt to the norm of a domain or contribute to develop a shared semantic. Furthermore, this social aspect of a folksonomic system fosters the building of communal categories which reflect knowledge of a domain and stimulate knowledge sharing.

Additionally, through electronic methods such as the use of co-occurrence of categories and hyperlinks, a folksonomy supports findability of related resources when one browses resources classified by others. This is unlike semantic relationships which traditional clas-sification systems employ, but instead, users are able to browse related tags and users through the folkso-nomic system. Users may also find other tags in the system with a close correspondence to the currently suggested tags. In particular, a tag cloud which dis-plays popular tags serves as an effective navigational tool by providing a global contextual view of tags as-signed to resources in system (Kipp 2007).

These potential benefits of folksonomies present an overall approach to construct or evaluate a Web classi-fication which accounts for the interaction among us-ers, a system, and a given domain. A folksonomy al-lows users to participate and contribute their own per-sonal tags to generate a folksonomy; thus, a folkso-nomy can more accurately reflect users’ conceptual

models of the information around them. In addition, a folksonomy fosters the formation of a domain con-sisting of a group of users with the same interests through shared tags and resources. It leads to a shared classification structure which reflects the given goals, purposes and values of a particular domain. Lastly, a folksonomy supports users’ browsing and serendipi-tous discovery of related information through the in-terlinked system of tags. The potential of folksonomy as a reflective and interactive Web classification has been discussed. In order to fully exploit this concep-tion and support it, further empirical work is neces-sary to investigate how the folksonomy is structured through users’ interaction. This leads us to our review of a folksonomy as information scent, which ad-dresses how to identify users’ interaction with a folk-sonomy. 4.0 Folksonomy as information scent:

Information scent theory to understand folksonomic interaction

In the early 1990s, Pirolli and Card proposed infor-mation foraging theory as an approach to understand-ing human information-gathering and sense-making strategies. They report various studies of human in-teraction with information retrieval and Web systems based on information foraging theory (i.e., Pirolli 1997, 2002; Pirolli and Card 1995, 1999, Pirolli et al. 2005). Using empirical studies, they show that users in a rich information environment constantly weigh the potential information gained against the costs of performing a task necessary to find information. Us-ers construct effective foraging patterns through con-tinuously adapting decision-making and direction to the ever-changing environment.

In particular, when dealing with the complex con-text of the Web, information foraging and scent are understood as a significant factor in information seeking behavior. A few researchers attempt to pro-vide a comprehensive explanation of Web information seeking behavior, integrating related models such as Ellis’s six categories of information-seeking behaviors (Chi et al. 2000; Kalbach 2000; Choo et al. 2000). These studies demonstrate the rationale for creating information finding mechanism taken the informa-tion foraging and scent as a priori to Web information seeking behavior

With respect to information foraging, information scent is used to explain and predict users’ Web infor-mation seeking behaviors. Users assess the utility of an information source in relation to other alternative

Page 62: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

522

sources (Pirolli and Card 1999; Spink and Cole 2006). In the Web navigation context, users follow the strongest scent for their desired information. And, if they somehow lose the scent (often by following a link that doesn’t lead where they think it will), they have to loop back to pick up the scent all over again (Koman 1998). Thereby, information scent plays an important role in guiding users to the information they seek as well as in providing users with an overall sense of the contents of collections.

These previous studies provide strong support for the use of information scent to characterize informa-tion foraging behavior. The general findings tell us that models developed in this theory of information scent can (Pirolli 2002, 2):

– predict where people will navigate or what in-formation resources they will select based on their information need;

– infer what information need they have, given observations of their navigation or informa-tion selections; and

– infer the category structure that people will induce from interaction with an information system.

Information foraging and scent theory presents us with a good understanding of Web information seek-ing behavior in general. In this context, folksonomic interactions are understood by realizing that people constantly weight information scent to optimize their interaction with a folksonomy. The empirical results of information foraging theory also demonstrate that information scent can be measured systematically, and such measurement can generate good predictions of Web interaction. However, this measurement of scent is not likely to help understand users’ interac-tions because behavioral measures such as click-throughs and log analysis merely tell us what works and what does not. Behavioral measurement should be employed with insight into user perceptions of in-formation scent, or the manner in which they assess the environmental cues in judging information sour-ces and navigating through information spaces.

Only few studies have explored information scent from a perceptual approach (Sundar et al. 2007). It is necessary to study users’ perception and awareness of information scent in order to better understand users’ interaction with the Web. In particular, folksonomies provide a relatively new information structure, and folksonomic interactions are little known. It is still questionable how users interact with a folksonomy in

accessing, sharing and navigating Web resources, and how to explore the manner in which information scent facilitates information foraging behavior in a folksonomy. To address these questions, this paper suggests a qualitative approach to folksonomic inter-action that is open to our awareness and perception. 5.0 Conceptual framework to study

folksonomic interaction In order to build a conceptual framework which re-flects interactions among users, a given domain, and a classification system, the preceding sections have ex-amined the dual concepts of a folksonomy as Web clas-sification, and as information scent. Integrating these two, this paper suggests a conceptual framework for an empirical study to explore the structure of folksonomy shaped by users’ interaction with a folksonomy. Figure 3 illustrates an interaction among users, a folksonomic system, and a given domain that consists of a group of users who share common interests or goals. It points to three components of a folksonomic interaction from an end user’s view: (A) tagging - cognitive cate-gorization and representation of a Web resource by an individual user; (B) navigation – exploration and dis-covery of a Web resource in the folksonomic system; and (C) knowledge sharing - representation and com-munication of knowledge within a domain.

When an individual user accesses a Web resource and classifies it by assigning tags or categories, the first interaction, (A) tagging−cognitive categoriza-tion occurs. For example, a user 1 assigns two tags ‘Shirky’ and ‘folksonomy’ to Shirky’s 2005 article “Ontology is overrated: Categories, links, and tags.” Using these tags, he classifies the article in his idio-syncratic categories ‘Shirky’ with other articles he perceives as similar such as articles by the same au-thor and ‘folksonomy’ on the same topic of folkso-nomies. These idiosyncratic categories are only mean-ingful to user 1’s interaction with Shirky’s article. However, once these idiosyncratic categories become public and are shared with others, they become an in-teractive part of the folksonomy.

User 1’s idiosyncratic categories ‘’Shirky’ and ‘folk-sonomy’ aggregate a folksonomy that consists of other users’ idiosyncratic categories for Shirky’s article such as ‘socialtagging,’ ‘toread,’ and ‘folksonomy’. The folksonomic system provides users with various forms of the aggregated tags of all users for that article. The system shows that ‘folksonomy’ is the most popular tag associated with this article through the use of tag cloud. The system also provides users instant feedback

Page 63: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

523

showing ‘Related Users and Tags’ and ‘Suggested Tag’ which are associated with Shirky’s article. These are all related to a communal category that may influence us-ers to add and/or modify their idiosyncratic catego-ries. For example, through the tag cloud or suggested tag user 2 possibly discovers that ‘folksonomy’ is bet-ter than ‘socialtagging’ to represent the topic of this article in order to communicate with other users. And consequently, he may add and/or modify his category ‘socialtagging’ to ‘folksonomy.’ In this context, the category ‘folksonomy’ becomes a communal category which is generated in a context where users interact with each other. Through shared communal catego-ries, a folksonomy supports users’ (B) navigation–exploration and resource discovery.

In addition, while observing others’ categories and sharing resources, a user group or domain which has

the same interests, goals, or tasks may be established. For example, users 1, 2, and 3 can build a specific do-main that is interested in sharing and communicating their knowledge on the topic of folksonomy through the folksonomic system. Here occurs an instance of (C) knowledge sharing- representation and commu-nication in which the folksonomy works as a repre-sentational tool for a given domain. The folksonomy has grown up around a given domain of users who want to share their knowledge, creating a widely agreed upon classification.

Taking information foraging and scent theory as the theoretical framework, Figure 4 depicts how a folksonomy and tags are able to function as informa-tion scent. Through shared tags, folksonomies are able to provide users with a distinct information scent that leads to groups of Web resources in relation to

Figure 3. Folksonomic interaction

Page 64: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

524

the information they are searching for, by grouping related resources and users together. Based on infor-mation foraging and scent theory, users’ awareness of the role of tags as information scent should be fur-ther explored through empirical studies. 6.0 Discussion: Need for conceptual framework

which can be formed by qualitative research on folksonomic interaction

Recently, there has been a considerable increase in the number of studies that explore the use of folkso-nomy, especially focusing on the formulation and dis-tribution of tags (i.e., Fokker, Pouwelse, and Buitine 2006; Golder and Huberman 2006; Guy and Tonkin 2006; Kipp 2007; Kipp and Campbell 2006; Lin et al. 2006; Marlow et al. 2006; Tonkin 2006; Voss 2006). Most of these studies focus on how people formulate tags and folksonomy, and identify, based on a tag ana-lysis, the pattern of tags used in a folksonomy.

This yields a one-sided understanding of a folkso-nomic interaction, drawn from the primarily quanti-tative aspects of a folksonomy including the distribu-tion and pattern of tags. A quantitative approach does not yield any understanding of how users actually as-sign, use, and share tags in structuring a folksonomy.

As Mathes (2004, 17) points out, “examining user behavior through ethnographic observation or inter-view to understand user motivation and cognitive processes in tagging items” is necessary to fully un-derstand a folksonomic interaction.

A qualitative approach allows us to clarify the other side of a folksonomic interaction, in other words, users’ perceptions and motivations. In particu-lar, interviews enable the researcher to identify what factors directly influence the formation of a folkso-nomy, and how the motivation of group communica-tion influences users’ interaction with a folksonomy (Mathes 2004). Trevino (2006) provides an interest-ing analysis of the users’ perceptions of the informa-tion they organized and the implications of Delicious’s structure. She conducted face-to-face interviews with 16 participants asking their browsing activities, his-tory of using Delicious, interactions with others on or about Delicious, general opinions or questions about Delicious and other users, and performed a content analysis of their comments. Her study identi-fied the tensions between using Delicious for the purpose of a personal information archive and for public discovery, as well as those between personal privacy and the social norms of openness among us-ers. The results are an important first step in the

Figure 4. Folksonomy and tags as information scent

Page 65: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

525

analysis of users’ understanding and usage of a folkso-nomic system from a qualitative approach. However, the study focuses more on how people generally un-derstand and use Delicious, rather than their actual ac-tivities of tagging or use of tags. It is important to ex-amine the cognitive and behavioral aspects of folkso-nomy uses. What is the tagging behavior of people who use folksonomies? Why do people choose the

tags they use; what motivations lead them to modify these tags; how do others’ tags and tagging behaviors affect their tag decision? Until we understand more about the users’ tagging behaviors, how they assign, use, and share the tags and their resources, it is difficult to understand fully their folksonomic interactions.

In addition, currently folksonomy studies focus on discussing issues involved with users’ tagging activity

Description of study Folksonomic system Data analysis Data collection

period Fokker, Pouwelse, & Buitine (2006)

Comparative study of Flickr & Wikepedia (the nature of tags)

Flickr Wikepedia

Tag analysis (Ambiguity, synonyms)

Dec. 2005

Golder & Huber-man (2006) Tag usage pattern

Delicious

Tag frequency Regularities in user activity Types of tags Bursts of popularity in bookmarking

June 23-27, 2005

Guy & Tonkin (2006)

Tag usage pattern (Power law) Tag literacy

Delicious Flickr

Tag popularity Tag distribution N.R.

Kipp (2007) Types of tag (non-subject tags)

CiteULike Connnotea Delicious

Tag analysis (Types of tags:Time, task, emotion related tags)

Oct. 20-31, 2006

Kipp & Campbell (2006)

Tag usage pattern Delicious

Frequency of tags # of unique tags # of users with a specific tag for each URL Count of the total # of tags & total # of unique tags for each URL

Jan. 30-31, 2006

Lin et al. (2007)

Nature of tagging

Connotea Flickr Delicious

Connotea: Similarity be-tween tags & MeSH Flickr: category assign-ment for the user tags Delicious: Convergence of tags

N.R.

Marlow et al. (2006) Tag usage pattern

Flickr

Usage correlation Distribution of tags Overlap of tag distribution for random users & con-tracts

N.R.

Sinclair & Cardew-Hall (2008) Users’ perceptions &

usage patterns of inter-face design (Tag cloud)

Folksonomy-like system designed for experiment study

Experiments & survey (Task-based evaluations w/ 89 participants)

Aug. 17, 2008-May 25, 2006

Trevino (2006) Users’ perceptions on the information on Delicious & the implications of the site’s structure

Delicious User interviews (w/ 16 participants) Content analysis

Feb.-March, 2006

Voss (2006) Nature of tagging (Comparison of struc-tural properties among tags thesauri, & DDC)

Delicious Wikipedia

Descriptors per record Records per descriptor Descriptor levels

N.R.

Table 1. Summary of folksonomy studies

Page 66: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

526

and suggesting an agenda for further research. De-spite increasing attention in academic research, little empirical research has been done to build a concep-tual model in order to understand users’ interactions with a folksonomy. In discussing folksonomy re-search, Macgregor and McCulloch (2006) point out the lack of a theoretical framework. They note that “[the] lack of conceptual progress has consequently manifested itself in a lack of testable conceptual mod-els and empirical studies” (Macgregor & McCulloch 2006, 299). This paper, therefore, suggests to investi-gate the structure of folksonomy based on a solid conceptual model from a qualitative approach to bet-ter understand users’ interaction with a folksonomy. 7.0 Conclusion and future directions The twin dimensions of folksonomy as both Web classification and information scent provide useful in-sight into the ways in which folksonomy serves as a Web classification that reflects an interaction among users, a domain, and a classification structure. This paper addresses tags and the act of tagging involved with the structure of folksonomy. This paper claims tagging as being related to information gathering and browsing behavior because people are constantly gathering, monitoring, and screening information when using a folksonomy. Tags usually serve as cate-gories, grouping like resources together. They collo-cate resources within a user’s personal collection, as well as across the entire folksonomic system by showing all resources that are tagged with the same term by any member of the folksonomic system. Shared tags also function as information scent, guid-ing users to the information they seek and helping them to predict which resources will be pursued.

This approach to tags as categories and informa-tion scent can contribute to folksonomy studies, which have most currently focused on the function of tags in information retrieval. There has been concern with the quality or consistency of user-created tags, and the extent to which they will impact effective search and retrieval efficiency. Such approaches mostly limit to investigation of communal categories for knowledge sharing. The proposed model for folk-sonomic interaction, thus, suggests investigating both idiosyncratic and communal categories in order to explore a holistic view of folksonomy. This is espe-cially important with respect to folksonomy struc-ture, addressing how idiosyncratic and communal categories interact with each other.

The conceptual model this paper proposed will re-quire an empirical investigation of users’ interactions with a folksonomy. An understating of how people use and understand a folksonomy in practice has a potential to provide a realistic view of folksonomy as a Web classification, helping to lay the foundation for future research. Future work will involve testing the current conceptual model in various application prac-tices, and developing methods to examine users’ en-gaged experience and reflection. One area of future research is examining the tagging behaviors of users engaged in the folksonomic interactions. It will ad-dress how people are tagging in the practice of orga-nizing Web resources, and how they are interacting with a folksonomy through tagging, identifying their motivation and cognitive processes in tagging Web resources.

Various research questions can arise from this pro-posed framework; for example, what activities are in-volved when people assign tags to Web resources? What are the observable patterns in the tagging proc-ess? Why do people choose tags they use? Do they consider others’ tags and tagging behaviors? If so, how does it influence their tags and tagging behav-iors? Qualitative research is appropriate to investigate the little known phenomena involved with the tag-ging behaviors of users engaged in the folksonomic interaction. Data collection and analysis incorporat-ing interviews and observation will allow us to iden-tify users’ perception and interactions which deter-mine their tagging behaviors and folksonomic inter-action, and their understanding of folksonomy which they work with in organizing Web resources. References Barsalou, Lawrence W. 1983. Ad hoc categories. Me-

mory & cognition 11: 211-27. Campbell, D. Grant. 2006. A phenomenological fra-

mework for the relationship between the semantic web and user-centered tagging systems. In Furner, Jonathan and Tennis, Joseph T., eds. Proceedings 17th Workshop of the American Society for Informa-tion Science and Technology Special Interest Group in Classification Research 17, Austin, Texas Available http://dlist.sir.arizona.edu/1838/.

Cattuto, Ciro. 2006. Semantic dynamics in online social communities. The European physical journal C - par-ticles and fields 46, Supplement 2: 33-37, DOI: 10.1140/epjcd/s2006-03-004-4 http://www.ecagents. org/dllink.php?id=150&type=Document.

Page 67: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

527

Chi, Ed H. et al. 2000. The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site. In Turner, Thea et al., eds., Proceedings of the ACM CHI 2000 Human Factors in Computing Systems Conference April 1-6, 2000, The Hague, Netherlands. New York: ACM, pp. 161-68.

Chi, Ed H. et al. 2001. Using information scent to model user information needs and actions on the Web. In Jacko, Julie A., ed., Proceedings of the ACM CHI 2001 Anyone Anywhere: Human Factors in Computing Systems Seattle, WA, USA — March 31 - April 05, 2001. New York: ACM, pp. 490-97.

Choo, Chun W. et al. 2000. Information seeking on the web: An integrated model of browsing and search-ing. FirstMonday 5n2. Available http://firstmonday. org/issues/issue5_2/choo/index.html.

Fokker, Jenneke. et al. 2006. Tag-based navigation for peer-to-peer wikipedia. Collaborative Web Tagging Workshop at WWW2006, 22 May 2006 Edinburgh UK. Available www.semanticmetadata.net/hosted/ taggingws-www2006-files/9.pdf.

Furnas, George W. and Jul, Susanne. 1997. Navigation in electronic worlds. CHI extended abstracts 1997: 230.

Golder, Scott A. and Huberman, Bernardo A. 2006. Usage patterns of collaborative tagging system. Journal of information science 32: 198-208. Available http://www.sigchi.org/chi97/proceedings/work shop/sj1.htm.

Guy, Marieke and Tonkin, Emma. 2006. Folksonomies: tidying up tags? D-Lib magazine 12n1. Available http://www.dlib.org/dlib/january06/guy/01guy. html.

Hjørland, Birger. 2003. Fundamentals of knowledge organization. Knowledge organization 30: 87-114.

Iyer, Hemalata. 1995. Classificatory structures: con-cepts, relations and representation. Frankfurt am Main: Indeks Verlag.

Jacob, Elin K. 1991. Classification and categorization: drawing the line. In Kwasnik, Barbara H., and Fidel, Raya, eds., Advances in classification research 2: Proceedings of the 2nd ASIS SIG/CR Classifica-tion Workshop: Held at the 54th ASIS Annual Meet-ing Washington, D.C., October 27-31, 1991. Wash-ington, DC: ASIS Monograph Series, Learned In-formation, pp. 67-83.

Jacob, Elin. 2004. Classification and categorization: a difference that makes a difference. Library trends 52n3: 515-40.

Jacoby, JoAnn. 2005. Optimal Foraging. In Fisher, Karen E., Erdelez, Sanda, and McKechnie, Lynne

E. F., eds., Theories of information behavior. Med-ford, NJ: Information Today, pp. 259-64.

Kalbach, James. 2000. Designing for information fo-ragers: a behavioral model for information seeking on the World Wide Web. Internetworking 3n3. Available http://www.internettg.org/dec00/article_ information_foragers.html.

Kipp, Margaret E.I. 2007. @toread and cool: Tagging for time, task and emotion. In Information Archi-tecture Summit 2007, Las Vegas, Nevada, 22-26 March 2007. Available http://eprints.rclis.org/ archive/00011414/.

Kipp, Margaret E.I., and Campbell, D. Grant. 2006. “Patterns and inconsistencies in collaborative tag-ging systems: an examination of tagging practices.” Paper presented at the Annual Meeting of the American Society for Information Science and Technology, Austin, TX, November 3-8, 2006. Available: dlist.sir.arizona.edu/1704/.

Koman, Richard 1998. The scent of information: hel-ping users find their way by making your site “smelly”. Webreview 1998 Issues, May 15. Avail-able http://www.ddj.com/architect/184413077.

Kwasnik, Babara H. 1999. The role of classification in knowledge representation and discovery. Library trends 48n1: 22-47.

Lakoff, George. 1987. Women, fire and dangerous things: what categories reveal about the mind. Chi-cago: University Of Chicago Press.

Lerman, Kristina and Jones, Laurie A. 2007. Social browsing on flickr. In International Conference on Weblogs and Social Media. Available http://arxiv. org/abs/cs/0612047.

Lin, Xia et al. 2006. Exploring characteristics of social classification. In Furner, Jonathan, and Tennis, Jo-seph T., eds., Advances in classification research, Vol. 17: Proceedings of the 17th ASIS&T SIG/CR Clas-sification Research Workshop (Austin, TX, Novem-ber 4, 2006). Available: http://hdl.handle.net/101 50/106128.

Macgregor, George and McCulloch, Emma. 2006. Col-laborative tagging as a knowledge organization and resource discovery tool. Library review 55: 291-300.

Marchionni, Gary. 1995. Information seeking in elec-tronic environments. Cambridge Series on Human-Computer Interaction 9. New York: Cambridge University Press.

Marlow, Cameron et al. 2006. HT06, tagging paper, taxonomy, Flickr, academic article, to read. Proceed-ing of the 17th Conference on Hypertext & hyperme-dia. Available http://portal.acm.org/citation.cfm?id =1149949.

Page 68: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

528

Mathes, Adam. 2004. Folksonomies: cooperative classi-fication and communication through shared meta-data. Available http://www.adammathes.com/aca demic/computer-mediated_communication/folks onomies.html.

Millen, David et al. 2007. Social bookmarking and ex-ploratory search. In Bannon, Liam J., ed., Proceed-ings of the 2007 Tenth European Conference on Computer-Supported Cooperative Work 24-28 Sep-tember 2007, Limerick, Ireland. London: Springer-Verlag, pp. 21-40. Available: http://www.ecscw.org/ 2007/02%20paper%20108%20Millen%20et%20al. pdf.

Munk, Timme Bisgaard and Mørk, Kristian. 2007. Folksonomy, the power law & the significance of the least effort. Knowledge organization 34: 16-33.

Murphy, Gregory L. 2002. The big book of concepts. Cambridge, MA: MIT Press.

Olson, Hope A. 1998. Mapping beyond Dewey’s boundaries: constructing classificatory space for marginalized knowledge domain. Library trends 47n2: 233-54.

Olston, Chris and Chi, Ed Huai-hsin. 2001. Scent-Trails: integrating browsing and searching on the Web. Available http://www-users.cs.umn.edu/~ echi/papers/scenttrails/scenttrails-tochi.pdf.

Pirolli, Peter. 1997. Computational models of infor-mation scent-following in a very large browsable text collection. In Pemberton, Steven, ed., Human factors in computing systems : Chi '97 Conference proceedings: Looking to the future 22-27 March, 1997, Atlanta, Georgia. New York: ACM Press, pp. 3-10.

Pirolli, Peter 2002. Theory of information scent. In Jacko, J. and Stephanidis, C. eds., 10th Interna-tional Conference on Human Computer Interaction 22-27 June 2003 Crete Greece. Available http:// www2.parc.com/istl/groups/uir/publications/items/ UIR-2002-16-Pirolli-InfoScent.doc.

Pirolli, Peter and Card, Stuart K. 1995. Information foraging in information access environments. Pro-ceedings of the Conference on Human Factors in Computing, CHI’95 Denver, CO: ACM Press. Available http://www.acm.org/turing/sigs/sigchi/ chi95/Electronic/documents/papers/ppp_bdy.htm.

Pirolli, Peter and Card, Stuart K. 1999. Information foraging. Psychological Review 106: 643-75.

Pirolli, Peter et al. 2005. Information scent and Web navigation: theory, models, and automated usabil-ity evaluation. In Salvendy, Gavriel, ed., HCI in-ternational 2005: 11th international conference on

human-computer interaction, July 22-27, 2005, Cae-sars Palace, Las Vegas, Nevada, USA. [Mahwah N.J.]: Lawrence Erlbaum Associates. Available: http://www-users.cs.umn.edu/~echi/papers/2005-HCII/HCII_2005_Web_Info_Scent-v2.pdf.

Quintarelli, Emanuele. 2005. "Folksonomies: power to the people." Paper presented at the ISKO Italy-UniMIB meeting in Milan, Italy June 24 2005. Available: http://www.iskoi.org/doc/folksonomies. htm.

Rice, Ronald E. et al. 2001. Accessing and browsing in-formation and communication. Cambridge, MA: The MIT Press.

Shirky, Clay 2005. Ontology is overrated: categories, links and tags. Available www.shirky.com/writings/ ontology_overrated.html.

Spink, Amada and Cole, Charles 2006. Human infor-mation behavior: integrating diverse approaches and information use. Journal of the American Society for Information Science and Technology 57: 25-35.

Spiteri, Lousie 2006. The use of folksonomies in public library catalogues. The serials librarian 51n2: 75-89.

Sundar, Shyam et al. 2007. New cues: information scent and cognitive heuristics. Journal of the American Society for Information Science and Tech-nology 58: 366-78.

Tonkin, Emma. 2006. Searching the long tail: hidden structure in social tagging. In Furner, Jonathan, and Tennis, Joseph T., eds., Advances in classification re-search, Vol. 17: Proceedings of the 17th ASIS&T SIG/CR Classification Research Workshop (Austin, TX, November 4, 2006). Available: http://dlist. sir.arizona.edu/1791/.

Trant, Jennifer. 2006. Exploring the potential for so-cial tagging and folksonomy in art museums: proof of concept. New Review of Hypermedia and Mul-timedia 12: 83-105.

Trevino, Ericka M. 2006. Social bookmarks: personal organization and collective discovery on the Web. (Master’s thesis, University of Illinois at Chicago, 2006). Available http://blog.erickamenchen.net/ Trevino-SocialBookmarking2006.pdf.

Van Damme, Céline, Hepp, Martin, and Siorpaes, Ka-tharina. 2007. FolksOntology: an integrated ap-proach for turning folksonomies into ontologies. In Proceedings of the ESWC 2007 Workshop Bridg-ing the Gap between Semantic Web and Web 2.0, Innsbruck, Austria, pp.71-84.

Vander Wal, Thomas. 2007. Folksonomy coinage and definition (Feburary 2, 2007). Vanderwal.net Avail-able http://vanderwal.net/folksonomy.html.

Page 69: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 H. Park. A Conceptual Framework to Study Folksonomic Interaction

529

Voss, Jakob. 2006. Collaborative thesaurus tagging in the Wikipedia way. Available http://arxiv.org/ftp/ cs/papers/0604/0604036.pdf.

Xu, Zhichen et al. 2006. “Towards the semantic web: collaborative tag suggestions.” Paper presented at the Collaborative Web Tagging Workshop at WWW2006, 22 May 2006 Edinburgh UK. Avail-able: http://www.semanticmetadata.net/hosted/ taggingws-www2006-files/.

Yun, Zhang, and Boqin, Feng. 2008. Effective brows-ing of personal tag space in social tagging systems.

In Proceedings of the IEEE International Confer-ence on Information Reuse and Integration, IRI 2008, 13-15 July 2008, Las Vegas, Nevada, USA. Piscataway, NJ: IEEE, pp. 147-152. Available: http://dx.doi.org/10.1109/IRI.2008.4583021.

Weinberger, David 2006. Taxonomies and tags: from trees to piles of leaves. Available http://www.hyper org.com/blogger/misc/taxonomies_and_tags.html.

Page 70: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

530

Classifying Musical Performance: The Application of Classification Theories

to Concert Programmes†

Deborah Lee

Book Library, Courtauld Institute of Art, Somerset House, Strand, London WC2R 0RN England, <[email protected]>

Deborah Lee commenced a PhD at the Department of Information Science, City University London, in October 2010, researching knowledge organisation and music. She is also the senior cataloguer at the Courtauld Institute of Art. Previously, she has held cataloguing positions at several academic libraries and was a research assistant for the AHRC Concert Programmes Project. Deborah has given a number of papers about classifying concert programmes and her book chapter concerning on-the-job training of 21st-century cataloguers was published earlier this year. She was the 2008 winner of the IAML (Uk&Irl) E.T. Bryant prize for her masters dissertation on classifying concert programmes, which is the basis for this article.

Lee, Deborah. Classifying Musical Performance: The Application of Classification Theories to Con-cert Programmes. Knowledge Organization, 38(6), 530-540. 23 references. Abstract: This paper demonstrates how knowledge organisation theories can be used to understand the arrangement of concert programmes. Key classification theories from the management of libraries, archives and ephemera collections are used as a frame-work in this study: characteristics of division (faceted classification theory), provenance (archival arrangement) and arrangement by format (ephemera arrangement). Each theory is used to analyse the arrangement of specific concert programme collections held at the Centre for Performance History, Royal College of Music, London. Two classification models are created from the analysis. Model 1 reveals how concert programme arrangement could be viewed as a theoretical bridge between bibliographic, ar-chival and ephemera arrangement theories. This model proposes a unified classification based on bibliographic characteristics of division; the characteristics of division structure is populated with characteristics taken from bibliographical classification, archi-val arrangement and ephemera organisation. Model 2 proposes an alternative way of considering the unified classification model: a triumvirate of event, programme and individual copy. Complex relationships between elements of the triumvirate are explored, as well as is an analysis of how various characteristics fit into the model.

Received 31 January 2011; Revised 10 August 2011; Accepted 15 August 2011

† With thanks to Dr Rupert Ridgewell and Dr Katherine Cooper for their advice on various drafts of this paper, as well as Dr Julian Gilbey for his much-appreciated technical assistance. The author is also very grateful to Derek Lee for his proofreading support.

1.0 Introduction Traditionally, concert programmes have not enjoyed the same attention from collection managers as other musical documents. This neglect prompted Ridgewell (2003, v) to describe them as “the Cinderellas of mu-sic item retrieval.” Though this neglect is slowly being rectified−especially at collection level, for example the

recent Concert Programmes Project phase 1−few in-dividual programmes have been catalogued or docu-mented, which means that access to concert pro-grammes is reliant on manual methods of retrieval. There are a number of features of concert pro-grammes that make them particularly interesting from a classification perspective. Programmes do not abide in one type of information management abode: they

Page 71: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

531

can be found in libraries, archives and ephemera col-lections. Like other forms of performance ephemera, concert programmes are both representations of an event and physical items in their own right. This dual identity makes their arrangement especially worthy of exploration.

This paper aims to demonstrate how classification theories could be used to understand the arrangement of concert programmes. First, important principles from various classification theories will be reviewed, taken from bibliographic, archival and ephemera ar-rangement theories. This will be followed by a brief description and definition of concert programmes and an outline of the specific challenges associated with arranging programmes as compared to other do-cuments. Next, the potential of applying these prin-ciples to the arrangement of concert programmes will be explored, using examples from a specific institu-tion’s collections. Two classification models will fol-low. The first is a unified model that brings together the arrangement theories and approaches of the vari-ous information management theories. The second considers the concert programme as a series of three layers, and examines how this approach aids our un-derstanding of the classification process. By exploring the arrangement of concert programmes, access to these valuable documents will be enhanced. 2.0 General classification theories The purpose of arranging material is addressed by a number of authors in the bibliographic and archival worlds. ‘Retrieval’ is at the heart of many of their re-sponses. Perreault (1978, 53 emphasis original) argues that the relationship between ordering items and re-trieval is so co-dependent that it is almost subcon-scious:

It is so deep a part of the purpose of our [the li-brarian’s] profession that no argument seems needed to prove that the benefit that is aimed at in imposing order on files and collections is re-trieval, whether of information or of documents.

For bibliographic commentators, one aspect of re-trieval is intimately linked to classification: browsing. For instance, Rowley and Farrow (2000, 194) argue that classification is particularly useful for browsing, and add that browsing is concerned with the expecta-tion of finding similar subjects nearby on the shelf.

Access to concert programmes is particularly diffi-cult, given that most collections are not catalogued or

indexed at item level (Concert Programmes Project 2004). Browsing the shelves of an un-catalogued col-lection of concert programmes may be the only way of determining whether a particular item is present in the collection. The sheer size of some collections is also a factor: for instance, the Centre for Perform-ance History (CPH) at the Royal College of Music (RCM) in London estimates its holdings at 600,000 items (Ridgewell 2003, 95). In these cases, access to concert programmes is almost entirely dependent on their effective arrangement.

Furthermore, the hybrid nature of concert pro-grammes makes their arrangement problematic. A concert programme is usually a printed item, and therefore a type of bibliographic object; a concert programme is a document of an event in an organisa-tion or person’s life, and thus a type of archival ob-ject; a concert programme is also a transient item produced for a one-off event, and thus an item of ephemera. Or, turning this the other way around, a concert programme belongs equally in a library, an ar-chive and a collection of ephemera. In the United Kingdom, concert programmes can be found in each of these three types of institutions, and are the sub-ject of the varying management systems of libraries, archives and ephemera centres. Therefore, the ar-rangement systems of each of these types of institu-tions needs to be considered in order to effectively analyse the arrangement of concert programmes. 3.0 Specific classification theories Three classification theories from three types of in-formation management centres have been selected for this paper: characteristics of division, provenance and arrangement by format. This selection of classifica-tion theories is based on arrangements of concert programme collections noted as part of the case study research which informs this paper (Lee 2008). For the purposes of this paper, the term ‘classification’ will be used in the broadest sense−meaning the systematic arrangement of materials. Furthermore, as informa-tion management systems use different vocabulary to express arrangement principles, the terms ‘arrange-ment,’ ‘organisation’ and ‘classification’ will be used interchangeably. However, it is accepted that the use of all of these terms is limited to the physical realm for the purposes of this paper, and this use does not fully represent the breadth that these terms cover.

This study makes use of one of the main principles underpinning faceted classification and its associated concepts: the characteristic of division. For the pur-

Page 72: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

532

poses of this paper, a ‘characteristic of division’ is de-fined as the aspect by which a subject is divided into subsidiary subjects. The term ‘characteristic of divi-sion’ appears to be used interchangeably with ‘princi-ple of division’ in bibliographic classification litera-ture; in addition, Ranganathan’s term ‘division char-acteristic’−as defined in his glossary of faceted classi-fication (1958, 122) and used much earlier in the first edition of his Prolegomena to library classification (1937, 10)−also has the same meaning. For ease of reference, the term ‘characteristics of division’ will be used throughout the paper.

The characteristic of division system of arrange-ment causes a number of consequential phenomena, which will prove important to discussions about the arrangement of concert programmes. When a charac-teristic of division is applied to a subject, each result-ing, subsidiary subject benefits from collocation; any item with a given subsidiary subject will sit on the shelf near other items with the same subsidiary sub-ject. However, another inevitable consequence of characteristic of division classification is viewed less positively. For every selected characteristic of divi-sion, there will be at least one which either is not se-lected, or if multiple characteristics of division are employed, is not applied first. These subsidiary sub-jects are known as distributed relatives, and items with these qualities will be scattered throughout the classification system. Distributed relatives are not to be dismissed lightly; any scheme scatters more sub-jects than subjects which are collated (Buchanan 1979, 37-38). Therefore, as Foskett (1996, 61) sum-marises, using characteristics of division brings some concepts together while splits others.

The underlying principle behind archival arrange-ment is that the context of the documents must not be lost through their arrangement. Each item has value as part of a collection (Williams 2006, 74): “It [the document] has a collective significance, and sig-nificance is lost if documents are treated as single items.” Modern archival classification is largely based on two theories that espouse this principle: prove-nance and original order. In the case study institu-tions, provenance was seen to be an important ar-rangement theory for collections of concert pro-grammes, so this classificaton method will be the fo-cus of archival arrangement discussions. Arrangement by provenance means that materials with the same origins will be kept together. Thibodeau (1998, 68) suggests the rationale behind provenance is that an item’s status comes from the creator of the archives, meaning the organisation or person who originally

collected the items. If the item is not kept accord-ingly then this link will be lost. Though evidence of original order was seen in some concert programme collections studied for the initial case studies (Lee 2008), it did not lead to further analysis within the concert programme framework, so will not be further discussed in this paper. Similarly, while acknowledg-ing the importance of later archival arrangement prin-ciples such as function and early archival arrangement theories such as the geographic-chronologic scheme, they have been purposefully ignored in the ensuing discussion, as the selected case study institutions re-vealed little insight into their potential application to concert programme collections.

Ephemera arrangement discourse can be divided into two prevalent viewpoints. The first is based on archival principles, and is concerned with provenance and provenance-based issues (see, for example, Hadley (2001)). The second viewpoint is aligned to librarian-ship. For example, Pollard (1977) bases his discussion on ephemera arrangement around subject classifica-tion. However, Pollard (1977) also gives a few inter-esting alternatives: one of these is arrangement by format.

The idea of arranging ephemera by format−for ex-ample, keeping all posters together, all programmes together, all playing cards together, and so forth−has resonance for a number of reasons. First, ‘format’ does not feature in most mainstream discussions of other arrangement theories, suggesting arrangement by format could be a quintessentially ephemera-based idea. Second, the concept of format is intrinsically im-portant to ephemera studies in general – for instance, Rickard’s (2000) ephemera encyclopaedia is largely a series of entries about individual ephemera formats. Third, the arrangement by format of major ephemera collections, such as the John Johnson Collection of printed ephemera in Oxford, suggests that format is significant in the arrangement of ephemera. 4.0 The arrangement of concert programmes Consideration of what constitutes a concert pro-gramme is necessary before further analysis of their arrangement can be contemplated. At the broadest level, concert programmes are a type of object pro-duced to accompany a musical performance. The usual purpose of a concert programme is to codify items re-lating to the musical performance−such as the music performed and the performers−as well as to provide information on the day and location of the event. Ridgewell (2003, 3), states that programmes are pri-

Page 73: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

533

mary source material, produced for a specific event, and are not usually created retrospectively. The ap-pearance and format of concert programmes are also extremely varied, ranging from a single photocopied sheet to a multi-coloured and gilded souvenir pro-gramme.

Various types of information are scattered across different parts of the concert programme. Some in-formation places the event in time and space and is of importance to researchers; examples include geo-graphic place, concert venue, concert date and time. Musical programme information, such as works per-formed or genre of concert, are usually present. In-formation about performers is given on most pro-grammes, such as names, biographies and headshots of soloists and conductors; in larger programmes, lists of choir or orchestra members are often found. Sometimes, programmes contain important textual information about the music being performed, for in-stance programme notes with, or without, musical examples. Programmes are often a rich source of so-ciological data and visual data as they may feature general advertisements and portraits of performers. Finally, programmes frequently contain information relating to other concerts, for instance lists of con-certs in the same series or unrelated concerts at the same venue.

The significant theories from bibliographic, archival and ephemera arrangement theories discussed above can all be taken from their original contexts and ap-plied to concert programmes. The original research which formed the basis of this article (Lee 2008) drew from three case study institutions: the Wigmore Hall Archive, the Royal Academy of Music Library and the CPH at the RCM. These institutions represent an ar-chive, library and research centre (containing elements of an archive and an ephemera collection) respectively. However, comparison between the type of institution and collection arrangement revealed a non-linear rela-tionship between the two. Furthermore, a closer ex-amination of the collection management context sug-gested that simple collection management categorisa-tion was not possible. Therefore, this factor has been ignored and all examples used in this paper are from the largest collection in the case study, the CPH. Spe-cific collections at the CPH are used to demonstrate how the three selected classification theories−charac- teristics of division, provenance and arrangement by format−can be applied to concert programmes. It is not suggested that CPH staff have consciously ar-ranged their collections in the manner described be-low; rather, these examples suggest a theoretical

framework for the physical classification of these col-lections.

The Menges collection can be used to demonstrate how characteristics of divisions could be used to un-derstand the arrangement of the collection. It contains programmes and ephemera from the twentieth-century British violinist Isolde Menges. She was active as both a solo violinist and chamber musician in her own ensemble, with both aspects reflected in the col-lection. There are also programmes featuring Menges’s students in the collection, where she did not play.

In a simplified model, the concert programmes in the Menges collection are arranged by concert venue, followed by date. The concert venue itself is an amal-gamation of two components, geographic location and building, but for simplicity “concert venue” will be used as the combined term for both. The programmes from solo concerts and those from chamber music are kept in two different, similarly arranged sequences; also, programmes where Menges did not play are kept separately from concerts where she did. Using the characteristics of division method, ideas such as ‘per-formers,’ ‘concert venue’ and ‘date’ could be perceived as characteristics of division (see Figure 1).

The collocation and scattering can be seen when analysing the collection using this method. For in-stance, because the role of Menges is the highest characteristic, all the programmes from Menges’s chamber groups have been collated. This is helpful to musicologists researching the performance profile of her chamber groups. However, as ‘time’ is one of the last characteristics, concerts from the same year have been almost comprehensively scattered. This is disad-vantageous to anyone seeking a chronological narra-tive of Menges’ life and performances.

An example of arrangement by provenance is pro-vided by the CPH’s collection of programmes relat-ing to the performing career of the British oboist Leon Goossens. The programmes are kept together by virtue of being part of a single donation to the CPH by the estate of Leon Goossens: all the pro-grammes in this collection have the same provenance. This means that identical programmes from concerts given by him can be found in both the Leon Goossens collection and in other parts of the CPH’s holdings, and these identical programmes are not interfiled. For arrangement purposes, the context of each individual programme in the Leon Goossens col-lection is more important than the information the programme contains.

The CPH has examples of collections which are arranged by format. Collections of concert pro-

Page 74: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

534

grammes frequently include other performance ephemera items, for instance press cuttings, concert diaries, posters and tickets; or collections may con-tain archival items such as diaries or letters. There-fore, applying the principle of arrangement by format to concert programme collections means that they have been separated from all other ephemera or archi-val items. For instance, the Thomas Harper collection at the RCM includes a number of volumes of concert programmes held at the CPH, and a manuscript vol-ume which lists various concerts in which the nine-teenth-century trumpeter Thomas Harper per-formed, which is held in the RCM library. Some que-ries arising from the concert programme volumes can be solved by the manuscript volume; but due to their differing formats, the programmes and manuscript volume are in different departments at the RCM, lo-cated at different sites, and documented on separate catalogues. 5.0 Model 1: Universal characteristics of division The arrangement of libraries and archives seems at first glance to be based on very different principles. Archival classification values the context of each item, seeing separation of a document from its con-text, a travesty to the intellectual arrangement of the items; bibliographic classification is largely based on subject and assesses the intellectual contents of each item on an individual basis. Bibliographic and archival theorists largely consider their classification and ar-

rangements to be entirely disparate: not only are there painstaking efforts by archival theorists to separate themselves from bibliographic ideas of arrangement – see for example Hurley (1993, 212) repeating three times in a row that a particular type of archival theory is most definitely not bibliographic – but there is very little classification literature which considers both bibliographic and archival classification, with Schel-lenberg (1965) a notable exception.

However, a closer analysis of how provenance functions within concert programme collections re-veals an interesting paradigm and a potential bridge. If a group of programmes are sorted into collections by the archival principle of provenance, another way of describing this phenomenon is that the pro-grammes have been arranged by dividing the group into different provenances. Therefore, provenance could be viewed as an honorary characteristic of divi-sion and the related concepts of collocation and scat-tering can also be viewed. For example, in the Leon Goossens collection mentioned above, different pro-grammes from the same source are collated while identical programmes from different sources within the CPH are scattered. In practice, provenance usu-ally acts at the level which decides whether pro-grammes are in one collection or another – for exam-ple, where special collections have been separated from each other and ‘non-special’ collections. Trans-ferring this to the world of characteristics of division, provenance would therefore be one of the first char-acteristics applied.

Figure 1. Menges Collection characteristics of division method

Page 75: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

535

Although the arrangement of ephemera does not deliberately separate itself from archival or biblio-graphic classification, it usually links to one or the other rather than providing a useful bridge between the two. However, a similar case can be made for the ephemera management idea of arrangement by for-mat, along the same lines as provenance which was de-scribed above. If a collection contains different for-mats of material from the same concert, the concerts themselves would become distributed relatives – as in the case of the Thomas Harper collection (see section 4). In these cases, format would act as the highest characteristic of division. Examples from the CPH show that format can also function as one of the last characteristics of division applied, after geographic lo-cation, type of venue and name of concert venue. For instance, programmes and concert diaries from the concert venue St. John’s, Smith Square in London ha-ve been separated. This practical solution has been chosen for both its neatness for storage purposes and ease at seeing gaps in the collection. It also shows how format can be one of the final characteristics of divi-sion.

To summarise, this model amalgamates concepts from archival, bibliographic and ephemera classifica-tion theories. ‘Characteristics of division,’ ‘prove-nance,’ and ‘arrangement by format’ are brought to-gether as one unified system of characteristics of divi-sion. The bibliographic technique ‘characteristics of division’ – with the inevitable processes of collocation and scattering – can be applied universally, even when the characteristics are taken from outside the biblio-graphic sphere. 6.0 Model 2: The event/programme/individual

copy triumvirate The second model utilises the unified system of the first. The techniques of characteristics of division re-main, but the characteristics themselves are consid-ered in a completely different way. A close considera-tion of the characteristics of division identified previ-ously in this paper reveals an interesting pattern. While some characteristics, such as date or concert venue, relate to the event itself, others, such as whether the item is an individual concert programme or concert diary, relate directly to the object. There-fore, the arrangement of concert programmes could be viewed from an alternative frame of reference: con-cert programmes are the union of an event and a physical object. Taking this further, the physical object could be considered as two separate components. A

programme is one of many identical programmes from the same concert; however, any given pro-gramme is also an individual item with its own unique custodial history, an exemplar of the whole print-run of a particular programme. Each exemplar may also include annotations which provide extra information about the event (for instance, encores or last-minute changes of musical programme) or provide insight concerning the original owner of the programme (for example, their opinions about the performers or piec- es performed). Not only can this framework aid our understanding of concert programme arrangement, but furthermore, can provide an insight into how the arrangement of programmes in a collection affects how that collection is perceived.

Programmes can therefore be considered to consist of the following three aspects: Event

A concert; something which exists in both the temporal and spatial planes, but not in the physical plane [Note that the event as represented in the programme is the planned event as correct when the programme went to press; there may be differ-ences between the planned event and the event which actually takes place]

Programme

An item which contains information about the (planned) event; something which exists in the physical plane and has physical attributes

Individual copy

A particular exemplar of a programme; exists in the physical plane and has physical attributes; may ap-pear physically identical to all other copies of a programme, but each copy has its own custodial history and current storage conditions (such as binding); may contain annotations (from the origi-nal programme owner) which could provide extra information about the actual event as opposed to the planned event

We can now consider where potential characteristics of division of a concert programme will fit into the triumvirate. The characteristics discussed earlier in this paper are now supplemented by other potential characteristics. Together these aim to provide a more detailed picture of one concert programme.

Page 76: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

536

Event Date of concert; time of concert; geographic loca-tion; concert venue (containing elements of geo-graphic location and venue type); concert genre; repertoire; solo performer(s); performing groups; individual concert-promoters or concert-giving so-cieties

Programme

Format; size of programme; programme notes (in-cluding the presence of analytical notes and the programme notes author); visual features; advertis-ing; box office information (including seating plans or ticket prices)

Individual copy

Provenance; custodial history; storage (including current binding); annotations and signatures; copy number

Considering real-life collections through the prism of the event/object/individual copy paradigm reveals a startling trend: the characteristics are used unevenly. For example, an examination of the main concert pro-gramme collection at the CPH reveals that only event characteristics−such as place, concert venue, date and time−are used. Indeed for non-special collections and within special collections, event characteristics are generally by far the most prevalent. This has serious implications for users of the collections. As espoused by Batley (2005), classification is concerned with clas-sifying knowledge; if concert programme arrangement is largely based on classifying using event characteris-tics, then classification of concert programmes will become a classification of concert life.

However, though event characteristics may be the most significant quantitatively, programme and indi-vidual copy characteristics are still important qualita-tively. For instance, designating a group of pro-grammes to be a special collection is based on prove-nance and in the triumvirate, provenance falls into the individual copy layer. As described in the first model, provenance is often one of the first characteristics applied and is therefore highly significant to the ar-rangement of concert programmes.

Exploring relationships between levels in the tri-umvirate is another way to demonstrate its value. A number of relationships are theoretically possible be-tween events, programmes and individual copies. For example, there could be one event and two different programmes for that event, such as a festival pro-gramme and a concert programme; on the other hand, another example would be where five events from an orchestra on tour are covered by one pro-gramme, where this programme covers all five con-certs. In practice, the individual copy part of the rela-tionship is largely stable: there are numerous individ-ual copies of each programme. Because of this, the re-lationship which is of interest is that between event and programme; four basic types of event/pro- gramme relationship have been identified.

The first type of relationship is where there is a single event and a single type of programme produced for that event, a one-to-one relationship. As well as being the simplest relationship in theory, this is also the most common in practice (see Figure 2).

The second type of relationship occurs where there is one event, but two types of programme are pro-duced. This is a one-to-many relationship and there are numerous types of situation where this may occur:

Figure 2. Relationships between one event, one programme and multiple individual copies

Page 77: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

537

for example, a concert may have a free programme giving just basic information as well as a souvenir pro-gramme available for purchase (see Figure 3).

Both of these relationships assume only one event is represented by the programmes. However, in prac-tice this is not always the case as a concert programme could cover more than one event (see Figure 4).

For instance, a concert programme may represent an orchestral concert which is repeated on two dates in the same venue; or, the same concert given in two nearby towns. The relationship between these events and the programme is many-to-one. Things get even more complex when events such as concert series and

festivals are considered. These will often result in multiple types of programme, such as festival pro-grammes and programmes from individual concerts. However, items such as festival programmes also rep-resent more than one event, where the relationship between events and programmes would be many-to-many (see Figure 5).

At the CPH, the first of these four relationships is the most prevalent. However, it is the other three re-lationships which proved to be the more problematic when arranging programmes: for instance, the CPH collections contained many items from festivals, and these items caused many challenges to collection man-

Figure 3. Relationships between one event, multiple programmes and multiple individual copies

Figure 4. Relationships between multiple events, one programme and multiple individual copies

Page 78: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

538

agers. Festivals produce a number of different types of items. Each festival will have one or more festival programmes, which contain details about many dif-ferent events; there are also individual concert pro-grammes, which contain details of only one event. An analysis of the arrangement of programmes from fes-tivals using the triumvirate reveals insights into the problems. For example, if the collection were ar-ranged by format of programme, then festival pro-grammes and individual concert programmes would be kept in separate sequences as they are different types of programme. This is non-ideal as the same event would be represented in both the festival pro-gramme and individual concert programme, which would be scattered on the shelves. Alternatively, if this same collection of programmes were arranged by event, each individual concert programme would be collocated with a copy of the festival programme. This is a better theoretical solution, as all the infor-mation in the collection about a specific concert would be in one place. However, in practice this ar-rangement would seldom work: it is unlikely that there would be enough copies of the festival pro-gramme to collocate each with an individual concert programme. Though analysis using the event/pro-

gramme/individual copy triumvirate and the resulting relationships does not provide any easy solution to arranging programmes from festivals, it does provide a better theoretical understanding of the problems collection managers will encounter.

Model 2, though applying specifically to the physi-cal arrangement of (usually) uncatalogued items, must be viewed within the context of the recent in-flux of projects devoted to the creation of metadata for performances and performance materials in the on-line environment. A number of projects have fo-cused on describing events rather than performance ephemera – so in model 2 terms, situated in the “event” level. For example, the recent project to index all past and present Royal Opera House perform-ances (Royal Opera House 2011) in a public-access performance database involved creating a metadata model and organisation system for musical events. This model subdivides an event into work, produc-tion and performance, and the model considers the complex relationships between these constituent parts (Field 2007). Some projects bring together dif-ferent types of data concerning the same musical event. For instance, Fingerhut (2008) describes the Institute for Research and Coordination Acous-

Figure 5. Relationships between multiple events, multiple programmes and multiple individual copies

(This figure shows a hypothetical situation where each programme represents multiple events; yet for events such as festivals or concert series, the most likely real-life situation is for ‘programme a’ to cover multiple events, and ‘programme b’ to refer to only one event – see below).

Page 79: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

539

tic/Music (IRCAM) workflow, where different types of information about a musical event−for example, recordings, programme notes−are incorporated into the same metadata model. In model 2 terms, this combines a ‘surrogate’ of the event (through an audio file) with a digital copy of (part of the) programme. Work completed in the wider performing arts com-munity demonstrates other approaches to providing and organising performance data: for instance, the Global Performing Arts Database (GloPAD 2006) and the Australian Performing Arts collection (PROMPT), whose metadata is contained within the National Library of Australia Catalogue (2011). 7.0 Conclusion This study considered how classification theories can be used to help our understanding of the arrangement of concert programme collections. In the Menges col-lection, characteristics of division were used to ana-lyse the arrangement and this was depicted as a hier-archical diagram of characteristics. The Leon Goossens collection demonstrated how arrangement by provenance prioritises context over contents, and showed that the same programme from different owners would be separated using this system. The Thomas Harper collection proved that even within a performance ephemera collection there can be various types of ephemera, and hence the validity of arrange-ment by format; however, this example also showed how arranging performance ephemera collections by format can lead to problems in retrieval, and the intel-lectual contents of a collection can become scattered.

Model 1 drew together each of these arrangement theories into a single system of characteristics of divi-sion, where the characteristics are taken from librar-ies, archives and ephemera collections. Geographic location, concert venue, date, concert series, pro-gramme note author, provenance and format all be-come equal as potential characteristics of division. This resulting unified model has interesting implica-tions. On a conceptual level, the fiercely independent realms of bibliographic and archival arrangement theories – with a little help from ephemera – have in some small way been brought together. This method of absorbing archival and ephemera classification theories into the bibliographic classification universe suggests an interesting new approach for knowledge organisation research. From a collection management perspective, the chameleonic qualities of a concert programme traditionally have made the arrangement of these programmes problematic; however, combin-

ing theories from all three types of collection might help them to be housed more successfully in any one of them. Another implication of this model is to con-sider its extension to other types of performance ephemera. For example, the validity of model 1 could be tested by seeing whether it is effective for all types of performance ephemera, not just concert pro-grammes; or, the model could be used as an analytical tool for investigating classification issues in general performance ephemera collections.

Model 2 suggested taking this unified theory apart again – albeit in a different way. The event/pro- gramme/individual copy triumvirate was proposed, which usefully showed how some of the problems of arranging concert programmes could be better under-stood. For example, organising music festival ephem-era, which encompasses the complexities of multiple events and multiple programmes simultaneously, can be analysed on a theoretical level using this model and potential solutions evaluated. Drawing together model 2 and various performance databases or performance ephemera databases introduces exciting possibilities; these investigations could usefully move the discussion beyond the specificity of concert programmes, towards the general organisation of performance ephemera. Both models 1 and 2 can be used to demonstrate the influence that the arrangement of a collection exerts over how researchers view and use a collection; for in-stance, the characteristics selected by collection man-agers will determine whether the programmes are or-ganised in an event, programme or individual copy ori-entated arrangement, and it is this ‘version’ of the col-lection that will be presented to users. Specific exam-ples of this phenomenon are given in Lee (2007) which analyses how three different hypothetical arrange-ments of the Thomas Harper collection (housed in the CPH) would create three different perceptions of the material. In short, there are many different ways that models 1 and 2 can be used to analyse concert pro-gramme classification, and applications of both models can be extended to general performance ephemera. To conclude, these neglected, ‘Cinderella’ concert pro-grammes may still be far from living happily ever after, but hopefully this brief foray into their arrangement has helped them on their way to the ball. References Batley, Susan. 2005. Classification in theory and prac-

tice. Oxford: Chandos. Buchanan, Brian. 1979. Theory of library classification.

London: Clive Bingley.

Page 80: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · 2021. 4. 6. · Mauri Kaipainen and Antti Hautamäki. Epistemic Pluralism and Multi-Perspective Knowledge Organization: Explorative ...

Knowl. Org. 38(2011)No.6 D. Lee. Classifying Musical Performance: The Application of Classification Theories to Concert Programmes

540

Concert Programmes Project. 2007. About. Available at: <http://www.concertprogrammes.org.uk/html/ about> (Accessed 31 Aug. 2010).

Field, A. 2006. The Royal Opera house performance database. IAML-IMIC-IMS Conference. Göthen-burg University,18-23 June.

Fingerhut, M. 2008. Online preservation and access to the record[ing]s of past [mostly] musical events. Available at: <http://mediatheque.ircam.fr/articles/ textes/Fingerhut08b/index.pps> (Accessed 21 July 2011) [From a presentation given at the IAML con-ference, Naples, July 2008]

Foskett, A.C. 1996. The subject approach to informa-tion. 5th ed. London: Library Association.

Global Performing Arts Consortium. 2006. GloPAD. Available at: <http://www.glopad.org/pi/en/> (Accessed 21 July 2011)

Hadley, Nancy. 2001. Access and description of visual ephemera. Collection Management 25n4: 39-50.

Hurley, Chris. 1993. What if anything is a function? Archives and Manuscripts 21: 208-20.

Lee, Deborah. 2007. Organizing concert life: concert programme arrangement and the historiography of musical performance. Music in 19th-century Britain [Biennial conference]. Birmingham, England, July 5-8.

Lee, Deborah. 2008. Classifying musical perform-ance: the application of classification theories to concert programmes. (Master’s dissertation, Lon-don Metropolitan University)

National Library of Australia. 2011. Catalogue. Avail-able at: <http://catalogue.nla.gov.au/Search/ Home> (Accessed 21 July 2011)

Perreault, Jean M. 1978. The idea of order in bibliog-raphy. Bangladore: Sarada Ranganathan Endown-ment for Library Science.

Pollard, Nik. 1977. Printed ephemera. In Pacey, Philip., ed., Art Library manual: a guide to resources and practice. London: Bowker, pp. 316-36.

Ranganathan, S.R. 1937. Prolegomena to library classi-fication. Madras: Madras Library Association.

Ranganathan, S.R. 1958. Library classification glos-sary. Annals of library science 5n3: 65-112.

Rickards, Maurice and Twyman, Michael. 2000. The encyclopedia of ephemera: a guide to the fragmentary documents of everyday life for the collector, curator and historian. London: British Library.

Ridgewell, Rupert. 2003. Concert programmes in the UK and Ireland: a preliminary report. London: IAML (UK & Irl) and the Music Libraries Trust.

Rowley, Jennifer and Farrow, John. 2000. Organizing knowledge: an introduction to managing access to in-formation. 3rd ed. Aldershot: Gower.

Royal Opera House. 2011. Performance database: what’s online. Available at: <http://www.roh collections.org.uk/performances.aspx> (Accessed 21 July 2011).

Schellenberg, T.R. 1965. The management of archives. New York: Colombia University Press.

Thibodeau, Sharon Gibbs. 1988. Archival arrange-ment and description. In Bradsher, James Gregory, ed., Managing archives and archival institutions. London: Mansell, pp. 67-77.

Williams, Caroline. 2006. Managing archives: founda-tions, principles and practice. Oxford: Chandos.

Editor’s note:

In printed issues of KO 38 no. 5 (2011) the names of Carla López Piñeiro and Elea Giménez Toledo were improperly inverted in the citation provided with the abstract. We regret the error. The correct form of the citation is:

López Piñeiro, Carla and Giménez Toledo, Elea. Knowledge Classification: A Problem for Sci-entific Assessment in Spain? Knowledge Organization, 38(5), 367-380. 36 references.