Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept....

45
Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS Sofia 2011 Contract Contract № 2011- № 2011- ERA ERA - - IP- IP- 7 7 Sofia, 04.-17. September, 2011

Transcript of Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept....

Page 1: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Subject access to information in Web 2.0 environments

Sonja Špiranec, PhDAssistant Professor

Dept. of Information Sciences,Zagreb, Croatia

IP LibCMASS Sofia 2011Contract Contract № 2011-№ 2011-ERAERA--IP-IP-77

Sofia, 04.-17. September, 2011

Page 2: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Content• context: Web 2.0 and libraries

• KOS (knowledge organization systems)

• folksonomies– strengths, weakness– research, studies– tag/folksonomies improving techniques

• seminar/discussion /group work

Page 3: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Context• Web 2.0 influences and transforms

information landscapes

• addresses issues of generating and using information, organization and access to information

• the influence on the LIS sector is natural

Page 4: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Web 2.0 in library context• enthusiasm, excitement

• do we really have a win-win situation?

Page 5: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Added-value for users?

• personalisation of services, interactivity• users will appreciate the possibility to “privatize”

their space on the library web site• growing expectation from library users that they

will be able to interact with the catalogue, not just passively receive information delivered by the cataloguers

• strengthens connection between library and their user.

• psychologically the connection is stronger if they participate, and create something by themselves

Page 6: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

The downside: Quality of information, quality of intelectual access to

information• what if the content is inaproppriate, of low

quality

• consistency in the organization of information

• providing acces to materials

• ...

Page 7: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Short group discussion

• discuss within your group the meaning of the term Library 2.0.

• What does it denote for You?

• Strengths? Weakness?

Page 8: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

A short memento:

KOS in libraries

Page 9: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

KOS in libraries

• To organize is to: give orderly structure to; frame and put into working order

• libraries are in the business of organizing information, namely documents, from their beginnings

• to this end languages for document representation and organization where used

Page 10: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

• information professionals have developed indexing languages;(set of terms used to represent topics or features of documents, and the rules for combining or using those terms)

• since the 1960, much research has been done to test whether controlled or uncontrolled indexing languages provide better retrieval results

• E. Svenonius suggested that “free text and controlled vocabulary terms each contribute to precision and recall

Page 11: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Controlled vs. natural languages

• control is exercised over which terms are used and what are the relationships between the terms

• terms are standardised and similar or related resources are collocated for ease of discovery by the user (Lancaster, 1972).

Page 12: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Features of controlled languages

• control the use of synonyms (and near-synonyms) by establishing a single form of the term. This ensures that indexers apply the same terms to describe the same or similar concepts, e.g. “car”, “automobile”, “motorcar”, or “motor vehicle”, etc.).

• discriminates between homonyms, allowing the indexer to resolve clashes of meaning that arise when several terms assume the same form but assume distinct meanings (e.g. jaguar)

• controls lexical anomalies by minimising any superfluous vocabulary or grammatical variations that could potentially create further noise in the users' results set spelling variants, singular and plural forms, verb tenses

• it unites similar terms, or systematically refers the indexer to closely related alternatives, in order to ensure that similar or related resources are collocated.

• This is normally achieved by displaying the “genus/species” relationship between terms within some form of semantic hierarchical structure,

Page 13: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

negative features of controlled languages:

• investments of time, money, training, expertise and professional intervention

• current schemes are incapable of reflecting the transient nature of knowledge and therefore the demands of the modern information user.

Page 14: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Folksonomies

Page 15: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Why folksonomies?

• in the context of Web 2.0 developments• the growth of user-generated context increases

demand for suitable methods and facilities of storage and retrieval

• companies (and individuals) have developed collaborative inf. services: social bookmarking, photosharing, videosharing– enable users to store and publish information, but

also to index, organize it– via tags; the totality of tags >>> folksonomy

Page 16: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

• the magic?– users do it by themselves– no guidance, no structure, no

rules, no fields...

• folksonomies turned our professional views, standpoints upside-down

Page 17: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Folksonomies vs. KOS

• the perspective of folksonomies on KOS is an altered one

• instead of choosing criterions, subject departments, classes and filling them with resources the point of departure for folksonomies are resources

• folksonomies employ a resource-centric approach (instead a criteria-centric)

Page 18: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Weller, K. & Peters, I. (2007). Reconsidering relationships for knowledge representation. In: Proceedings of I-Know ‚07, Graz, September 5-7 (pp. 501-504).

two new players in indexingand knowledge representation

Page 19: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

A folksonomy represents simultaneously some of the best and worst in the

organization of information.

Mathes, 2005.

Page 20: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Proposed alternative terms:

• democratic indexing• social classification system• collaborative classification system• ethnoclassification• grassroots taxonomy• user-generated metadata• folk wisdom• folksabulary• mob indexing• tagsonomy• metadata for the masses• lightweight knowledge representation

Page 21: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Folskonomies: basic features

“...a conflation of the worlds ‘folk’ and ‘taxonomy’ used to refer to an informal, organic assemblage of related terminology” (Vander Wal)

• organic structures that mirror the understanding users have of resources

• nothing is predetermined• develop and advance with usage• progress based on collective intelligence (the group is

smarter than the individual)• the collective creation of tags ought to be more rich

semantically than with controlled vocabulary (several opinions and perspectives)

• statistical consensus

Page 22: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Terminology

• the basic unit of folksonomy: tag• tags are user-generated keywords – have been

suggested as a lightweight way of enhancing descriptions of on-line information resources

• social tagging: refers to the practice of publicly labelling or categorizing resources in a shared, on-line environment.

• The resulting assemblage of tags form a “folksonomy”

Page 23: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Exercise

• Basic features of folksonomies– problems– strengths

• Examples:– Connotea tag cloud– Amazon tag cloud– LibraryThing tag cloud

Page 24: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Folksonomies: limitations

• Ambiguity of the tags (emerge as users apply the same tag in different ways)

• the lack of synonym control can lead to different tags being used for the same concept, precluding collocation

• Spaces, Multiple Words • different word forms, plural and singular, inconsistent

and ambiguous assignation of tags• the user proclivity towards exhaustive tags (e.g.

“marketing”, “technology”), popular tags and personal tags (e.g. “me”, “to read”) further compromises precision and contributes to high levels of recall and noise

• Can folksonomies collapse due to rising number of users, tags or resources?

Page 25: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Folsksonomies: strengths

• browsing (finding things unexpectedly)

• up-to-date (can more easily accommodate new terms and concepts than heavily controlled vocabularies)

• reflects the vocabulary of users

• cheaper

• feasibility for large data collections

Page 26: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Folksonomies: strenghts

• cataloguers or indexers will attempt to keep similar or related concepts together– Shirky argues that it is impossible to “collapse” such

terms without loosing the essence of what each term conceptually denotes. He therefore states that it is impossible to disentangle terms such as “queer”, “gay” or “homosexual”

• in traditional controlled vocabulary-based indexing, all terms assigned to a document carry more or less equal weight

• in social tagging, certain tags will be much popular than others

Page 27: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Considering new approaches to knowledge organization in library

contexts

Page 28: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

LC Working Group

• Report of The Library of Congress Working Group on the Future of Bibliographic Control

• the tightly controlled consistency designed into library standards thus far is unlikely to be realized or sustained in the future, even within the local environment.

• Integrate User-Contributed Data into Library Catalogs

• develop methods to guide user tagging through techniques that suggest entry vocabulary (e.g., term completion, tag clouds).

Page 29: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

user tags

indexing terms

Page 30: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

question:

• can tagging solve the problem of organizing knowledge– strengths– today more important – reflect the spirit of modern

time (new problems, new issues, subjects, innovations, research fields

• decentralization• communities of practice• multiperspective (before one viewpoint was OK

because collections had local character) • user-friendly

Page 31: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

• defining the term “tag” on LibraryThing

“Tags are a simple way to categorize books according to how you think of them, not how some official librarian does.”

Page 32: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Research, studies

• studies concerning tag distributions, tag categories, users’ taging behavior, comparison between tags and subject headings...

Page 33: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Tag distributions

• determined that in folksonomies the distribution of tags on resource level resembles a Power Law curve; a few tags are very popular, but the majority are used infrequently

• the frequently used are extremely general and make the vaguest allusions to the content

Page 34: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.
Page 35: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Users tagging behavior

• users pick out particular aspects that are important/interesting to them and express them via tag

• it’s not a representation of the whole entity

Page 36: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Tag categories

• linguistic level; occurence of regularities regarding certain genres or forms of tags

• Golder and Huberman (2006):– “topics”– “type”: format, e.g. blog, article– adjectives: reflect the author’s opinion: funny, boring– “self reference” relation between the tagger and resource: my

stuff– “task organizing”: to read, to do– refining tags, tags which describe antother tag in detail

• categories depend on the social bookmarking system; e.g. on Flickr: geographic tags, time/events

Page 37: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Tag redundancy:

• 19% tags reflect the title (added value?)

• the majority of compounded tags is used only once

Page 38: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Basic-level tags

• Basic-level theory states that terms can be cognitively structured in a hierarchical system with different level of specificity

• The basic level often contains the one term which is the most demonstrative, but not specific– Furniture – chair (basic level) –kitchen chair

• Basic level terms occur much more frequently in natural language

• Heavily used in folksonomies

Page 39: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Folksonomy data set: how to improve it?

• no good guys vs. bad guys• critiques are mainly based on comparisons of folksonomies with

traditional methods of knowledge organization systems (thesauri, classification systems etc.) and professional indexing techniques.

• boundaries between structured KOS and folksonomies are not at all solid but rather blurred.

– folksonomies can adopt some of the principle guidelines available for traditional KOS and may gradually be enriched with some elements of vocabulary control and semantics.

– folksonomies provide a useful basis for the stepwise creation of semantically richer KOS and for the refinement of existing classifications, thesauri or ontologies

Page 40: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

• gradual refinement of folksonomy tags and a stepwise application of additional structure to folksonomies is a promising approach

• some platforms already provide different features to actually manipulate, revise and edit folksonomy tags

Page 41: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.
Page 42: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Flickr’s cluster’s

Page 43: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Theoretical approaches for structural enhancement of folksonomies

• “emergent semantics“, "semantic upgrades" or "semantic enrichments”...

• Tag gardening

• processes of manipulating and re-engineering folksonomy tags in order to make them more productive and effective

• on top of existing folksonomies (don’t inhibit the user)

Page 44: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

• No clear picture of folksonomies has emerged yet– Strong tool with major flaws

• Great enthusiasm of users to participate in indexing

• Strengthen the positive effects and minimize the negative ones

• Additional method for knowledge organization which complement traditional controlled vocabularies

Conclusion

Page 45: Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept. of Information Sciences, Zagreb, Croatia IP LibCMASS.

Literature• Golder, S.A. Huberman, B.A. Usage Patterns of Collaborative

Tagging Systems. // Journal of Information Science 32, 2(2006), 198–208

• Peters, I. Folksonomies: indexing and retrieval in Web 2.0. Berlin : De Gruyter/Saur, c2009.

• Rolla, P.J. User Tags versus Subject Headings: Can User-Supplied Data Improve Subject Access to Library Collections? // Library Resources & Technical Services, 53(3), 2009, 174-184.

• Spiteri, L.F. Structure and Form of Folksonomy Tags:The Road to the Public Library Catalogue. // Webology 4, 2(2007).

• Svenonius, E. The intellectual foundation of information organization. Cambridge, Mass.; London : The MIT Press, 2000.

• Yi, K. Chan, L.M. Linking folksonomy to Library of Congress subject headings: an exploratory study. // Journal of Documentation, 65, 6(2009), 872 – 900.