Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of...

22
Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of...

Page 1: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification

IMT530: Organization of Information Resources

Winter 2007

Michael Crandall

Page 2: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 2

Module 6a Outline

• Where we are

• Controlled vocabularies

• Types of controlled vocabularies

• Tagging

• Overview of building vocabularies

Page 3: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 3

Recap

• We looked at the indexing process to see how controlled vocabularies can be used to enhance access to information– Different methods of indexing provide

different results– Need to decide on your approach based on

an analysis of your business objectives, the user needs, and the domain

– A combination of automatic and human indexing is often the best solution

Page 4: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 4

Overview of Subject Representation

• Subject analysis– a technique used to determine the “subject(s)” and

disciplinary context exemplified by an object• Subject indexing

– a technique through which subject terms (words, taxonomic categories, or notation) are added to an object representation to describe the subject content of the object

• Controlled vocabularies– standards containing controlled subject terms

(words, taxonomic categories, or notation) used in the indexing process

Page 5: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 5

Controlled Vocabulary: Definition

• A controlled vocabulary is a list of terms (words or phrases) or codes (notation) used for indexing

• Almost always, controlled vocabularies show relationships among terms

Page 6: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 6

Purpose of Controlled Vocabularies

• Specific Purposes– To provide access to content by subject, through

providing hierarchical and associative relationships and synonym control for the terms used in the domain

– Increase precision in retrieval and display by controlling homographs (words that are spelled the same but have different meanings)

• General Purposes– Assist users by conveying meaning, orientation,

and structure in a subject area– Assist users by providing rich relationships among

concepts and terms

Page 7: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 7

Buckland

• Proposes five different vocabularies in any system:– Authors– Indexers– Syndetic structure– Searchers– Formulated queries

• Formal tradition vs. document tradition

Page 8: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 8

Types of Controlled Vocabularies

• Subject Heading List• Taxonomy• Thesauri• Classification Scheme• More terminology on Leonard Will’s site

– http://www.willpowerinfo.co.uk/glossary.htm

Zeng, M.L. (2005). Construction of controlled vocabularies: A primer.

Page 9: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 9

Subject Heading Lists

• General list of terms (words and phrases), not limited by discipline or subject area

• Terms are called subject headings• The distinction between thesauri & subject heading

lists is largely historical (subject heading lists are older); there are very few subject heading lists because they are so expensive to maintain

• Terms are mainly subject attributes, but there are many exemplified attributes used in subdivisions

• Example: Library of Congress Subject Headings (LCSH), used in library catalogs– Sample terms: “France – Colonies – History – 18th century”;

“Time and space – Juvenile fiction”; “Frogs” (notice the use of subdivisions, marked here by dashes; thesauri seldom use subdivisions)

Page 10: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 10

Taxonomies

• List of terms (words and phrases) that may be general or subject/discipline/domain specific

• Terms are called taxons or (simply) terms• Terms represent subjects, disciplines/domains, and

exemplified attributes• Used in digital environment only• Examples: Microsoft Corporation intranet

taxonomies; Yahoo taxonomy used in the Yahoo directory– Sample terms from the Yahoo taxonomy (in Yahoo, you’ll find

these at the top of the screen as you browse through the directory): “Education”; “Science > Agriculture > Research > Government Agencies”; “Health > Nursing”; “Health > Education”;

Page 11: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 11

Thesauri• Thesauri (pl.) / Thesaurus (s.)

– List of terms (words and phrases) that are usually limited to a specific subject or disciplinary area

– Terms listed in a thesaurus are often called descriptors

– Thesauri were mostly defined and developed after the advent of the computer and were created for use in an computerized environment (or with computers in mind)

– Terms are usually subject (about) attributes, but some thesauri also contain exemplified (example of) attributes- http://www.e-government.govt.nz/nzgls/thesauri

– Example: ERIC Thesaurus (education)• Sample terms from the ERIC Thesaurus: “School

community relationship”; “College entrance exams”; “Age grade placement”

Page 12: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 12

“Classification” Schemes

• Chart of subject categories contextualized by a hierarchical structure

• Terms are lists of codes (notation)• Terms are called classes and class numbers• Classification schemes make use of

disciplinary, subject, and (sometimes) exemplified attributes

• Used often to arrange physical documents; sometimes used in online environments

Page 13: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 13

“Classification” Example

• Examples: Dewey Decimal Classification (DDC); Universal Decimal Classification (UDC); Colon Classification

• Sample entries (DDC): – 510 (meaning: “Mathematics” (a discipline and a

subject)); – 512.57 (meaning: “Mathematics / Linear,

multilinear, multidimensional algebras / Factor algebras”)

– 362.582 (meaning: “Social problems and services / Problems of and services to the poor / Financial assistance”)

Page 14: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 14

Four Types of Classification

• Kwasnik describes four classification systems– Hierarchies– Trees– Paradigms– Facets

• Paradigms are useful primarily for analysis of subject gaps and relationships in a constrained space

• Trees are a poor form of hierarchy with limited relationships

• We’ll look at the other two in some detail over the next two weeks

Page 15: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 15

Hierarchies

• Good for representation of knowledge in mature domains where the nature of the entities and relationships are well known

• You’ll see examples of these in the thesauri that we will look at in today’s exercise

• Require a model that describes what entities are included, with rules of association and distinction

• Tend to be monolithic and cumbersome for large domains

Page 16: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 16

Facets

• Actually a different approach rather than a different structure– May use hierarchies or trees as part of the

structure– Originated in the work of S.R. Ranganathan

• Proposed that any object could be viewed in five ways: personality, matter, energy, space and time (PMEST)

– Being used more and more in modern information systems because of flexibility in meeting multiple needs

Page 17: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 17

Collaborative Tagging

• Points out issues of “basic level” and “collective sensemaking”

• Tug of war between personal storage – Identifying qualities– Self reference– Task organizing

• and public nature of access– What or who it is about– What it is– Who owns it– Categories

• Stability emerges from imitation and shared experience

Page 18: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 18

Trees vs. Tags

• Weinberger’s article postulates three types of vocabularies– Trees (hierarchies)– Facets – Tags

• Golder/Huberman and Weinberger both point out that each approach can be useful in particular situations– Choosing your approach is part of the process of

subject and domain analysis

Page 19: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 19

Steps in Constructing CVs

• Define your domain• Gather concepts

– From user interviews, search logs, content analysis, preexisting vocabularies

• Select your approach• Extract terminology• Control your terms• Organize your terms• Maintain, maintain, maintain

Page 20: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 20

Questions?

• If not, take a break!!!

Page 21: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 21

Exercise 6a

• Purpose is to explore some existing controlled vocabularies to investigate their differences and similarities, how useful they might be for subject access, and to become familiar with the structure of controlled vocabularies in general

• Spend the next 45 minutes on Exercise 6a• Ask questions and talk!!!• Be sure to hand in completed work at the end

of class for credit!!!

Page 22: Module 6a: Intro to Controlled Vocabularies, Taxonomies and Classification IMT530: Organization of Information Resources Winter 2007 Michael Crandall.

IMT530- Organization of Information Resources 22

Thursday

• We’ll start to look at ways to build controlled vocabularies and the rules associated with them

• Remember to read assignments BEFORE class