Post on 27-Mar-2015
Tax
onom
y &
Met
adat
a / I
nfo
rmat
ion
A
rch
itec
ture
Con
sult
ing
Amy J. Warner, Ph.D.
Metadata & Taxonomies for a More Flexible Information
ArchitectureInformation Architecture Summit
March 16, 2002
Amy J. Warner, Ph.D.
warneramyj@yahoo.com
Amy J. Warner, Ph.D. 2
Outline
• What I’ll cover:– Metadata and IA.– Metadata schema.– Vocabulary development.
• Underlying themes:– Standards.– Reality.– Some IR (information retrieval) issues.
Amy J. Warner, Ph.D. 3
What is Metadata?
Metadata is structured data which describes the characteristics of a resource. It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives.
Chris TaylorUniversity of Queensland
Amy J. Warner, Ph.D. 4
Types & Functions of MetadataTYPE DEFINITION EXAMPLES
Administrative Metadata used in managing andadministering resources
Acquisition informationRights and reproduction trackingDocumentation of legal accessrequirementsLocation informationVersion control
Descriptive Metadata used to describe oridentify information resources
Cataloging recordsSpecialized indexesHyperlinked relationships betweenresourcesAnnotations by users
Preservation Metadata related to thepreservation of informationresources
Documentation of actions taken topreserve physical and digitalversions of resources (e.g., datarefreshing and migration)
Technical Metadata related to how asystem functions or metadatabehaves
Digitization information (e.g.,formats, compression ratios,scaling routines)Authentication and security data(e.g., encryptions, passwords)
Use Metadata related to the level andtype of use of informationresources
Use and user trackingContent re-use and multi-versioning information
Introduction to Metadata, Getty Information Institute
Amy J. Warner, Ph.D. 5
Confusing Terminology• Controlled vocabularies
– Subject Headings: traditionally employed in libraries to tag (index) the topics of books and other library materials
– Thesauri: traditionally employed in abstracting & indexing services to tag (index) the topics of journal articles and other scholarly material in a given subject area (e.g. medicine, engineering)
– Taxonomies: the classification of different organisms into mutually exclusive categories based on phylum species
Amy J. Warner, Ph.D. 6
Levels of Control
Simple Complex
SynonymRings
AuthorityFiles
ThesauriClassificationSchemes
Equivalence Hierarchical Associative
(Vocabularies)
(Relationships)
Taxonomies
Amy J. Warner, Ph.D. 7
Metadata & IA
Content
UsersBusinessContext
Identify patternsin content
Determine how target audience(s) search for and use information
Determine how stakeholderswant to organize &present
their information
Amy J. Warner, Ph.D. 8
IA ‘Generations’
• ‘Brochureware’
• Pages served from database
• Metadata-driven website
CMS
Amy J. Warner, Ph.D. 9
Metadata in Metadata-Driven Websites
MetadataRecords
Content
J. Jones xxxx White Paper Employees http://...
Author Title DocType Audience URL
http://….
Amy J. Warner, Ph.D. 10
Two Parts to Generating a Metadata Schema
• Decisions about indexable parameters (attributes, aspects) of documents; this corresponds to fields in the database records.
• Decisions about the elements (terms, descriptors, subject headings, tags) that these fields contain.
Amy J. Warner, Ph.D. 11
Two Possibilities
• Content already exists– Identify content that exists--content
inventory.
• Most or all content does not exist– Use ‘wish lists’ to identify desired content.
• To do content inventory, need to go to those who are going to develop, own, maintain content.
Amy J. Warner, Ph.D. 12
Content Analysis
• Look for patterns, similarities:– logical--themes, sensitivity, specialization.– physical--formats, dynamic vs. static (dated
vs. rarely updated).
• Look for relationships--note connections between content (parent-child, sibling, dependencies.
• Begin to create groupings.
Amy J. Warner, Ph.D. 13
Generating a Metadata Table
• The beginning of a metadata-driven website.
• Determine the major indexable parameters or attributes for each major document type in your sample.
• Determine what major types of rules or general guidelines your indexing system will follow for each attribute.
• Create an X-by-Y table.
• Put indexable attributes on the X axis and the rules on the Y axis.
• Fill in the decisions you make about each rule application in the individual cells of the table.
Amy J. Warner, Ph.D. 14
Required Repeatable Auto/Manual Whole doc/Concepts
CV
Author Yes Yes Manual Whole Doc. No
Title Yes No Manual Whole Doc. No
DocType No Yes Manual Whole Doc. DocTypesList
Subject Yes Yes Semi-Auto Concepts SubjectsVocabulary
Audience No No Manual WholeDocument
AudienceList
Metadata Table
Amy J. Warner, Ph.D. 15
User and Stakeholder Involvement
• When organizing content, start with the content, generate the metadata, and then evaluate with users and stakeholders.
• When organizing entities (i.e. products, projects) where content is not the major focus, start with stakeholders and users to determine metadata.
Amy J. Warner, Ph.D. 16
Identify Terms• Published Reference Materials
– Thesauri, classification schemes, encyclopedias, dictionaries, glossaries, indexes.
• Content– Representative sample of web site / intranet.
• Users– Search log analysis, surveys, interviews.
• Experts– Authors, subject experts.
Amy J. Warner, Ph.D. 17
Organize Terms
• Define preferred terms.• Link synonyms and variants.
Synonym Rings
• Group preferred terms by subject.• Identify broader and narrower terms.
Taxonomies / Hierarchies
• Identify related terms.Thesauri
Amy J. Warner, Ph.D. 18
Variant Terms
Variant terms provide the user with entrypoints into the vocabulary.
Synonyms (same meaning):cats USE felines helicopters USE whirlybirds
Lexical Variants (different word forms):paediatrics USE pediatrics BK USE Burger King
Quasi-Synonyms (treated as equivalent):generic posting: beagle USE dogantonyms/continuum: wetness USE dryness
Amy J. Warner, Ph.D. 19
Term Specificity
Assuming a good entry vocabulary, increased term specificity allows for improved precision without hurting recall (but costs grow fast).
Vocabulary A Vocabulary B United States United States
California San Diego
Amy J. Warner, Ph.D. 20
Compound Terms
Article Title: “Software for Information Architects”H
igh
Pre
cis
ion
Hig
h R
ec
all
One Term Information Architecture Software
Two Terms Information Architecture Software
Three Terms Architecture Information Software
Amy J. Warner, Ph.D. 21
Facets
Things (entities)ConceptsProcessesPeopleOrganizationsOccupations
etc.
TopicAudienceIntellectual LevelFormTypeLanguageDate
etc.
Facets of a Topic Facets of Documents
Aspects of Documentsto Index
Controlled Vocabular(ies)
Amy J. Warner, Ph.D. 22
Facet Analysis
• Facets come from content inventory, intuition, and users.
• Break domain into logical categories or chunks based on how documents need to be managed (both for system and for search).
Amy J. Warner, Ph.D. 23
Polyhierarchy
• Strict Hierarchies– Each term appears in only
one place in the hierarchy.– Essential for placement
of physical objects.
• Polyhierarchies– Terms cross-listed
in multiple categories– Accepts complex
nature of reality.
Amy J. Warner, Ph.D. 24
Polyhierarchy
• Compound terms neededto manage 6 milliondocuments in Medline.
• High level ofpre-coordinationforces polyhierarchy.
• Terms may havemore than one BT. Viral
Pneumonia
Diseases
VirusDiseases
RespiratoryTract
Diseases
Medical Subject Headings (MeSH)
Amy J. Warner, Ph.D. 25
Facets, Coordination, Specificity
Drying of ApplesDrying of PearsDrying of PeachesCanned ApplesCanned PearsCanned PeachesFrozen ApplesFrozen PearsFrozen PeachesFresh ApplesFresh PearsFresh PeachesFreezing of Canned ApplesCanning of Dried PearsDrying of Fresh Peaches
EntitiesApplesPearsPeaches
ProcessesCanningFreezingDrying
FormsCannedFrozenFresh
ApplesPearsPeachesCanningFreezingDryingCannedFrozenFreshCanning of ApplesCanning of PearsCanning of PeachesFreezing of ApplesFreezing of PearsFreezing of Peaches
Partial List of Potential Combinations
Amy J. Warner, Ph.D. 26
Semantic Relationships
• Equivalence:– Use/Used For (USE/UF)– Leads from variants to preferred
e.g., prams: USE baby carriages
A = B
Amy J. Warner, Ph.D. 27
Semantic Relationships
• Hierarchical:– Broader Term/Narrower Term (BT/NT)
• Types– Generic (class/species, inheritance)
Vertebrata NT Amphibia
– Whole-Part (associative unless exclusive)
Ear NT Vestibular Apparatus
– Instance (proper name)
Seas NT Mediterranean Sea
AB
Amy J. Warner, Ph.D. 28
Semantic Relationships
• Associative:– Related Term (RT, See Also)
– Non-hierarchical and non-equivalent– Relation should be “strongly implied”
e.g., hammers RT nails
A B
Amy J. Warner, Ph.D. 29
Associative Relationships
• Field of Study and Object of Study:– Forestry RT Forests
• Process and its Agent:– Temperature Control RT Thermostat
• Concepts and their Properties:– Poisons RT Toxicity
• Action and Product of Action:– Weaving RT Cloth
• Concepts Linked by Causal Dependence:– Bereavement RT Death
Amy J. Warner, Ph.D. 30
Leveraging the Thesaurus• User Interface:
– Generate browsable indexes (site-wide, sub-site, specialized authority lists).
– Enable Field-Specific Searching (filters, zones, sorting).
– Support personalization (map profile to vocabulary).
• Behind the Scenes:– Enable efficient content management.– Support decentralized tagging.
Amy J. Warner, Ph.D. 31
Uses of Metadata-Driven Website
• Routing
• Search
• Navigation
Amy J. Warner, Ph.D. 32
RoutingDocument Stream Metadata Filter Document Subset
From IndividualContributors or Syndication Service
Profile orFilter
Amy J. Warner, Ph.D. 33
Generalizations about Routing
• Can be ‘push’ or ‘pull’.
• Can be driven by various metadata elements (e.g., audience, topic, etc.).
• May have both internal and external metadata schemes to consider; mapping may be an important issue.
Amy J. Warner, Ph.D. 34
SearchingSearchingUser Query Databases Document
Subset
MetadataRecords
http://….
Amy J. Warner, Ph.D. 35
Epicurious.com
Amy J. Warner, Ph.D. 36
Epicurious, First FacetBrowse > Picnics
Amy J. Warner, Ph.D. 37
Epicurious.com Facets
Beans, Beef, Berries, Cheese, Chocolate, Citrus,Dairy, Eggs, Fish, Fruits, Garlic, Ginger, Grains,Greens, Herbs, Lamb, Mushrooms, Mustard, Nuts,Olives, Onions, Pasta, Peppers, Pork, Potatoes, Poultry, Rice, Shellfish, Tomatoes, Vegetables
Main Ingredients
African, American, Asian, Caribbean, EasternEuropean, French, Greek, Indian, Italian, Jewish,Mediterranean, Mexican, Middle Eastern,Scandinavian, Spanish
Cuisine
Advance, Bake, Broil, Fry, Grill, Marinade,Microwave, No Cook, Poach, Quick, Roast, Sauté, Slow Cook, Steam, Stir Fry
Preparation Method
Christmas, Easter, Fall, Fourth of July,Hanukkah, New Years, Picnics, Spring,Summer, Superbowl, Thanksgiving, Valentine's Day, Winter
Season/Occasion
Appetizers, Bread, Breakfast, Brunch,Condiments, Cookies, Desserts, HorsD'oeuvres, Main Dish, Salads, Sandwiches,Sauces, Side Dish, Snacks, Soup, Vegetables
Course/Dish
Amy J. Warner, Ph.D. 38
Epicurious, Second Facet
Browse > Picnics > Poultry
Amy J. Warner, Ph.D. 39
Integration of Search and Browse
Amy J. Warner, Ph.D. 40
Integration of Search and Browse
Amy J. Warner, Ph.D. 41
Amazon.com Advanced Search
Amy J. Warner, Ph.D. 42
Generalizations about Search & Navigation
• The relationship between the metadata and search engine capabilities is crucial.
• Controlled vocabulary and keyword searching are often both enabled.
• Navigation and search are often both provided as complements to each other.
Amy J. Warner, Ph.D. 43
Contact:Amy J. Warner, Ph.D.warneramyj@yahoo.com
Questions??