Taxonomy and Knowledge Organization
Taxonomy in ContextTom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
2
Agenda
Introduction: Time for Taxonomies Taxonomy Types: Strengths and Weaknesses
– Formal and Browse
Taxonomy in the Organization: Intellectual Infrastructure– Content, People, Activities
Taxonomy Tips and Techniques– Development Stages– Issues and Ideas
Future Directions– Building on the Intellectual Infrastructure
3
KAPS Group
Knowledge Architecture Professional Services (KAPS) Consulting, strategy recommendations Knowledge architecture audits Partners – Convera and others
– First Convera Certified Taxonomy Developers Taxonomies: Enterprise, Marketing, Insurance, etc.
– Taxonomy customization Intellectual infrastructure for organizations
– Knowledge organization, technology, people and processes– Search, content management, portals, collaboration,
knowledge management, e-learning, etc.
4
Time for Taxonomies
Taxonomy Time: Technology is not delivering– Professionals spend more time looking for information than
using it– 50% of them spend > 2 hours a day looking
Search not enough – text strings vs. concepts– Relevance isn’t very relevant
Data mining misses 80% of significant content– Text mining needs more structure (taxonomies)
Surveys– 76% say taxonomies are important– 90% plan on a taxonomy strategy in 24 months
5
Time for Taxonomies: Word of Caution
Taxonomy is not the answer– Is this a taxonomy?
• Inventories, catalogs, classifications, categorization schemas, thesauri, controlled vocabularies
– Taxonomy not enough – need other structures• Metadata, facets
– Taxonomies have to be used to be useful How to fail:
– Taxonomy as a project– Taxonomy as a search engine project afterthought
6
Two Types of Taxonomies: Browse and Formal
Browse Taxonomy – Yahoo
7
Browse Taxonomies: Strengths and Weaknesses Strengths: Browse is better than search
– Context and discovery– Browse by task, type, etc.
Weaknesses:– Mix of organization
• Catalogs, alphabetical listings, inventories• Subject matter, functional, publisher,
document type– Vocabulary and nomenclature Issues– Problems with maintenance, new material– Poor granularity and little relationship
between parts.• Web site unit of organization
– No foundation for standards
8
Formal Taxonomies: Strengths and Weaknesses
Strengths:– Fixed Resource – little or no maintenance– Communication Platform – share ideas, standards– Infrastructure Resource
• Controlled vocabulary and keywords• More depth, finer granularity
Weaknesses:– Difficult to develop and customize– Don’t reflect users’ perspectives
• Users have to adapt to language
9
Dynamic Classification: Best of Both Worlds Search and browse better than either alone
– Categorized search – context– Browse as an advanced search
Dynamic search and browse is best– Can’t predict all the ways people think
• Advanced cognitive differences• Panda, Monkey, Banana
– Can’t predict all the questions and activities• Intersections of what users are looking for
and what documents are often about• China and Biotech• Economics and Regulatory
Facet Taxonomies– Actors, events, functions, geography
10
Taxonomy in Context: Intellectual Infrastructure
3 infrastructures: technology, organizational, intellectual– Technology – systems and applications, servers and
desktops, programmers and help desks, etc.– Organizational – business units and project groups, policies
and procedures, administrators and facilitators– Intellectual – Information and knowledge, vocabularies and
applications, authors and editors and librarians Taxonomy at the nexus of the three infrastructures Taxonomy enables communication among people, content,
and technology
11
Taxonomy in the Organization: Project Approach or Infrastructure Approach Situation: Problem with access to information
– Project Approach• Publish everything on the intranet• Buy a search engine• Do some keyword and usability tests• Buy a portal (or two)• Buy content management software• Try knowledge organization – taxonomy?
– Infrastructure Approach• “The path up and down is one and the same.”
(Heraclitus)
12
Taxonomy in the Organization:Why an Infrastructure Approach?
Immanuel Kant– “Concepts without percepts are empty.”– “Percepts without concepts are blind.”
Knowledge Management (KM) / Information Projects
– KM without applications is empty• Strategy only, management fad• Elegant taxonomies – unused
Applications without knowledge architecture (KA) are blind
– IT based KM– Fragmented applications
13
Taxonomy in the Organization: Structuring Content
All kinds of content– Structured and unstructured, Internet and desktop
Metadata standards – Dublin core+– Keywords - poor performance – Need controlled vocabulary, taxonomies, semantic network
Document Type– Form, policy, how-to, etc.– Dynamic classification with subject matter taxonomies
Audience– Role, function, expertise, information behaviors– Consistent across subject matter and people
Best bets metadata
14
Taxonomy in the Organization:Structuring People
Individual People– Tacit knowledge, information behaviors– Advanced personalization – category priority
• Sales – forms ---- New Account Form• Accountant ---- New Accounts ---- Forms
Communities– Variety of types – map of formal and informal– Variety of subject matter – vaccines, research, scuba– Variety of communication channels and information behaviors– Community-specific vocabularies, need for inter-community
communication (Cortical organization model)
15
Taxonomy in the Organization:Structuring Processes and Technology Technology: infrastructure and applications
– Enterprise platforms: from creation to retrieval to application– Taxonomy as the computer network
• Applications – integrated meaning, not just data
Creation – content management, innovation, communities of practice (CoPs)
– When, who, how, and how much structure to add– Workflow with meaning, distributed subject matter experts (SMEs)
and centralized teams
Retrieval – standalone and embedded in applications and business processes
– Portals, collaboration, text mining, business intelligence, CRM
16
Taxonomy in the Organization: The Integrating Infrastructure Starting point: knowledge architecture audit, K-Map
– Social network analysis, information behaviors People – knowledge architecture team
– Infrastructure activities – taxonomies, analytics, best bets– Facilitation – knowledge transfer, partner with SMEs
“Taxonomies” of content, people, and activities– Dynamic Dimension – complexity not chaos– Analytics based on concepts, information behaviors
Taxonomy is the answer– In an Infrastructure Context
17
Taxonomy Development: Tips and TechniquesStage One – How to Begin Step One: Strategic Questions – why, what value from the
taxonomy, how are you going to use it– Variety of taxonomies – important to know the differences, when to
use what. Step Two: Get a good taxonomist! (or learn)
– Library Science+ Cognitive Science + Cognitive Anthropology Step Three: Software Shopping
– Automatic Software – Fun Diversion for a rainy day• Uneven hierarchy, strange node names, weird clusters
– Taxonomy Management, Entity Extraction, Visualization Step Four: Get a good taxonomy!
– Glossary, Index, Pull from multiple sources– Get a good document collection
18
Taxonomy Development: Tips and TechniquesStage Two: Development and/or Customization
Combination of top down and bottom up (and Essences)– Top: Design an ontology, facet selection – Bottom: Vocabulary extraction – documents, search logs,
interview authors and users– Develop essential examples (Prototypes)
• Most Intuitive Level – genus (oak, maple, rabbit)• Quintessential Chair – all the essential characteristics, no more
– Work toward the prototype and out and up and down– Repeat until dizzy or done
19
Taxonomy Development: Tips and TechniquesStage Three: Evaluate and Refine
Formal Evaluation– Quality of corpus – size, homogeneity, representative– Breadth of coverage – main ideas, outlier ideas (see next)– Structure – balance of depth and width– Kill the verbs– Evaluate speciation steps – understandable and systematic
• Person – Unwelcome person – Unpleasant person - Selfish person
– Avoid binary levels, duplication of contrasts– Primary and secondary education, public and private
20
Taxonomy Development: Tips and TechniquesStage Three: Evaluate and Refine
Practical Evaluation– Test in real life application– Select representative users and documents– Test node labels with Subject Matter Experts
• Balance of making sense and jargon
– Test with representative key concepts– Test for un-representative strange little concepts that only
mean something to a few people but the people and ideas are key and are normally impossible to find
21
Taxonomy Development: Tips and TechniquesIssues and Ideas
Complex Topics – intersection of subject domains and facets
– What documents are often about is the intersection– Example – China and Biotech
Standards and Customization– Balance of corporate communication and departmental
specifics– At what level are differences represented?– Customize pre-defined taxonomy – additional structure, add
synonyms and acronyms and vocabulary
22
Taxonomy Development: Tips and TechniquesIssues and Ideas
Enterprise Taxonomy– No single subject matter taxonomy – Need an ontology of facets or domains
Enterprise Facet Model:– Actors, Events, Functions, Locations, Objects, Information
Resources– Combine and map to subject domains
23
Future Directions: Knowledge Organization
New analytic methods– Cognitive anthropology, history of ideas, ESNA
New metadata schemas– SCORM, RDF and semantic Web– Learning and knowledge objects
New people models– Bloom’s Taxonomy, Gardner’s 7 Intelligences
Advanced personalization– Community-based, cognitive-based– Adaptive, dynamic presentation variations
24
Future Directions: Technology
Taxonomies within applications– Richer world knowledge and better learning
Entity extraction and fact extraction Natural language processing (NLP) search – answers, not
document lists Integrated KM platform
– Creation, structure, retrieval, application, measurement– Integrated KM/KA team– Contextualizing content: related content, best bets, expertise,
communities
25
Future Directions: Well-Articulated Organization
Learning takes place throughout the system– Smart applications – adapts to users’ and community’s
activities – Just-in-time training and performance support
Combination of analytics and knowledge organization– Concept-level, not document-level– Taxonomy is the brain, analytics are the eyes
Self-knowledge – highest form of knowledge– “Unexamined life is not worth living.” (Plato)– Unexamined, inarticulate enterprise is not worth having
26
The Contextual Desktop: Document, List of Documents, Applications Screen
Before you view:– Agent keeps you up to date– Your connections to content and
communities, your preferences– Your history and the history of other
members of your communities
When you add/change content– Suggests categorization value,
metadata values– Routes to appropriate content and
communities– Prompt on unusual connections
• Pre-existing content• Related content• Regulatory issues• Ask the question – route to experts?
When you look for information– Taxonomy-based dynamic browse– Entities
• People, companies, wells– Related content
• Regulatory, patents, BI-CI• Geological data• News stories
– Dictionaries, USGS data, databases– Experts
• Ask questions, chat When you use information
– Communities• Search, chat, email
– Performance aids, classes– Stories
27
Sources
Books– Women, Fire, and Dangerous Things
• What Categories Reveal about the Mind• Geroge Lakoff
– The Geography of Thought• Richard E. Nisbett
Software– Convera Retrievalware– Inxight Smart Discovery – entity and fact extraction
Courses– Convera Taxonomy Certification
Questions?
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Top Related