Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

85
Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics

Transcript of Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Page 1: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Dynamic Classification Workshop

Claude Vogel

Roadmap & Quality Metrics

Page 2: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Outline = Roadmap

• Definitions

• Step by step• Phase 1

• Taxonomy design [QA]• Implementation & Tests

Lexicon extraction [QA]Meta data generation [QA]

• Phase 2• Classification design [QA]• Implementation & Tests

Portal generation [QA]

• Conclusion

Page 3: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Your Problem

• Hit lists are inefficient

• Information is unstructured

• Information structure is irrelevant

Page 4: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Define “Find”

• I’m looking for an “APARTMENT in CARLSBAD”

• I end up with a STUDIO in OCEANSIDE

• “Find” is a result, not a starting point

• Find is not: Search + Retrieval system

• Find is a dynamic process

Apartment

CarlsbadOceansideOceanside

Studio

Page 5: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Relate available information to

OUR

decision-making processes

Page 6: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Dynamic Classification

Rationale: Associate a semantic signature to structured and unstructured sources, then use this semantic representation to slice n’ dice sources.

• Example 1 : Endeca• Meta-data index• Parametric classification

• Example 2: Convera• Taxonomic index• Topical classification

Page 7: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 8: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 9: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 10: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 11: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 12: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 13: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 14: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 15: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 16: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 17: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 18: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 19: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 20: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Reduce Complexity

Domestic Salesand Marketing ?

Jobsand Marketing ?

Page 21: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Categorize…

Bonus

Domestic Sales

Marketing

Jobs

Page 22: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

…And Classify!

Bonus

Domestic Sales

Marketing

JobsDomestic Salesand Marketing ?

Page 23: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

…And Classify Again!

Jobsand Marketing ?

BonusDomestic Sales

MarketingJobs

Page 24: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Leverage K-Assets

TAGS

Page 25: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Categories = Essential Knowledge

Africa

Somalia

“A reasonably stable definition of the basic components of the world”

Genus to species

Munitions

Bombs

Page 26: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Classification = Accidental Knowledge

Africa

Missiles

Missiles

Africa

“A relevant answer to a practical problem”

Whatever

Page 27: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

A Twofold Process

1. Taxonomy driven categorization• Steady• Accurate• Scalable

2. Classification driven user interface• Flexible• Relevant• Focused

Page 28: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Glossary

• Paradigmatic models• Ontology, Taxonomy

• Practical models• Inventory, Catalog, Classification

Page 29: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

The Semiotic Triangle

Concept

Reference

Mammals Carnivora Canidae Canids Boxer

… It stands about 56 to 61 cm (about 22 to 24 in) high and weighs about 30 kg (about 66 lb) Source: Microsoft Encarta.

Word

“Boxer”Boxer

Page 30: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Lexicon, Taxonomy, Catalog

Catalog

Taxonomy

Lexicon

Page 31: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Ontology

• An ontology is a foundation of categories representing a view of the world. An ontology reflects the commonly used and trusted breakdown of categories. For example, the breakdown of news items into categories of ‘World’, ‘Sports’, ‘Politics’, etc. is ontological.

Page 32: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Taxonomy

• A taxonomy is a hierarchical system describing genera and species. Species derive from a common genus and are hierarchically represented according to their essential characteristics and differences. For example, animals are categorized with the "Taxonomy of Life" which separates mammals from birds and spiders from insects, based on proper features and relative differences. This genus to species nomenclature is highlighted by terminology which moves from generic terms to binomial terms through lexical derivation and compounding.

• A taxonomy doesn’t deal with things, but with the essence of things: a taxonomy is based on an ontology.

Page 33: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Inventory, Catalog

• Inventory• List of things which stand for themselves, as

they are, where they are.

• Catalog• Consolidated inventory, introducing for that

purpose some kind of elementary classification.

• In both cases, the things listed have a unique and non-ambiguous name: e.g. URL, serial number, etc.

Page 34: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Classification

1. Arrangement of things according to some of their properties

2. Arrangement of types of things according to some of their properties

Multiple classification systems might combine multiple ontologies in multiple ways.

Things might have multiple locations in any given classification.

Page 35: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Thesaurus Nomenclature

Page 36: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Glossary

• ANSII/NISO Z39.19-1993

• A thesaurus is a controlled vocabulary arranged in a known order and structured so that equivalence, homographic, hierarchical, and associative relationships among terms are displayed clearly and identified by standardized relationship indicators that are employed reciprocally.

• The primary purposes of a thesaurus are (a) to facilitate retrieval of documents and (b) to achieve consistency in the indexing of written or otherwise recorded documents and other items, mainly for postcoordinate information storage and retrieval systems.

Page 37: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Outline = Roadmap

• Definitions

• Step by step• Phase 1

• Taxonomy design [QA]• Implementation & Tests

Lexicon extraction [QA]Meta data generation [QA]

• Phase 2• Classification design [QA]• Implementation & Tests

Portal generation [QA]

• Conclusion

Page 38: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Terrorism

Vertical Cartridges

Weapons

Geography

Plug and Play

Page 39: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Example 1: Geography

Africa

Algeria

Angola

Asia

Afghanistan

Armenia

Europe

Albania

Andorra

Middle East

Bahrain

Iran

North and Central America

Antigua and Barbuda

Bahamas

Pacific

Australia

Fiji

South America

Argentina

Bolivia

U.S.

Alabama

Alaska

Page 40: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Example 2 : DefenseDefense Communications

Satellite Communications

Tactical Communications

Defense Systems

Air Defense

Antiaircraft Defense Systems

Gun Air Defense Systems

Antimissile Defense Systems

Forward Area Air Defense Systems

Terminal Defense

Aircraft Defense Systems

Antisubmarine Defense Systems

Antiswimmer Defense Systems

Countermeasures

Acoustic Countermeasures

Page 41: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Ordnance

Fire Control Systems

Sights

Gun Sights

Radar Gun Sights

Unique Beginner

Life Form

Generic

Specific

Varietal

Taxonomy Design Canon

Page 42: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Example: Breads

Page 43: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Ontology Proliferation

Page 44: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Mass Nouns

• Linnaeus: Higher taxa are artefacts: “ An order is a subdivision of classes needed to avoid placing together more genera than the mind can follow.” Philosophia Botanica

• Some life-form categories are created to group objects together. Terms associated to these are often mass nouns (versus count nouns) like “furniture”: “a kind of things of different kinds made by people to etc.”

Page 45: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Synonyms

Person

Unwelcome person

Unpleasant person

Selfish person

Opportunist

Backscratcher

(WordNet)

Page 46: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Cycles

• Life-formGenus

SpeciesLife-form (mass noun)

Genus (having derivate forms)Species (derivates

from genus)

Page 47: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Ontology Vacuum

Acceptance

Product Acceptance

Accountability

Social Responsibility

Social Investing

Accountants

Public Accountants

Cpas

Attorney Cpas

Accounting Firms

Big Five Accounting Firms

Big Six Accounting Firms

Page 48: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Unbalanced derivation

Acceptance

Product Acceptance

Accidents

Accident Prevention

Aircraft Accidents and Safety

Air Traffic Control

Hijacking

Boating Accidents and Safety

Construction Accidents and Safety

Electrocutions

Falls

Firearm Accidents and Safety

Household Accidents and Safety

Nuclear Accidents and Safety

Occupational Accidents

Industrial Accidents

Occupational Safety

Indoor Air Quality

Railroad Accidents and Safety

Ship Accidents and Safety

Lighthouses

Swimming Accidents and Safety

Drownings

Traffic Accidents and Safety

Hit and Run Accidents

Page 49: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Duplicated Paths = Classification schema

Tax

Individuals Corporations

Assets Liability Assets Liability

Tax

Individuals Corporations

Assets Liability

Individuals Corporations

Page 50: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Split Paradigms in Multiple Taxonomies

Loans Debts

Liabilities Assets

Tax items Tax payers

Organizations

Individuals

Assoc.

Corporations

Page 51: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Taxonomy 101

1. Identify the main paradigms

2. Look for thesauri

3. Focus on taxonomy first

4. Split partonomies

5. Clean up ontology

6. Check levels, overlaps, etc.

7. Review all synsets

Page 52: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Outline

• Introduction

• Step by step• Phase 1

• Taxonomy design [QA]• Implementation & Tests

Lexicon extraction [QA]Meta data generation [QA]

• Phase 2• Classification design [QA]• Implementation & Tests

Portal generation [QA]

• Conclusion

Page 53: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Sources

• Dispersion (Multiplicity, Size, Homogeneity)

• Refresh

• AccessFeatures Internet,

News, E-Mail

Reports, Patents

E-Trade, Logs

Informative content - + + Number of topics covered + + - Structured information - + + Size of records - + - Number of records + - +

Page 54: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 55: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 56: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.
Page 57: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Taxonomy ActivationGeography

Nairobi

AfricaAlgeriaAngolaKenya

NairobiTanzania

Dar es Salaam

AsiaAfghanistanArmenia

Nairobi

Dar es Salaam

Dar es Salaam

Page 58: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Smart Latching

Rifles

Gun sight

Weapons

Page 59: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Disambiguation

Rifles

Weapons

Control the ambiguity generated by the keyword based latching mode used by taxonomy expansion.

Sight

Fire Control Systems

Gun sight

Page 60: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Ranking Formula

Example:

2 occurrences of “chemical laser”1 occurrence of “gun sight”

Defense

Lasers

Ordnance

File Control Systems

Sights

2

1

0

1

2

3

0 1 2 3

Specificity Concentration

Distance

Amplifiers

Page 61: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

XML Output

Page 62: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Tables

Page 63: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Charts

Page 64: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Outline

• Introduction

• Step by step• Phase 1

• Taxonomy design [QA]• Implementation & Tests

Lexicon extraction [QA]Meta data generation [QA]

• Phase 2• Classification design [QA]• Implementation & Tests

Portal generation [QA]

• Conclusion

Page 65: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Classification = Matrix

Page 66: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Permutable Trees

Page 67: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Typical Structures

• Geography / Topic• Terrorism in Philippines• Criminal Law in Texas• Domestic Sales• Security in Building C

• Horizontal / Vertical• Petroleum Business• AML Regulations

• Vertical / Vertical• Chemical Compounds for Alzheimer

Page 68: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Xml Representation

Page 69: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Population Mechanism

Page 70: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Population Control

Page 71: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Spread

0

Page 72: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Mutual Information

All Bomb truck are Kenya

Some Kenya are Bomb truck

Page 73: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Example

Page 74: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

High MI And Low Spread

Page 75: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Low MI And High Spread

Page 76: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Typical Patterns: Over/Under Populated

Page 77: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Typical Patterns : Bottleneck

Page 78: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Typical Patterns : Interrupted Bell Curve

Page 79: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Typical Patterns : Multiple Cycles

Page 80: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

10 Tests To Qualify Your Classification

1. Average size

2. Top folders size

3. Depth

4. Balance

5. Cycles

6. Interrupted cycles

7. “Strings”

8. Buried documents

9. The needle test

10. The false discovery test

Page 81: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Conclusion

Page 82: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

How Is It Useful ?

Quickly point to the relevant information

Put in perspective extremely large amounts

of information

Maintain multiple views on a consistent

repository

Page 83: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Roadmap For Success

1. Build a FOUNDATION

2. QUALIFY results

3. Attain MATURITY

Page 84: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Project Metrics

• Typical Planning• 2-3 weeks

• Typical Team• 1 KE + Experts + Users Panel

• Typical Cost• Categorization Software• Internal Support

• Typical ROI• $ 2M / Year for 5,000 Users

Page 85: Dynamic Classification Workshop Claude Vogel Roadmap & Quality Metrics.

Dynamic Classification Workshop

Claude Vogel

[email protected]

http://www.convera.com

Roadmap & Quality Metrics