What is IOA?

1© AIIM | All Rights Reserved

What is Information Organization and Access?


www.aiim.org/training


How to find information?

Option 1:

Search- 65,400,000

hits when searching for “good red wine”

- Little or no metadata / taxonomy


How to find information?

Option 2:

Browsing- Ability to find

the wine you need via 9 different categories

- Requires metadata / taxonomy


What is IOA?

• IOA, or Information Organization and Access, consists of a content preparation process and a content search and access process

• During the preparation content is captured, prepared, enriched, and indexed;

• During the access process, someone searches for and accesses content


Parts of IOAInformation Organization• Content Architecture

– Structure and composition of a repository, information collection, or individual document

• Content Intelligence– Enriching content with additional information

Information Access• Search and Retrieval

– Querying information sets and obtaining documents

• Findability– Enhancing access to the right information


How to organize information to

improve access?


What is Content Intelligence?• Adding “meaning” to information by structuring,

classifying, and/or labeling the content so it is more findable by both people and technology

• In short, enriching the content– Metadata

• “Data about the data”• Usually a discreet component

– Classification of content

– Taxonomy• Law for categorizing information

© AIIM | All Rights Reserved 9

Metadata Fundamentals• Metadata consists of statements we make about

resources to help us find, identify, use, manage, evaluate, and preserve them -- and perhaps dispose of them

Metadata building blocks • The basic unit of metadata is a statement• A statement consists of a property (aka,

element) and a value–e.g., The shirt has a color (property), which is blue (value)

• Metadata statements describe resources that can be used by content technologies

–e.g., Display all information that is about blue products


Categorizing Metadata

Asset metadata – Who:

Creator, Publisher, Contributor, Type, Format,

Identifier

Subject metadata –What, Where & Why:

Subject, Title, Description, Coverage

Relational metadata – Links between and to:

Source, Relation

Use metadata – When & How:

Date, Language, Rights

Source: Taxonomy Strategies, LLC


Where Value Emerges

Asset metadata – Who:

Creator, Publisher, Contributor, Type, Format,

Identifier

Subject metadata –What, Where & Why:

Subject, Title, Description, Coverage

Relational metadata – Links between and to:

Source, Relation

Use metadata – When & How:

Date, Language, Rights

More efficient content

processing

Better navigation &

discovery

Source: Taxonomy Strategies, LLC


Taxonomy and Content Management

• Taxonomies often act as a “great unifier” in the area of content technologies and enable them to work together

• Many content management systems depend on solid metadata and taxonomy in order to add significant value

• Taxonomy is a key enabler for ECM– Essential for organizing any large content corpus– Required for meaningful records management– Critical to effective findability– Ideal way to represent logical hierarchy

• How you choose to design the taxonomy in the repository, and how the system you choose can use a taxonomy, greatly influence the business value you can realize


Understanding taxonomies• A taxonomy is a classification scheme

– Such as the way that an individual classifies the content of their e-mail inbox, a personal cd collection, or the contents on an iPod

• A taxonomy is a knowledge map– Reflects how it’s owner conceives a given body of content (a

knowledge domain), for purposes of browsing, navigating, discovering, and sharing that information

• A taxonomy is semantic– Indicating the relationships between concepts, such as the

relationships between a car and a steering wheel, in that the steering wheel is a “part of” a car

Source: Organising Knowledge (Patrick Lambe, 2007)


Representations of taxonomies

• Lists• Trees• Hierarchies• Polyhierarchies• Matrices• Facets • System Maps

© AIIM | All rights reserved

Source: Organising Knowledge (Patrick Lambe, 2007)

List Matrices

Facets System Maps


What is a Vocabulary?• Vocabularies represent potential metadata values• Vocabularies can be controlled or uncontrolled

– Controlled vocabularies: metadata must come from a set list (e.g. “Province”)

– Uncontrolled vocabularies: metadata can be applied free-form (e.g. “Town”)

• “Taxonomies” are a particular type of controlled vocabulary– But not all controlled vocabularies are taxonomies– We’ll discuss taxonomies in the next module


Why Use Controlled Vocabularies?

• It’s important to control vocabulary so your searchers don’t have to

• Standards need to be set to minimize confusion among taggers/indexers

• Enforces terminological consistency

• Reduces spelling mistakes

• Enables interoperability


What is a Thesaurus?• Thesaurus: is a networked collection of controlled vocabulary

terms, using associative rather than strict hierarchical relationships– Often used to control synonyms across vocabularies or taxonomies

– But more generally can identify the relationships among terms• E.g. Equal to, Related to, Opposite of

• Some examples from a hypothetical domain– Lettuce = Greens = Frisée (a.k.a, ‘a synonym ring’)

– Coriander is related to Cilantro

• Thesauri can be enormously useful in an enterprise setting– When different units have different taxonomies where systems need

to cross-walk

– When the enterprise cannot agree on a common vocabulary


What is an Ontology?• The formal definition of ontology is "the specification of one's

conceptualization of a knowledge domain”

• Semantic technologies are typically centered around ontologies• Ontologies:

– Resemble faceted taxonomies and often subsume thesauri, but employ richer semantic relationships among terms and attributes

– Apply rules specifying terms and relationships

– Do more than just control vocabulary

– Are a knowledge representation

• Thus, an ontology for salad would contain the structure for how it relates to everything, from ingredients to growers to the rodents that might eat it, and how a salad is different in Japan vs. Italy


Ontology Example

Capturing all the uses of ice cream…

A complete ontology would account for more relationships and properties.

Source: Roz Chast, The New Yorker


What’s a Topic Map?• A topic map is a visual representation of a knowledge

domain• Topic maps are an ISO standard for the representation

and interchange of knowledge, with an emphasis on the findability of information. The standard is formally known as ISO/IEC 13250:2003

• The topics that populate the map are an ontology – topic maps are thus ontology-driven – like the ice cream example


• Folksonomy: the anti-controlled vocabulary. Collaborative vocabularies for tagging content, rarely with any sort of control

• Relevance between metadata and content may be determined by users in a democratic fashion– four users define an object as being “green” – one user defines an object as being “aqua” – relevance can be defined as "more green than aqua”

• Over time, clusters emerge and communities typically self-organize around them– “Wisdom of the crowd”

• Typically arise in Web-based communities where individuals to share content, then create and use tags (e.g., blogs)

• Applied to enterprise use cases when there is a critical mass of taggers to make it worthwhile– Can be a useful “bottom-up” approach to developing taxonomies

What is a Folksonomy?


Folksonomy Example

Source: flickr.com


The importance of Findability


What is Findability?Findability is the quality of being locatable or navigable• At the core of IOA is the findability of information.

Information should be easy to discover or locate• Information access is about helping users find

documents that satisfy their information needs• Remember, someone may be looking for something

they’ve never seen or touched before• Advanced information organization techniques can

support findability– Thesauri, Ontologies, Topic Maps and Semantic Networks– Faceted search and navigation


Access via Browse• Browsing is usually the first option for

users seeking information or documents

– Desktop and enterprise file systems

– Content management system repositories

– Intranets and Websites

• If users can’t find via browse, then they resort to search

• Some users will go straight to search

– This is partly generational


Effective Browsing• Browsing effectiveness is highly dependent on

– navigational structure

– folder labeling

– the location of the content

– In short: depends on how organized the content is…

• Content technologies typically use “virtual folders” to represent different classifications– These allow for multiple paths to the same

content

– In contrast: physical file system forces documents to a single “place”

– Ideally content should be cross-referenced, but not duplicated


Access via Search

• Search is an application or tool for finding information via search term

• Search is omnipresent, and essential– But: there is much ignorance about how search engines work

– Most end-users shouldn’t need to know; they just assume “magic”

• Advanced display techniques can blur the line between search and browse

• Search is not a magic bullet or effective panacea for lack of information organization– Better-organized information will yield more effective search results


How Enterprise Subsystems Work Together

Source: CMS Watch


What Is An Effective Search Result?

• When a user finds what they are seeking– Or not…– Seekers may find more than one answer

• Two ways to measure results effectiveness:

PrecisionPercentage of all returns in a results set that are relevant to the query

RecallPercentage of relevant documents that were actually returned in the results set

• Precision and recall are frequently traded off in actual search implementations – “Tuning” for one can reduce the other…


The Myriad of Search Choices• Vendors recognise the importance of search

– Beware of how they push enterprise search as the answer to an organization’s need for a single, unified window into everything the organization knows at any point in time

• The ultimate knowledge management machine simply does not exist: the typical enterprise search system does not contain “all” the organization's content

• Limitations on available information include:– Security considerations

– Inability to integrate specialized content

– Difficulty reconciling structured and unstructured content

– Cost, time, and difficulty required to incorporate diverse content repositories


Current Trends in Search

As search sector changes, distinctions among different “flavors” of search technology, features, and functions become more difficult to make.

Source: CMS Watch


Federated SearchThe modern enterprise is not a monolith• Multiple information repositories• Multiple search engines• Need to search across information domains from a

single query interface– Federated search approaches are designed to accomplish this– Sometimes called “meta search”

• Two approaches to federated search– Use the same search technology across information sets, but

create separate indexes and merge results– Use multiple search technologies, passing query over

heterogeneous indexes, and synthesizing multiple result sets (more common)


Federated Search ExampleHas seen success on the public web• No security issues around

public info• Limited set of file types• Better metadata can

improve results merging

Example: “Merlot” • Meta search engine

for education resourceson the public web(www.merlot.org)


Challenges of Federated Search

Federated search within the enterprise tends to be much harder

• Multiple indexes mean multiple security systems to resolve

• Different index and query approaches across search systems may skew results

• Often prohibitive performance problems– Results must be de-duped, transferred, merged, and ranked


The Case for Text Mining• Enterprises looking for better findability face two vexing challenges:

1. How to yield metadata from large quantities of information?

2. How to turn “search” into more powerful navigation and discovery?

• Text Mining offers one answer– Text mining is partly a more attractive marketing term for auto-

classification – a term that aligns with the concept of “data mining”

– But text mining takes auto-classification one step further through the discovery of more sophisticated patterns in text

– However, there are many different approaches to text mining

– Text mining is sometimes called “text analytics” or “content intelligence”


How Text Mining Works• Prior to indexing content, information is “discovered” or derived from

a corpus of content– The goal of text mining is to glean information from data, find patterns,

and “separate signal from noise”

– It does this by attempting to extract “entities” and “relationships” from text

– Relevant information is usually derived through the divining of patterns and trends

• Text is then parsed (sometimes adding and removing certain pieces of text for the purposes of an index)

• Typical text mining tasks can include auto-classification, clustering, concept/entity extraction, auto-categorization (production of taxonomies), document summarization, and entity relation modeling (i.e., learning relations between entities, as in an ontology)– Different text mining tools tend to excel at one or just a handful of these

approaches


A Clustering Example

Source: cwi.nl & Inxight


A Clustering Example

Source: London Natural History Museum & Inxight


IOA Strategy IOA as a Practice IOA as a Project IOA Master

For more information:AIIM IOA Certificate Program



Find, Inventory,Analyze Content

Metadata Taxonomy

Ontologies andTopic Maps

ContentModelling

Introduction toAccess

Search Techniques

Topics inFindability

User ExperienceOf IOA

Parts of IOA

For more information:AIIM IOA Practitioner


What is IOA?

Technology

Transcript of What is IOA?