Hitachi Infrastructure Opportunity Assessment (iOA) Overview
What is IOA?
-
Upload
atle-skjekkeland -
Category
Technology
-
view
140 -
download
0
description
Transcript of What is IOA?
1© AIIM | All Rights Reserved
What is Information Organization and Access?
2© AIIM | All Rights Reserved
www.aiim.org/training
3© AIIM | All Rights Reserved
How to find information?
Option 1:
Search- 65,400,000
hits when searching for “good red wine”
- Little or no metadata / taxonomy
4© AIIM | All Rights Reserved
How to find information?
Option 2:
Browsing- Ability to find
the wine you need via 9 different categories
- Requires metadata / taxonomy
5© AIIM | All Rights Reserved
What is IOA?
• IOA, or Information Organization and Access, consists of a content preparation process and a content search and access process
• During the preparation content is captured, prepared, enriched, and indexed;
• During the access process, someone searches for and accesses content
6© AIIM | All Rights Reserved
Parts of IOAInformation Organization• Content Architecture
– Structure and composition of a repository, information collection, or individual document
• Content Intelligence– Enriching content with additional information
Information Access• Search and Retrieval
– Querying information sets and obtaining documents
• Findability– Enhancing access to the right information
7© AIIM | All Rights Reserved
How to organize information to
improve access?
8© AIIM | All Rights Reserved
What is Content Intelligence?• Adding “meaning” to information by structuring,
classifying, and/or labeling the content so it is more findable by both people and technology
• In short, enriching the content– Metadata
• “Data about the data”• Usually a discreet component
– Classification of content
– Taxonomy• Law for categorizing information
© AIIM | All Rights Reserved 9
Metadata Fundamentals• Metadata consists of statements we make about
resources to help us find, identify, use, manage, evaluate, and preserve them -- and perhaps dispose of them
Metadata building blocks • The basic unit of metadata is a statement• A statement consists of a property (aka,
element) and a value–e.g., The shirt has a color (property), which is blue (value)
• Metadata statements describe resources that can be used by content technologies
–e.g., Display all information that is about blue products
© AIIM | All Rights Reserved 10
Categorizing Metadata
Asset metadata – Who:
Creator, Publisher, Contributor, Type, Format,
Identifier
Subject metadata –What, Where & Why:
Subject, Title, Description, Coverage
Relational metadata – Links between and to:
Source, Relation
Use metadata – When & How:
Date, Language, Rights
Source: Taxonomy Strategies, LLC
© AIIM | All Rights Reserved 11
Where Value Emerges
Asset metadata – Who:
Creator, Publisher, Contributor, Type, Format,
Identifier
Subject metadata –What, Where & Why:
Subject, Title, Description, Coverage
Relational metadata – Links between and to:
Source, Relation
Use metadata – When & How:
Date, Language, Rights
More efficient content
processing
Better navigation &
discovery
Source: Taxonomy Strategies, LLC
© AIIM | All Rights Reserved 12
Taxonomy and Content Management
• Taxonomies often act as a “great unifier” in the area of content technologies and enable them to work together
• Many content management systems depend on solid metadata and taxonomy in order to add significant value
• Taxonomy is a key enabler for ECM– Essential for organizing any large content corpus– Required for meaningful records management– Critical to effective findability– Ideal way to represent logical hierarchy
• How you choose to design the taxonomy in the repository, and how the system you choose can use a taxonomy, greatly influence the business value you can realize
© AIIM | All Rights Reserved 13
Understanding taxonomies• A taxonomy is a classification scheme
– Such as the way that an individual classifies the content of their e-mail inbox, a personal cd collection, or the contents on an iPod
• A taxonomy is a knowledge map– Reflects how it’s owner conceives a given body of content (a
knowledge domain), for purposes of browsing, navigating, discovering, and sharing that information
• A taxonomy is semantic– Indicating the relationships between concepts, such as the
relationships between a car and a steering wheel, in that the steering wheel is a “part of” a car
Source: Organising Knowledge (Patrick Lambe, 2007)
© AIIM | All Rights Reserved 14
Representations of taxonomies
• Lists• Trees• Hierarchies• Polyhierarchies• Matrices• Facets • System Maps
© AIIM | All rights reserved
Source: Organising Knowledge (Patrick Lambe, 2007)
List Matrices
Facets System Maps
© AIIM | All Rights Reserved 15
What is a Vocabulary?• Vocabularies represent potential metadata values• Vocabularies can be controlled or uncontrolled
– Controlled vocabularies: metadata must come from a set list (e.g. “Province”)
– Uncontrolled vocabularies: metadata can be applied free-form (e.g. “Town”)
• “Taxonomies” are a particular type of controlled vocabulary– But not all controlled vocabularies are taxonomies– We’ll discuss taxonomies in the next module
© AIIM | All Rights Reserved 16
Why Use Controlled Vocabularies?
• It’s important to control vocabulary so your searchers don’t have to
• Standards need to be set to minimize confusion among taggers/indexers
• Enforces terminological consistency
• Reduces spelling mistakes
• Enables interoperability
© AIIM | All Rights Reserved 17
What is a Thesaurus?• Thesaurus: is a networked collection of controlled vocabulary
terms, using associative rather than strict hierarchical relationships– Often used to control synonyms across vocabularies or taxonomies
– But more generally can identify the relationships among terms• E.g. Equal to, Related to, Opposite of
• Some examples from a hypothetical domain– Lettuce = Greens = Frisée (a.k.a, ‘a synonym ring’)
– Coriander is related to Cilantro
• Thesauri can be enormously useful in an enterprise setting– When different units have different taxonomies where systems need
to cross-walk
– When the enterprise cannot agree on a common vocabulary
© AIIM | All Rights Reserved 18
What is an Ontology?• The formal definition of ontology is "the specification of one's
conceptualization of a knowledge domain”
• Semantic technologies are typically centered around ontologies• Ontologies:
– Resemble faceted taxonomies and often subsume thesauri, but employ richer semantic relationships among terms and attributes
– Apply rules specifying terms and relationships
– Do more than just control vocabulary
– Are a knowledge representation
• Thus, an ontology for salad would contain the structure for how it relates to everything, from ingredients to growers to the rodents that might eat it, and how a salad is different in Japan vs. Italy
© AIIM | All Rights Reserved 19
Ontology Example
Capturing all the uses of ice cream…
A complete ontology would account for more relationships and properties.
Source: Roz Chast, The New Yorker
© AIIM | All Rights Reserved 20
What’s a Topic Map?• A topic map is a visual representation of a knowledge
domain• Topic maps are an ISO standard for the representation
and interchange of knowledge, with an emphasis on the findability of information. The standard is formally known as ISO/IEC 13250:2003
• The topics that populate the map are an ontology – topic maps are thus ontology-driven – like the ice cream example
© AIIM | All Rights Reserved 21
• Folksonomy: the anti-controlled vocabulary. Collaborative vocabularies for tagging content, rarely with any sort of control
• Relevance between metadata and content may be determined by users in a democratic fashion– four users define an object as being “green” – one user defines an object as being “aqua” – relevance can be defined as "more green than aqua”
• Over time, clusters emerge and communities typically self-organize around them– “Wisdom of the crowd”
• Typically arise in Web-based communities where individuals to share content, then create and use tags (e.g., blogs)
• Applied to enterprise use cases when there is a critical mass of taggers to make it worthwhile– Can be a useful “bottom-up” approach to developing taxonomies
What is a Folksonomy?
© AIIM | All Rights Reserved 22
Folksonomy Example
Source: flickr.com
23© AIIM | All Rights Reserved
The importance of Findability
24© AIIM | All Rights Reserved
What is Findability?Findability is the quality of being locatable or navigable• At the core of IOA is the findability of information.
Information should be easy to discover or locate• Information access is about helping users find
documents that satisfy their information needs• Remember, someone may be looking for something
they’ve never seen or touched before• Advanced information organization techniques can
support findability– Thesauri, Ontologies, Topic Maps and Semantic Networks– Faceted search and navigation
25© AIIM | All Rights Reserved
Access via Browse• Browsing is usually the first option for
users seeking information or documents
– Desktop and enterprise file systems
– Content management system repositories
– Intranets and Websites
• If users can’t find via browse, then they resort to search
• Some users will go straight to search
– This is partly generational
© AIIM | All Rights Reserved 26
Effective Browsing• Browsing effectiveness is highly dependent on
– navigational structure
– folder labeling
– the location of the content
– In short: depends on how organized the content is…
• Content technologies typically use “virtual folders” to represent different classifications– These allow for multiple paths to the same
content
– In contrast: physical file system forces documents to a single “place”
– Ideally content should be cross-referenced, but not duplicated
27© AIIM | All Rights Reserved
Access via Search
• Search is an application or tool for finding information via search term
• Search is omnipresent, and essential– But: there is much ignorance about how search engines work
– Most end-users shouldn’t need to know; they just assume “magic”
• Advanced display techniques can blur the line between search and browse
• Search is not a magic bullet or effective panacea for lack of information organization– Better-organized information will yield more effective search results
© AIIM | All Rights Reserved 28
How Enterprise Subsystems Work Together
Source: CMS Watch
© AIIM | All Rights Reserved 29
What Is An Effective Search Result?
• When a user finds what they are seeking– Or not…– Seekers may find more than one answer
• Two ways to measure results effectiveness:
PrecisionPercentage of all returns in a results set that are relevant to the query
RecallPercentage of relevant documents that were actually returned in the results set
• Precision and recall are frequently traded off in actual search implementations – “Tuning” for one can reduce the other…
© AIIM | All Rights Reserved 30
The Myriad of Search Choices• Vendors recognise the importance of search
– Beware of how they push enterprise search as the answer to an organization’s need for a single, unified window into everything the organization knows at any point in time
• The ultimate knowledge management machine simply does not exist: the typical enterprise search system does not contain “all” the organization's content
• Limitations on available information include:– Security considerations
– Inability to integrate specialized content
– Difficulty reconciling structured and unstructured content
– Cost, time, and difficulty required to incorporate diverse content repositories
© AIIM | All Rights Reserved 31
Current Trends in Search
As search sector changes, distinctions among different “flavors” of search technology, features, and functions become more difficult to make.
Source: CMS Watch
© AIIM | All Rights Reserved 32
Federated SearchThe modern enterprise is not a monolith• Multiple information repositories• Multiple search engines• Need to search across information domains from a
single query interface– Federated search approaches are designed to accomplish this– Sometimes called “meta search”
• Two approaches to federated search– Use the same search technology across information sets, but
create separate indexes and merge results– Use multiple search technologies, passing query over
heterogeneous indexes, and synthesizing multiple result sets (more common)
© AIIM | All Rights Reserved 33
Federated Search ExampleHas seen success on the public web• No security issues around
public info• Limited set of file types• Better metadata can
improve results merging
Example: “Merlot” • Meta search engine
for education resourceson the public web(www.merlot.org)
© AIIM | All Rights Reserved 34
Challenges of Federated Search
Federated search within the enterprise tends to be much harder
• Multiple indexes mean multiple security systems to resolve
• Different index and query approaches across search systems may skew results
• Often prohibitive performance problems– Results must be de-duped, transferred, merged, and ranked
© AIIM | All Rights Reserved 35
The Case for Text Mining• Enterprises looking for better findability face two vexing challenges:
1. How to yield metadata from large quantities of information?
2. How to turn “search” into more powerful navigation and discovery?
• Text Mining offers one answer– Text mining is partly a more attractive marketing term for auto-
classification – a term that aligns with the concept of “data mining”
– But text mining takes auto-classification one step further through the discovery of more sophisticated patterns in text
– However, there are many different approaches to text mining
– Text mining is sometimes called “text analytics” or “content intelligence”
© AIIM | All Rights Reserved 36
How Text Mining Works• Prior to indexing content, information is “discovered” or derived from
a corpus of content– The goal of text mining is to glean information from data, find patterns,
and “separate signal from noise”
– It does this by attempting to extract “entities” and “relationships” from text
– Relevant information is usually derived through the divining of patterns and trends
• Text is then parsed (sometimes adding and removing certain pieces of text for the purposes of an index)
• Typical text mining tasks can include auto-classification, clustering, concept/entity extraction, auto-categorization (production of taxonomies), document summarization, and entity relation modeling (i.e., learning relations between entities, as in an ontology)– Different text mining tools tend to excel at one or just a handful of these
approaches
© AIIM | All Rights Reserved 37
A Clustering Example
Source: cwi.nl & Inxight
© AIIM | All Rights Reserved 38
A Clustering Example
Source: London Natural History Museum & Inxight
39© AIIM | All Rights Reserved
IOA Strategy IOA as a Practice IOA as a Project IOA Master
For more information:AIIM IOA Certificate Program
www.aiim.org/training
40© AIIM | All Rights Reserved
Find, Inventory,Analyze Content
Metadata Taxonomy
Ontologies andTopic Maps
ContentModelling
Introduction toAccess
Search Techniques
Topics inFindability
User ExperienceOf IOA
Parts of IOA
For more information:AIIM IOA Practitioner
www.aiim.org/training