Building a Foundation for Info Apps

16
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com

description

Building a Foundation for Info Apps. Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Introduction A Semantic Platform – What and Why Text Analytics – What and Why - PowerPoint PPT Presentation

Transcript of Building a Foundation for Info Apps

Page 1: Building a Foundation  for Info Apps

Building a Foundation for

Info Apps

Tom ReamyChief Knowledge Architect

KAPS Group

Program Chair – Text Analytics World

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Building a Foundation  for Info Apps

2

Agenda

Introduction

A Semantic Platform – What and Why

Text Analytics – What and Why– Getting Started with Text Analytics

Building on the Platform:– Search– Range of Apps

Conclusion

Page 3: Building a Foundation  for Info Apps

3

Introduction: KAPS Group

Knowledge Architecture Professional Services – Network of Consultants Applied Theory – Faceted taxonomies, complexity theory, natural

categories, emotion taxonomies Services:

– Strategy – IM & KM - Text Analytics, Social Media, Integration– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Quick Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development

Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics

Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times,

Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.

Presentations, Articles, White Papers – www.kapsgroup.com

Page 4: Building a Foundation  for Info Apps

4

Building a Foundation for Info AppsWhat is a Semantic Platform?

Semantic Layer = Taxonomies, Metadata, Vocabularies + Text Analytics – adding cognitive science

Technology Layer– Search, Content Management, SharePoint, Intranets

Publishing process, multiple users & info needs– Hybrid human automatic structure (tagging)

Infrastructure – Not an Application– Business / Library / KM / EA and IT

Building on the Foundation– Info Apps (Search-based Applications)

Foundation of foundation – Text Analytics

Page 5: Building a Foundation  for Info Apps

5

Building a Foundation for Info AppsWhy a Semantic Platform

Search Failed – lack of semantics– Results of Find Wise survey – deep dissatisfaction– Ten years of development = ?

Content Management under-performing – lack of semantics Taxonomy and Metadata – a solution but - Failed

– Taxonomy – formal model of a domain– Library science good for some things – indexing, etc.

Semantics is about language, meaning, information – And structure = taxonomy Plus– Need cognitive science – how people think – Text Analytics

Solution = Strategic Vision + Quick Start

Page 6: Building a Foundation  for Info Apps

6

Building a Foundation for Info AppsText Analytics Features

Noun Phrase Extraction / Fact Extraction– Catalogs with variants, rule based dynamic– Relationships of entities – Ontologies of people-organizations, etc.

Sentiment Analysis – Products and Phrases– Statistics, Dictionaries, & rules – Positive and Negative

Summarization – replace snippets Auto-categorization – built on a taxonomy

– Training sets, Terms, Semantic Networks– Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE– Foundation – subjects, disambiguation, add intelligence to all

Ontologies – fact extraction + reasoning about relationships Text Mining – NLP, machine learning, predictive analytics

Page 7: Building a Foundation  for Info Apps

Building a Foundation for Info AppsAdding Structure to Unstructured Content How do you bridge the gap – taxonomy to documents? Tagging documents with taxonomy nodes is tough

– And expensive – central or distributed

Library staff –experts in categorization not subject matter– Too limited, narrow bottleneck– Often don’t understand business processes and business uses

Authors – Experts in the subject matter, terrible at categorization– Intra and Inter inconsistency, “intertwingleness”– Choosing tags from taxonomy – complex task– Folksonomy – almost as complex, wildly inconsistent– Resistance – not their job, cognitively difficult = non-compliance

Text Analytics is the answer(s)!

7

Page 8: Building a Foundation  for Info Apps

Building a Foundation for Info AppsAdding Structure to Unstructured Content

Text Analytics and Taxonomy Together – Platform– Text Analytics provides the power to apply the taxonomy– And metadata of all kinds– Consistent in every dimension, powerful and economic

Hybrid Model– Publish Document -> Text Analytics analysis -> suggestions for

categorization, entities, metadata - > present to author– Cognitive task is simple -> react to a suggestion instead of select from

head or a complex taxonomy– Feedback – if author overrides -> suggestion for new category– Facets – Requires a lot of Metadata - Entity Extraction feeds facets

Hybrid – Automatic is really a spectrum – depends on context– Automatic – adding structure at search results

8

Page 9: Building a Foundation  for Info Apps

Quick Start for Text Analytics Step 1 : Start with Self Knowledge Ideas – Content and Content Structure

– Map of Content – Tribal language silos– Structure – articulate and integrate– Taxonomic resources

People – Producers & Consumers– Communities, Users, Central Team

Activities – Business processes and procedures– Semantics, information needs and behaviors– Information Governance Policy

Technology – CMS, Search, portals, text analytics– Applications – BI, CI, Semantic Web, Text Mining

9

Page 10: Building a Foundation  for Info Apps

Quick Start for Text AnalyticsStep 2: Software Evaluation: Different Type of Evaluation Traditional Software Evaluation - Start

– Filter One- Ask Experts - reputation, research – Gartner, etc.• Market strength of vendor, platforms, etc.• Feature scorecard – minimum, must have, filter to top 6

– Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus

– Filter Three – In-Depth Demo – 3-6 vendors Reduce to 1-3 vendors Vendors have different strengths in multiple environments

– Millions of short, badly typed documents, Build application– Library 200 page PDF, enterprise & public search

Essential Step – POC or Pilot – search or first Info App

10

Page 11: Building a Foundation  for Info Apps

Quick Start for Text AnalyticsStep 3: Proof of Concept / Quick Start

POC – understand how text analytics can work in your environment

Learn the software – internal resources trained by doing Learn the language – syntax (Advanced Boolean) Learn categorization and extraction Good categorization rules

– Balance of general and specific– Balance of recall and precision

Develop or refine taxonomies for categorization POC – can be the Quick Start or the First Application

11

Page 12: Building a Foundation  for Info Apps

Development, ImplementationQuick Start – First Application: Search and TA Simple Subject Taxonomy structure

– Easy to develop and maintain Combined with categorization capabilities

– Added power and intelligence Combined with people tagging, refining tags Combined with Faceted Metadata

– Dynamic selection of simple categories– Allow multiple user perspectives

• Can’t predict all the ways people think• Monkey, Banana, Panda

Combined with ontologies and semantic data– Multiple applications – Text mining to Search– Combine search and browse

12

Page 13: Building a Foundation  for Info Apps

13

Building a Foundation for Info AppsWhat are Info Apps? Search-based Applications Plus E-Discovery, Behavior Prediction, document duplication, BI & CI, etc.

Legal Review– Significant trend – computer-assisted review (manual =too many)– TA- categorize and filter to smaller, more relevant set– Payoff is big – One firm with 1.6 M docs – saved $2M

Expertise Location – Data (HR, project) plus text – authored documents – subject & level

Financial Services– Combine unstructured text (why) and structured data (what)– Anti-Money Laundering

Page 14: Building a Foundation  for Info Apps

14

Building a Foundation for Info AppsPronoun Analysis: Fraud Detection - Enron Emails Patterns of “Function” words reveal wide range of insights Function words = pronouns, articles, prepositions, conjunctions, etc.

– Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words

Areas: sex, age, power-status, personality – individuals and groups Lying / Fraud detection: Documents with lies have

– Fewer and shorter words, fewer conjunctions, more positive emotion words

– More use of “if, any, those, he, she, they, you”, less “I”– More social and causal words, more discrepancy words

Current research – 76% accuracy in some contexts Text Analytics can improve accuracy and utilize new sources

Page 15: Building a Foundation  for Info Apps

15

Conclusions

Info Apps based on search and search needs help Text analytics with taxonomy & metadata = semantic platform

– Formal and informal language and cognition

Semantic Infrastructure– Knowledge Audit -> Content, People, Technology, Processes

• Strategic Vision– Integration of text analytics search, content management– Hybrid Model of tagging – best of human & machine– Build integrated Info Apps

Platform vs. Apps = Yes Thing Big (Semantics), Build Small, Build Integrated

Page 16: Building a Foundation  for Info Apps

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com