Semantic Infrastructure Workshop Applications
Tom ReamyChief Knowledge Architect
KAPS GroupKnowledge Architecture Professional Services
http://www.kapsgroup.com
2
Agenda Search and Semantic Infrastructure
– Elements /Rich Dynamic Results– Different Environments– Design Issues
Platform for Information Applications– Multiple Applications– Case Study – Categorization & Sentiment– Case Study – Taxonomy Development– Case Study – Expertise & Sentiment & Beyond
Conclusions
3
A Semantic Infrastructure Approach to SearchElements Multiple Knowledge Structures
– Facet – orthogonal dimension of metadata– Taxonomy - Subject matter / aboutness– Ontology – Relationships / Facts
• Subject – Verb - Object Software - Search, ECM, auto-categorization, entity
extraction, Text Analytics and Text Mining People – tagging, evaluating tags, fine tune rules and
taxonomy People – Users, social tagging, suggestions Rich Search Results – context and conversation
4
A Semantic Infrastructure Approach to Search:Rich Results Elements
– Faceted Navigation– Categorization – metadata and/or dynamic– Tag Clouds – clustering– User Tags, personalization– Related topics – discovery
Supports all manner of search behaviors and needs– Find known items – zero in with facets– Discovery – Tags clouds, user tags, related topics– Deep dive - categorization
5
6
7
8
A Semantic Infrastructure Approach to Search: Three Environments E-Commerce
– Catalogs, small uniform collections of entities– Conflict of information and Selling– Uniform behavior – buy this
Enterprise– More content, more types of content– Enterprise Tools – Search, ECM– Publishing Process – tagging, metadata standards
Internet– Wildly different amount and type of content, no taggers– General Purpose – Flickr, Yahoo– Vertical Portal – selected content, no taggers
9
A Semantic Infrastructure Approach to Search: Enterprise Environment –Taxonomy, 7 facets Taxonomy of Subjects / Disciplines:
– Science > Marine Science > Marine microbiology > Marine toxins Facets:
– Organization > Division > Group– Clients > Federal > EPA– Instruments > Environmental Testing > Ocean Analysis > Vehicle– Facilities > Division > Location > Building X– Methods > Social > Population Study– Materials > Compounds > Chemicals– Content Type – Knowledge Asset > Proposals
10
A Semantic Infrastructure Approach to Search: Internet Design Subject Matter taxonomy – Business Topics
– Finance > Currency > Exchange Rates Facets
– Location > Western World > United States– People – Alphabetical and/or Topical - Organization– Organization > Corporation > Car Manufacturing > Ford– Date – Absolute or range (1-1-01 to 1-1-08, last 30 days)– Publisher – Alphabetical and/or Topical – Organization– Content Type – list – newspapers, financial reports, etc.
11
12
Rich Search ResultsDesign Issues - General What is the right combination of elements?
– Faceted navigation, metadata, browse, search, categorized search results, file plan
What is the right balance of elements?– Dominant dimension or equal facets– Browse topics and filter by facet
When to combine search, topics, and facets?– Search first and then filter by topics / facet– Browse/facet front end with a search box
13
Rich Search ResultsDesign Issues - General Homogeneity of Audience and Content Model of the Domain – broad
– How many facets do you need?– More facets and let users decide– Allow for customization – can’t define a single set
User Analysis – tasks, labeling, communities• Issue – labels that people use to describe their
business and label that they use to find information Match the structure to domain and task
– Users can understand different structures
14
Rich Search ResultsAutomatic Facets – Special Issues Scale requires more automated solutions
– More sophisticated rules Rules to find and populate existing metadata
– Variety of types of existing metadata – Publisher, title, date– Multiple implementation Standards – Last Name, First / First Name,
Last Issue of disambiguation:
– Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford– Same word, different entity – Ford and Ford
Number of entities and thresholds per results set / document– Usability, audience needs
Relevance Ranking – number of entities, rank of facets
15
Semantic Infrastructure for Search Based AppsMultiple Applications Platform for Information Applications
– Content Aggregation– Duplicate Documents – save millions!– Text Mining – BI, CI – sentiment analysis– Combine with Data Mining – disease symptoms, new
• Predictive Analytics – Social – Hybrid folksonomy / taxonomy / auto-metadata– Social – expertise, categorize tweets and blogs, reputation– Ontology – travel assistant – SIRI
Use your Imagination!
16
Semantic Infrastructure for Search AppsMultiple Applications SIRI – Travel Assistant
Semantic Infrastructure for Search Apps Case Study – Categorization & Sentiment Call Motivation
– Categorization – Motivation Taxonomy – Purpose of previous calls to understand current call– Issues of scale, small size of documents, jargon, spelling
Customer Sentiment– Telecom Forums– Feature level – not just products – Issue of context - sarcasm, jargon
Knowledge Base– Categorization, Product extraction, expertise-sentiment analysis– Social Media as source for solutions
17
Case Study – Categorization & Sentiment
18
Case Study – Categorization & Sentiment
19
20
Sentiment AnalysisDevelopment Process Combination of Statistical and categorization rules Start with Training sets – examples of positive, negative,
neutral documents Develop a Statistical Model Generate domain positive and negative words and phrases Develop a taxonomy of Products & Features Develop rules for positive and negative statements Test and Refine Test and Refine again
21
22
23
24
25
Semantic Infrastructure for Search Apps Case Study – Taxonomy Development
Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Content – 250,000 large documents, search logs, etc. Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Entity Extraction – people, organizations, Programming languages Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms
26
Case Study – Taxonomy Development
27
Case Study – Taxonomy Development
28
Case Study – Taxonomy Development
29
30
Semantic Infrastructure ApplicationsExpertise Analysis Sentiment Analysis to Expertise Analysis(KnowHow)
– Know How, skills, “tacit” knowledge No single correct categorization
– Women, Fire, and Dangerous Things– Types of Animals
• Those that belong to the Emperor• Embalmed Ones• Suckling Pigs• Fabulous Ones• Those that are included in this classification• Those that tremble as if they were mad• Other
31
Semantic Infrastructure ApplicationsExpertise Analysis – Basic Level Categories Mid-level in a taxonomy / hierarchy Short and easy words Maximum distinctness and expressiveness First level named and understood by children Level at which most of our knowledge is organized Levels: Superordinate – Basic – Subordinate
– Mammal – Dog – Golden Retriever– Furniture – chair – kitchen chair
32
Semantic Infrastructure ApplicationsExpertise Analysis Experts prefer lower, subordinate levels
– In their domain, (almost) never used superordinate Novice prefer higher, superordinate levels General Populace prefers basic level Not just individuals but whole societies / communities differ
in their preferred levels Issue – artificial languages – ex. Science discipline Issue – difference of child and adult learning – adults start
with high level
33
Semantic Infrastructure ApplicationsExpertise Analysis What is basic level is context(s) dependent
– Document/author expert in news health care, not research Hybrid – simple high level taxonomy (superordinate), short words –
basic, longer words – expert Plus Develop expertise rules – similar to categorization rules
– Use basic level for subject– Superordinate for general, subordinate for expert
Also contextual rules– “Tests” is general, high level– “Predictive value of tests” is lower, more expert– If terms appear in same sentence - expert
34
Expert General
Research (context dependent) Kid
Statistical Pay
Program performance Classroom
Protocol Fail
Adolescent Attitudes Attendance
Key academic outcomes School year
Job training program Closing
American Educational Research Association Counselor
Graduate management education Discipline
Education Terms
35
Expert GeneralMouse Cancer
Dose Scientific
Toxicity Physical
Diagnostic Consumer
Mammography Cigarette
Sampling Smoking
Inhibitor Weight gain
Edema Correct
Neoplasms Empirical
Isotretinion Drinking
Ethylene Testing
Significantly Lesson
Population-base Knowledge
Pharmacokinetic Medicine
Metabolite Sociology
Polymorphism Theory
Subsyndromic Experience
Radionuclide Services
Etiology Hospital
Oxidase Social
Captopril Domestic
Pharmacological agents
Dermatotoxicity
Mammary cancer model
Biosynthesis
Healthcare Terms
36
Expertise Analysis Expertise – application areas Taxonomy / Ontology development /design – audience focus
– Card sorting – non-experts use superficial similarities Business & Customer intelligence – add expertise to sentiment
– Deeper research into communities, customers Text Mining - Expertise characterization of writer, corpus eCommerce – Organization/Presentation of information – expert, novice Expertise location- Generate automatic expertise characterization based
on documents Experiments - Pronoun Analysis – personality types
– Essay Evaluation Software - Apply to expertise characterization• Model levels of chunking, procedure words over content
37
Beyond Sentiment: Behavior PredictionCase Study – Telecom Customer Service Problem – distinguish customers likely to cancel from mere threats Analyze customer support notes General issues – creative spelling, second hand reports Develop categorization rules
– First – distinguish cancellation calls – not simple– Second - distinguish cancel what – one line or all– Third – distinguish real threats
38
Beyond SentimentBehavior Prediction – Case Study
Basic Rule– (START_20, (AND, – (DIST_7,"[cancel]", "[cancel-what-cust]"),– (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))
Examples:– customer called to say he will cancell his account if the does not stop receiving
a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to
cancel his act– ask about the contract expiration date as she wanted to cxl teh acct
Combine sophisticated rules with sentiment statistical training and Predictive Analytics
39
Beyond Sentiment - Wisdom of CrowdsCrowd Sourcing Technical Support Example – Android User Forum Develop a taxonomy of products, features, problem areas Develop Categorization Rules:
– “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.”
– Find product & feature – forum structure– Find problem areas in response, nearby text for solution
Automatic – simply expose lists of “solutions”– Search Based application
Human mediated – experts scan and clean up solutions
40
Semantic Infrastructure: A Platform for KM Applications Expertise Location – Individuals and Communities Knowledge Sharing – Com. Of Practice
– Find right person better– Knowledge representation to support better sharing– Enhance sharing as well as sub for person
Knowledge Base // Portal– Greatly improved – find what you are looking for– New kinds of presentations – rich search to dynamic graphs
Process – deliver rich K representation in work flow – SIRI+
Text Analytics: Future Directions Start with the 80% of significant content that is not data
– Enterprise search, content management, Search based applications Text Analytics and Text Mining
– Text Analytics turns text into data – Build better TM Apps– Better extraction and add Subject / Concepts– Sentiment and Beyond – Behavior, Expertise
Text Mining and Text Analytics– TM enriching TA – Taxonomy development– New Content Structures, ensemble models
Text Analytics and Predictive Analytics– More content, New content – social, interactive – CSR– New sources of content/data = new & better apps
41
42
Semantic Infrastructure ApproachConclusions Semantic Infrastructure solution (people, policy, technology,
semantics) and feedback is best approach Foundation – Hybrid ECM model with text analytics, Search Integrated information, knowledge, and semantics Semantic Infrastructure as a platform for multiple applications
– Build on infrastructure for economy and quality Text Analytics (Entity extraction and auto-categorization,
sentiment analysis) are essential Future – new kinds of applications:
– Text Mining and Data mining, research tools, sentiment– Beyond Sentiment – expertise applications– NeuroAnalytics – cognitive science meets search and more
• Watson is just the start
Questions? Tom Reamy
[email protected] Group
Knowledge Architecture Professional Serviceshttp://www.kapsgroup.com
44
Resources Books
– Women, Fire, and Dangerous Things• George Lakoff
– Knowledge, Concepts, and Categories• Koen Lamberts and David Shanks
– Formal Approaches in Categorization• Ed. Emmanuel Pothos and Andy Wills
– The Mind • Ed John Brockman • Good introduction to a variety of cognitive science theories,
issues, and new ideas– Any cognitive science book written after 2009
45
Resources Conferences – Web Sites
– Text Analytics World– http://www.textanalyticsworld.com
– Text Analytics Summit– http://www.textanalyticsnews.com
– Semtech– http://www.semanticweb.com
46
Resources Blogs
– SAS- http://blogs.sas.com/text-mining/ Web Sites
– Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP/
– LindedIn – Text Analytics Summit Group– http://www.LinkedIn.com– Whitepaper – CM and Text Analytics -
http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf
– Whitepaper – Enterprise Content Categorization strategy and development – http://www.kapsgroup.com
47
Resources Articles
– Malt, B. C. 1995. Category coherence in cross-cultural perspective. Cognitive Psychology 29, 85-148
– Rifkin, A. 1985. Evidence for a basic level in event taxonomies. Memory & Cognition 13, 538-56
– Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987. Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, 1061-1086
– Tanaka, J. W. & M. E. Taylor 1991. Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23, 457-82
Top Related