Download - Semantic Technology 2009: Hybrid Approaches to Taxonomy and Folksonomy

Transcript
  • 1.Hybrid Approaches toTaxonomy & Folksonomy Semantic Technology 2009 San Jose, CA June 17, 2009Richard Beatch Paul Wlodarczyk Earley & Associates www.earley.com

2. Agenda

  • The taxonomy/folksonomy debate
  • Tagging pitfalls
  • Social tagging & the enterprise
  • Hybrid approaches to taxonomy/folksonomy
    • Co-existence
    • Tag-influenced taxonomy
    • Taxonomy-influenced tags
    • Tag hierarchies/ontologies
  • Conclusion

Copyright 2009 Earley & Associates Inc. All Rights Reserved 3. About Earley & Associates

  • Founded in 1994, Earley & Associates is an information management (IM) consulting company specializing in
    • Taxonomy development and management
    • Content management strategy
    • Search integration
    • Usability & Information Architecture
  • Some of our recent clients include:
    • American Greetings, Hasbro, Ford Foundation, Astra Zeneca, Motorola, The Hartford Insurance Group, Urban Land Institute
  • Give us your business card
    • For a free pass to one of our Community of Practice conference calls

Copyright 2009 Earley & Associates Inc. All Rights Reserved 4. About us

  • Richard Beatch
    • Senior Consultant at Earley & Associates, Inc.
    • Ph.D. in Ontology
    • Specialized in Taxonomy, Search, Metadata, and content architecture.
    • Extensive industry experience leading the implementation and design of taxonomies and search solutions for a range of companies including Apple, McAfee, Allstate, Dell, and AT&T.
    • Blog: http://sethearley.wordpress.com/

Copyright 2009 Earley & Associates Inc. All Rights Reserved 5. About us

  • Paul Wlodarczyk
    • Director, Solutions Consulting at Earley & Associates, Inc.
    • MBA with BA in Psychology / Cognitive Science
    • Specialized in unstructured content technologies with over 20 years experience in XML / structured authoring, content reuse, ECM, KM, localization, semantic analysis and content enrichment
    • Blogs at http://sethearley.wordpress.com/ and http://thecontentguy.net

Copyright 2009 Earley & Associates Inc. All Rights Reserved 6. The tired debate Copyright 2009 Earley & Associates Inc. All Rights Reserved Taxonomy Folksonomy Control Democracy Top-down Bottom-up Arduous process Just do it Accurate Good enough Restrictive Flexible Static Evolving Expensive to maintain Low cost crowdsourced 7. The relevance problem

  • Search results should be relevant to what a searcher wants, but technology can only determine if it is relevant to a search term*
  • Taxonomies and folksonomies = 2 approaches to the problem of relevance with common goal of describing content, each with particular gaps

*Billy Cripe: Folksonomy, Keywords & Tags: Social & Democratic User Interaction in Enterprise Content Management http://www.oracle.com/technology/products/content-management/pdf/OracleSocialTaggingWhitePaper.pdf Copyright 2009 Earley & Associates Inc. All Rights Reserved 8. Taxonomy

  • Added by a small number of individuals: author/originators or authorized persons (e.g.librarian)
  • Describes meaning or purpose of content based on a set view point for a specific audience using a controlled vocabulary
  • Relationships between terms defined
    • Hierarchical (e.g. Computer hardware > Keyboard)
    • Associative (e.g. Computer hardware Software)
    • Equivalent (e.g. Laptop = Notebook Computer)

Copyright 2009 Earley & Associates Inc. All Rights Reserved 9. Tags

  • Added by authors and consumers (individual motivation)
  • Can connote any type of meaning or purpose
  • No compression around a single viewpoint, no control of vocabulary
  • Self-correcting through volume

Copyright 2009 Earley & Associates Inc. All Rights Reserved 10. Why tagging is so interesting

  • Adding individual value to the act of classification user control over findability
  • Reducing the cognitive burden(i.e. its easy)
  • Reduced technologicalinvestment (i.e. its cheap)
  • Can leverage emergentstructure (folksonomy)

Reno| Tags Copyright 2009 Earley & Associates Inc. All Rights Reserved 11. The downside

  • Neither tags nor taggers are perfect
  • No language control
    • Guy & Tonkin, 2006.
    • http://www.dlib.org/dlib/january06/guy/01guy.html

Study: 40% of flickr tags and 28% of del.icio.us tags were flawed in these ways Copyright 2009 Earley & Associates Inc. All Rights Reserved Misspellings Library vs. libary Plam pilot Compound words TimBernersLee Case & number Folksonomy, Folksonomies Personal tags To read My dog @work Single-use tags Billybobsdog 12. The downside

  • Varying levels of granularity
  • Same tag, different meanings
  • Lack of relationships between tags which is broader? Narrower?
  • Lack of consistency/approach to change even single user can change language and hamper own personal retrieval

Robin Bird Turdus migratorinus Known as tag noise Copyright 2009 Earley & Associates Inc. All Rights Reserved 13. The downside

  • Most tag search does not account for stemming, plurals, etc.

E.g. Search on Delicious: Folksonomy: 16049 Folksonomies: 4404 Both: 2642 Copyright 2009 Earley & Associates Inc. All Rights Reserved 14. The tagging hype cycle http://www.pui.ch/phred/archives/2007/05/tag-history-and-gartners-hype-cycles.html Copyright 2009 Earley & Associates Inc. All Rights Reserved 15. The web vs. the enterprise

  • Shirky: there is no shelf
    • Traditional organization schemes are built to deal with physical collections and constraints.
    • They dont work well on the web
      • large corpus
      • no clear edges
      • no formal categories
      • no authority
  • The enterprise is much more defined
      • smaller corpuses
      • formal entities
      • coordinated users, clear tasks
      • need for reliable retrieval

E.g. Flickr Delicious Social tagging works well in this context Social tagging is more of a challenge, needs clear arena Copyright 2009 Earley & Associates Inc. All Rights Reserved 16. R o le of folksonomy in the enterprise?

  • Tagging external links
    • Seeing what colleagues are interested in
    • Sharing links with a specific team
    • Subscribing to link feeds
    • Monitoring news/blog coverage of the company
    • Consumer/competitor research
    • Tracking industry trends
  • Tagging internal links
    • Finding/facilitating access to most popular pages on the intranet
    • Seeing what intranet pages mean to staff

Copyright 2009 Earley & Associates Inc. All Rights Reserved 17. Role of folksonomy in the enterprise?

  • Social aspects
    • Identifying subject matter experts
    • Connecting people who share interests
    • Encouraging collaboration & resource sharing
  • Improve your taxonomy, information retrieval
    • User tagging to refine the corporate taxonomy
      • New concepts
      • New terminology
    • Seeing what employees find interesting
    • Distributing tagging tasks

Copyright 2009 Earley & Associates Inc. All Rights Reserved 18. The downside

  • Potential issues of security, inappropriateness
    • Can implement some level of vetting
  • Privacy concerns
    • Can be anonymous tagging, although this removes some social value
    • Can create role or team-based collections
  • Need higher ratio of active participants due to population size

Copyright 2009 Earley & Associates Inc. All Rights Reserved 19. Message text External News Reports Discussion postings Links Engineering document repositories Success Stories Policies Approved Methods Best Practices Key concept:Not all content is created equally The content continuum Copyright 2009 Earley & Associates Inc. All Rights Reserved Lower Cost Higher Cost Tagging/Organizing Processes Unfiltered Reviewed/Vetted/Approved Lower Value Higher Value 20. What if we blended the two?

  • Folksonomy / Taxonomy

Low cost Findability Flexible Structured relationships User terminology Oversight Social sharing Consistency Copyright 2009 Earley & Associates Inc. All Rights Reserved 21. Hybrid approaches Co-existence Tag-influenced taxonomy Taxonomy-influenced tagging Tag hierarchies/ontologies Copyright 2009 Earley & Associates Inc. All Rights Reserved 22. Co-existence

  • Taxonomy and folksonomy are used side by side
  • Strengths of each approach preserved, philosophy of each kept pure

Web example: Flickr & Library of Congress:http://www.flickr.com/photos/library_of_congress/ Copyright 2009 Earley & Associates Inc. All Rights Reserved 23. Co-existence Ann Arbor District Library Copyright 2009 Earley & Associates Inc. All Rights Reserved 24. Raytheon corporate example

  • Used in Raytheon employee portal - website lists (Suggested sites feature box)
  • How does it work:
    • inserted Suggested Sites in a "feature" box to the right of the regularly ranked results
    • website suggestions (URLs) submitted along with recommended tags/keywords which are subsequently verified and approved by librarians

http://www.slideshare.net/CJMConnors/i-kms-singapore-presentation Copyright 2009 Earley & Associates Inc. All Rights Reserved 25. Variation: Tag mediation

  • Vetting & editing tags
  • Pros:
    • Weeds out potentially inappropriate tags
    • Eliminates misspellings, plural issues, etc.
    • Some can be done automatically (spell-checker, e.g.)
    • Enhances findability
  • Cons:
    • Higher effort/cost
    • Perceived lack of trust
    • Who knows better?

Copyright 2009 Earley & Associates Inc. All Rights Reserved 26. Tag-influenced taxonomy

  • Taxonomy & tagging co-exist, tags serve as pool of candidate terms to enrich taxonomy, keep it current
    • Find new terminology (synonyms, popular language)
    • Find new concepts
  • Performed as separateprocesses (taxonomytagging=formal,tagging=informal) orcombined in singleinterface

Copyright 2009 Earley & Associates Inc. All Rights Reserved 27. Tag-influenced taxonomy

  • Requires formal vetting process
  • Can be supported by automation (e.g. candidate tags pulled & filtered with script to remove taxonomy terms, stop words)
  • Evaluate candidates based on
    • Frequency (literary warrant)
    • Salience within context
  • Look at tags used in conjunction with taxonomy

Copyright 2009 Earley & Associates Inc. All Rights Reserved 28. Taxonomy-influenced tagging

  • Presenting choices/suggestions to user from controlled set ofterms/tags
    • Sometimes users prefer easy choice
      • Drop-down menus
      • Check boxes
      • Type ahead
      • Tree view
    • influenced option to enter own tag? Good source of new terms
    • Enforces consistency
    • Offers structure

Copyright 2009 Earley & Associates Inc. All Rights Reserved 29. WWW example: ZigTag Defined Tagging Definitions from Wikipedia & Wordnet Tagging with type-ahead against database of 3M unique concepts & 8M synonyms Copyright 2009 Earley & Associates Inc. All Rights Reserved 30. Zigtag

  • Type ahead & synonyms encourage consistency
  • Users can enter new tags
  • Synonyms based on Wikipedia, so can be dirty data
  • No hierarchy, only equivalent relationships so far

Copyright 2009 Earley & Associates Inc. All Rights Reserved 31. Zigtag search Still get problems with uncontrolled tags & recall Interesting relationships from Wikipedia Browse-able tag cloud Copyright 2009 Earley & Associates Inc. All Rights Reserved 32. Example: myedna (Education.au) http://www.educationau.edu.au/jahia/webdav/site/myjahiasite/shared/papers/tagging_hayman.ppt Fully taxonomy-directed tagging Copyright 2009 Earley & Associates Inc. All Rights Reserved 33. TextWise Semantic Cloud

  • Document (URL or text) is submitted to web service for semantic analysis
  • Category tags from subset of the ODP taxonomy
  • Concept tags are derived from document, persisted, related to ODP categories

Copyright 2009 Earley & Associates Inc. All Rights Reserved 1 3 2 34. Buzzillions.com

  • Review site: tags are controlled not against a taxonomy, but against other tags reduces redundancy
  • Only popular tags exposed as faceted navigation

Copyright 2009 Earley & Associates Inc. All Rights Reserved 35. SharePoint?

  • Plug-ins make taxonomy easy
  • Present the taxonomy like tags
  • E.g. KWizCom: plug-in manages taxonomy and tags in easy interface can opt-out of letting users create own tags

Copyright 2009 Earley & Associates Inc. All Rights Reserved 36. Taxonomy-influenced tagging

  • Pros:
    • More consistency
    • Better support for findability
    • Relationships, definitions leveragedadding meaning to the tags
    • Realistic for the enterprise
  • Cons:
    • Not really folksonomy anymore..
    • Can be forcing terminology on user
    • Need to develop reference list of concepts manually through taxonomy or need large corpus to derive automatically

Copyright 2009 Earley & Associates Inc. All Rights Reserved 37. Tag hierarchies

  • Tag hierarchies come in two flavors:
  • User-powered
  • Automatic derivation

Copyright 2009 Earley & Associates Inc. All Rights Reserved 38. User-powered tag hierarchies

  • User-powered
    • Social approach
    • Bogus hierarchies possible
    • Small population will contribute
  • RawSugar tried it
    • (no longer around)
    • Taggers could specify hierarchy in own account, tags clustered based on common groups

Copyright 2009 Earley & Associates Inc. All Rights Reserved 39. User-poweredtag hierarchies

  • E.g. LibraryThing

LibraryThing allows any use to combine (or uncombine) 2 tags that are semantically equivalent. www.librarything.com Copyright 2009 Earley & Associates Inc. All Rights Reserved 40. User-poweredtag hierarchies: Intelligent tags

  • Move toward more semantic tagging with machine-readable tags, e.g. Flickrmachine tagsin triple format: [namespace]:[key]=[value]
    • geo:neighborhood=SoHo, geo:lat=58.41618, etc.
    • flickr:user=mortimer
    • taxonomy:common=grevyszebra
    • lastfm:event=34640
      • makes your photo appear on a lastfm event page

Copyright 2009 Earley & Associates Inc. All Rights Reserved 41. User-poweredtag hierarchies: Intelligent tags

  • MOAT: Meaning of a tag part of linked data movement, mapping tags to semantic web
    • http://moat-project.org/
  • Adding to the triplet
    • User resource tag meaning
    • Meaning = URI to a resource containing meaning (e.g. DBPedia)

Copyright 2009 Earley & Associates Inc. All Rights Reserved 42. Automatically derived tag hierarchies

  • Tag hierarchies, facets, ontologies, or folksontology
  • Done through statistical/clustering algorithms

http://www.pui.ch/phred/automated_tag_clustering/ Copyright 2009 Earley & Associates Inc. All Rights Reserved 43. Delicious & citeulike hiearchy http://heymann.stanford.edu/taghierarchy.html Copyright 2009 Earley & Associates Inc. All Rights Reserved 44. Clustering at flickr Copyright 2009 Earley & Associates Inc. All Rights Reserved 45. Auto clustering/facets

  • Still not very mature
  • Time-sensitive
  • Community- sensitive
  • Ambiguous tags
  • Improve with volume(self-correcting)

http://www.pui.ch/phred/automated_tag_clustering/ Copyright 2009 Earley & Associates Inc. All Rights Reserved 46. Tag hierarchy pros and cons

  • Pros:
    • Relationships, definitionsleveragedadding meaning to the tags
    • Provides a basis for application behavior in the absence of taxonomy (e.g. Flickr maps, clusters)
    • Self-correcting with volume
  • Cons:
    • Automatically derived relationships (clusters) can be bogus or time-sensitive
    • Folksonomic relationships can be esoteric (just like tags)
    • Small population of contributors

Copyright 2009 Earley & Associates Inc. All Rights Reserved 47. Conclusion

  • Not all content is created equal tags and taxonomies have their sweet spots
  • Hybrid approaches are emerging
    • taxonomy-influenced tagging leading the pack in popularity on the web
    • co-existence in the enterprise
  • Look for more developments on the semantic web/linked data front for making tags more intelligent

Copyright 2009 Earley & Associates Inc. All Rights Reserved 48. Questions? Richard Beatch[email_address] Paul Wlodarczyk [email_address] Web :www.earley.com Blog : sethearley.wordpress.com Twitter :earleytaxonomy Give us your business card for a free pass to one of our Community of Practice conference calls (a $50 value). 49. Appendix: Corporate social tagging tools 50. Corporate social tagging software http://www.connectbeam.com/ Copyright 2009 Earley & Associates Inc. All Rights Reserved 51. Corporate social tagging software http://www.cogenz.com/ Copyright 2009 Earley & Associates Inc. All Rights Reserved 52. Corporate social tagging software http://www-306.ibm.com/software/lotus/products/connections/dogear.html Copyright 2009 Earley & Associates Inc. All Rights Reserved 53. Corporate social tagging software

  • BEA AquaLogic Pathways
      • http://www.bea.com/framework.jsp?CNT=index.jsp&FP=/content/products/aqualogic/pathways/

Copyright 2009 Earley & Associates Inc. All Rights Reserved 54. Corporate social tagging software

      • http://www.newsgator.com/business/socialsites/default.aspx

Copyright 2009 Earley & Associates Inc. All Rights Reserved