Semtech bizsemanticsearchtutorial

64
Semantic Search Tutorial • Barbara Starr (@BarbaraStarr) – Basics of What semantic search is, what tools and techniques are used • Bill Slawski (@bill_slawski) – Strategy for SEO – Case based examples and analysis SemTechBiz 2014 #SemTechBiz

description

Slides for Semantic Search Tuorial at #SemTechBiz 2014 by Barbara Starr and Bill Slawski

Transcript of Semtech bizsemanticsearchtutorial

  • 1. Barbara Starr ( ) Basics of What semantic search is, what toolsand techniques are used Bill Slawski ( ) Strategy for SEO Case based examples and analysis

2. Pursued a doctorate in Artificial Intelligence fromSouth Africa in the 80's. Recruited to build intelligent/predictive tradingsystems on Wall Street Migrated to government-based contracts, severalof which turned into real world products like SIRI (PAL from DARPA) WATSON (Acquaint - IBM Watson Labs wasa team member) From the vantage of a semantic technologist, Ikeenly watched the evolution of the Semantic Web. Shocked into the real world when working as aconsultant @ Overstock. Rdfa on 900,000 item pages 2 days before Google adopted it UPC and identifier miner Today Consultant for companies such as GS1US, Columnist, Strategist, 3. Primitive UI Hunt and Peck 4. Primarily Stochastic in nature 5. Based on concept of citations and very easily gamed Probabilistic or Statistical (Not Symbolic) Keyword Based Search Engine (Not Concept Based orOntology Based) link juice ? Other odd vernacular thatbecame standard jargon in theSEO community 6. SIRIAmazing fact: same amountof computing to answer oneGoogle Search query as all thecomputing done in flight and on the ground-- for the entire Apollo program!Moore's law is the observationthat, over the history ofcomputing hardware, thenumber of transistors in adense integrated circuit doublesapproximately every two yearsSource: Wikipedia 7. A new form of Webcontent that is meaningfulto computers will unleash arevolution of newpossibilities Tim Berners Lee James Hendler Ora Lassilahttp://www.cs.umd.edu/~golbeck/LBSC690/SemanticWeb.html 8. What they wantWhen they want it (Now)Accurate (Reliable & Informative)AvailableSearch engines must satisfy consumer needs, else: 9. Def. Semantic Search is any retrieval method where User intent and resources are represented in a semantic model A set of concepts or topics that generalize over tokens/phrases Additional structure such as a hierarchy among concepts, relationships amongconcepts etc. Semantic representations of the query and the user intent are exploitedin some part of the retrieval processPeter Mika, Sr. Research Scientist, Yahoo Labs June 19, 2014 10. Inevitable passage ofSemantic Web adoption(or some version thereof) culminating inschema.orghttp://semanticweb.com/semtech-2011-coverage-the-rdfaseo-wave-how-to-catch-it-and-why_b20458 11. Things not strings -May 16 2012Understanding things helps Googleunderstand what things are in the worldand what users are searching forJune 2012 Twitter announces Twitter Cards PinterestRich Pins 12. Directly extracting on page metadata to create enhanced displays Searching directly on consumed metadata Provide direct answers to queries by searching on consumed, verified and validatedinformationRICH SNIPPETS 2009Searchmonkey 2008 Aggregate answers or deduce them (like a timeline of events) Expose more relevant answers in the long tail of search Assist in interpreting a user query Detect relevancy signals: i.e what content to show to what audience Use it in conjunction with machine learning techniques- to eg. Train other components tilesLong tail:Peanut Butterand Jelly instripes ? 13. Search is changing Semantic, Predictive, Personalised, Conversational Search over documents Search over Data Rise of Answer Engines (Direct answers proliferating) Data Quality is imperativeBecoming Less like a search Engineand more like a personal Assistant 14. SIRIGoogle NowCortanaAiAgents(create your own)Runs cross platform 15. AnswerboxOrganicSearchResultsSearchOver DataKnowledgePanelSearchOverDocuments 16. Synonymous with the migration to Answer Engines & Search Over Data 17. Crawling &IndexingQueryInterpretationIndexing andRankingResultsPresentationIndexedinformation 18. Means of preprocessing documents to speedup search (serving results in real time) 19. Microsoft has given a fairly concise definition of the entityrecognition and disambiguation process: The objective of an Entity Recognition and Disambiguationsystem is to recognize mentions of entities in a given text,disambiguate them, and map them to the entities in a givenentity collection or knowledge base. In Googles case, that means recognizing entities on webpages or web documents and mapping them back tospecific entities in their Knowledge Graph 20. Implicit entity graph derived/inferredfrom the text on a web pageExplicit entities obtained fromstructured markup on a web pageMay need to map toexternal Ontologies likeschema.org or someother ontologyTechnology NLP or IR or Technology Semantic Web 21. Make it Search Engine/Machine Friendly & tell them (explicitly)what things are on your web page Make it (your information on your website) available to Google (and the major search and socialengines), ensure you make it easy for computers to read and discover your stuff. With schema.org (and/or the preferred vocabulary/ontology of the search social engine you areoptimizing for, e.g for Facebook use rdfa & Opengraph). Google, Yahoo, Bing, Yandex =>Schema.org Pick a markup format (syntax) and stick with it Microdata Microformat Rdfa Rdfa lite JSON-LD 22. Recall some of Googles Mission/Objective Statements or goals Organizing the worlds information to make it universally accessible and useful To help with that we have built the knowledge graph Give an identity to every thing in the world The knowledge graph Contains information and entities and their relationships Helps in Resolving ambiguities when processing queriesYou can explicitly disambiguate your content by providing a freebase mid machine identifier - (in your markup) 23. Ref: Google I/O 2013 24. Google plus in Enhanced Displays andthe knowledge Graph Authorship Local businesses Knowledge Carousel 25. With Schema.org (and JSON-LD in this case) Note the sameAs statement mid makes it easier to match or reconcile the thinghttps://www.youtube.com/watch?v=W9pRpSW_KqA&src_vid=0oOwrBEeQss&feature=iv&annotation_id=annotation_1139520055 Ref: Google I/O 2014 26. The Knowledge Graph Powers: Rich snippets in Events Event listings in Google Maps Notifications in Google Nowhttps://www.youtube.com/watch?v=XXw8g-FbemI Ref: Google I/O 2014 27. https://www.youtube.com/watch?v=XXw8g-FbemI Ref: Google I/O 2014 28. http://youtu.be/pkrxhefQIBs 29. Rich snippets make your data more visible in Search Engine Results PagesWhich would you rather click on?No Rich Snippets With Rich SnippetsLower Bounce Rate 30. 32More Visibility inverticals, recipes& images viamarkupIn Search Engine Results PagesYour product is not visibleif no color attribute ispopulated&Search Verticals 31. You want peanutbutter and jelly instripes ?Allows unique and interesting content to surface 32. GooglePlusKey Point -Corollary: If you dont exist as an entity you do not exist in the knowledge graph or in Search Over DataThe cost of that: Anonymity and Irrelevance! 33. http://www.socialmediaexaminer.com/rich-pins-on-pinterest/Twitter Cards & Deep LinkingPinterest PinsFacebookOpengraph Drive Brand awareness Diversify Revenue Sources(Reduce Dependence onGoogle) Increase Lift & Conversions 34. Googles Structured Markup Helper Generates JSON-LD or microdata E-mail and web page markupData Highlighterhttps://support.google.com/webmasters/answer/99170?hl=en&ref_topic=1088472Google can present your data more attractively-- and in new ways -- in search results and in otherproducts such as the Google Knowledge Graph.List provided on schema.rdfs.orgWordpress plugin and html code http://schema.rdfs.org/tools.html 35. Make sureto enableMicrodata 36. Microdata reveal JSON-LD sniffer Semantic inspector META SEO inspector Green Turtle RDFaList maintained by Aaron Bradley:http://www.seoskeptic.com/structured-data-markup-validation-testing-tools/Written Explanation of Walkthroughhttp://searchengineland.com/see-entities-web-page-tools-help-194710GRUFF 37. Alchemyapi (with freebase mappings of entities since July 2013) Opencalais Semantic Verses Aylien which was launched in Feb 2014, provides mappings to freebase and schema.org. Smartlogic lexalytics Text-Processing Stanfords Ner Textrazor 38. The following informationMUST MATCH! 39. Ensure sure you supply rich, high quality data,mapped to search filters for maximum visibilityNot visible if no colorattribute populatedFill in TheGaps 40. Ensure to supply rich, consistent data in anyformat you submit and ensure it is validated,verified and fresh Send Consistent signals Provide global identifiers whenever possible 41. RichProductinformationwith GTIN 42. Implicit (content and Bill) also tools I have 43. Query logs record the actual usage of search systems and their analysis has proven critical toimproving search engine functionality. Yet, despite the deluge of information, query log analysisoften suffers from the sparsity of the query space.we propose a new model for query log data called the entity-awareclick graph. In this representation, we decompose queries into entities and modifiers, andmeasure their association with clicked pages. We demonstrate the benefits of this approach onthe crucial task of understanding which websites fulfill similar user needs, showing that using thisrepresentation we can achieve a higher precision than other query log-based approaches Measuring website similarity using an entity-aware click graph2012 publication: Peter Mika, Hugo Zaragoza, Pablo N Mendes, RoI Blancohttp://dl.acm.org/citation.cfm?id=2398500 44. Need to understand the question in order to answer it Entity Mention Queries: Common structure to entity mention queries:query = + Queries that return facts as an answer What form does the question take? (Question forms)Where was X born?When was X born?Who invented X?Where was X invented?What is the X of Y?Flights from ?x to ?yVisit old problems/solutions with scale (Parameterized Queries, Form Based Queries,Query Template, Template Based Query)Takeaway: Create Content that will provide great answers to these kinds of questions(for entities relevant to your audience) 45. Social Graphs Interest Graphs Mobile Social graphs Attraction graphs Engagement graphs Attention Graphs Intent graph User Query Graph .. 46. Takeaway: Write engaging content around your audiences interests(Find ways Big Data - to determine their interests) 47. Anatomy of a Google SearchResults Page (Revisited)SearchOver DataSearchOverDocuments 48. Slide:3 https://www.flickr.com/photos/67262490@N04/6151466225/ Slide 5 https://www.flickr.com/photos/outsourcetechndu/8241430872/ Slide 9: https://www.flickr.com/photos/drs2biz/197524395/ Slide 3: https://www.flickr.com/photos/106426559@N03/10448641806/ Slide 3: https://www.flickr.com/photos/amynkassam/2866419139/ Slide 5 https://www.flickr.com/photos/legocy/8291983493/in/photolist slide 4: https://www.flickr.com/photos/mekz/2389113709/in/photolist