1. Knowledge graph panel (to share) - ISWC...
Transcript of 1. Knowledge graph panel (to share) - ISWC...
Enterprise-scaleknowledge graphs
● Yuqing Gao, Microsoft● Anant Narayanan, Facebook● Alan Patterson, eBay● Jamie Taylor, Google● Anshu Jain, IBM
Panel: Enterprise-scale knowledge graphs
Knowledge Systems
Yuqing GaoAI and ResearchMicrosoft
Knowledge systems: World graph
Web pages, Web documents, Images, …
Search queries, views, click throughs, …
World graph• People• Places• Organizations• Things• Actions…
Academic graph• People• Publications• Fields of Study• Venues…
Knowledge systems: Academic graph
Authors, institutions, articles, conferences …
Knowledge acquisition, search, recommendation …
Work graph• People• Groups• Messages• Activities
…
Knowledge systems: Work graph
Emails, Messages, Documents, Meetings, …
Messages read/sent, Document author/shared, …
Infusing knowledge: From search to conversation
Search
Q&A
Enrichment Recommendations
Conversation
Relatedness
Bing – knowledge in answers
Knowledge in Conversation
I want to travel to NY 2 days before Thanksgiving, staying for a week
Okay, booking a flight to JFK from November 20 to November 27. Where will you be flying from?
Got it, I’ve found some flights for you …
From San Francisco, and also non-stop in first class
How about leaving in the afternoon
NY
JFKairport
Thanksgiving
SF
SFOairport
Nov 22
travel
NY
Thanksgiving
week
San Francisco
non-stop first class
leaving afternoon
FlightSearch
Delta2648
Filter Set
…
…
How do we bring knowledge systems to life?
Knowledge creation
Raw DataStructured + Unstructured
High-qualitySemanticallyOrganized
Challenges of scaled KGsBuilding a small KG is easy - building a vast system like Satori is a huge challenge
Coverage
Freshness
Have we got the information we need?
CorrectnessIs our information accurate?
Is information up to date?
Will Smith: Single entity, 108K facts assembled from 41 web sites.There are 200 Will Smiths on Wikipedia alone.
Three forces in constant conflict:
Increased freshness and coverage
Harder to ensure correctness
Increased correctness Harder to ensure freshness and coverage
Correctness is always hard – what is true and correct? Particularly critical in today’s world
Active research and product efforts in knowledge
• Unsupervised knowledge extraction from unstructured data in open domain
• Knowledge graph semantic embedding
• Autonomous knowledge inference & verification
• Real-time knowledge graph with archiving
• Large scale entity linking and disambiguation
• Ultra-scale knowledge representations
• Knowledge system for multi-lingual
• Knowledge Precision vs Comprehensiveness
• …
Thank you!
Facebook is known for the world’s largest social graph spanning billions of entities and trillions of assertions.
We’ve built technology over the last decade to enable rich connections between users and we’re eager to apply it to build a deeper understanding of not just people, but also the things that people care about.
Why a Knowledge Graph for FB?
The most socially relevant entities, concepts and words.
People, CelebritiesPlaces, Points of InterestMovies, TV, MusicSports
What do we put in it?
Deliberately focused and small to start with, US only.
~500M assertions~50M entities~100s of types
How large is it?
Attribute: adventurous, casual, sustainable, trendyDish: coffee and tea, bread, drink, parfait, cake, belgian waffle, gingerbread, liege waffle, turkey sandwich, fresh-squeezed lemonade, bacon waffleFeatures: Credit cards, Takeout, Wifi, -Parking, -ADAMeals: Breakfast, LunchSuggestions: liege waffle, lemonadeTelephone: (555) 987-1234Hours: { … }Website: http://www.heidiswafflehouse.com
Knowledge of the worldPeople, Places, Things
Heidis Waffle House
Use it to provide utility...Recommend smart
repliesEasy sharingEntity Detection
Memory
… and delight the user!
Memorabilia & Collectables
ParisFranceEuropeWartime
Michael JordanBasketballChicago Bulls
Pipeline
Graph - Products, properties, value types, fitment, relationships, standards, variations, people, places, brands, companies, events, dates.
Statistical Data Mining - Discovery (over billions of listings)
Buyer & Seller Data - Queries, clicks & conversions, seller descriptions
Knowledge Matching and Semantic Verification
Graph stats - Expect around 100M products, >1Bn triples
Problems
Biggest Problem: Identity Are these listings the same product
Identity depends on who’s asking
Buyer and Seller have different requirements
Proprietary + Confidential
Knowledge Graph
Jamie TaylorGoogle - Knowledge Graph Schema Team
ISWC 2018
Knowledge Graph
1 Billion Things
70 Billion Facts
What is it used for
Challenges
Challenges
What is personally challengingLong evolutionBigger than my brainSo many stakeholders
Challenges
What is personally challengingLong evolutionBigger than my brainSo many stakeholders
Wonderful Problems!
Challenges
What is personally challengingLong evolutionBigger than my brainSo many stakeholders
Survival Mechanism:A Principled Approach
Managing Identity
Managing Identity
Santos FC Before Pelé
Sailing
Cuba
Managing Identity
Santos FC Before Pelé
Sailing
Cuba
Maintaining Semantic Stability
E-Sports & Sports
Movies & Television
Broadcast, Podcast & Streaming
Managing Identity
Santos FC Before Pelé
Sailing
Cuba
Maintaining Semantic Stability
Knowledge Graph @ Watson - Context
Watson Jeopardy: Fact based QA on Wikipedia and other sourceshttps://en.wikipedia.org/wiki/Watson_(computer)
Strategic IP Insight Platform - SIIP: Interactive exploration & discovery on patents & journalshttps://www-935.ibm.com/services/us/gbs/bao/siip/, http://time.com/3208716/ibm-watson-cancer/
Watson Discovery Advisor: OnPrem QA and interactive exploration based discovery https://youtu.be/pHfmh1Bi6aA
Watson Discovery Service: Cloud SAAS model for discovery on unstructured text (focused on developer)https://www.ibm.com/watson/developercloud/discovery.html
Knowledge Integration Toolkit (KnIT): Understanding all the worlds medical abstractshttps://www-935.ibm.com/services/us/gbs/bao/siip/, http://time.com/3208716/ibm-watson-cancer/
Several Watson Services: Watson NLU, Retrieve and Rank, Document Conversion etc. https://www.ibm.com/watson/developercloud/services-catalog.html
Discovery is about the non-obviousDiscovery creates new knowledge (hitherto unknown / unrecorded in domain docs, data sources). New knowledge is surprising and anomalous. It could be formally abstracted in several forms:
Search and Exploration look for knowledge already available in the knowledge sources available to the system. They are necessary for Discovery but not sufficient
Confirmation of facts known to an SME of a domain is not discovery (listing all possible side effects of a drug which may have been mined from structured or unstructured data)
New link between entities: A new side effect of a drug, a new potential emerging company as an acquisition target or sales lead, a non-obvious but relevant case / law applicable in a particular trial / proceeding, a person of interest in a terrorist attack. (Link Prediction, Relationship Discovery, Relationship Ranking)
A potential new important entity in the domain: e.g. a new material for display technologies, a new investor for a particular investment area etc. (Entity Discovery, Entity Recommendation, Entity Ranking)
Changing significance of an existing entity: it’s changing relationships/attributes/metrics. e.g. increasing stake by an investor in an organization, increasing interaction between a person of interest and some criminal in an intelligence gathering scenario, decreasing number complaints for a particular product or service in a retail / consumer scenario. (Trend analysis, Distribution Analysis, Anomaly Detection)
In a nutshell• We haven’t focused on ONE Global knowledge graph• We have a framework for anyone to build their domain specific KGs• Offered as a part of Watson Discovery Advisor and now Watson
Discovery Service, also consumed internally by services / solutions• Clients In various sectors:
– Banking and Finance– IT Services, Customer Service– Cyber Security– Scientific Discovery: Life Science, Oil & Gas, Chemicals & Petroleum– Defense– Space Exploration– Media and Entertainment
valuable things we learnt..• polymorphic stores is the answer, stop fighting for one
store
• build surprise into your model, into your api
• evidence is primitive to the system, which every node and edge maps to
• push entity resolution to runtime through context
• every new piece of knowledge should know its cascade effect
Detailed description can be found at talks.anshu.co
Open problems that haunt us
• modeling and analyzing changing knowledge
• merge relationships discovered from unstructured data with known relationships
• federation of global, domain specific and customer specific knowledge
• incremental update of global knowledge on horizontally scaled stores
• scanned PDF extraction ..garbage in garbage out
Thank you.