DESIGNING A SEMANTIC SEARCH PLATFORM...
Transcript of DESIGNING A SEMANTIC SEARCH PLATFORM...
DESIGNING A SEMANTIC SEARCH
PLATFORM
DEEPAK SAINI
April 2015
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Pls find this information
Source: http://www.slideshare.net/andrewhenson/data-modelling-and-knowledge-engineering-for-the-internet-of-things
• Limitations of Keyword Search
• Semantic Search
• Examples of Semantic Search Queries
• Moving parts in a Semantic Search Platform
• Putting together a search platform
Agenda
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
t
By 2020, IDC predicts the size of digital content will have reached 40,000 EB, or 40
Zettabytes (ZB)
The world‟s information is doubling every two years.
Digital Content Growth
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Source: : IDC’s Digital Universe Study, Dec 2012
• Key, relevant document remain hard to find.
• Gives me „What I said‟, instead of „What I want‟
Limitations Of Existing Search Platform
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Did you mean ‘financial bank’ or ‘river shore’?
• Multiple formats: Structured, unstructured, semi-structured
• Providing effective and efficient search ability over such heterogeneous collections within a single search engine remains a big challenge.
Increasing Heterogeneity and Complexity
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Investment Banks: Timely, Accurate data
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
• As an asset manager/research analyst
• Need timely industry information.
• Find actionable knowledge
• Maximize value for my research
• Reduce costs.
Example : “Get me equity research on HSBC authored by Greg Allen”
Source: : http://www.sapient.com/content/dam/sapient/sapientglobalmarkets/pdf/thought-leadership/SGM_Crossings_Spring2014.pdf
Investment Banks: Semantic Search as a Solution?
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
• „Business Context‟ as against „Keyword‟
search.
• Represents a significant productivity
increase for investment professionals who
spend a great deal of time looking for
research material.
• Allows the asset managers to maximize
their ROI on their research spending by
thematically mining the contents rather
than just the titles of millions of research
documents.
Moving from Syntax/Structure to Semantics
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Source: : http://knoesis.wright.edu/library/presentations/sheth-gdita.ppt
Examples of Ontology
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
• Defines the properties and attributes of
content within your domain
• The ontology, or semantic model is also able
to understand the relationships between
different pieces of information.
• Able to judge the meaning of content rather
than just its literal values and also the
application of this data to the purpose of your
business or organization.
Case Study (Problem Statement): RBS
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
How to help BIS team improve performance without increasing cost
Option Pros/Cons
Automate routine requests
Faster responseBenefit however limitedProblem still remains
Outsource BISresources
Potential cost savingsProblem not solved, but moved to a diff. placeQoS risks
Give bankers direct access to information sources
Risk of uncontrolled costs.Same as existing approach.
Source: www.smartlogic.com , Case study: RBS (formerly ABN AMRO) Saves Time and Effort with Semaphore
Case Study (Solution): RBS
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
A taxonomy-based approach to automate routine information requests.
Source: www.smartlogic.com , Case study: RBS (formerly ABN AMRO) Saves Time and Effort with Semaphore
Example Queries – Semantic Search :Context specific
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Add More Meaning/Context : Combine documents, data and triples
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Data stored in Triples
Expressed as Subject : Predicate : Object
Example: “Pranab Mukherjee” : livesIn : “New Delhi”
“New Delhi” : isIn : “India”
Rules tell us something about the triples
If (A livesIn X) AND (X isIn Y) then (A livesIn Y)
Inference; “Pranab Mukherjee” : livesIn : “India”
Components – Content Extraction
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Unstructured Content
MetadataExtracted Textual
Content
• Extract meaning from unstructured
data
• Transform into structured data for
analysis
Ontology Management
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
A tool that supports lists, controlled vocabularies, taxonomies, thesauri or
ontologies:
• Hierarchical relationships
• Associative relationships
• Synonyms
http://wiki.opensemanticframework.org/index.php/Ontology_Tools
Content Classification
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
• Analyze document
• Add metadata ‘tags’ that describe that document which are sourced from
Ontology
Logical Architecture
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Information Architect
Research Consumers/Providers
DATA SOURCES
Content Providers
Companies Data
TRBC, GICS
ResearchCorpus
Content Import
Content Standardization Indexing
Ontology Management
Query Processing
Search Results Processing
Ontology Search Graph SearchSearch
Management
Content
Indexes
Triples
UserProfile
Configurations
Classification
Supported Search Types
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Free Text SearchGraph Search
Semantic Search
Search Types
Processing
Query ParserGraph Query Translator
SemanticQuery Translator
• Standard Data• Research Documents• SEC Filings• Semantic Search Tags• Metadata
ML XML Store(Indication +Data)ML Triple Store
(Fact / Truth)
• Ontology Data• Edgar Data• Open Data• Document Metadata
Face
ts &
Filt
ers
SES(Ontology Store)
Storage
Content Delivery
Free Text SearchGraph Search Semantic Search
Lessons Learnt
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
• Shaking off the search engine's perspective and adapting a user's
point of view
• Search results are never good enough. Always scope for
improvement.
• Continuous refinement in Ontology, text
mining, classification, relevance scores, indexing....
• Challenges in converting unstructured content to structured.
• Limitations of the tools used
References
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
http://knoesis.wright.edu/library/presentations/sheth-gdita.ppt
http://knoesis.wright.edu/library/download/perry_geos2007.pdf
http://www.vldb.org/pvldb/1/1454198.pdf
http://www.slideshare.net/heimohanninen/business-ontology-13-an-overview
http://hlwiki.slais.ubc.ca/index.php/Semantic_search
http://www.corporate-semantic-web.de/applications-and-use-cases/articles/museumsportal-berlin.html
http://www.marklogic.com
http://www.cambridgesemantics.com/
http://www.sapient.com/content/dam/sapient/sapientglobalmarkets/pdf/thought-
leadership/SGM_Crossings_Spring2014.pdf