DESIGNING A SEMANTIC SEARCH PLATFORM...

30
DESIGNING A SEMANTIC SEARCH PLATFORM DEEPAK SAINI April 2015 © COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Transcript of DESIGNING A SEMANTIC SEARCH PLATFORM...

DESIGNING A SEMANTIC SEARCH

PLATFORM

DEEPAK SAINI

April 2015

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Pls find this information

Source: http://www.slideshare.net/andrewhenson/data-modelling-and-knowledge-engineering-for-the-internet-of-things

• Limitations of Keyword Search

• Semantic Search

• Examples of Semantic Search Queries

• Moving parts in a Semantic Search Platform

• Putting together a search platform

Agenda

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

t

By 2020, IDC predicts the size of digital content will have reached 40,000 EB, or 40

Zettabytes (ZB)

The world‟s information is doubling every two years.

Digital Content Growth

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Source: : IDC’s Digital Universe Study, Dec 2012

• Key, relevant document remain hard to find.

• Gives me „What I said‟, instead of „What I want‟

Limitations Of Existing Search Platform

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Did you mean ‘financial bank’ or ‘river shore’?

• Multiple formats: Structured, unstructured, semi-structured

• Providing effective and efficient search ability over such heterogeneous collections within a single search engine remains a big challenge.

Increasing Heterogeneity and Complexity

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Investment Banks: Timely, Accurate data

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

• As an asset manager/research analyst

• Need timely industry information.

• Find actionable knowledge

• Maximize value for my research

• Reduce costs.

Example : “Get me equity research on HSBC authored by Greg Allen”

Source: : http://www.sapient.com/content/dam/sapient/sapientglobalmarkets/pdf/thought-leadership/SGM_Crossings_Spring2014.pdf

Investment Banks: Semantic Search as a Solution?

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

• „Business Context‟ as against „Keyword‟

search.

• Represents a significant productivity

increase for investment professionals who

spend a great deal of time looking for

research material.

• Allows the asset managers to maximize

their ROI on their research spending by

thematically mining the contents rather

than just the titles of millions of research

documents.

Moving from Syntax/Structure to Semantics

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Source: : http://knoesis.wright.edu/library/presentations/sheth-gdita.ppt

Examples of Ontology

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

• Defines the properties and attributes of

content within your domain

• The ontology, or semantic model is also able

to understand the relationships between

different pieces of information.

• Able to judge the meaning of content rather

than just its literal values and also the

application of this data to the purpose of your

business or organization.

Case Study (Problem Statement): RBS

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

How to help BIS team improve performance without increasing cost

Option Pros/Cons

Automate routine requests

Faster responseBenefit however limitedProblem still remains

Outsource BISresources

Potential cost savingsProblem not solved, but moved to a diff. placeQoS risks

Give bankers direct access to information sources

Risk of uncontrolled costs.Same as existing approach.

Source: www.smartlogic.com , Case study: RBS (formerly ABN AMRO) Saves Time and Effort with Semaphore

Case Study (Solution): RBS

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

A taxonomy-based approach to automate routine information requests.

Source: www.smartlogic.com , Case study: RBS (formerly ABN AMRO) Saves Time and Effort with Semaphore

Approaches To Semantic Search

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Example Queries – Free Text Search

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Example Queries – Free Text Search on Metadata

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Example Queries – Semantic Search :Context specific

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Add More Meaning/Context : Combine documents, data and triples

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Data stored in Triples

Expressed as Subject : Predicate : Object

Example: “Pranab Mukherjee” : livesIn : “New Delhi”

“New Delhi” : isIn : “India”

Rules tell us something about the triples

If (A livesIn X) AND (X isIn Y) then (A livesIn Y)

Inference; “Pranab Mukherjee” : livesIn : “India”

Example Queries – Graph Search

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Components – Content Extraction

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Unstructured Content

MetadataExtracted Textual

Content

• Extract meaning from unstructured

data

• Transform into structured data for

analysis

Ontology Management

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

A tool that supports lists, controlled vocabularies, taxonomies, thesauri or

ontologies:

• Hierarchical relationships

• Associative relationships

• Synonyms

http://wiki.opensemanticframework.org/index.php/Ontology_Tools

Content Classification

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

• Analyze document

• Add metadata ‘tags’ that describe that document which are sourced from

Ontology

Data Store & Search Engine

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Logical Architecture

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Information Architect

Research Consumers/Providers

DATA SOURCES

Content Providers

Companies Data

TRBC, GICS

ResearchCorpus

Content Import

Content Standardization Indexing

Ontology Management

Query Processing

Search Results Processing

Ontology Search Graph SearchSearch

Management

Content

Indexes

Triples

UserProfile

Configurations

Classification

Supported Search Types

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Free Text SearchGraph Search

Semantic Search

Search Types

Processing

Query ParserGraph Query Translator

SemanticQuery Translator

• Standard Data• Research Documents• SEC Filings• Semantic Search Tags• Metadata

ML XML Store(Indication +Data)ML Triple Store

(Fact / Truth)

• Ontology Data• Edgar Data• Open Data• Document Metadata

Face

ts &

Filt

ers

SES(Ontology Store)

Storage

Content Delivery

Free Text SearchGraph Search Semantic Search

Lessons Learnt

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

• Shaking off the search engine's perspective and adapting a user's

point of view

• Search results are never good enough. Always scope for

improvement.

• Continuous refinement in Ontology, text

mining, classification, relevance scores, indexing....

• Challenges in converting unstructured content to structured.

• Limitations of the tools used

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

References

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

http://knoesis.wright.edu/library/presentations/sheth-gdita.ppt

http://knoesis.wright.edu/library/download/perry_geos2007.pdf

http://www.vldb.org/pvldb/1/1454198.pdf

http://www.slideshare.net/heimohanninen/business-ontology-13-an-overview

http://hlwiki.slais.ubc.ca/index.php/Semantic_search

http://www.corporate-semantic-web.de/applications-and-use-cases/articles/museumsportal-berlin.html

http://www.marklogic.com

http://www.cambridgesemantics.com/

http://www.sapient.com/content/dam/sapient/sapientglobalmarkets/pdf/thought-

leadership/SGM_Crossings_Spring2014.pdf

Questions?

© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL

Deepak Saini

[email protected]

THANK YOU