The Enterprise Search Market in a Nutshell
-
Upload
dr-haxel-congress-and-event-management-gmbh -
Category
Internet
-
view
1.879 -
download
0
Transcript of The Enterprise Search Market in a Nutshell
1
The Enterprise Search Market in a Nutshell
Iain Fletcher
October 19, 2015
ICIC 2015, Nice
2
Agenda
• About Search Technologies (30 seconds)
• The enterprise search market
• Likely future architectures for supporting
important search applications
3
Search Technologies: Background
San Diego
London UK
San Jose, CR
Cincinnati
San Francisco
Washington (HQ)
Frankfurt DE
• Founded 2005
• 180 employees
• 600+ customers
• Independent consulting company
• Focus on enterprise search
• Working will all leading platforms
Prague, CZ
4
600+ Customers
5
The Enterprise Search Market
6
High-level Search Engine Classifications
1. Part of a portfolio, many are recently acquired technologies
– E.g. SharePoint/FAST, HP Autonomy, IBM/Vivisimo, Dassault/Exalead,
Oracle/Endeca
2. Stand-alone specialists, often deployed to address specific apps or
challenges
– E.g. GSA, Coveo, Attivio, Sinequa, Recommind
3. Open source, with or without support or proprietary add-ons
– Raw: Lucene, Solr, Elasticsearch
– With support/add-ons: LucidWorks, Cloudera Search, Elastic ELK
4. Cloud-based services, typically based on open source technology
– E.g. Amazon Cloudsearch (Solr), Microsoft Azure search (Elasticsearch)
7
The dominant market share is currently with
SharePoint, open source, and the GSA
• SharePoint 2013 search is credible, and bundled
– Search teams are under pressure to use it, or to provide a
compelling reason to do otherwise
• Solr and Elasticsearch are robust and reliable
– Thanks to very wide-spread deployment
• The Google brand sells – and a lot of GSAs have been
shipped during the past few years
Market Observations
8
Functional Observations
• Core indexing / searching is generally fast and reliable
– Search is a maturing / converging technology
• Key differences remain in peripheral functionality, such as
content processing prior to indexing, and query processing
– Coveo, Attivio, Sinequa etc. have well-developed indexing
pipelines, UI tools, and a range of data connectors
– SharePoint and GSA are delivered with limited content
processing functionality and limited connectivity
– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t
provide a formal indexing pipeline, UI, or connectors
9
Further Observations
• The search engines with less focus on peripheral issues
such as content processing and connectivity have dominant
market share
• Connectivity is often challenging, especially when
combined with continual data growth, and document-level
security requirements
• The movement of data sets to the cloud adds further
complexity for enterprise search systems
– Hybrid indexing environments will be with us for some years
– Some content sets in the cloud, some behind the firewall
10
Great Search requires Attention to Detail
E.g. in content processing
prior to indexing • Normalization
– Names, dates, synonyms….
• Entity identification and resolution
• Categorization
• Document vector extraction
• Document splitting and concatenation
• Link & popularity analysis
• Dupe & near-dupe detectionIndex
security
category
metadata
11
Future Directions for Search
So what will search architectures look like in the future?
Important influences:
• The business need for organizational and analytical agility
• The convergence of search and (“big data”) analytics
• Continual growth in data volumes, and evolution in
repository / storage fashions
12
Converging Architectures
Let’s take a brief look at:
1. The “Big Data Architecture”, as evangelized by IBM,
Cloudera, etc.
2. Recent Search Architectures
Background Info
13
The Big Data Architecture
Designed for Structured Data
14
The Traditional Search Architecture
Integrated Search EngineContentSources
Connectors Index Pipeline SearchIndexEmployee
Directory
CMS
File Share
UI
Etc.
Designed for Unstructured Content
15
The Traditional Search Architecture
Integrated Search EngineContentSources
Connectors Index Pipeline SearchIndexEmployee
Directory
CMS
File Share
UI
Etc.
• As data volumes grow, re-indexing
becomes challenging
• The rate at which content can be
acquired from repositories is usually the
bottleneck
Designed for Unstructured Content
16
The Traditional Search Architecture
Integrated Search EngineContentSources
Connectors Index Pipeline SearchIndexEmployee
Directory
CMS
File Share
UI
Etc.
• A few documents-per-second?
• There are only 2.6 million seconds in a
month
RE-INDEX
17
A Better Search Architecture
• Re-indexing rates greatly improved
• “Touch-time” with repositories can be managed autonomously
Search EngineContentSources
ConnectorsIndex
PipelineSearchIndex
EmployeeDirectory
CMS
Etc.
RE-INDEX
Content
Processing
SecureCache
Iterative
Development
18
The Future Architecture?
Hadoop
Search EngineContentSources
ConnectorsIndex
PipelineSearchIndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
SecureCache
Iterative
Development
• This environment will encourage ever more sophisticated text analytics
• We expect to see much innovation in text analytics during the next few years
• The deliverable is a better, and richer search index
19
An Established Architecture
Hadoop
Search EngineContentSources
ConnectorsIndex
PipelineSearchIndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
SecureCache
Iterative
Development
• Google.com works something like this, since 2004
20
An Integrated Search/Analytics Architecture
Hadoop
ContentSources
Connectors
CMS
File system
Rapid Indexing
Content
Processing
SecureCache
Iterative
Development
ETL
DataSources
Data Warehouse
Logfiles
Etc.
Etc. Search App.
Search App.
Analysis App.
Analysis App.
• Encourages agile exploitation of data and content resources
21
Summary 1
• Search and Big Data applications are tending towards to the same architecture
• Autonomous connectivity and content processing simplifies and de-risks – if you can get it right
• The foundation of great search is still a clean, rich and detailed index
• The “search index” itself is a mature technology, almost a commodity
• Much of the innovation during the next few years will be in text analytics, and other methods of preparing content prior to indexing
22
The compulsory analyst quote….
And finally….
“Enterprise Search Can Bring Big Data Within Reach”
• Multiple, purpose-built indexes that are derived from enriched content are necessary.
http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/
* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog
23
The Enterprise Search Market in a Nutshell
Iain Fletcher
October 20, 2015
Questions?
24
Spare Slides
25
Reference Architecture
Content sources
Connectors
Indexes
Semantics
Text Mining
Quality Metrics
Content Processing Pipelines
Big Data Framework
Indexes
Queryparsing
Search Engine
Web Browser
Staging Repository
26
Where is the Focus?
• The Business View
• The Implementation View
ApplicationContent Capture & Preparation
Data Store
/ Index
ApplicationContent Capture
& PreparationData Store
/ Index