Search Analytics at Enterprise Search Summit Fall 2011

Post on 27-Jan-2015

106 views 1 download

Tags:

description

This presentation describes what Search Analytics is, what value it brings to the table, how it can be used, what additional functionality and values can be build with search data, etc.

Transcript of Search Analytics at Enterprise Search Summit Fall 2011

Search Analytics

What? Why? How?

Otis Gospodnetić – Sematext International@otisg ◦ @sematext ◦ sematext.com

sematext.com/search-analytics

Copyright 2011 Sematext Int'l. All rights reserved.2

About Otis Gospodnetić

• ASF Member: Lucene, Solr, Nutch, Mahout

• Author: Lucene in Action 1 & 2

• Entrepreneur: Sematext, Simpy

Copyright 2011 Sematext Int'l. All rights reserved.3

Sematext Metrics

100% organic: no GMO, no VC 4 years old < 10 people 7 countries 3 timezones 2 continents > 100 customers

Copyright 2011 Sematext Int'l. All rights reserved.4

About Sematext

Products & Services

Consulting, Development, Tech Support:

Search (Lucene, Solr, ElasticSearch...) Big Data (Hadoop, HBase, Voldemort...) Web Crawling (Nutch, Droids) Machine Learning (Mahout)

Copyright 2011 Sematext Int'l. All rights reserved.5

Agenda

What is Search Analytics and why it matters Example reports and their value Optional: Search Analytics in the Cloud

Copyright 2011 Sematext Int'l. All rights reserved.6

Communication

twitter.com/sematext twitter.com/otisg hash tags: #stsa or #stanalytics http://sematext.com/search-analytics/index.html Raise your hand! otis@sematext.com

Copyright 2011 Sematext Int'l. All rights reserved.7

Why

searchusers

searchproviders

searchexperience

Copyright 2011 Sematext Int'l. All rights reserved.8

Why Oh Why

searchproviders

searchexperience

This search sucks!It takes 17 tries to find anything here!

F!?@#$%^&?!?

searchusers

Cool, the latest search tweaks made our site really sticky!

Awesome!

Copyright 2011 Sematext Int'l. All rights reserved.9

Fill in the Missing Piece

Search Analytics

Performance Monitoring

Quality Assurance

Tuning UI

Copyright 2011 Sematext Int'l. All rights reserved.10

Blind Leading the Blind

Copyright 2011 Sematext Int'l. All rights reserved.11

Analytics as Compass

Search logs are your Map

Search Analytics is your Compass

Copyright 2011 Sematext Int'l. All rights reserved.12

The Bottom Line Why

Measure and monitor everything. Supports (re)design, navigation choices Helps with content acquisition & enhancement Improve search experience Mula

Copyright 2011 Sematext Int'l. All rights reserved.13

The Moment of Truth

Question for the audience #1

What do you use for Search Analytics?

a) Home grown stuffb) Google Analyticsc) Omnitured) Webtrendse) Otherf ) Nothing

Copyright 2011 Sematext Int'l. All rights reserved.14

Search Analytics Basics

Collect: queries & clicks & interactions & ... Analyze: actions / xactions / conversions Output: reports – over time Output++: feedback loop

The means, not the goal Ongoing, not one-off

remember this

Copyright 2011 Sematext Int'l. All rights reserved.15

Search vs. Web Analytics

User intent and information needs vs. inferring Hand in hand Ideally you can relate data from both or even

unify it

Copyright 2011 Sematext Int'l. All rights reserved.16

Report Types

Failures vs. non-failures

Actionable vs. non-actionable

Trends vs. summaries

Copyright 2011 Sematext Int'l. All rights reserved.17

Failures vs. Non-Failures

Zero hits Low CTR Low MRR High bounce rate Low conversion rate Deep paging Deep clicking High latency

Query rate Query volume Top seen & clicked

docs Top queries Terms per query Search sessions Search users Distinct queries

Copyright 2011 Sematext Int'l. All rights reserved.18

Value of Failure Fixes

Zero hits Low CTR Low MRR High bounce rate Low conversion rate Deep paging Deep clicking High latency

Re-search

Findability

Relevance Tuning

Performance Tuning

Copyright 2011 Sematext Int'l. All rights reserved.19

Measure, then Fix

If you can't measure, it you can't fix it!

Copyright 2011 Sematext Int'l. All rights reserved.20

Relevance A/B Testing

Copyright 2011 Sematext Int'l. All rights reserved.21

Tracking Zero Hits

Copyright 2011 Sematext Int'l. All rights reserved.22

Watching Latency

Copyright 2011 Sematext Int'l. All rights reserved.23

Search Analytics & Measuring

If you can't measure it, you can't fix it!

You can't measure it if you don't have Analytics

Copyright 2011 Sematext Int'l. All rights reserved.24

Actionable vs. Non-Actionable

Zero hits Low CTR Low MRR High bounce rate Low conversion rate Deep paging Deep clicking High latency

Query rate Query volume Top seen & clicked

docs Top queries Terms per query Search sessions Search users Distinct queries

Copyright 2011 Sematext Int'l. All rights reserved.25

More Fixin' Query rate Query volume Search sessions Search users Top seen & clicked

docs Top queries Terms per query Distinct queries

Navigation & Design

Results Shuffling Diversification

Recommendations

AutoCompleteSearch box size

Copyright 2011 Sematext Int'l. All rights reserved.26

Output++: Data is Power

AutoComplete - $MM improvement Better DYM Spellchecker Related Searches Recommendations Relevance Feedback ...

Copyright 2011 Sematext Int'l. All rights reserved.27

Closing the Loop

searchusers

searchproviders

searchexperience

Copyright 2011 Sematext Int'l. All rights reserved.28

Resources

http://rosenfeldmedia.com/books/searchanalytics/

Search Analytics for Your SiteLouis Rosenfeld

Search Analytics What? Why? How?

Search Analytics with Flume and HBase

Search Analytics Business Value & NoSQL Backend

http://blog.sematext.com/tag/analytics/

Copyright 2011 Sematext Int'l. All rights reserved.29

Key Take-aways

Without Analytics you are blind

If you can't measure it, you can't fix it

Use Search Analytics to understand, measure and improve search

Using Search Analytics means having a competitive advantage

Copyright 2011 Sematext Int'l. All rights reserved.30

Time permitting:

Behind the scenes of Sematext Search Analytics

Behind the Scenes

Copyright 2011 Sematext Int'l. All rights reserved.31

sematext.com blog.sematext.com @sematext @otisg otis@sematext.com

Want SA? Grab me or go to: sematext.com/search-analytics

Hash tags: #stsa or #stanalytics

Contact

Copyright 2011 Sematext Int'l. All rights reserved.32

What We've Built

Search Analytics SaaS Numerous reports (e.g. query volume,

rate, latency, term frequencies / comparisons, hit buckets, search origins, etc.)

Trending over time Comparisons of time periods Top N reports Filter, slice and dice

Copyright 2011 Sematext Int'l. All rights reserved.33

Sematext Search Analytics

Copyright 2011 Sematext Int'l. All rights reserved.34

Big Dreams

SaaS Multitenant Large Scale – Massive Data Cloud

Copyright 2011 Sematext Int'l. All rights reserved.35

Storage Choices

RDBMS: MySQL, PostgreSQL HDFS Hive HBase Cassandra

Copyright 2011 Sematext Int'l. All rights reserved.36

SaaS vs. In-House

Question for the audience #2

SaaS vs in-house Search Analytics?

a) SaaSb) in-house

Copyright 2011 Sematext Int'l. All rights reserved.37

Sematext Search Analytics

Copyright 2011 Sematext Int'l. All rights reserved.38

Sematext Search Analytics

Copyright 2011 Sematext Int'l. All rights reserved.39

Sematext Search Analytics

Copyright 2011 Sematext Int'l. All rights reserved.40

Sematext Search Analytics

Copyright 2011 Sematext Int'l. All rights reserved.41

Data Flow See Search Analytics with Flume and HBase

http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/

Copyright 2011 Sematext Int'l. All rights reserved.42

Data Collection See Search Analytics with Flume and HBase

http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/

Copyright 2011 Sematext Int'l. All rights reserved.43

Core Tech

JavaScript Beacons Metric Capture Web App aka Receiver Flume Agents, Collectors, Sinks HBase MapReduce Aggregations Search Analytics Reporting Web App

Copyright 2011 Sematext Int'l. All rights reserved.44

What is Flume

Distributed data/log collection service Scalable, configurable, extensible Centrally manageable, open source

Agents get data from app, Collectors save it Abstractions: Source → Decorator(s) → Sink

Copyright 2011 Sematext Int'l. All rights reserved.45

What is HBase

Scalable, reliable, distributed, column-oriented DB On top of HDFS MapReducable

Copyright 2011 Sematext Int'l. All rights reserved.46

Data Flow, Detailed

Copyright 2011 Sematext Int'l. All rights reserved.47

Why Flume

Reliable delivery e.g. queue msgs locally if destination unreachable

Easy, centralized management via Web UI or console

Good community, good progress, now @ASF But: more complex, more moving parts On Flume: slideshare.net/cloudera/inside-flume Alternatives: Kafka, Scribe...

Copyright 2011 Sematext Int'l. All rights reserved.48

Why HBase

Scalable raw & aggregate data storage MapReduce data input Fast scans for time ranges, fast key lookups Easy storage and compute power expansion Good looking roadmap, community, progress

Copyright 2011 Sematext Int'l. All rights reserved.49

Open Sourcing

2 open-source projects:

github.com/sematext/HBaseWD

github.com/sematext/HBaseHUT See sematext.com/open-source/index.html

Patches for Flume and HBaseblog.sematext.com/tag/flume/

Copyright 2011 Sematext Int'l. All rights reserved.50

Challenges

Data size. Solutions: Compression (4-5x smaller with lzo) Data pruning (variable levels)

Query string distribution: very long-tail Lots of data to process, update, aggregate

Young tools: Flume, HBase Poor IO on EC2 Hadoop distributions

Copyright 2011 Sematext Int'l. All rights reserved.51

sematext.com blog.sematext.com @sematext @otisg otis@sematext.com

Want SA? Grab me or go to: sematext.com/search-analytics

Hash tags: #stsa or #stanalytics

Contact