Search, Signals & Sense: An Analytics Fueled Vision

38
Search, Signals & Sense: An Analytics Fueled Vision Seth Grimes @sethgrimes

description

Keynote presented by Seth Grimes at the Open Source Search Conference, October 2, 2012

Transcript of Search, Signals & Sense: An Analytics Fueled Vision

Page 1: Search, Signals & Sense: An Analytics Fueled Vision

Search, Signals & Sense:An Analytics Fueled Vision

Seth Grimes@sethgrimes

Page 2: Search, Signals & Sense: An Analytics Fueled Vision

A Sense Making Story

New York Times,September 30, 2012

Page 3: Search, Signals & Sense: An Analytics Fueled Vision

New York Times,September 8, 1957

Valium: Starting a Chain of Connections

Page 4: Search, Signals & Sense: An Analytics Fueled Vision

H.P. Luhn

By H.P. Luhn, inIBM Journal,April, 1958

http://altaplana.com/ibm-luhn58-LiteratureAbstracts.pdf

Page 5: Search, Signals & Sense: An Analytics Fueled Vision
Page 6: Search, Signals & Sense: An Analytics Fueled Vision

Modelling Text

“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.”

-- H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.

Luhn’s analysis of Messengers of the Nervous System, a Scientific American article http://wordle.net,

applied to the NY Times article

Page 7: Search, Signals & Sense: An Analytics Fueled Vision

New York Times,September 8, 1957

Luhn’s Example

Page 8: Search, Signals & Sense: An Analytics Fueled Vision

Close Reading

Page 9: Search, Signals & Sense: An Analytics Fueled Vision
Page 10: Search, Signals & Sense: An Analytics Fueled Vision

Can Software Make the Connection?

Mark Lombardi, George W. Bush, Harken Energy and Jackson Stephens, c. 1979-90, Detail

Page 11: Search, Signals & Sense: An Analytics Fueled Vision

There and Back Again: Modelling Text, 2

The text content of a document can be considered an unordered “bag of words.”

Particular documents are points in a high-dimensional vector space.

Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975.

Page 12: Search, Signals & Sense: An Analytics Fueled Vision

Modelling Text, 3

We might construct a document-term matrix...• D1 = “I like databases”• D2 = “I hate hate databases”

and use a weighting such as TF-IDF (term frequency–inverse document frequency)…

in computing the cosine of the angle between weighted doc-vectors to determine similarity.

I like hate databases

D1 1 1 0 1

D2 1 0 2 1http://en.wikipedia.org/wiki/Term-document_matrix

Page 13: Search, Signals & Sense: An Analytics Fueled Vision

Modelling Text, 4

In the form of query-document similarity, this is Information Retrieval 101.• See, for instance, Salton & Buckley, “Term-Weighting

Approaches in Automatic Text Retrieval,” 1988.• A useful basic tech paper: Russ Albright, SAS, “Taming Text

with the SVD,” 2004.

Given the complexity of human language, statistical models may fall short.

“Reading from text in general is a hard problem, because it involves all of common sense knowledge.”

-- Expert systems pioneer Edward A. Feigenbaum

Page 14: Search, Signals & Sense: An Analytics Fueled Vision

From Text to Data: Features

Analytical methods make text tractable.Latent semantic indexing utilizing singular value

decomposition for term reduction / feature selection.

Classification technologies / methods:• Naive Bayes.• Support Vector Machine.• K-nearest neighbor.

Page 15: Search, Signals & Sense: An Analytics Fueled Vision

Thus the Orb he roam'dWith narrow search; and with inspection

deep Consider'd every Creature, which of all Most opportune might serve his Wiles.

-- John Milton, Paradise Lost

“Reading from Text is a Hard Problem”

Eugène Delacroix, St. Michael Defeats the Devil

Page 16: Search, Signals & Sense: An Analytics Fueled Vision

Thus the Orb he roam'dWith narrow search; and with inspection

deep Consider'd every Creature, which of all Most opportune might serve his Wiles.

-- John Milton, Paradise Lost

Eugène Delacroix, St. Michael Defeats the Devil

Data, Search, Analysis, and Discovery

Data Space

For features Analysi

s

Intent, Goals

Page 17: Search, Signals & Sense: An Analytics Fueled Vision

The User Interface

“Search is the UI for data today.”-- Grant Ingersoll, Chief Scientist, LucidWorks

Quoted by Gil Press in Forbes,

“LucidWorks: Bringing Search to Big Data”http://www.forbes.com/sites/gilpress/2012/09/24/lucidworks-bringing-search-to-big-data/

What’s beyond?

Page 18: Search, Signals & Sense: An Analytics Fueled Vision

Search and Sensemaking

“It is convenient to divide the entire information access process into two main components: information retrieval through searching and browsing, and analysis and synthesis of results. This broader process is often referred to in the literature as sensemaking. Sensemaking refers to an iterative process of formulating a conceptual representation from of a large volume of information. Search plays only one part in this process.”

-- Marti Hearst, 2009http://searchuserinterfaces.com/

Page 19: Search, Signals & Sense: An Analytics Fueled Vision

Senseless Search

New but old: Dumb and siloed

Page 20: Search, Signals & Sense: An Analytics Fueled Vision

Better?

Searcher Supplied Sense

Page 21: Search, Signals & Sense: An Analytics Fueled Vision

Siloed signals.

More better?

Page 22: Search, Signals & Sense: An Analytics Fueled Vision

Semantic Search Engines

Meh.

Page 23: Search, Signals & Sense: An Analytics Fueled Vision

Clustered Clarity

Carrot2.(open source)

Page 24: Search, Signals & Sense: An Analytics Fueled Vision

Semanticized (Web) Search

Google Knowledge Graph

Page 25: Search, Signals & Sense: An Analytics Fueled Vision

Search Fronted Analysis & Discovery

Fusions, Signals

Page 26: Search, Signals & Sense: An Analytics Fueled Vision

Old Search Sensemaking

Search on: keywords + identity, history & context

Sources: content/type silos

Unified

Indexed: terms + metadata (properties)

Returned: hit lists Categories / clusters / answers first

Relevance: PageRank (Inferred) intent

Prevalence: plenty of new platforms with old(ish) search

Plenty of established search with new(ish) capabilities, also wanna-bes.

Toward Semantic Search Sensemaking

Page 27: Search, Signals & Sense: An Analytics Fueled Vision

Platforms and ecosystems.

APIs and services.

Text and content analytics --Discerns and extracts features including

relationships from source materials.

Features = entities, key-value pairs, concepts, topics, events, sentiment, etc.

Provide (for) BI on content-sourced data.

Data integration, record linkage, data fusion.

The Back End

Page 28: Search, Signals & Sense: An Analytics Fueled Vision

Text/content analytics generates semantics to bridge search, BI, and applications, enabling next-generation information systems.

Search BI

Applica-tions

Search based applications (search + text + apps)

Information access (search + text + BI)

Integrated analytics (text + BI)

Text analytics (inner circle)

Semantic search (search + text)

NextGen CRM, EFM, MR, marketing, …

Text+ Technology Mashups

Page 29: Search, Signals & Sense: An Analytics Fueled Vision

Analytical Assets (Open Source)

>>> import nltk>>> sentence = """At eight o'clock on Thursday morning... Arthur didn't feel very good.""">>> tokens = nltk.word_tokenize(sentence)>>> tokens['At', 'eight', "o'clock", 'on', 'Thursday', 'morning','Arthur', 'did', "n't", 'feel', 'very', 'good', '.']>>> tagged = nltk.pos_tag(tokens)>>> tagged[0:6][('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),('Thursday', 'NNP'), ('morning', 'NN')]

http://nltk.org/tm: Text Mining PackageA framework for text mining applications within R.

Page 30: Search, Signals & Sense: An Analytics Fueled Vision

A Big Data Analytics Architecture

http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html

http://hpccsystems.com/ (GNU Affero GPL)

Page 31: Search, Signals & Sense: An Analytics Fueled Vision

Commercial (Non-OS) Solutions Plug In

Page 32: Search, Signals & Sense: An Analytics Fueled Vision

Drivers and Trends

Social media!… and personal-social-enterprise integration.

Via-API cloud services.

Big Data (even if you don’t like the term).Volume and velocity mean new analytical approaches.Variety: new types and a new fusion imperative.

Sentiment: Mood, opinions, emotions, intent.

Question answering.

Page 33: Search, Signals & Sense: An Analytics Fueled Vision

Text Tech Initiatives

Now and near future.• Broader & deeper international language support.• Sentiment analysis, beyond polarity.

Emotions, intent signals. etc.• Identity resolution & profile extraction.

Online-social-enterprise data integration.• Semantic data integration, Complex Data. • Speech analytics.• Discourse analysis.

Because isolated messages are not conversations.

• Rich-media content analytics.• Augmented reality; new human-computer interfaces.

Page 34: Search, Signals & Sense: An Analytics Fueled Vision

http://timoelliott.com/blog/2010/10/sap-businessobjects-augmented-explorer-now-available-resources-to-test-it.html

Personal. Mobile. Intelligent?

Page 35: Search, Signals & Sense: An Analytics Fueled Vision

A Focus on Information & Applications

Now and near future.• Signal detection.

Sentiment, emotion, identity, intent.• Semanticized applications.

Linkable, mashable, enrichable.• Rich information.

Context sensitive, situational.

Σ = Sensemaking.

Page 36: Search, Signals & Sense: An Analytics Fueled Vision
Page 37: Search, Signals & Sense: An Analytics Fueled Vision

Onward… to Q&A

Page 38: Search, Signals & Sense: An Analytics Fueled Vision

Search, Signals & Sense:An Analytics Fueled Vision

Seth Grimes@sethgrimes