Global Analytics: Text, Speech, Sentiment, and Sense
-
Upload
seth-grimes -
Category
Data & Analytics
-
view
550 -
download
0
Transcript of Global Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and Sense
Seth GrimesAlta Plana Corporation
@sethgrimes
December 4, 2014
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
2
Thus the Orb he roam'd
With narrow search; and with inspection deep
Consider'd every Creature, which of all
Most opportune might serve his Wiles.
-- John Milton, Paradise Lost
“Reading from Text is a Hard Problem”
EugèneDelacroix, St. Michael Defeats the Devil
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
3
Thus the Orb he roam'd
With narrow search; and with inspection deep
Consider'd every Creature, which of all
Most opportune might serve his Wiles.
-- John Milton, Paradise Lost
“Reading from Text is a Hard Problem”
EugèneDelacroix, St. Michael Defeats the Devil
Data Space, Indexing
Search
Analysis
Intent, Goals
Context
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
4
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
5
Analytics is the systematic application of algorithmic methods that derive and deliver information, typically expressed quantitatively, whether in the form of indicators, tables, visualizations, or models.
• Systematic means formal & repeatable.
• Algorithmic contrasts with heuristic.
Analytics creates and/or applies models.
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
6
http://www.tropicalisland.de/NYC_New_York_Brooklyn_Bridge_from_World_Trade_Center_b.jpg
x(t) = ty(t) = ½ a (et/a + e-t/a)
= acosh(t/a)
http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg
Models make the unstructured computable.
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
7
Sixty+ years of analysis & modelling progress:
Text
Numbers
Patterns & Insights
Connections
Interactions
Document input and processing
Knowledge handling is key
Desk Set (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.
Hans Peter Luhn
“A Business Intelligence System”
IBM Journal, October 1958
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
10
“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.”
H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
Luhn’s analysis of Messengers of the Nervous System, a Scientific Americanarticle
http://wordle.net, applied to a Luhn-cited
NY Times article
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
11
“This rather unsophisticated argument on ‘significance’ avoids such linguistic implications as grammar and syntax... No attention is paid to the logical and semantic relationships the author has established.”
-- H.P. Luhn
~ 2004-5
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
15
Patterns, Insights & Connections
~ 2009-12
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
16
… also commonly explored via dashboards.
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
17
Current, 33%
Current, 31%
Current, 34%
Current, 47%
Current, 51%
Current, 56%
Current, 47%
Current, 54%
Current, 66%
Expect, 21%
Expect, 24%
Expect, 23%
Expect, 23%
Expect, 28%
Expect, 25%
Expect, 33%
Expect, 28%
Expect, 22%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Events
Semantic annotations
Other entities – phone numbers, part/product …
Metadata such as document author,…
Concepts, that is, abstract groups of entities
Named entities – people, companies, …
Relationships and/or facts
Sentiment, opinions, attitudes, emotions,…
Topics and themes
Do you currently need (or expect to need) to extract or analyze...
Text Analytics 2014http://altaplana.com/TA2014
What information?
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
18
Emotion and outcomes
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
19
“The share rise in users who selected Arabic…coincided with much of the civil unrest… in Middle Eastern countries.”
http://bits.blogs.nytimes.com/2014/03/09/the-languages-of-twitter-users/
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
20
10%
1%
16%
9%
36%
34%
2%
2%
18%
7%
4%
3%
13%
8%
7%
38%
3%
2%
3%
2%
5%
9%
17%
3%
28%
7%
17%
24%
2%
10%
11%
15%
8%
4%
17%
21%
3%
20%
4%
0%
1%
1%
2%
0%
0% 10% 20% 30% 40% 50% 60%
Arabic
Bahasa Indonesia or Malay
Chinese
Dutch
French
German
Greek
Hindi, Urdu, Bengali, Punjabi, or other…
Italian
Japanese
Korean
Polish
Portuguese
Russian
Scandinavian or Baltic
Spanish
Turkish or Turkic
Other African
Other Arabic script (including Urdu,…
Other East Asian
Other European or Slavic/Cyrillic
Other
Current
Within 2 years
Non-English language support?
Text Analytics 2014http://altaplana.com/TA2014
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
21
Audio including speech
Images
Video
IOT
http://www.geekosystem.com/facebook-face-recognition/
http://www.sciencedirect.com/science/article/pii/S0167639312000118
http://flylib.com/books/en/2.495.1.54/1/
Beyond text
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
22
http://searchuserinterfaces.com/
“It is convenient to divide the entire information access process into two main components: information retrieval through searching and browsing, and analysis and synthesis of results. This broader process is often referred to in the literature as sensemaking.
Sensemaking refers to an iterative process of formulating a conceptual representation from of a large volume of information.”
– Marti Hearst, 2009
Sensemaking
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
23
ChallengesContext
Interaction
Narrative and discourse
Correlation, integration, and synthesis
Sentiment++: Mood, opinions, emotions, intent
Question answering
Dialog, storytelling
Cross-lingual / “omni-channel” implementation
…
Prescription, autonomy
…
Singularity?
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
24
Opportunity enablersThe API economy
I.e., on-demand, via-API Web services
Cloud deployment and service delivery
…enabling rapid deployment
Data aggregation and enrichment
Examples: Gnip, DataSift, Spinn3r, and Moreover
Growth hacking
Knowledge graphs
Machine learning
Supervised, unsupervised, active, deep
Open source
Platforms and frameworks
Examples: UIMA, GATE… Salesforce, QlikView… Python, R
Global Analytics: Text, Speech, Sentiment, and Sense
LT-Accelerate – 4 December, 2014
25
Where to?
Global Analytics: Text, Speech, Sentiment, and Sense
Seth GrimesAlta Plana Corporation
@sethgrimes
December 4, 2014