Text Analytics Past, Present & Future: An Industry View

52
Text Analytics Past, Present & Future: An Industry View Seth Grimes Alta Plana Corporation @sethgrimes June 5, 2014

description

Keynote presentation to JADT.org, June 5, 2014 

Transcript of Text Analytics Past, Present & Future: An Industry View

Text Analytics Past, Present & Future: An Industry View

Seth GrimesAlta Plana Corporation

@sethgrimes

June 5, 2014

Text Analytics: An Industry View

JADT – June 5, 2014

2

Text Analytics: An Industry View

JADT – June 5, 2014

3

Analytics is the systematic application of algorithmic methods that derive and deliver information, typically expressed quantitatively, whether in the form of indicators, tables, visualizations, or models.

• Systematic means formal & repeatable.

• Algorithmic contrasts with heuristic.

Text Analytics: An Industry View

JADT – June 5, 2014

4

Text analytics past:

Pioneers…

Document input and processing

Knowledge handling is key

Desk Set (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.Hans Peter Luhn

“A Business Intelligence System”IBM Journal, October 1958

Text Analytics: An Industry View

JADT – June 5, 2014

6

“Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.”

H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.

Text Analytics: An Industry View

JADT – June 5, 2014

10

Pipelines and patternsIBM’s

MedTAKMI, 1997-

http://www.research.ibm.com/trl/projects/textmining/index_e.htm

Text Analytics: An Industry View

JADT – June 5, 2014

11

Exhaustive extractionAn (old) Attensity example – NLP to identify roles

and relationships, for a law-enforcement application .

Text Analytics: An Industry View

JADT – June 5, 2014

12

Language engineeringGATE: General Architecture for Text Engineering.

http://gate.ac.uk/

Text Analytics: An Industry View

JADT – June 5, 2014

13

Text analytics present:

Business, technology, applications, and solutions…

Text Analytics: An Industry View

JADT – June 5, 2014

14

“Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” -- Philip Russom, the Data Warehousing Institute,

2007http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-

analytics.aspx

Text Analytics: An Industry View

JADT – June 5, 2014

15

Linguistics, statistics, and semanticsText analytics (typically) involves linguistic

modelling, statistical characterization, learned patterns, and semantic understanding of text-derived features –Named entities: people, companies, places, etc.Pattern-based features: e-mail addresses, phone

numbers, etc.Concepts: abstractions of entities.Facts and relationships.Events.Concrete and abstract attributes (e.g., “expensive”

& “comfortable”) including measure-value pairs.Subjectivity in the forms of opinions, sentiments,

and emotions: attitudinal data.– applied to business ends.

Text Analytics: An Industry View

JADT – June 5, 2014

16

SourcesIt’s a truism that 80% of enterprise-relevant

information originates in “unstructured” form:E-mail and messages.Web pages, online news & blogs, forum postings,

and other social media.Contact-center notes and transcripts.Surveys, feedback forms, warranty claims.Scientific literature, books, legal documents....

Non-text “unstructured” content?ImagesAudio including speechVideo

Value derives from patterns.

Text Analytics: An Industry View

JADT – June 5, 2014

17

ValueWhat do we do with text, whether online, on-

social, or in the enterprise?1. Post/Publish, Manage, and Archive.2. Index and Search.3. Categorize and Classify according to

metadata & contents.4. Extract information and Analyze.

Text Analytics: An Industry View

JADT – June 5, 2014

18

Semantics, analytics, and IRText analytics generates semantics to bridge

search, BI, and applications, enabling next-generation information systems.

Search BI/Big Data

Applica-tions

Search based applications (search + text + apps)

Information access (search + analytics)

Synthesis (text + BI)/(big data)

Text analytics (inner circle)

Semantic search (search + text)

NextGen CRM, EFM, MR, marketing, apps…

Text Analytics: An Industry View

JADT – June 5, 2014

19

Content, composites, connections 1

Text Analytics: An Industry View

JADT – June 5, 2014

20Content, Composites, Connections, 2Content, composites, connections 2

Text Analytics: An Industry View

JADT – June 5, 2014

21

ApplicationsText analytics has applications in:

Intelligence & law enforcement.Life sciences & clinical medicine.Media & publishing including social-media analysis and contextual advertizing.Competitive intelligence.Voice of the Customer: CRM, product management & marketing.Public administration & policy.Legal, tax & regulatory (LTR) including compliance.Recruiting.

Text Analytics: An Industry View

JADT – June 5, 2014

22

Opinion, sentiment & emotion

Text Analytics: An Industry View

JADT – June 5, 2014

23

Sentiment analysisA specialization, of relevance to:

Brand/reputation management.Customer experience management (CEM).Competitive intelligence.Survey analysis (EFM = Enterprise Feedback

Management).Market research.Product design/quality.Trend spotting.

Text Analytics: An Industry View

JADT – June 5, 2014

24

Data exploration via dashboards and workbenches.

Text Analytics: An Industry View

JADT – June 5, 2014

25

Text analytics present:

The market…

Text Analytics: An Industry View

JADT – June 5, 2014

26

http://altaplana.com/TA2014

Text Analytics: An Industry View

JADT – June 5, 2014

27

Military/national security/intelligenceLaw enforcement

Intellectual property/patent analysisFinancial services/capital markets

Product/service design, quality assurance, or warranty claimsOther

Insurance, risk management, or fraudE-discovery

Life sciences or clinical medicine

Online commerce including shopping, price intelligence, reviews

Content management or publishingCustomer /CRM

Search, information access, or Question AnsweringCompetitive intelligence

Brand/product/reputation managementResearch (not listed)

Voice of the Customer / Customer Experience Management

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

5%

6%

8%

9%

10%

11%

13%

14%

15%

16%

25%

27%

29%

33%

38%

38%

39%

What are your primary applications where text comes into play?

Text Analytics: An Industry View

JADT – June 5, 2014

28

Voice of the CustomerText analytics is applied to improve customer

service and boost satisfaction and loyalty.Analyze customer interactions and opinions –

• E-mail, contact-center notes, survey responses.• Forum & blog posting and other social media.

– to – • Address customer product & service issues.• Improve quality.• Manage brand & reputation.

Assessment of qualitative information from text helps users – • Gain feedback on interactions.• Assess customer value.• Understand root causes.• Mine data for measures such as churn likelihood.

Text Analytics: An Industry View

JADT – June 5, 2014

29

The commercial scene

Text Analytics: An Industry View

JADT – June 5, 2014

30

Online commerceText analytics is applied for marketing, search

optimization, competitive intelligence.Analyze social media and enterprise feedback to

understand the Voice of the Market: • Opportunities• Threats• Trends

Categorize product and service offerings for on-site search and faceted navigation and to enrich content delivery.

Annotate pages to enhance Web-search findability, ranking.

Scrape competitor sites for offers and pricing.Analyze social and news media for competitive

information.

Text Analytics: An Industry View

JADT – June 5, 2014

31

E-Discovery and complianceText analytics is applied for compliance, fraud and

risk, and e-discovery.Regulatory mandates and corporate practices

dictate –• Monitoring corporate communications• Managing electronic stored information for

production in event of litigationSources include e-mail (!!), news, social mediaRisk avoidance and fraud detection are key to

effective decision making• Text analytics mines critical data from unstructured

sources• Integrated text-transactional analytics provides rich

insights

Text Analytics: An Industry View

JADT – June 5, 2014

32

Web-site feedbacksocial media not listed above

chatemployee surveys

contact-center notes or transcripts

e-mail and correspondenceonline reviews

scientific or technical literatureFacebook postings

on-line forumscustomer/market surveys

comments on blogs and articlesnews articles

blogs (long form+micro)

0% 10% 20% 30% 40% 50% 60% 70%

16%19%

20%

20%

22%

26%31%

31%

32%

36%

37%

38%

42%

61%What textual information are you analyzing or do you plan

to analyze?

201420112009

Text Analytics: An Industry View

JADT – June 5, 2014

33

insurance claims or underwriting notes

video or animated images

photographs or other graphical images

field/intelligence reports

patent/IP filings

text messages/instant messages/SMS

Web-site feedback

chat

contact-center notes or transcripts

online reviews

Facebook postings

customer/market surveys

news articles

Twitter, Sina Weibo, or other microblogs

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%5%5%5%5%

7%9%

11%11%

12%12%12%13%

16%19%

20%20%

22%26%

31%31%

32%36%

37%38%

42%43%

46%What textual information are you analyzing or do you plan to

analyze?

Text Analytics: An Industry View

JADT – June 5, 2014

34

Events

Semantic annotations

Other entities – phone numbers, part/product numbers, e-mail & street addresses, etc.

Metadata such as document author, publication date, title, headers, etc.

Concepts, that is, abstract groups of entities

Named entities – people, companies, geographic locations, brands, ticker symbols, etc.

Relationships and/or facts

Sentiment, opinions, attitudes, emotions, percep-tions, intent

Topics and themes

0% 20% 40% 60% 80% 100%

Current; 33%

Current; 31%

Current; 34%

Current; 47%

Current; 51%

Current; 56%

Current; 47%

Current; 54%

Current; 66%

Expect; 21%

Expect; 24%

Expect; 23%

Expect; 23%

Expect; 28%

Expect; 25%

Expect; 33%

Expect; 28%

Expect; 22%

Do you currently need (or expect to need) to extract or analyze...

Text Analytics: An Industry View

JADT – June 5, 2014

35

“The share rise in users who selected Arabic…coincided with much of the civil unrest… in Middle Eastern countries.”

http://bits.blogs.nytimes.com/2014/03/09/the-languages-of-twitter-users/

Text Analytics: An Industry View

JADT – June 5, 2014

36

Arabic

Chinese

French

Greek

Italian

Korean

Portuguese

Scandinavian or Baltic

Turkish or Turkic

Other Arabic script (including Urdu, Pashto, Farsi, Dari)

Other European or Slavic/Cyrillic

-10% 0% 10% 20% 30% 40% 50% 60%

10%1%

16%9%

36%34%

2%2%

18%7%

4%3%

13%8%7%

38%3%2%3%2%

5%9%

17%3%

28%7%

17%24%

2%10%

11%15%

8%4%

17%21%

3%20%

4%0%

1%1%

2%0%

CurrentWithin 2 years

Non-English language support?

Text Analytics: An Industry View

JADT – June 5, 2014

37

Software & platform optionsText-analytics options may be grouped in general

classes.• Installed text-analysis application, whether

desktop or server or deployed in-database.• Data mining workbench.• Hosted.• Programming tool.• As-a-service, via an application programming

interface (API).• Code library or component of a business/vertical

application, for instance for CRM, e-discovery, search.

Text analytics is frequently embedded in search or other end-user applications.

The slides that follow next will present leading options in each category except Hosted…

Text Analytics: An Industry View

JADT – June 5, 2014

38

media monitoring/analysis interface

hosted or Web service (on-demand "API") option

supports data fusion / unified analytics

sector adaptation (e.g., hospitality, insurance, retail, health care, communications, financial services)

BI (business intelligence) integration

ability to create custom workflows or to create or change topics/categories yourself

big data capabilities, e.g., via Hadoop/MapReduce

predictive-analytics integration

open source

support for multiple languages

sentiment scoring

"real time" capabilities

low cost

deep sentiment/emotion/opinion/intent extraction

document classification

broad information extraction capability

ability to use specialized dictionaries, taxonomies, ontologies, or extraction rules

ability to generate categories or taxonomies

0% 10% 20% 30% 40% 50% 60% 70%

22%

25%

28%

30%

32%

33%

33%

36%

37%

40%

41%

43%

44%

45%

53%

53%

54%

64%What is important in a solution?

2014 (n=139)2011 (n=136)2009 (n=78)

Text Analytics: An Industry View

JADT – June 5, 2014

39

User decision criteriaPrimary considerations include –

Adaptation or specialization: To a business or cultural domain, language, information type (e.g., text, speech, images) & source (e.g., Twitter, e-mail, online news).

By-user customization possibilities: For instance, via custom taxonomies, rules, lexicons.

Sentiment resolution: Aggregate, message, or feature level. (What features? Topics, coreferenced entities?)

What sentiment? Valence & what else? Emotion? Intent?

Outputs: E.g., annotated text, models, indicators, dashboards, exploratory data interfaces.

Usage mode: As-a-service (API), installed, or hosted/cloud.

Capacity: Volume, performance, throughput, latency.

Cost.

Text Analytics: An Industry View

JADT – June 5, 2014

40

A few French companies

Text Analytics: An Industry View

JADT – June 5, 2014

41

Academic spin-offs

People Pattern

Text Analytics: An Industry View

JADT – June 5, 2014

42

Text analytics future:

Synthesis and sensemaking.

New York Times,September 8, 1957

Text Analytics: An Industry View

JADT – June 5, 2014

44

Emotion in text

Text Analytics: An Industry View

JADT – June 5, 2014

45

Emotion and outcomes

Text Analytics: An Industry View

JADT – June 5, 2014

46

Audio including speech.Images.Video.

http://www.geekosystem.com/facebook-face-recognition/

http://www.sciencedirect.com/science/article/pii/S0167639312000118

http://flylib.com/books/en/2.495.1.54/1/

Beyond Text

Text Analytics: An Industry View

JADT – June 5, 2014

47

The world of big dataMachine data (e.g., logs, sensor outputs,

clickstreams).Actions, interactions, and transactions:

geolocation and time.Profiles: individual, demographic & behavioral.Text, audio, images, and video.

Facts and feelings.

Text Analytics: An Industry View

JADT – June 5, 2014

48

(Accessible) data everywhere

Text Analytics: An Industry View

JADT – June 5, 2014

49

http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html

A big data analytics architecture (example)

Text Analytics: An Industry View

JADT – June 5, 2014

50

http://searchuserinterfaces.com/

“It is convenient to divide the entire information access process into two main components: information retrieval through searching and browsing, and analysis and synthesis of results. This broader process is often referred to in the literature as sensemaking. Sensemaking refers to an iterative process of formulating a conceptual representation from of a large volume of information.”

– Marti Hearst, 2009

Sensemaking

Text Analytics: An Industry View

JADT – June 5, 2014

51

http://www.businessweek.com/magazine/content/04_19/b3882029_mz072.htm

En route

Text Analytics Past, Present & Future: An Industry View

Seth GrimesAlta Plana Corporation

@sethgrimes

June 5, 2014