Visualizing Data in Elasticsearch DevFest DC 2016

Post on 27-Jan-2017

88 views 0 download

Transcript of Visualizing Data in Elasticsearch DevFest DC 2016

1

Search, Time Series, and Graph Analysis in the Cloud

Dave Ericksondave@elastic.co

Visualizing Data in Elasticsearch

2

Dave Erickson – Developer

• Biotech

• Electronic Archives & Libraries

• Geospatial

• Healthcare

• Air Traffic Control

• Financial Services

3

4

Elastic Stack: Real Time Search & Analytics at Scale

Elastic Cloud

Security

X-Pack

KibanaUser Interface

ElasticsearchStore, Index,& Analyze

IngestLogstash Beats

+

Alerting

Monitoring

Reporting

Graph

5

6

Visualization is Importanthttps://www.reddit.com/r/dataisugly/

7

Visualization in the Cloud

• Qualities We Want:‒ Parallel‒ Highly Available‒ Platform Independent‒ Multi-tenancy‒ Extensible

• Use Cases:‒ Search, Discovery, & Analytics‒ Metrics & Time Series Data‒ Structured & Unstructured‒ Security Analytics

8

Wait …

Why would you use a search engine for analytics?

9

Search indexes have been around for a long time

10

Scaled, distributed search indexes have been around for a long time

11

Electronic search engines have been around for a long time

1928 – patent application by Emanuel Goldberg for a “Statistical Machine”http://www.google.com/patents/US1838389Basically an optical version of grep that predates almost everything

12

Timeline, in no way complete

• 7th Century B.C.E. ? – library catalogs• 1928 – Goldberg “Statistical Machine”

– Optical search on microfilm

• 1945 – Vannevar Bush “microfilm rapid selector”; “Memex”• 1960s – SMART Information Retrieval System (Cornell U.)• 1974 – grep first appears in Unix v4• 1990s – WWW search engines• 1999 – Doug Cutting Lucene search indexer

13

Inverted Indexes

• Pay the cost at indexing time (insertion time)

• Reap the benefits at retrieval time

“the quick brown fox” “brown fox in the forest”Document (1) Document (2)

“brown bear”Document (3)

Term Postings List Statistics (count)

quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1

14

Pretty Good At RetrievalFind documents mentioning “foxes” ?

Term Postings List Statistics (count)

quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1

“the quick brown fox” “brown fox in the forest”Document (1) Document (2)

“brown bear”Document (3)

15

Excellent at SearchFind documents mentioning “quick” AND “fox” ?

Term Postings List Statistics (count)

quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1

inte

rsec

tion

“the quick brown fox” “brown fox in the forest”Document (1) Document (2)

“brown bear”Document (3)

16

“the quick brown fox” “brown fox in the forest”Document (1) Document (2)

“brown bear”Document (3)

Excellent at Real Time AnalyticsWhat was the most commonly mentioned term?

Term Postings List Statistics (count)

quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1

17

“the quick brown fox” “brown fox in the forest”Document (1) Document (2)

“brown bear”Document (3)

Histogram about the mention of foxes over time:

Term Postings List Statistics (count)

quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1?

18 18

Columnar Indexes

text: “the quick brown fox”date: Monday

text: “brown fox in the forest”date: Tuesday

Document (1)

Document (2)

text: “brown bear”date: Monday

Document (3)

Doc id Date

1 Monday

2 Tuesday

3 Monday

Term Postings List Statistics (count)

quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1

19 19

Now do it in parallel

• Distributed

• Non-blocking

• Read / Write

• Commodity hardware

• Fault-tolerance

• High Availability

20 20

Use Cases

21

QuickLive Demo

22

23

24

25

26

Thank You!dave@elastic.co