Visualizing Data in Elasticsearch DevFest DC 2016
-
Upload
david-erickson -
Category
Technology
-
view
88 -
download
0
Transcript of Visualizing Data in Elasticsearch DevFest DC 2016
1
Search, Time Series, and Graph Analysis in the Cloud
Dave [email protected]
Visualizing Data in Elasticsearch
2
Dave Erickson – Developer
• Biotech
• Electronic Archives & Libraries
• Geospatial
• Healthcare
• Air Traffic Control
• Financial Services
3
4
Elastic Stack: Real Time Search & Analytics at Scale
Elastic Cloud
Security
X-Pack
KibanaUser Interface
ElasticsearchStore, Index,& Analyze
IngestLogstash Beats
+
Alerting
Monitoring
Reporting
Graph
5
6
Visualization is Importanthttps://www.reddit.com/r/dataisugly/
7
Visualization in the Cloud
• Qualities We Want:‒ Parallel‒ Highly Available‒ Platform Independent‒ Multi-tenancy‒ Extensible
• Use Cases:‒ Search, Discovery, & Analytics‒ Metrics & Time Series Data‒ Structured & Unstructured‒ Security Analytics
8
Wait …
Why would you use a search engine for analytics?
9
Search indexes have been around for a long time
10
Scaled, distributed search indexes have been around for a long time
11
Electronic search engines have been around for a long time
1928 – patent application by Emanuel Goldberg for a “Statistical Machine”http://www.google.com/patents/US1838389Basically an optical version of grep that predates almost everything
12
Timeline, in no way complete
• 7th Century B.C.E. ? – library catalogs• 1928 – Goldberg “Statistical Machine”
– Optical search on microfilm
• 1945 – Vannevar Bush “microfilm rapid selector”; “Memex”• 1960s – SMART Information Retrieval System (Cornell U.)• 1974 – grep first appears in Unix v4• 1990s – WWW search engines• 1999 – Doug Cutting Lucene search indexer
13
Inverted Indexes
• Pay the cost at indexing time (insertion time)
• Reap the benefits at retrieval time
“the quick brown fox” “brown fox in the forest”Document (1) Document (2)
“brown bear”Document (3)
Term Postings List Statistics (count)
quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1
14
Pretty Good At RetrievalFind documents mentioning “foxes” ?
Term Postings List Statistics (count)
quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1
“the quick brown fox” “brown fox in the forest”Document (1) Document (2)
“brown bear”Document (3)
15
Excellent at SearchFind documents mentioning “quick” AND “fox” ?
Term Postings List Statistics (count)
quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1
inte
rsec
tion
“the quick brown fox” “brown fox in the forest”Document (1) Document (2)
“brown bear”Document (3)
16
“the quick brown fox” “brown fox in the forest”Document (1) Document (2)
“brown bear”Document (3)
Excellent at Real Time AnalyticsWhat was the most commonly mentioned term?
Term Postings List Statistics (count)
quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1
17
“the quick brown fox” “brown fox in the forest”Document (1) Document (2)
“brown bear”Document (3)
Histogram about the mention of foxes over time:
Term Postings List Statistics (count)
quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1?
18 18
Columnar Indexes
text: “the quick brown fox”date: Monday
text: “brown fox in the forest”date: Tuesday
Document (1)
Document (2)
text: “brown bear”date: Monday
Document (3)
Doc id Date
1 Monday
2 Tuesday
3 Monday
Term Postings List Statistics (count)
quick 1 1brown 1, 2, 3 3fox 1, 2 2forest 2 1bear 3 1
19 19
Now do it in parallel
• Distributed
• Non-blocking
• Read / Write
• Commodity hardware
• Fault-tolerance
• High Availability
20 20
Use Cases
21
QuickLive Demo
22
23
24
25
26
Thank [email protected]