Be Lazy & Scale

Post on 11-Jan-2017

381 views 0 download

Transcript of Be Lazy & Scale

Be Lazy & ScaleFull-Text Tagging Billions Of Messages

reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!

pam_unix(sshd:session): session opened for user xxxxxx by (uid=0)

Bad protocol version identification 'root' from xxx.xx.xxx.xx port xxxxx

reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!

Bad protocol version identification 'root' from xxx.xx.xxx.xx port xxxxx

pam_unix(sshd:session): session opened for user xxxxxx by (uid=0)

PercolatorTraditionally you design documents based on your data, store them into an index, and then define queries via the search API in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an index and then, via the percolate API, you define documents in order to retrieve these queries.https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html

reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!

reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!

"possible break-in attempt!"

"bad protocol version identification"

"session opened"

/0-10/173

$$$

$$$

Bad protocol version identification ...

"bad protocol"Phrase Query

versionTerm Query

ident*Prefix Query

Boolean Query AND, OR, NOT

105s

1 Big OR

+3.8%

109s

160

500000

~ 33%

Tags(real life)Runs(based on real messages)Matches

-8.5%96s

Using single char message

'a'

105s

Trivial 1 Term

clause / tag

-72.8%28.6s

160

~ 29550000

0

~ 33%

Tags(real life)Terminal ClausesRuns(based on real messages)Matches

-41%62.7s

Keep only 1 clause / tag

Perco. Queries Index

Register Queries

In-Memory Index

Bad protocol ...

Bad protocol ...

Perco. Req. Bad protocol ...

Perco. Resp.

ExecuteEachQuery

[0, 1, 2, 3]"POSSIBLE BREAK-IN ATTEMPT!"

connect*

version

Query Term Index

possible --> 0break --> 1in --> 2attempt --> 3version --> 4

Query Clauses Rewritten Clauses

connect*

4

Query Term Indexpossible --> 0break --> 1in --> 2attempt --> 3version --> 4

reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed - POSSIBLE BREAK-IN ATTEMPT!

Raw Message

[reverse, mapping, checking, getaddrinfo, for, xxxxx, xxx, xxx, xxx, xxx, failed, possible, break, in, attempt]

Analyzed Message

[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]Message Rewritten in Query Space

truetruetruetruefalse

Query Term Presence Bitset

[reverse, mapping, checking, getaddrinfo, for, xxxxx, xxx, xxx, xxx, xxx, failed, possible, break, in, attempt]

[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]

Analyzed Message

Message Rewritten in Query Space

truetruetruetruefalse

Query Term Presence Bitset

[0, 1, 2, 3]"POSSIBLE BREAK-IN ATTEMPT!"

Quick Check / Early Termination

Actual Check~ contains

[reverse, mapping, checking, getaddrinfo, for, xxxxx, xxx, xxx, xxx, xxx, failed, possible, break, in, attempt]

[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]

Analyzed Message

Message Rewritten in Query Space

truetruetruetruefalse

Query Term Presence Bitset

connect*connect*

Brute Force /startsWith (FAST!)

[reverse, mapping, checking, getaddrinfo, for, xxxxx, xxx, xxx, xxx, xxx, failed, possible, break, in, attempt]

[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]

Analyzed Message

Message Rewritten in Query Space

truetruetruetruefalse

Query Term Presence Bitset

4version

Simple Lookup

AND/OR/NOT

105s

160Tags

500000Runs

~ 33%Matches

7.3s

x14.4Faste

r 8.8s

x22.2Faste

r

195s

320Tags500000Runs~ 33%Matches