Introducing log analysis to your organization

Post on 21-Jan-2018

587 views 7 download

Transcript of Introducing log analysis to your organization

IntroducingLogAnalysisToYourOrganization

RafałKuć

Sematext UndMich

logs

metrics

cloud&

Next60minutes…

Logshipping- buffers- protocols- parsing

Centralbuffering- Kafka- Redis

Storage&Analysis- Elasticsearch- Kibana- Grafana

Why&How?- ShouldItry?- Opensource- Commercial

WhyYouShouldCare

Environmentsaregettingbigger

WhyYouShouldCare

Environmentsaregettingbigger

Containersareeverywhere

WhyYouShouldCare

Environmentsaregettingbigger

Containersareeverywhere

Infrastructureworkgetsautomated

CreatedbyKjpargeter - Freepik.com

WhyYouShouldCare

Environmentsaregettingbigger

Containersareeverywhere

Infrastructureworkgetsautomated

Logs&metricsatthesameplace

WhyYouShouldCare

Environmentsaregettingbigger

Containersareeverywhere

Infrastructureworkgetsautomated

Fasterdiagnostics==lessmoneyspent

Logs&metricsatthesameplace

GoingForCommercialSolution

cloud

GoingForCommercialSolution

cloud

GoingForCommercialSolution

cloud

GoingForCommercialSolution

cloud

GoingForCommercialSolution

cloud

GoingForCommercialSolution

cloud

GoingForCommercialSolution

cloud

GoingForCommercialSolution

cloud

GoingForCommercialSolution

cloud

GoingForCommercialSolution

IconmadebySmashicons from www.flaticon.com

GoingOpen-Source

GoingOpen-Source

GoingOpen-Source

GoingOpen-Source

GoingOpen-Source– Today’sFocus

Logshippingarchitecture

File

Logshippingarchitecture

File Shipper

Logshippingarchitecture

File Shipper

File Shipper

File Shipper

Logshippingarchitecture

File Shipper

File Shipper

File Shipper

CentralizedBuffer

Logshippingarchitecture

File Shipper

File Shipper

File Shipper

CentralizedBuffer

data

Logshippingarchitecture

File Shipper

File Shipper

File Shipper

CentralizedBuffer

ES ES ES

ES ES ES

ES ES ES

data

Focus:Shipper

File Shipper

File Shipper

File Shipper

CentralizedBuffer

ES ES ES

ES ES ES

ES ES ES

data

Whatabouttheshipper?

logs

CentralizedBuffer

Whichshippertouse?

Whichprotocol shouldbeused

Whataboutthebuffering

LogtoJSON orparse andhow

Buffers

performance & availability

batches&threads whencentralbufferisgone

Buffertypes

Disk ||memory ||combinedhybrid approachOnsource||centralized

App

Buffer

App

Buffer

fileorlocallogshipper

easyscaling– fewermovingpartsoftenwiththeuseoflightweightshipper

App

App

Kafka /Redis /Logstash /etc…

oneplaceforallchangesextrafeaturesmadeeasy(likeTTL)

ES

ES

BuffersSummary

Simple Reliable

App

Buffer

App

Buffer

ES

App

App

ES

Protocols

UDP– fast,coolfortheapplication,notreliableTCP – reliable(almost) applicationgetsACK whenwritten tobuffer

Application levelACKsmaybeneeded

HTTP

RELP

Beats

Kafka

Logstash,rsyslog,Fluentd

Logstash,rsyslog

Logstash,Filebeat

Logstash,rsyslog,Filebeat,Fluentd

Choosingtheshipper

application

rsyslog Elasticsearchhttp

socket

memory&diskassistedqueues

FinalArchitecture

application

rsyslog Elasticsearchhttp

socket

memory&diskassistedqueues

application

filersyslogLogagentfilebeat

consumer

FinalArchitecture

application

rsyslog Elasticsearchhttp

socket

memory&diskassistedqueues

application

file

rsyslogLogagentfilebeat

consumer

ParsingDoneHere

Focus:CentralizedBuffer

File Shipper

File Shipper

File Shipper

CentralizedBuffer

ES ES ES

ES ES ES

ES ES ES

data

WhyApacheKafka?

Fast &easytouse

Easytoscale

Faulttolerantandhighlyavailable

Supportsstreaming

Worksinpublish/subscribemode

Kafkaarchitecture

ZooKeeper

ZooKeeper

ZooKeeper

Kafka

Kafka

KafkaKafka

Kafka&topics

security_logs access_logs

app1_logs app2_logs

Kafkastoresdatain topics

writtenondisk

Kafka&topics&partitions&replicas

logspartition2

logspartition1

logspartition3

logspartition4

logsreplicapartition2

logsreplicapartition1

logsreplicapartition3

logsreplicapartition4

ScalingKafka

logspartition1

ScalingKafka

logspartition1

logspartition2

logspartition3

logspartition4

ScalingKafka

logspartition1

logspartition2

logspartition3

logspartition4

logspartition5

logspartition6

logspartition7

logspartition8

logspartition9

logspartition10

logspartition11

logspartition12

logspartition13

logspartition14

logspartition15

logspartition16

ThingstorememberwhenusingKafka

Scales byaddingmorepartitions notthreads

ThemoreIOPS thebetter

Keepthe#ofconsumersequalto#ofpartitions

Replicas usedforHA andFT only

Offsets storedperconsumer– multipledestinationseasilypossible

Focus:Elasticsearch

File Shipper

File Shipper

File Shipper

CentralizedBuffer

ES ES ES

ES ES ES

ES ES ES

data

Elasticsearchclusterarchitecture

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Dedicatedmastersplease

client

client

client

data

data

data

data

data

data

master

master

master

discovery.zen.minimum_master_nodes ->N/2+1mastereligiblenodes

ingest

ingest

ingest

Elasticsearch– Indices

Index – logicalplacefordata

Elasticsearch– Indices

Index – logicalplacefordata

Index– canbecomparedtodatabaseinDB

Elasticsearch– Indices

Index – logicalplacefordata

Index– canbecomparedtodatabaseinDB

Index– builtoutofoneormoreshards

Elasticsearch– Indices

Index – logicalplacefordata

Index– canbecomparedtodatabaseinDB

Index– builtoutofoneormoreshards

Shard – canbespreadamongmultiplenodes

ScalingElasticsearch

LogsShard1

ScalingElasticsearch

LogsShard1

UsersShard1

InvoicesShard1

ScalingElasticsearch

LogsShard1

LogsShard2

LogsShard3

LogsShard4

ScalingElasticsearch

LogsShard3

LogsShard2

LogsShard4

LogsShard1

ScalingElasticsearch

LogsShard1

LogsReplica4

LogsShard2

LogsReplica3

LogsShard4

LogsReplica1

LogsShard3

LogsReplica2

Onebigindexisano-go

Notscalableenoughfortimebaseddata

Onebigindexisano-go

Notscalableenoughfortimebaseddata

Indexingslowsdownwithtime

Onebigindexisano-go

Notscalableenoughfortimebaseddata

Indexingslowsdownwithtime

Expensivemerges

Onebigindexisano-go

Notscalableenoughfortimebaseddata

Indexingslowsdownwithtime

Expensivemerges

Delete byquery neededfordataretention

Dailyindicesareagoodstart

2017.11.16 2017.11.17 2017.11.20 2017.11.21...

Indexing isfaster forsmallerindices

Deletes arecheap

Search canbeperformedonindicesthatareneeded

Static indicesarecachefriendly

indexing

mostsearches

Dailyindicesareagoodstart

2017.11.16 2017.11.17 2017.11.20 2017.11.21...

Indexing isfaster forsmallerindices

Deletes arecheap

Search canbeperformedonindicesthatareneeded

Static indicesarecachefriendly

indexing

mostsearches

Wedelete wholeindices

Dailyindicesaresub-optimal

black

friday

saturdaysunday

loadisnoteven

Sizebasedindicesareoptimal

sizelimitforindices

logs_01

indexing

around5– 10GBpershardonAWS

Sizebasedindicesareoptimal

sizelimitforindices

logs_01

indexing

around5– 10GBpershardonAWS

Sizebasedindicesareoptimal

sizelimitforindices

logs_01

indexing

logs_02

around5– 10GBpershardonAWS

Sizebasedindicesareoptimal

sizelimitforindices

logs_01

indexing

logs_02

around5– 10GBpershardonAWS

Sizebasedindicesareoptimal

sizelimitforindices

logs_01 logs_02

indexing

logs_N...

around5– 10GBpershardonAWS

Sliceusingsize

Predictable searchingandindexingperformance

Better indicesbalancing

Fewershards

Easier handling ofspikyloads

Lesscostsbecauseofbetter hardwareutilization

ProperElasticsearchconfiguration

Keepindex.refresh_interval atmaximumpossiblevalue1sec->100%,5sec->125%,30sec-> 175%

Youcanloosen upmerges- possiblebecauseofheavyaggregationuse- segments_per_tier ->higher-max_merge_at_once->higher-max_merged_segment ->lower

Allprefixedwithindex.merge.policy

} higherindexingthroughput

ProperElasticsearchconfiguration

Index onlyneededfields

Usedocvalues

Donotindex_source

Donotstore_all

Optimizationtime

Wecanoptimize datanodesfortimebaseddata

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Hot– coldarchitecture

EShot EScold EScold

-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold

Hot– coldarchitecture

logs_2017.11.22

EShot EScold EScold

-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold

curl-XPUTlocalhost:9200/logs_2017.11.22 -d'{"settings":{"index.routing.allocation.exclude.tag":"cold","index.routing.allocation.include.tag":"hot"}}'

Hot– coldarchitecture

logs_2017.11.22

EShot EScold EScold

indexing

Hot– coldarchitecture

logs_2017.11.22logs_2017.11.23

EShot EScold EScold

indexing

Hot– coldarchitecture

logs_2017.11.22logs_2017.11.23

EShot EScold EScold

indexing

moveindexafterdayends

curl-XPUTlocalhost:9200/logs_2017.11.22/_settings-d'{"index.routing.allocation.exclude.tag":"hot","index.routing.allocation.include.tag”:"cold"

}'

Hot– coldarchitecture

logs_2017.11.23 logs_2017.11.22

EShot EScold EScold

indexing

Hot– coldarchitecture

logs_2017.11.23logs_2017.11.24 logs_2017.11.22

EShot EScold EScold

indexing

Hot– coldarchitecture

logs_2017.11.23logs_2017.11.24 logs_2017.11.22

EShot EScold EScold

indexing

moveindexafterdayends

Hot– coldarchitecture

logs_2017.11.24 logs_2017.11.22 logs_2017.11.23

EShot EScold EScold

indexing

Hot– coldarchitecture

HotESTier

GoodCPULotsofI/O

ColdESTier

MemoryboundDecentI/O

EScold

ColdESTier

MemoryboundDecentI/O

Hot– coldarchitecturesummary

EScold

Optimizecosts – differenthardwarefordifferenttier

Performance – usecaseoptimizedhardware

Isolation – longrunningsearchesdon’taffectindexing

Elasticsearchclient nodeneeds

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Elasticsearchclient nodeneeds

Nodata=noIOPS

Largequerythroughput=highCPUusage

Lotsofresults=highmemory usage

Lotsofconcurrentqueries=higherresources utilization

Elasticsearchingest nodeneeds

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Elasticsearchingestnodeneeds

Nodata=noIOPS

Largeindexthroughput=highCPU&memoryusage

Complicatedrules=highCPUusage

Largerdocuments=moreresources utilization

Elasticsearchmaster nodeneeds

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Elasticsearchingestnodeneeds

Nodata=noIOPS

Largenumberofindices=highCPU&memoryusage

Complicatedmappings=highmemoryusage

Dailyindices=spikesinresources utilization

WhataboutOS?

SayNO toswapSettherightdiskscheduler

CFQ forspinningdisksdeadline forSSD

Usepropermount optionsforext4noatimenodirtimedata=writeback,nobarier

ForbaremetalcheckCPUgovernordisabletransparenthugepages

/proc/sys/vm/nr_hugepages=0

Analysis- Kibana

Analysis- Kibana

Analysis- Kibana

Analysis- Kibana

Analysis- Kibana

Analysis- Kibana

Analysis- Kibana

Analysis- Grafana

Analysis- Grafana

Analysis- Grafana

WhereToGoFromHere?

Weareengineers!

Wedevelop DevOpstools!

WeareDevOps people!

Wedofunstuff;)http://sematext.com/jobs

Thankyouforlistening!Getintouch!

Rafałrafal.kuc@sematext.com@kucrafal

http://sematext.com@sematext http://sematext.com/jobs