In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring...
Transcript of In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring...
![Page 1: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/1.jpg)
Best Practices for Monitoring Distributed In-Memory ComputingDenis MekhanikovJuly 31, 2019
2019 © GridGain Systems
![Page 2: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/2.jpg)
2019 © GridGain Systems2019 © GridGain Systems
What communication with GridGain support often looks like
Customer: The cluster is hanging.
GG: Please send logs.
Customer: We don’t have logs.
GG: Did you take thread dumps?
Customer: Nope.
GG: The problem is probably in GC.
What is the memory consumption level?
Customer: ...
2019 © GridGain Systems2
![Page 3: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/3.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Why should we monitor?
3
• Check if everything is fine• Prevent upcoming issues• Discover and react to the issues that
already happened
• Find a reason for an issue and prevent it from happening again
Dashboarding
Logging
![Page 4: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/4.jpg)
2019 © GridGain Systems2019 © GridGain Systems
What to monitor?
4
• Every node in isolation• Connection between nodes• System as a whole
![Page 5: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/5.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Every node is...
5
• Hardware (hypervisor)• Operating System• Virtual machine• Application
![Page 6: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/6.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Hardware / Hypervisor / OS
6
• CPU• Memory• Disk• System logs • Cloud Provider’s logs
![Page 7: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/7.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Network
7
• Ping monitoring• Network hardware monitoring
TCP dumps
![Page 8: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/8.jpg)
2019 © GridGain Systems2019 © GridGain Systems
JVM
8
GC logs• JMX
Java Flight RecorderThread DumpsHeap Dumps
● java -XX:+HeapDumpOnOutOfMemoryError ...
![Page 9: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/9.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Application
9
• Logs• JMX• Throughput / Latency• Test queries
![Page 10: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/10.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Tools
10
![Page 11: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/11.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Tools
11
Metrics
![Page 12: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/12.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Tools
12
Logs
![Page 13: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/13.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Tools
13
JVM
MAT
Java Flight Recorder
![Page 14: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/14.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Tools
14
Network
![Page 15: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/15.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Tools
15
Benchmarking
![Page 16: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/16.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Tools
16
![Page 17: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/17.jpg)
2019 © GridGain Systems
GridGain
17
![Page 18: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/18.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain
18
OS
JVM
GridGain
Hardware
![Page 19: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/19.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: Cache Metrics
19
CacheMetricsMXBean• CacheGets• AverageGetTime• AverageTxCommitTime• ...
CacheGroupMetricsMXBean• LocalNodeMovingPartitionsCount• ClusterMovingPartitionsCount• ClusterOwningPartitionsCount• ...
![Page 20: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/20.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: Cache Metrics
20
How to enable cache metrics
CacheConfiguration<K, V> cacheCfg = new CacheConfiguration<>("cache");
// Enable metrics.cacheCfg.setStatisticsEnabled(true);
ignite.createCache(cacheCfg);
![Page 21: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/21.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: Discovery and Communication
21
TcpDiscoverySpiMBean• MessageWorkerQueueSize• AvgMessageProcessingTime• Coordinator• NodesFailed• ...
TcpCommunicationSpiMBean• OutboundMessagesQueueSize• SentMessagesCount• ReceivedMessagesCount• ...
![Page 22: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/22.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: Data Storage
22
Ram
Disk
WAL
![Page 23: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/23.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: Data Storage Metrics
23
Data volume
DataStorageMetricsMXBean• WalTotalSize• TotalAllocatedSize• OffheapUsedSize• ...
DataRegionMetricsMXBean• TotalAllocatedPages• AllocationRate• PagesFillFactor• ...
![Page 24: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/24.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: Data Storage Metrics
24
Checkpoints
DataStorageMetricsMXBean• DirtyPages• CheckpointTotalTime• LastCheckpointDuration• UsedCheckpointBufferSize• LastCheckpointPagesWriteDuration• LastCheckpointMarkDuration• LastCheckpointTotalPagesNumber• ...
Checkpoint marker
Ram
Disk
WAL
![Page 25: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/25.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: Data Storage Metrics
25
Page replacement
DataRegionMetricsMXBean• PagesReplaceRate• PagesReplaceAge• PagesReplaced
Ram
Disk
R/W
![Page 26: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/26.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: Data Storage Metrics
26
How to enable data storage metrics
DataStorageConfiguration storageCfg = new DataStorageConfiguration();DataRegionConfiguration regionCfg = new DataRegionConfiguration();regionCfg.setName("myDataRegion");
// Enable metrics.storageCfg.setMetricsEnabled(true); // Metrics for data storage.regionCfg.setMetricsEnabled(true); // Metrics for a particular data region.
storageCfg.setDataRegionConfigurations(regionCfg);
![Page 27: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/27.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: IO metrics
27
Coming in 2.8
IoStatisticsMetricsMXBean• CacheGroupLogicalReads• CacheGroupPhysicalReads• IndexLogicalReads• IndexPhysicalReads• ...
![Page 28: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/28.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain: WebConsole
28
![Page 29: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/29.jpg)
2019 © GridGain Systems2019 © GridGain Systems
GridGain Monitoring
29
Demo
![Page 30: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/30.jpg)
2019 © GridGain Systems2019 © GridGain Systems
Checklist for monitoring
30
• CPU / Memory / Disk / Network• GC logs• Application logs
+ Problematic places specific to your setup
![Page 31: In-Memory Computing Monitoring Distributed Best Practices for · Best Practices for Monitoring Distributed In-Memory Computing Denis Mekhanikov July 31, 2019 2019 © GridGain Systems](https://reader033.fdocuments.us/reader033/viewer/2022050408/5f8539314d4b80029c4510a9/html5/thumbnails/31.jpg)
2019 © GridGain Systems
Q&A
31
https://github.com/dmekhanikov/ignite-elk/
https://console.gridgain.com/ [email protected]