Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming...
-
Upload
dataworks-summit -
Category
Technology
-
view
3.063 -
download
3
Transcript of Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming...
![Page 1: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/1.jpg)
Apache MetronA Case Study of a Modern Streaming Architecture on Hadoop
Casey Stella@casey_stella
2017
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 2: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/2.jpg)
Introduction
Hi, I’m Casey Stella!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 3: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/3.jpg)
Apache Metron: A Cybersecurity Analytics Platform
• Metron provides a scalable, advanced security analytics framework to offer acentralized tool for security monitoring and analysis.
• Metron was initiated at Cisco in 2014 as OpenSOC.• Metron was submitted to the Apache Incubator in December 2015• Metron graduated to a top level project in April 2017
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 4: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/4.jpg)
Apache Metron: A Cybersecurity Analytics Platform
• Metron provides a scalable, advanced security analytics framework to offer acentralized tool for security monitoring and analysis.
• Metron was initiated at Cisco in 2014 as OpenSOC.
• Metron was submitted to the Apache Incubator in December 2015• Metron graduated to a top level project in April 2017
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 5: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/5.jpg)
Apache Metron: A Cybersecurity Analytics Platform
• Metron provides a scalable, advanced security analytics framework to offer acentralized tool for security monitoring and analysis.
• Metron was initiated at Cisco in 2014 as OpenSOC.• Metron was submitted to the Apache Incubator in December 2015
• Metron graduated to a top level project in April 2017
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 6: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/6.jpg)
Apache Metron: A Cybersecurity Analytics Platform
• Metron provides a scalable, advanced security analytics framework to offer acentralized tool for security monitoring and analysis.
• Metron was initiated at Cisco in 2014 as OpenSOC.• Metron was submitted to the Apache Incubator in December 2015• Metron graduated to a top level project in April 2017
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 7: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/7.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale
◦ Kafka provides a unified data bus◦ Storm providing a distributed streaming framework◦ HBase provides a low latency key/value lookup store for enrichments and profiles◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably◦ New enrichments can be done live on running topologies without restart◦ New enrichment capabilities can be added via user defined functions◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 8: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/8.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale◦ Kafka provides a unified data bus
◦ Storm providing a distributed streaming framework◦ HBase provides a low latency key/value lookup store for enrichments and profiles◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably◦ New enrichments can be done live on running topologies without restart◦ New enrichment capabilities can be added via user defined functions◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 9: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/9.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale◦ Kafka provides a unified data bus◦ Storm providing a distributed streaming framework
◦ HBase provides a low latency key/value lookup store for enrichments and profiles◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably◦ New enrichments can be done live on running topologies without restart◦ New enrichment capabilities can be added via user defined functions◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 10: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/10.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale◦ Kafka provides a unified data bus◦ Storm providing a distributed streaming framework◦ HBase provides a low latency key/value lookup store for enrichments and profiles
◦ Zookeeper provides a distributed configuration store• Ingested network telemetry can be enriched pluggably
◦ New enrichments can be done live on running topologies without restart◦ New enrichment capabilities can be added via user defined functions◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 11: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/11.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale◦ Kafka provides a unified data bus◦ Storm providing a distributed streaming framework◦ HBase provides a low latency key/value lookup store for enrichments and profiles◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably◦ New enrichments can be done live on running topologies without restart◦ New enrichment capabilities can be added via user defined functions◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 12: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/12.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale◦ Kafka provides a unified data bus◦ Storm providing a distributed streaming framework◦ HBase provides a low latency key/value lookup store for enrichments and profiles◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably
◦ New enrichments can be done live on running topologies without restart◦ New enrichment capabilities can be added via user defined functions◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 13: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/13.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale◦ Kafka provides a unified data bus◦ Storm providing a distributed streaming framework◦ HBase provides a low latency key/value lookup store for enrichments and profiles◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably◦ New enrichments can be done live on running topologies without restart
◦ New enrichment capabilities can be added via user defined functions◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 14: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/14.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale◦ Kafka provides a unified data bus◦ Storm providing a distributed streaming framework◦ HBase provides a low latency key/value lookup store for enrichments and profiles◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably◦ New enrichments can be done live on running topologies without restart◦ New enrichment capabilities can be added via user defined functions
◦ Enrichments can be composed through a domain specific language called Stellar• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 15: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/15.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale◦ Kafka provides a unified data bus◦ Storm providing a distributed streaming framework◦ HBase provides a low latency key/value lookup store for enrichments and profiles◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably◦ New enrichments can be done live on running topologies without restart◦ New enrichment capabilities can be added via user defined functions◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 16: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/16.jpg)
Characteristics of Metron
• Metron is built atop the Apache Hadoop ecosystem handle capturing, ingesting,enriching and storing streaming data at scale◦ Kafka provides a unified data bus◦ Storm providing a distributed streaming framework◦ HBase provides a low latency key/value lookup store for enrichments and profiles◦ Zookeeper provides a distributed configuration store
• Ingested network telemetry can be enriched pluggably◦ New enrichments can be done live on running topologies without restart◦ New enrichment capabilities can be added via user defined functions◦ Enrichments can be composed through a domain specific language called Stellar
• Data stored in HBase can be the source of enrichments
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 17: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/17.jpg)
Characteristics of Metron
• Enriched telemetry can be indexed into a Security data lake◦ Indexes supported are pluggable and include HDFS, Solr and Elasticsearch
• Advanced analytics can be done on streaming data
◦ Probabalistic data structures (e.g. sketches) can sketch streaming data across time andenable approximate distribution, set existence and distinct count queries
◦ Models can be deployed using Yarn, autodiscovered via Zookeeper and interrogated viaStellar functions
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 18: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/18.jpg)
Characteristics of Metron
• Enriched telemetry can be indexed into a Security data lake◦ Indexes supported are pluggable and include HDFS, Solr and Elasticsearch
• Advanced analytics can be done on streaming data◦ Probabalistic data structures (e.g. sketches) can sketch streaming data across time and
enable approximate distribution, set existence and distinct count queries
◦ Models can be deployed using Yarn, autodiscovered via Zookeeper and interrogated viaStellar functions
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 19: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/19.jpg)
Characteristics of Metron
• Enriched telemetry can be indexed into a Security data lake◦ Indexes supported are pluggable and include HDFS, Solr and Elasticsearch
• Advanced analytics can be done on streaming data◦ Probabalistic data structures (e.g. sketches) can sketch streaming data across time and
enable approximate distribution, set existence and distinct count queries◦ Models can be deployed using Yarn, autodiscovered via Zookeeper and interrogated via
Stellar functions
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 20: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/20.jpg)
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich andtransform streaming data. Out of this need, we created Stellar:
• Interact with the various enabling Hadoop components in a unified manner• Compose a rich set of built-in functions with user defined functions• Provide simple primitives around the functions: boolean operations, conditionals,numerical computation.
Think of Stellar as Excel functions that we can run on streaming data.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 21: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/21.jpg)
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich andtransform streaming data. Out of this need, we created Stellar:• Interact with the various enabling Hadoop components in a unified manner
• Compose a rich set of built-in functions with user defined functions• Provide simple primitives around the functions: boolean operations, conditionals,numerical computation.
Think of Stellar as Excel functions that we can run on streaming data.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 22: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/22.jpg)
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich andtransform streaming data. Out of this need, we created Stellar:• Interact with the various enabling Hadoop components in a unified manner• Compose a rich set of built-in functions with user defined functions
• Provide simple primitives around the functions: boolean operations, conditionals,numerical computation.
Think of Stellar as Excel functions that we can run on streaming data.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 23: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/23.jpg)
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich andtransform streaming data. Out of this need, we created Stellar:• Interact with the various enabling Hadoop components in a unified manner• Compose a rich set of built-in functions with user defined functions• Provide simple primitives around the functions: boolean operations, conditionals,numerical computation.
Think of Stellar as Excel functions that we can run on streaming data.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 24: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/24.jpg)
Stellar
Metron needed the ability to allow users to pluggably and consistently enrich andtransform streaming data. Out of this need, we created Stellar:• Interact with the various enabling Hadoop components in a unified manner• Compose a rich set of built-in functions with user defined functions• Provide simple primitives around the functions: boolean operations, conditionals,numerical computation.
Think of Stellar as Excel functions that we can run on streaming data.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 25: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/25.jpg)
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.
• Each telemetry source is ingested into kafka• A Storm parser topology is used to convert the raw telemetry format to anormalized JSON Map◦ Common network-related raw telemetry formats◦ Generic formats such as CSV and JSON◦ Specifying the parser via Grok◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 26: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/26.jpg)
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.• Each telemetry source is ingested into kafka
• A Storm parser topology is used to convert the raw telemetry format to anormalized JSON Map◦ Common network-related raw telemetry formats◦ Generic formats such as CSV and JSON◦ Specifying the parser via Grok◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 27: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/27.jpg)
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.• Each telemetry source is ingested into kafka• A Storm parser topology is used to convert the raw telemetry format to anormalized JSON Map
◦ Common network-related raw telemetry formats◦ Generic formats such as CSV and JSON◦ Specifying the parser via Grok◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 28: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/28.jpg)
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.• Each telemetry source is ingested into kafka• A Storm parser topology is used to convert the raw telemetry format to anormalized JSON Map◦ Common network-related raw telemetry formats
◦ Generic formats such as CSV and JSON◦ Specifying the parser via Grok◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 29: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/29.jpg)
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.• Each telemetry source is ingested into kafka• A Storm parser topology is used to convert the raw telemetry format to anormalized JSON Map◦ Common network-related raw telemetry formats◦ Generic formats such as CSV and JSON
◦ Specifying the parser via Grok◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 30: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/30.jpg)
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.• Each telemetry source is ingested into kafka• A Storm parser topology is used to convert the raw telemetry format to anormalized JSON Map◦ Common network-related raw telemetry formats◦ Generic formats such as CSV and JSON◦ Specifying the parser via Grok
◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 31: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/31.jpg)
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.• Each telemetry source is ingested into kafka• A Storm parser topology is used to convert the raw telemetry format to anormalized JSON Map◦ Common network-related raw telemetry formats◦ Generic formats such as CSV and JSON◦ Specifying the parser via Grok◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 32: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/32.jpg)
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.• Each telemetry source is ingested into kafka• A Storm parser topology is used to convert the raw telemetry format to anormalized JSON Map◦ Common network-related raw telemetry formats◦ Generic formats such as CSV and JSON◦ Specifying the parser via Grok◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements
• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 33: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/33.jpg)
Data Ingest: Parsers
• Telemetry data comes in a variety of formats and velocities.• Each telemetry source is ingested into kafka• A Storm parser topology is used to convert the raw telemetry format to anormalized JSON Map◦ Common network-related raw telemetry formats◦ Generic formats such as CSV and JSON◦ Specifying the parser via Grok◦ Creating your own parser in a JVM-based language
• Transformations and normalizations can be done post-parse via Stellar statements• The normalized data across all telemetries is written to an enrichment kafka topic
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 34: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/34.jpg)
Data Ingest: Parsers
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 35: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/35.jpg)
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allowsusers to enrich the messages with broader context
◦ Enriching with reference data ingested into HBase◦ Enriching via arbitrary Stellar expressions◦ Enriching with Geolocation data
• Enrichment is split into two phases◦ Preparatory Enrichment◦ Threat Intelligence Enrichment with risk triage
• If messages are marked with an is_alert field, they can have a triage scorecomputed via Stellar which defines their priority as threats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 36: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/36.jpg)
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allowsusers to enrich the messages with broader context◦ Enriching with reference data ingested into HBase
◦ Enriching via arbitrary Stellar expressions◦ Enriching with Geolocation data
• Enrichment is split into two phases◦ Preparatory Enrichment◦ Threat Intelligence Enrichment with risk triage
• If messages are marked with an is_alert field, they can have a triage scorecomputed via Stellar which defines their priority as threats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 37: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/37.jpg)
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allowsusers to enrich the messages with broader context◦ Enriching with reference data ingested into HBase◦ Enriching via arbitrary Stellar expressions
◦ Enriching with Geolocation data• Enrichment is split into two phases
◦ Preparatory Enrichment◦ Threat Intelligence Enrichment with risk triage
• If messages are marked with an is_alert field, they can have a triage scorecomputed via Stellar which defines their priority as threats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 38: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/38.jpg)
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allowsusers to enrich the messages with broader context◦ Enriching with reference data ingested into HBase◦ Enriching via arbitrary Stellar expressions◦ Enriching with Geolocation data
• Enrichment is split into two phases◦ Preparatory Enrichment◦ Threat Intelligence Enrichment with risk triage
• If messages are marked with an is_alert field, they can have a triage scorecomputed via Stellar which defines their priority as threats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 39: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/39.jpg)
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allowsusers to enrich the messages with broader context◦ Enriching with reference data ingested into HBase◦ Enriching via arbitrary Stellar expressions◦ Enriching with Geolocation data
• Enrichment is split into two phases
◦ Preparatory Enrichment◦ Threat Intelligence Enrichment with risk triage
• If messages are marked with an is_alert field, they can have a triage scorecomputed via Stellar which defines their priority as threats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 40: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/40.jpg)
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allowsusers to enrich the messages with broader context◦ Enriching with reference data ingested into HBase◦ Enriching via arbitrary Stellar expressions◦ Enriching with Geolocation data
• Enrichment is split into two phases◦ Preparatory Enrichment
◦ Threat Intelligence Enrichment with risk triage
• If messages are marked with an is_alert field, they can have a triage scorecomputed via Stellar which defines their priority as threats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 41: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/41.jpg)
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allowsusers to enrich the messages with broader context◦ Enriching with reference data ingested into HBase◦ Enriching via arbitrary Stellar expressions◦ Enriching with Geolocation data
• Enrichment is split into two phases◦ Preparatory Enrichment◦ Threat Intelligence Enrichment with risk triage
• If messages are marked with an is_alert field, they can have a triage scorecomputed via Stellar which defines their priority as threats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 42: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/42.jpg)
Enrichment
• The enrichment topology takes the various normalized telemetry sources and allowsusers to enrich the messages with broader context◦ Enriching with reference data ingested into HBase◦ Enriching via arbitrary Stellar expressions◦ Enriching with Geolocation data
• Enrichment is split into two phases◦ Preparatory Enrichment◦ Threat Intelligence Enrichment with risk triage
• If messages are marked with an is_alert field, they can have a triage scorecomputed via Stellar which defines their priority as threats
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 43: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/43.jpg)
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.• User defined enrichment functions can be enabled through adding a jarimplementing the function to HDFS.
• Stellar enrichments can be executed asynchronously across storm workers and theirresults joined together
Stellar functions are provided for, among others,• Interrogating ML models• Reading reference data from HBase• Reading historical profiles
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 44: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/44.jpg)
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.• User defined enrichment functions can be enabled through adding a jarimplementing the function to HDFS.
• Stellar enrichments can be executed asynchronously across storm workers and theirresults joined together
Stellar functions are provided for, among others,• Interrogating ML models• Reading reference data from HBase• Reading historical profiles
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 45: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/45.jpg)
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.• User defined enrichment functions can be enabled through adding a jarimplementing the function to HDFS.
• Stellar enrichments can be executed asynchronously across storm workers and theirresults joined together
Stellar functions are provided for, among others,• Interrogating ML models
• Reading reference data from HBase• Reading historical profiles
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 46: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/46.jpg)
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.• User defined enrichment functions can be enabled through adding a jarimplementing the function to HDFS.
• Stellar enrichments can be executed asynchronously across storm workers and theirresults joined together
Stellar functions are provided for, among others,• Interrogating ML models• Reading reference data from HBase
• Reading historical profiles
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 47: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/47.jpg)
Enrichment: Stellar
Stellar is the primary method for enrichment in Metron.• User defined enrichment functions can be enabled through adding a jarimplementing the function to HDFS.
• Stellar enrichments can be executed asynchronously across storm workers and theirresults joined together
Stellar functions are provided for, among others,• Interrogating ML models• Reading reference data from HBase• Reading historical profiles
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 48: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/48.jpg)
Enrichment
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 49: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/49.jpg)
Indexing
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 50: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/50.jpg)
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.
• This is insufficient for a number of scenarios◦ Correlating between different sources◦ Making judgments based on past events◦ Both at the same time
• Operating across multiple sources has scalability implications• Waiting on the data you want from the other stream isn’t plausible• Sometimes you want to change your mind about how much history you want to referto
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 51: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/51.jpg)
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.• This is insufficient for a number of scenarios
◦ Correlating between different sources◦ Making judgments based on past events◦ Both at the same time
• Operating across multiple sources has scalability implications• Waiting on the data you want from the other stream isn’t plausible• Sometimes you want to change your mind about how much history you want to referto
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 52: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/52.jpg)
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.• This is insufficient for a number of scenarios
◦ Correlating between different sources
◦ Making judgments based on past events◦ Both at the same time
• Operating across multiple sources has scalability implications• Waiting on the data you want from the other stream isn’t plausible• Sometimes you want to change your mind about how much history you want to referto
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 53: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/53.jpg)
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.• This is insufficient for a number of scenarios
◦ Correlating between different sources◦ Making judgments based on past events
◦ Both at the same time
• Operating across multiple sources has scalability implications• Waiting on the data you want from the other stream isn’t plausible• Sometimes you want to change your mind about how much history you want to referto
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 54: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/54.jpg)
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.• This is insufficient for a number of scenarios
◦ Correlating between different sources◦ Making judgments based on past events◦ Both at the same time
• Operating across multiple sources has scalability implications• Waiting on the data you want from the other stream isn’t plausible• Sometimes you want to change your mind about how much history you want to referto
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 55: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/55.jpg)
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.• This is insufficient for a number of scenarios
◦ Correlating between different sources◦ Making judgments based on past events◦ Both at the same time
• Operating across multiple sources has scalability implications
• Waiting on the data you want from the other stream isn’t plausible• Sometimes you want to change your mind about how much history you want to referto
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 56: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/56.jpg)
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.• This is insufficient for a number of scenarios
◦ Correlating between different sources◦ Making judgments based on past events◦ Both at the same time
• Operating across multiple sources has scalability implications• Waiting on the data you want from the other stream isn’t plausible
• Sometimes you want to change your mind about how much history you want to referto
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 57: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/57.jpg)
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.• This is insufficient for a number of scenarios
◦ Correlating between different sources◦ Making judgments based on past events◦ Both at the same time
• Operating across multiple sources has scalability implications• Waiting on the data you want from the other stream isn’t plausible• Sometimes you want to change your mind about how much history you want to referto
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 58: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/58.jpg)
Profiler: Motivation
• Enrichments and parsers operate within the context of a single message.• This is insufficient for a number of scenarios
◦ Correlating between different sources◦ Making judgments based on past events◦ Both at the same time
• Operating across multiple sources has scalability implications• Waiting on the data you want from the other stream isn’t plausible• Sometimes you want to change your mind about how much history you want to referto
• In cyber security, advanced actors wait for months, so you need data months back!
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 59: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/59.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records
• Windows should be able to be specified very flexibly to avoid seasonal aberrations.• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase◦ Uses Stellar to define how to aggregate data◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used• This enables things like
◦ Temporal outlier detection◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 60: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/60.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records• Windows should be able to be specified very flexibly to avoid seasonal aberrations.
• The Profiler is a storm topology that takes the enriched data◦ Capture aggregations of data in a window to HBase◦ Uses Stellar to define how to aggregate data◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used• This enables things like
◦ Temporal outlier detection◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 61: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/61.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records• Windows should be able to be specified very flexibly to avoid seasonal aberrations.• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase
◦ Uses Stellar to define how to aggregate data◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used• This enables things like
◦ Temporal outlier detection◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 62: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/62.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records• Windows should be able to be specified very flexibly to avoid seasonal aberrations.• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase◦ Uses Stellar to define how to aggregate data
◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used• This enables things like
◦ Temporal outlier detection◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 63: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/63.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records• Windows should be able to be specified very flexibly to avoid seasonal aberrations.• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase◦ Uses Stellar to define how to aggregate data◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used• This enables things like
◦ Temporal outlier detection◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 64: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/64.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records• Windows should be able to be specified very flexibly to avoid seasonal aberrations.• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase◦ Uses Stellar to define how to aggregate data◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used
• This enables things like◦ Temporal outlier detection◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 65: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/65.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records• Windows should be able to be specified very flexibly to avoid seasonal aberrations.• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase◦ Uses Stellar to define how to aggregate data◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used• This enables things like
◦ Temporal outlier detection◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 66: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/66.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records• Windows should be able to be specified very flexibly to avoid seasonal aberrations.• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase◦ Uses Stellar to define how to aggregate data◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used• This enables things like
◦ Temporal outlier detection
◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 67: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/67.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records• Windows should be able to be specified very flexibly to avoid seasonal aberrations.• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase◦ Uses Stellar to define how to aggregate data◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used• This enables things like
◦ Temporal outlier detection◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 68: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/68.jpg)
Profiler: Solution
• Compromise: Operate on windows in time rather than individual records• Windows should be able to be specified very flexibly to avoid seasonal aberrations.• The Profiler is a storm topology that takes the enriched data
◦ Capture aggregations of data in a window to HBase◦ Uses Stellar to define how to aggregate data◦ Uses Stellar to define a filter on which messages to consider
• These aggregations can be read anywhere Stellar is used• This enables things like
◦ Temporal outlier detection◦ Historical context from other sources when building threat triage rules
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 69: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/69.jpg)
Profiler: Aggregations
• Example aggregations:
◦ Distributional statistics: median, mean, percentile◦ Set operations: Contains, Cardinality◦ Simple counts
• Aggregations challenges◦ May be big objects if naively done (e.g. set operations)◦ May not be able to be merged across time (e.g. distributions)◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures◦ Set operations: Bloom Filters, HyperLogLog approximations◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 70: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/70.jpg)
Profiler: Aggregations
• Example aggregations:◦ Distributional statistics: median, mean, percentile
◦ Set operations: Contains, Cardinality◦ Simple counts
• Aggregations challenges◦ May be big objects if naively done (e.g. set operations)◦ May not be able to be merged across time (e.g. distributions)◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures◦ Set operations: Bloom Filters, HyperLogLog approximations◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 71: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/71.jpg)
Profiler: Aggregations
• Example aggregations:◦ Distributional statistics: median, mean, percentile◦ Set operations: Contains, Cardinality
◦ Simple counts• Aggregations challenges
◦ May be big objects if naively done (e.g. set operations)◦ May not be able to be merged across time (e.g. distributions)◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures◦ Set operations: Bloom Filters, HyperLogLog approximations◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 72: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/72.jpg)
Profiler: Aggregations
• Example aggregations:◦ Distributional statistics: median, mean, percentile◦ Set operations: Contains, Cardinality◦ Simple counts
• Aggregations challenges◦ May be big objects if naively done (e.g. set operations)◦ May not be able to be merged across time (e.g. distributions)◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures◦ Set operations: Bloom Filters, HyperLogLog approximations◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 73: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/73.jpg)
Profiler: Aggregations
• Example aggregations:◦ Distributional statistics: median, mean, percentile◦ Set operations: Contains, Cardinality◦ Simple counts
• Aggregations challenges
◦ May be big objects if naively done (e.g. set operations)◦ May not be able to be merged across time (e.g. distributions)◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures◦ Set operations: Bloom Filters, HyperLogLog approximations◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 74: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/74.jpg)
Profiler: Aggregations
• Example aggregations:◦ Distributional statistics: median, mean, percentile◦ Set operations: Contains, Cardinality◦ Simple counts
• Aggregations challenges◦ May be big objects if naively done (e.g. set operations)
◦ May not be able to be merged across time (e.g. distributions)◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures◦ Set operations: Bloom Filters, HyperLogLog approximations◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 75: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/75.jpg)
Profiler: Aggregations
• Example aggregations:◦ Distributional statistics: median, mean, percentile◦ Set operations: Contains, Cardinality◦ Simple counts
• Aggregations challenges◦ May be big objects if naively done (e.g. set operations)◦ May not be able to be merged across time (e.g. distributions)
◦ Profile reading should be decoupled from writing• Approach: Use approximate data structures
◦ Set operations: Bloom Filters, HyperLogLog approximations◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 76: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/76.jpg)
Profiler: Aggregations
• Example aggregations:◦ Distributional statistics: median, mean, percentile◦ Set operations: Contains, Cardinality◦ Simple counts
• Aggregations challenges◦ May be big objects if naively done (e.g. set operations)◦ May not be able to be merged across time (e.g. distributions)◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures◦ Set operations: Bloom Filters, HyperLogLog approximations◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 77: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/77.jpg)
Profiler: Aggregations
• Example aggregations:◦ Distributional statistics: median, mean, percentile◦ Set operations: Contains, Cardinality◦ Simple counts
• Aggregations challenges◦ May be big objects if naively done (e.g. set operations)◦ May not be able to be merged across time (e.g. distributions)◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures◦ Set operations: Bloom Filters, HyperLogLog approximations
◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 78: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/78.jpg)
Profiler: Aggregations
• Example aggregations:◦ Distributional statistics: median, mean, percentile◦ Set operations: Contains, Cardinality◦ Simple counts
• Aggregations challenges◦ May be big objects if naively done (e.g. set operations)◦ May not be able to be merged across time (e.g. distributions)◦ Profile reading should be decoupled from writing
• Approach: Use approximate data structures◦ Set operations: Bloom Filters, HyperLogLog approximations◦ Distributional Statistics: t-digests
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 79: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/79.jpg)
Demo
• Los Alamos National Lab released an open data set representing 58 consecutive daysof de-identified event data collected from five sources within Los Alamos NationalLaboratory’s corporate, internal computer network.
• Among other telemetry sources, authentication logs and a set of well-defined redteaming events that present bad behavior within the 58 days are provided.
• We will look at the authentication logs around a breach and show how we can useMetron to pick out offending user’s activity leading up to the event◦ Look for users who are attempting to authenticate to many distinct hosts more than 5
standard deviations from the median across all users.
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 80: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/80.jpg)
"profile": "attempts_by_user","foreach": "user","onlyif": "source.type == 'auth'","init" : {
"total" : "HLLP_INIT(5,6)"},
"update": {"total" : "HLLP_ADD(total, ip_dst_addr)"
},"result" : {
"profile" : "total","triage" : {
"total_count" : "HLLP_CARDINALITY(total)"}
}
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 81: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/81.jpg)
"profile": "auth_distribution","foreach": "'global'","onlyif": "source.type == 'profiler' && profile == 'attempts_by_user'","init" : {
"s" : "STATS_INIT()"},
"update": {"s" : "STATS_ADD(s, total_count)"
},"result": "s"
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 82: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/82.jpg)
window := PROFILE_WINDOW('...')profile := PROFILE_GET('attempts_by_user', user, window)distinct_auth_attempts := HLLP_CARDINALITY(GET_LAST(profile))distribution_profile := PROFILE_GET('auth_distribution', 'global', window)stats := STATS_MERGE(distribution_profile)distinct_auth_attempts_median := STATS_PERCENTILE(stats, 0.5)distinct_auth_attempts_stddev := STATS_SD(stats)
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017
![Page 83: Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Modern Streaming Architecture on Hadoop](https://reader034.fdocuments.us/reader034/viewer/2022050613/5a65159a7f8b9ad5198b4b35/html5/thumbnails/83.jpg)
Questions
Thanks for your attention! Don’t forget to come to the cybersecurity Bird of a Feathersession Thursday.• Find me at http://caseystella.com• Twitter handle: @casey_stella• Email address: [email protected]
Casey Stella@casey_stella (Hortonworks) Apache Metron 2017