Apache Metron: Community Driven Cyber Security

45
Apache Metron: Community Driven Cyber Security Ned Shawa & Laurence Da Luz Hadoop Summit Melbourne - 2016

Transcript of Apache Metron: Community Driven Cyber Security

Page 1: Apache Metron: Community Driven Cyber Security

Apache Metron:Community Driven Cyber SecurityNed Shawa & Laurence Da Luz

Hadoop Summit Melbourne - 2016

Page 2: Apache Metron: Community Driven Cyber Security

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Metron Introduction

User Personas & Key Functional Themes

Capabilities and Architecture

Building a Use Case in Metron

Questions

Page 3: Apache Metron: Community Driven Cyber Security

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Metron Introduction

Page 4: Apache Metron: Community Driven Cyber Security

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Metron Vision

“Apache Metron is a Security Data Analytics Platform (SDAP). As a next

generation security analytics framework, it is designed to consume

and monitor network traffic and machine data within an enterprise

environment. Apache Metron is extensible and is designed to work at a massive scale. It is not a SIEM but

rather the next evolution of a SIEM.”

Page 5: Apache Metron: Community Driven Cyber Security

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Cyber Security – Today’s Enterprise Threat

Organizations have recently become targets of complex cyber security breeches that could have been easily prevented

Cyber attacks continuously become more advanced and go un-detected using traditional IT security policies and procedures

Cyber Security attacks have increased in visibility and targeted organizations with millions of customers – costing millions in privacy damages

Recent cyber security attacks have been led by highly skilled technical organizations where the attack could have been prevented by known solutions

62 % - Increase in Cyber Security Breaches since 2013

8 months – Average time an advanced security breach goes unnoticed

3 Trillion – Total cost of Cyber Security breaches

1 in 3 – Security professionals are not familiar with cyber security threats

2014 ISACA

Page 6: Apache Metron: Community Driven Cyber Security

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Metron – Community Driven Cyber Security

Security Data Lake

Enriched 360 Correlated Searchable Discoverable

Threat Intelligence

3rd Party Feeds Static Rules ML Models IOC Sharing

Pluggable Framework

Parsers Enrichers Threat IntelUI Widgets

SecurityApplication

PCAP Replay Evidence Store Hunting Platform

Apache Metron

Hortonworks and the Apache Metron Community are focused on delivering the next generation cyber security

platform to enable organizations to enrich and analyze all data within their enterprise

Page 7: Apache Metron: Community Driven Cyber Security

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Metron – How We Got Here

Page 8: Apache Metron: Community Driven Cyber Security

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Metron – Who’s Involved

Page 9: Apache Metron: Community Driven Cyber Security

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Metron – Capabilities Overview

Real-Time Security Stream Processing Pipeline

Page 10: Apache Metron: Community Driven Cyber Security

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

User Personas & Functional Themes

Page 11: Apache Metron: Community Driven Cyber Security

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron User Personas

Page 12: Apache Metron: Community Driven Cyber Security

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron’s Key Functional Themes

PlatformWork done to harden the platform for performance, scale, extensibility and maintainability. This also includes capabilities around provisioning, managing and monitoring the application.

Set of Data Sources that Metron provides capabilities to stream, ingest and parse into the platform.

A set of Storm Topologies to perform various actions in real-time including: normalization of telemetry data, enrichment, cross reference with threat intel feeds, alerting, indexing, and persisting into Historical stores

Data Collection

Data Processing

Data/Integration ServicesPortals/UI Set of portal, dashboard and user interfaces for the different personas.

Page 13: Apache Metron: Community Driven Cyber Security

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data CollectionSource Systems Message Queue Stream Process and Enrichment Data Access

Network Traffic

SSH

System Log

HTTP(S)

File System

email Flume

PCAP

NiFi

FlumeKafka

NiFi processor

NiFi processor

NiFi processor

NiFi processor

NiFi processor

NiFi processor

PCAP Topic

Email Topic

SSH Topic

SysLog Topic

HTTP Topic

DPI Topic

FlumeStorm & Spark

PCAP Topology

Email Topology

SSH Topology

SysLog Topology

HTTP Topology

DPI Topology

Hive

Solr

HBase

Raw Data (Historical)

Data Index

PCAP Data

Ability to ingest and process over 1.2 million events per secondApache Metron Logical Architecture

Page 14: Apache Metron: Community Driven Cyber Security

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Capabilities and ArchitectureApache Metron 0.2

Page 15: Apache Metron: Community Driven Cyber Security

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron 0.2 Streaming and Enrichment

Page 16: Apache Metron: Community Driven Cyber Security

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron 0.2 Data Ingestion Architecture

Page 17: Apache Metron: Community Driven Cyber Security

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key Points:• Each New Telemetry Data Source will have its own Parser Topology• Two types of Parsers available in TP2: Grok and Java

Metron 0.2 Parsing / Normalization Topology

Page 18: Apache Metron: Community Driven Cyber Security

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron 0.2 Parser Types

Metron parser:– Input: Read native format data from Kafka topic– Output: Normalized Metron JSON Object

Grok Parser– Suitable for structured or semi-structured logs– Regex-like syntax with pre-defined mappings (less effort to implement)– Good for lower volumes of data

Java Parser– Requires custom code (more effort to implement)– Good for higher volumes of data

Page 19: Apache Metron: Community Driven Cyber Security

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron 0.2 Enrichment Topology

Page 20: Apache Metron: Community Driven Cyber Security

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron 0.2 Enrichment Topology

Enrich Add additional information to raw source during streaming

In-built Geo enrichment (IP to coordinates + City/State/Country)

Streaming Allows ML models to score in real-time instead of batch

Threat Intel Flag alerts against intel feed & determine triage

Page 21: Apache Metron: Community Driven Cyber Security

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Stellar Framework

What is it?– Powerful framework that provides a custom DSL that is used across different Metron components for querying,

transformation and configuring rules.

Why do we Need it?– For a variety of components we have the need to determine if a condition is true and if so perform some action.– For those purposes, this framework provides the DSL to create those conditions and execute a set of action.

How is Stellar Used within Metron today?1. Filtering, transformations and validations in parser topologies2. Threat Triage - allocating scores to certain rules based on conditions3. PCAP CLI – Query for pcap data

Page 22: Apache Metron: Community Driven Cyber Security

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What does Stellar consist of?

Referencing Fields in the enriched JSON

Simple boolean operations: and, not, or

Simple comparison operations <, >, <=, >=

Determining whether a field exists (via exists)

The ability to have parenthesis to make order of operations explicit

E.g.: IN_SUBNET( ip, '192.168.0.0/24') or ip in [ '10.0.0.1’,'10.0.0.2' ] or exists(is_local)

A fixed set of functions which take strings and return boolean inlcuding:

– IN_SUBNET, IS_EMPTY, STARTS_WITH, ENDS_WITH, REGEXP_MATCH, IS_IP, IS_DOMAIN, IS_EMAIL, IS_URL, IS_DATE, IS_INTEGER

A fixed set of transformation functions including:

– TO_LOWER, TO_UPPER, TO_INTEGER, TO_DOUBLE, TRIM, JOIN, SPLIT, GET_FIRST, GET_LAST, GET, MAP_GET, DOMAIN_TO_TLD, DOMAIN_REMOVE_TLD, URL_TO_HOST, URL_TO_PROTOCOL, URL_TO_PORT, URL_TO_PATH, TO_EPOCH_TIMESTAMP

E.g.: IN_SUBNET( ip, '192.168.0.0/24') or ip in [ '10.0.0.1', '10.0.0.2' ] or exists(is_local)

Query Language Transformation Language

Page 23: Apache Metron: Community Driven Cyber Security

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron 0.2 Metron JSON Object Numerous sensors log in different formats. The parser should normalize at least the following

subset of fields to the following Metron JSON naming conventions:

Page 24: Apache Metron: Community Driven Cyber Security

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron 0.2 Metron UI with Kibana 4

Page 25: Apache Metron: Community Driven Cyber Security

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Building a Use Case in MetronSquid Logs (Metron Reference App)

Page 26: Apache Metron: Community Driven Cyber Security

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron Reference Application Squid Sensor What is the Reference App?

– A use case that showcases the following:1. How to add telemetry events from a new data source (Squid) which covers parsing, filtering, transforms and validates2. How to see the new Events in the Metron UI3. How to enrich the telemetry events4. How to do threat intel cross reference checks against event5. How to raise alerts6. How to persist (index, long term storage) the events

Why do we need it?– Similar to the famous java pet store app, it provides an app that is constantly updated to showcase new features.

What are the updates to the Metron Reference App with TP2?– Using Stellar framework to filter, transform and validate events– How to work with the New Metron UI to display new events– Using Stellar framework to do threat triage– Streaming enrichments

How do you consume it?https://cwiki.apache.org/confluence/display/METRON/Metron+Reference+Application

Page 27: Apache Metron: Community Driven Cyber Security

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case Setup

• Scenario• Customer Foo has installed Metron TP2 and they are using the out of the box data sources (PCAP, YAF/Netflow,

Snort and Bro). They love Metron!• But now they want to add new data source the the platform: squid proxy logs.

• Customer Foo’s requirements are the following1. Need to ingest the proxy events from Squid logs in real-time

2. The proxy logs have to be parsed into a standardized JSON structure that Metron can understand

3. In real-time, the squid proxy event needs to be enriched with domain/whois information (domain, cert, country, company)

4. In real-time, the domain of the proxy event must be checked against for threat intel feeds

5. If there is a threat intel hit, an alert needs to be raised

6. The system should provide the ability to configure rules via a custom DSL to prioritize/score different types of alerts

7. The end user must be able to see the new telemetry events and the alerts from the new data source

Page 28: Apache Metron: Community Driven Cyber Security

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron 0.2 Squid Use Case

Page 29: Apache Metron: Community Driven Cyber Security

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron 0.2 Squid Use Case

Step 1b NiFi TailFile Step 1a Create Topic Step 2 Define Parser

Step 3 Enrichment Config

Step 4 Configure Alerts

Step 5 Create Dashboard

Configuration Driven

Page 30: Apache Metron: Community Driven Cyber Security

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

• What is Squid?• Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times

by caching and reusing frequently-requested web pages

• What does a Squid Access Log look like?• When you make an outbound http connection to https://www.cnn.com, the following entry is added to a file called access.log:

1461576382.642 161 98.220.218.158 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html

Unix Epoch Time

IP of host where connection was made

Domain name of the outbound connection

Squid & its Telemetry Event

Page 31: Apache Metron: Community Driven Cyber Security

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

• What is Squid?• Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times

by caching and reusing frequently-requested web pages

• What does a Squid Access Log look like?• When you make an outbound http connection to https://www.cnn.com, the following entry is added to a file called access.log:

1461576382.642 161 98.220.218.158 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html

Unix Epoch Time

IP of host where connection was made

Domain name of the outbound connection

Convert from Unix Epoch to Timestamp

Asset enrichment to enrich IP (hostname, type of device)

WHOIS enrichment to look up domain name information

Threat Intel to cross-reference IP with intel feed to see if there is a hit

Index the event into Elastic and persist in HDFS (Security Data Vault)

What Metron does to the Squid telemetry in real-time

Squid & its Telemetry Event

Page 32: Apache Metron: Community Driven Cyber Security

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

1461576382.642 161 98.220.218.158 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html

Step 1 Telemetry Ingest

Step 1a Create Topic in Kafka Step 1b NiFi TailFile

/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $ZOOKEEPER_HOST:2181 --create --topic squid --partitions 1 --replication-factor 1

Ingest Squid logs into squid Kafka topic via NiFi

Page 33: Apache Metron: Community Driven Cyber Security

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Step 2 Configuring the Squid Parser

Defining the Grok Filter for the Squid data

• Grok vs Java no custom code• Suitable for structured or semi-structured logs• Pre-defined mappings• Regex-based

TIMESTAMP_ISO8601 NUMBER WORD HOSTNAME IP USERNAME

SQUID_DELIMITED %{NUMBER:timestamp}.*%{INT:elapsed} %{IP:ip_src_address} %{WORD:action}/%{NUMBER:code} %{NUMBER:bytes} %{WORD:method} %{NOTSPACE:url}.*%{IP:ip_dst_addr}

1461576382.642 161 98.220.218.158 TCP_MISS/200 103701 GET http://www.cnn.com/ - DIRECT/199.27.79.73 text/html

Squid Grok Filter:

Pre-defined Grok mappings for IP address and url are used

Page 34: Apache Metron: Community Driven Cyber Security

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Step 2 Configuring the Squid Parser

Squid Parser and Transform Configuration

Kafka Topic

Filter Location

Stellar Transformation LanguageCreate 2 additional fields: applying USL_TO_HOST and DOMAIN_REMOVE_SUBDOMAINS

Stellar Transformation Language

DOMAIN_TO_TLD (domain)DOMAIN_REMOVE_TLD(domain)URL_TO_HOST(url)URL_TO_PROTOCOL(url)…

Parser Configurations

Field Transformations

• Configuration stored in ZooKeeper

• Configure parser and field transformations

Page 35: Apache Metron: Community Driven Cyber Security

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Ingestion Checkpoint / Tracing an event

Raw Source Data Metron JSON Object

• Numerous sensor logs in different formats• The parser normalizes a subset of fields• Data is parsed into the Metron JSON

Object

1462366408966.966 161 127.0.0.1 TCP_MISS/200 105413 GET tp://www.cnn.com/ - DIRECT/199.27.79.73 text/html

Metron Storm Parsing

Page 36: Apache Metron: Community Driven Cyber Security

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Step 3 Configure Real-time Enrichment

Enriching events with WHOIS information

• Enrichment reference data stored in HBase• Configuration stored in ZooKeeper• WHOIS data bulk loaded using Metron framework• Sample WHOIS data used:

google.com, "Google Inc.", "US", "Dns Admin",874306800000work.net, "", "US", "PERFECT PRIVACY, LLC",788706000000capitalone.com, "Capital One Services, Inc.", "US", "Domain Manager",795081600000cisco.com, "Cisco Technology Inc.", "US", "Info Sec",547988400000cnn.com, "Turner Broadcasting System, Inc.", "US", "Domain Name Manager",748695600000

Bulk Load or Streaming

Page 37: Apache Metron: Community Driven Cyber Security

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

{ "zkQuorum" : "$ZOOKEEPER_HOST:2181" ,"sensorToFieldList" : { "squid" : { "type" : "ENRICHMENT" ,"fieldToEnrichmentTypes" : { "domain_without_subdomains" : [ "whois" ] } } }}

{"config" : {    "columns" : {        "domain" : 0        ,"owner" : 1        ,"home_country" : 2        ,"registrar": 3        ,"domain_created_timestamp": 4    }    ,"indicator_column" : "domain"    ,"type" : "whois"    ,"separator" : ","  }  ,"extractor" : "CSV"}

Step 3 Configure Real-time EnrichmentExtractor Configuration Enrichment Configuration

Metron Enrichment Bulk Loader Utility

Map Columns to enrichment data source

Identify column to match on

Configure field to enrichment type mapping

Page 38: Apache Metron: Community Driven Cyber Security

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Data Enrichment Checkpoint / Tracing an event

Metron JSON Object Enriched Metron object

• Enrichment data is added to the Metron JSON Object

Owner

Data Enrichment Each event is enriched with WHOIS data data based on domain mapping

Country

Registrar

Page 39: Apache Metron: Community Driven Cyber Security

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

• Threat Intel feeds are either bulk loaded or streamed

• Similar to enrichment framework• Mapping to flag out any matches between the

Threat Feed and Streaming data• is_alert flag=true is generated on matches

Alerts via Threat Intel Feeds

Stellar Transformation Language

DOMAIN_TO_TLD (domain)DOMAIN_REMOVE_TLD(domain)URL_TO_HOST(url)URL_TO_PROTOCOL(url)…

• Metron ‘Threat Triage’• Define rules based on incoming data • Use any field within the rules (newly enriched

fields)• Label alert severity levels based on rule

conditions

Alert severity via Defined Rules

Step 4 Configure Threat Intel and Alerting

Page 40: Apache Metron: Community Driven Cyber Security

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

{ "config" : {    "columns" : {        "domain" : 0        ,"source" : 1    }    ,"indicator_column" : "domain"    ,"type" : "zeusList"    ,"separator" : ","  }  ,"extractor" : "CSV"}

• Domain is mapped against this Threat Intel Feed• Alerts generated when a match is hit• Zeus malware tracker list used• Feed Bulk Loaded:

domain,source• Sample threat intel data:

Threat Intel Feed Mapping

Stellar Transformation Language

DOMAIN_TO_TLD (domain)DOMAIN_REMOVE_TLD(domain)URL_TO_HOST(url)URL_TO_PROTOCOL(url)…

Step 4a Configure Threat Intel and Alerting

malware_intel_feed.csv

039b1ee.netsolhost.com,abuse.ch03bbec4.netsolhost.com,abuse.ch0if1nl6.org,abuse.ch0x.x.gg,abuse.ch1st.technology,abuse.ch76tguy6hh6tgftrt7tg.su,abuse.chagiftcard724.com,abuse.ch…

Identify column mappings for the threat Intel feed

Specify column to match on

{ "zkQuorum" : "$ZOOKEEPER_HOST:2181" ,"sensorToFieldList" : { "squid" : { "type" : "THREAT_INTEL" ,"fieldToEnrichmentTypes" : { "domain_without_subdomains" : [ "zeusList" ] } }}}

Metron Threat Intel Bulk Loader Utility

Configure field to threat Intel mapping

Page 41: Apache Metron: Community Driven Cyber Security

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Requirement For Scoring a Specific Type Threat Intel Alert:– Rule 1: If the threat intel enrichment came from threat intel feed called zeusList is alerted, then

we want to consider that an alert of score of 5– Rule 2: If the url is neither a .com nor a .net, then we want to consider that alert a score of 10

Step 4b Alert Triage (Scoring Alerts)

Rule 2 If url is not a .com OR .net. Score = 10

Rule 1 If threat intel exists in the Zeus list. Score = 5

Aggregator defined for when multiple conditions are met

Page 42: Apache Metron: Community Driven Cyber Security

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Visualize Enriched Data and Alerts

(Example) Trend of Metron generated alerts for data categorized by the alert risk level

Drill down into Squid data events

• Kibana Driven Dashboards• List and Visualize Squid Data

List of Metron generated alerts ordered by risk level - record level drill down

Step 5 Enhance the Metron UI

Page 43: Apache Metron: Community Driven Cyber Security

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Metron Default Dashboard Kibana 4

• Displaying network data collected from the Metron probes

• In-built geo enrichment for default sensors feed the map view

Page 44: Apache Metron: Community Driven Cyber Security

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key Takeaways…

• Easy Extensibility - The ability to add new data source without writing any code and in an easy manner!!

• Repeatable Pattern - The reference application represents a repeatable pattern that you can apply to most data sources

• Configuration Drive - End to end framework to build real-time enrichment and alerting data pipelines

Page 45: Apache Metron: Community Driven Cyber Security

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved