How Machine Learning is Changing the Game in Breach Prevention › wp-content › uploads › 2017...

How Machine Learning is Changing the Game in Breach Prevention

Max AdamsRegional Sales Manager, WestMike MyersSystems Engineer

Copyright © 2017Copyright © 2017 1

Spun out of Northrop Grumman

Industry-LeadingMachine Learningand AI Research

Arming Threat Analysts and IncidentResponse Teams

Our Heritage

Copyright © 2017Copyright © 2017

The network security paradox

Companies can never know about every possible threat to their enterprise infrastructures so they must change their thinking about how to defend

their networks

You Defend Against What You Know

Perimeter Defenses, Sandboxes, Intelligence

Feeds, and Breach Detection Systems will

only defend against WHAT IS KNOWN

You Need to Know

Security Policies & Practices are based

on a NEED TO KNOW requirement; You must have prior

knowledge about each threat to defend

against it

You Get Exploited by The UnknownAdversaries now

routinely automate and change their malware by default to always look different every

time they attack you. YOU CAN NOT KNOW

You Measure What You KnowMetrics, Reporting,

Workflows, Incentives, and Spending are all

focused on monitoring & measuring against WHAT YOU KNOW


Advanced threats continue to elude even the best defenses

72% OFENTERPRISESsayitisimpossibletogetaheadofthreatswithtraditionalsystems


By The Numbers: Not enough people or money to keep up Humans are physically unable to analyze new threats at scale

Sources: Verizon DBR 2015; Symantec ISTR 20; Trend Micro Targeted Attacks 2015

28PERCENT

OF ALL MALWARE IS VIRTUAL MACHINE AWARE

17

% OF ANDROID APPS THAT ARE ACTUALLY MALWARE

317 MillionNEW VARIANTS OF MALWARE IN 2014

5 out of 6COMPANIES

TARGETED BY SPEAR PHISHING

4,000 %INCREASE IN CRYPTO RANSOMEWARE FROM

2013 TO 2014

70-90%OF ALL MALWARE IS

UNIQUE TO AN ORGANIZATION

60%

PORTION OF ALL SPEARPHISH ATTACKS THAT ARE EITHER .DOC OR .EXE FILE TYPES

MALWARE EVENTS

PER WEEK

BANKS350

INSURANCE575

RETAIL801

UTILITIES772

868,000Amount of NEW Malware Created

EVERY DAY

% OF TECHNOLOGY WEBSITES INFECTED WITH MALWARE

21%

YEARLYINCREASE

INDATA

BREACHES

23%

Copyright © 2017

Machine Learning is the growing trend in network defense

Thousands of academic papers published as early as 1995 on various ML techniques and processes related to malware… ramp up in popularity beginning in the late 2000s

3 presentations at BlackHat 20131 presentation at DEFCON 201318 presentations at RSA 2014/15

Commercial companies with ML-based solutions currently in the market

Commercial companies with published ML-based malware detection research but no ML-based malware detection product


Security Event Alert

Look Across Entirety of Network

Event TriageAutomated

Event Analysis & AnnotationLeverage Analytics to

ID Relationships and Links in Data

Final CaseEscalation

High FidelityDetection

using MachineLearning

DataCollection

Centralized

CaseCreation

Develop a ThreatCase File at

Digital Speed

minutesnot days or months

Enterprises need response at digital speed

Copyright © 2017

Breach Timeline

Breach

HuntSuccess!

Reconnaissance,LateralMovement,andStaging

MalwareDelivery

Exfil Exfil

T=0T-1sec T+1sec

InitialC2SecondaryPayloadsHeartbeat

ExitNetwork

T+1hour T+1day T+1month T>1monthIoCArrives

IoCArrives

AnomalyDetection

AnomalyDetection

Lucky Unlucky

Copyright © 2017

Finding What the Perimeter MissedRetrospective Analysis

Breach

HuntSuccess!


MalwareDelivery

Exfil Exfil

T=0T-1sec T+1sec


ExitNetwork


IoCArrives

AnomalyDetection

AnomalyDetection

Lucky Unlucky

• IndicatorsofCompromises(IoCs)areusedtosearchthelogrepository

• IoCs mayarriveinfeedsdaystomonthsafterthreatactorsareactivelyusingthem

Copyright © 2017

Finding What the Perimeter MissedAnalytics/Anomaly Detection

Breach

HuntSuccess!


MalwareDelivery

Exfil Exfil

T=0T-1sec T+1sec


ExitNetwork


IoCArrives

AnomalyDetection

AnomalyDetection

Lucky Unlucky

• Establishabaselineofnormaluserandnetworkbehaviorovertime• Oncebaselined,monitorforstatisticallysignificantchangesinasset,userornetworkbehavior• Typicallyrequiremultiplesuspiciousoccurrencesbeforealertingananalyst• Requiresophisticatedanalyststounderstandhowtointerpretalertsorvisuallyidentifyanomalies


Supervised vs. Unsupervised LearningSupervised(labeltheitems):

Unsupervised(sorttheitems):

Onlyonerightanswer• Modelhasbeentrainedtomakecorrect

predictions• Wehavetestsetwithknownlabels• Canquantifyclassifierperformance

Manyreasonableanswers• Moredifficulttomeasureclassifier

performance• Oftenrequirestrainingmodelonsiteto

learn“normal”

Copyright © 2017

MachineLearningUseCases

CyberSecurity

Machine Learning Use Cases

Copyright © 2017 13

SupervisedMachineLearningGoalistopredicttherightanswerbasedonexposuretotrainingdata.USECASE:ZeroDay/PolymorphicThreatDetection

UnsupervisedMachineLearningGoalistofindunusualpatterns/behaviorswithinunlabeleddatasetsbyclusteringlikedataandidentifyingoutliers.USECASE:Unusualcommunicationbetweenhosts

Signature/Rules/PatternEngineGoalistomatchanexplicitstringwithinagivendataset.USECASE:Knownvirus,IOCdetection

Techno

logy

Soph

istica

tion

HINDSIGHT INSIGHT FORESIGHT

Different analytics, different use cases

Copyright © 2017Copyright © 2017

Unsupervised(nolabels)

Incremental(Learn

continuously)

Batch(Learnonlyonceorindiscretesteps)

Supervised(withlabels)

Userbehavioranalytics

InsiderThreat

Detection

NetworkAnomalyDetection Network

TrafficProfiling

SpamFiltering

MalwareFamily

Identification

C2Detection

MalwareDetection

MachineLearninginCyberSecurity

Theproblemyouaretryingtosolvedictatesthedata,featuresandmachinelearningalgorithmsused.

Machine Learning in Cyber Security

Copyright © 2017

Static approach uses just the binary content and context (metadata) as featuresDoes not execute or open the software or file Does not require disassembly but may use itPro: Fast, especially if no disassembly is usedCon: Difficult to associate features to function

ML-Based Malware Detection Features: StaticFiles Are A Bunch of Bytes With Structured Sections

An n-gram, n=4

n-grams can encode assembly instructions, literal values (numbers, strings, etc.), or structure tags

PE Files

Copyright © 2017

Dynamic approaches use software or system activity as featuresRequires a virtual execution environment (sandbox) or host-based agent to conduct feature extractionSome activities are immediate others can take hours to appear, depend on user inputs, or vary with the runtime environmentPro: Features are functions…easier to understand the classifierCon: Slow…requires sandbox spin up and fixed observation window, malware authors can elude detection by using the same analysis methods used by the good guys

ML-Based Malware Detection Features: DynamicSystem Call Traces Registry Edits File System and Network Activity

Copyright © 2017

Network flow and host log data as featuresDoes not work out-of-the-box (models are not trained prior to deployment). Requires training time on-site.Alerts are typically comprised of multiple anomalous eventsPro: Not file based, useful for both malware and insider threat detectionCon: Anomalous does not always mean malicious and there are a lot of statistically anomalous benign activities

Network and User Behavior Anomaly Detection Features

Netflow Host-based Logs Derived values as functions of time


Pipelining ML Solutions

• Use of ML in security is not mutually exclusive• Provide defense in depth by staging solution with different ML technology• Example:

• Use static analysis as a first stage filter• Followed by dynamic execution engine• Followed by anomaly detection – do logs indicate anomalous behavior some time post

installation?


Supervised Learning Workflow


Developing Supervised Learning for Cybersecurity

• Feature selection is the ”secret sauce” in supervised machine learning• Determines how well the algorithm can separate samples into classes• Data collection (training data) also an important element (enough data + right data)

• Algorithms tend to be standard• Particularly in cybersecurity, not a lot of development of specialized learning algorithms• Algorithms are well supported by ML libraries• Algorithms may be hybridized and tested against sets of training data – trial and error

• Most time is spent on feature engineering and data collection• Other challenges are around operationalizing ML –

• How do you get real-time data? Where do you place your sensor?• Do you learn continuously (online) or use a batch learning model?


Trading Off Different Types of Errors in ML• Effectiveness of a ML model

depends on your goals• Low false positive?

• Make your analysts happy • Low false negative?

• You’re trying to detect malware!

• Most models will take a balanced approach between false + and false – rate

• Some systems make this a tunable parameter

• With enough samples this is possible to measure statistically


Learning Process


Detecting the Threat Vector using Supervised Machine Learning

Trained before hand so model is ready to work right out of the

box.

Trained Tailored

Quantitatively measure model performance

Customize your defense to your environment reducing false positives, improving

detection, and giving you a unique model unavailable

to the adversary.

Highly Accurate


Takeaways – how machine learning is changing the game …

Improved Resource Efficiency

Tailored to Customer

Environment

Earlier Detection Reduces

Risk

Maximize Existing Security

Investments

Detection of Unknown Threats

Improved detection of zero-day, targeted, and polymorphous threats

that evade detection by traditional solutions

Helps skilled threat analysts prioritize

their workloads and focus on actionable

threats

Machine learning can custom tune

each implementation to the customers’

specific environment, which improves

accuracy, and makes each instance

unique

Proactive vs. reactive detection means that

the risk can be contained and

mitigated before it’s had the chance to embed and cause significant damage

Some machine learning based solutions can be integrated with existing layers such as threat

intel feeds, SIEM, and post analysis tools to

improve overall effectiveness and

maximize ROI

THANK YOU

Max [email protected] [email protected]

www.bluvector.io

How Machine Learning is Changing the Game in Breach Prevention › wp-content › uploads › 2017...

Documents

Transcript of How Machine Learning is Changing the Game in Breach Prevention › wp-content › uploads › 2017...