Post on 11-Apr-2017
#RSAC
Sam Curry Yonatan Striem-Amit
Security Analytics: The Security Promise of AI, ML and Data Science
Chief Product Officer CTO, Co-founderCybereason Cybereason
“The computer industry is the only industry that is more fashion-driven than women's fashion.
Maybe I'm an idiot, but I have no idea what anyone is talking about.
What is it? It's complete gibberish. It's insane. When is this idiocy going to stop?”
- Larry Ellison
#RSAC
Quick Poll
#RSAC
AbstractSecurity Analytics: The security promise of AI, ML and Data Science
Artificial Intelligence, Machine Learning and Data Science are among the many disciplines abused by marketing and snake oil salesman alike. This session will focus on advancing the state of the art of security analytics using what is real and examining the promise, the hype and the real state of AI, ML and Data Science in solving fundamental security problems.
#RSACIf you stare at the data…“My God…it’s full of stars” (2001: A Space Odyssey)
#RSACSecurity Challenge:Be fast (machines) and smart (humans)
6
The job is hard already…The mission (practical CIA-N)
The right people access the right data at the right timeBut…stop the wrong people all the time
Exponentially more…UsersClientsDataAppsInfrastructure
#RSACSecurity Challenge:Be fast (machines) and smart (humans)
7
Can AI/Data Science/ML help…?
IdentitiesAuthenticate without passwordsAuthorize without policies or rules
Stop bad guysFind malware and Malops™Find the needle in the haystack
#RSAC
What are AI, ML and Data Science?
AI, ML and Data Science are not really interchangeableEasy to trump-up (trickle down)Security lags advances, most is still hypeTremendous potential for these technologies
#RSAC
Machine Learning Techniques
TypesCommon AlgorithmsDeep learning
Question…are there really any ”new” algorithms or are we just finally able to get results because of raw horsepower?
P(c|x) is the posterior probability of class (target) given predictor (attribute).P(c) is the prior probability of class.P(x|c) is the likelihood which is the probability of predictor given class.P(x) is the prior probability of predictor.
#RSAC
Machine Learning Technologies
#RSAC
“Big Data”
What happens when the amount of data we have grows beyond critical mass
We can now store, process, and maintain petabyte-scale data structures
Beyond any single computer
Volume, Variety, Velocity, Veracity
#RSAC
Data Science
The discipline of being able to mine data successfully Golden age of data science, made possible by big data. Finally can find meaningful phenomena vs statistical artifact
#RSAC
Artificial Intelligence
AI is a goal of machine learning. AI is not a tool for any application.Often abused termLots of research, mostly philosophicalThe pursuit of AI has produced amazing results in machine learning: self-driving cars, Alpha Go, Jarvis.
#RSAC
Let’s Talk About Anomalies
Don’t lose sight of the forestSecurity is not an “anomaly hunt”Real-world data sets aren’t simple
Not extrapolations of simple setsThat which is statistically significant has no necessary bearing on what you care aboutIn any complex environment, anomalies are the normActing on the system affects the system
Visualization of daily Wikipedia edits created by IBM
#RSAC
The Allure of Anomaly Detection
Long been considered the holy grail for ML in securityConsistently yields poor results in “real life”Anomalies are commonThe “False Positive” paradox𝑃 𝑅|𝐴 = 99%𝑃 𝑅|𝐴) = 0.5%𝑃 𝐴 = 0.001%
𝑃 𝐴|𝑅 = . /|0 . 0. /|0 . 0 1. /|0) . 0)
=0.2%
#RSAC
Let’s Debunk Some Myths Together
Machine Learning applications to security are newI can make a “final solution for security” or buy a product thatcan “set it and forget”
“There will be a pattern because I need one to be there”“If I find the secret-awesome-uber-application-of-ML-to-find-everything it will always be there and I can say “security is done.”
#RSAC
Applications in Security
Use Case Potential (where it could be today)
Maturity (where it is today)
Authentication, Authorization, Bio/Behaviometrics, etc.
High Early
Perimeter Protection Medium Early
Threat prediction and prevention High Medium
Deception, misdirection and obfuscation High Early
Event and incident detection and triage High Medium
Risk Management Medium Early
Vulnerability analysis, management and prevention Low Early
Forensics Low Medium
Insider threat High Early
#RSAC
DIY: A Practical Stack for Innovators
Sources: Logs, Netflow, Network/Perimeter, Endpoint Data, IOT
Frameworks: Hadoop / Spark / Elastic Search / Kafka
Toolkits: ScikitLearn, TensorFlow, Caffe
Models: Machine/User behaviors, correlated events
Solutions: Innovation = Business problems + SME + Data Scientists + Engineers + Big Data architects
Datasources
Frameworks
Toolkits
Models
Solutions
#RSAC
Approaches That Show Promise
Structured analysisDeep learningAssist humans – don’t replace themRemember the adversary
#RSAC
Snake Oil – RSA Conference ML Bingo!
Magic Box labeling – always go two layers deeper than claimsSecret methodology isn’t good in security, so why would it be ok in security + machine learning?Statistical Analysis being confused with Machine Learning or AI or Data ScienceProprietary algorithms or approaches“Math is greater than Malware”“Our results show that our [Magic Box] is the best!”Too complicated to explain“We solve that with Big Data!”
#RSAC
Takeaways
There is intelligence in security – it’s more often carbon based than siliconThe job of silicon intelligence is to make the carbon more effectiveML/Big data really is the future of securityRemember the adversaryAs with everything, something that’s too good to be true isn’t true
#RSAC
Q&A
MAKE
DATAGREAT AGAIN!
#RSAC
Sam Curry Yonatan Striem-Amit
SEM-M04
Chief Product Officer Chief Technology Office, Co-founderCybereason Cybereason
Thanks!You’ve been watching:
Security Analytics: The Security Promise of AI, ML and Data Science