From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)
-
Upload
alex-pinto -
Category
Data & Analytics
-
view
1.118 -
download
2
Transcript of From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)
From Threat Intelligence to Defense Cleverness: A Data Science Approach
(#<datasci) Alex Pinto
Chief Data Scien2st – Niddel / MLSec Project @alexcpsec
@MLSecProject
Alex Pinto • That guy that started MLSec Project • Chief Data Scien2st at Niddel • Machine Learning Researcher focused on Security Data • Network security and incident response aficionado • Tortured by Log Management / SIEMs as a child • A BIG FAN of AMack Maps #pewpew • Does not know who hacked Sony • Has nothing to do with aMribu2on in “Opera2on Capybara”
() { :; }; whoami
• Cyber War… Threat Intel – What is it good for? • Combine and TIQ-‐test • Using TIQ-‐test • Novelty Test • Overlap Test • Popula2on Test • Aging Test • Uniqueness Test
• Use case: Feed Comparison
Agenda
What is TI good for anyway?
• 3) How about actual defense? • Use it as blacklists? As research data? • Thing is RAW DATA is hard to work with
(Semi-‐)Required Reading
• #2qtest slides: hMp://bit.ly/2qtest • RPubs Page: hMp://bit.ly/2qtest-‐rpubs
Combine and TIQ-‐Test • Combine (hMps://github.com/mlsecproject/combine) • Gathers TI data (ip/host) from Internet and local files • Normalizes the data and enriches it (AS / Geo / pDNS) • Can export to CSV, “2q-‐test format” and CRITs • Coming soon: CybOX / STIX (ty @kylemaxwell)
• TIQ-‐Test (hMps://github.com/mlsecproject/2q-‐test) • Runs sta2s2cal summaries and tests on TI feeds • Generates charts based on the tests and summaries • WriMen in R (because you should learn a stat language)
Using TIQ-‐TEST • Available tests and sta2s2cs: • NOVELTY – How ogen do they update themselves? • OVERLAP – How do they compare to what you got? • POPULATION – How does this popula2on distribu2on compare to another one ? • AGING – How long does an indicator sit on a feed? • UNIQUENESS – How many indicators are found in only one feed?
Using TIQ-‐TEST – Data Prep • Extract the “raw” informa2on from indicator feeds • Both IP addresses and hostnames were extracted
Using TIQ-‐TEST – Data Prep • Convert the hostname data to IP addresses: • Ac2ve IP addresses for the respec2ve date (“A” query) • Passive DNS from Farsight Security (DNSDB)
• For each IP record (including the ones from hostnames): • Add asnumber and asname (from MaxMind ASN DB) • Add country (from MaxMind GeoLite DB) • Add rhost (again from DNSDB) – most popular “PTR”
Popula<on Test • Let us use the ASN and GeoIP databases that we used to enrich our data as a reference of the “true” popula2on.
• But, but, human beings are unpredictable! We will never be able to forecast this!
Can we get a beSer look? • Sta2s2cal inference-‐based comparison models (hypothesis tes2ng) • Exact binomial tests (when we have the “true” pop) • Chi-‐squared propor2on tests (similar to independence tests)
Uniqueness Test
• “Domain-‐based indicators are unique to one list between 96.16% and 97.37%”
• “IP-‐based indicators are unique to one list between 82.46% and 95.24% of the 2me”
• Some of you are probably like: • “You Data Scien2sts and your
algorithms, how quaint.” • “Why aren’t you doing some useful
research like na2on-‐state aMribu2on?”
OPTION 1: Cool Story, Bro!
• How about using TIQ-‐TEST to evaluate a private intel feed? • Trying stuff before you buy is usually a good idea. Just sayin’ • Let’s compare a new feed, “private1”, against our combined
outbound indicators
Use Case: Comparing Private Feeds
Aging Test
• I guess most DGAs rotate every 24 hours, right? • Rota2on means the private data is s2ll “fresh”, from research or
DGA genera2on procedures
Mostly DGA Related Churn
MLSec Project • Both projects are available as GPLv3 by MLSec Project
• Doing ML research on Security? Let us know! • Puvng together a trust group to share experiences and develop
open-‐source tools to help with data gathering and analysis • Liked TIQ-‐TEST? We can help benchmark your private feeds
using these and other techniques • Visit hSps://www.mlsecproject.org , message @MLSecProject
or just e-‐mail me.
• Come talk to me about Niddel! • Private Beta of Magnet, the Machine Learning-‐powered Threat Intelligence Plaworm.
• Our models extrapolate the knowledge of exis2ng threat intelligence feeds as experienced analysis would. • Models make use of the same data analyst would have. • Automa2cally triages and hunts on pivots of enriched informa2on
Don’t want to do all this work?
Take Aways
• Analyze your data. Extract value from it! • Try before you buy! Different test results mean different things to different orgs.
• Try the sample data, replicate the experiments: • hMps://github.com/mlsecproject/2q-‐test-‐Winter2015 • hMp://rpubs.com/alexcpsec/2q-‐test-‐Winter2015
Greets and Thanks!
• @kylemaxwell, @paul4pc for helping with COMBINE
• @bfist for his work on hMp://sony.aMributed.to
• @hrbrmstr for IPEW and chart revisions for TIQ-‐TEST
• All the MLSec Project community peps!