Workshop: Big Data Visualization for Security
-
Upload
raffael-marty -
Category
Internet
-
view
2.784 -
download
8
description
Transcript of Workshop: Big Data Visualization for Security
Raffael Marty, CEO
Big Data Visualization for Security
UE14 - Romania September 2014
Secur i ty. Analyt ics . Ins ight .3
I am Raffy - I do Viz!
IBM Research
Secur i ty. Analyt ics . Ins ight .4
Introduction
Data Sources
DAVIX
Log Data Processing
Agenda
• Big Data Ecosystem • Security Big Data Tools • Managing Security Data • Visualizing Big Data
6http://www.bigdatalandscape.com/
Secur i ty. Analyt ics . Ins ight .8
Velocity
Big Data - The Three V’s
Volume
Variety
The Big Data Ecosystem
9
Secur i ty. Analyt ics . Ins ight .10
Hadoop Ecosystem
Mahout machine learning
Hive data warehouse
HiveQL query lang
Pig programming language
(pig latin)
HBase big data store
rndm read and write auto sharding
Map Reduce
Impala interactive
SQL queries
distributed file system data redundancy fault-tolerance
HDFSrandom, real-time read/write access
append only namenode / datanode architecture
Zook
eepe
r ce
ntra
lized
“bra
in”
Sentry
Stor
m
Secur i ty. Analyt ics . Ins ight .11
Berkeley Data Analysis Stack (BDAS)
https://amplab.cs.berkeley.edu/software/
SparkSQL
Secur i ty. Analyt ics . Ins ight .12
• Schema free & document oriented
• Simple HTTP interface
• indexes JSON documents
• Queries, aggregations, highlighting, etc.
• Distributed - super easy to add nodes
• Real-time indexing • Based on Lucene
• Replication
• Partitioning / sharding
• how an index is assigned to nodes
• Snapshots
Elastic Search
Up and running in 10 minutes!!
http://elasticsearch.org
Secur i ty. Analyt ics . Ins ight .13
Elastic Search - Admin Interface
Big Data Security Tools
14
Secur i ty. Analyt ics . Ins ight .15
• Elastic Search
• LogStash
• Kibana
ELK Stack
Secur i ty. Analyt ics . Ins ight .16
LogStash http://logstash.net/
input filter output
http://www.elasticsearch.org/overview/logstash
Secur i ty. Analyt ics . Ins ight .17
logstash http://logstash.net/
input files syslog email tcp socket Flume !
AMQP STOMP Beanstalk redis !
twitter HTTP
filter timestamp parsing anonymize drop events parse fields (grok) multiline joins
output ElasticSearch Graylog2/GELF MongoDB Nagios TCP syslog WebSockets !
AMQP STOMP beanstalk redis
messaging
formats avro msgpack thrift xml protobuf csv
Secur i ty. Analyt ics . Ins ight .18
Storing and Indexing Logs
Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2
{“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”}
Non parsed:
Parsed (through grok in LogStash):{“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”, “time”: “Aug 2 13:29:58”, “host”: “pixl-ram”, ”process”: “sshd”, “pid”: 1631}
Raw log:
-> structured search: time > “Aug 1 2014”
Secur i ty. Analyt ics . Ins ight .19
• Instead of re-writing regexes
• Ships with about 100 patterns
• Patterns you don't have to write yourself
• It is easy to add new patterns
Grok
HOSTNAME \b(?:[0-9A-Za-z].......!IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]…!IPORHOST (?:%{HOSTNAME}|%{IP})!
Secur i ty. Analyt ics . Ins ight .20
• Automatic schema inference • Assigns analyzers (prefix indexing, etc.) • Field properties:
• “store” [field and document level] • “index”:
• “analyzed”: tokenized, analyzed • “not_analyzed”: indexed as is • “no”: no indexing
ElasticSearch on Grokked Data
Secur i ty. Analyt ics . Ins ight .21
Grok Patterns
/opt/logstash/patterns
Pattern database located in:
!
Debug Grok rules:
http://grokdebug.herokuapp.com/
Secur i ty. Analyt ics . Ins ight .22
LogStash UI - Kibana
Secur i ty. Analyt ics . Ins ight .23
• Block POST / PUT / DELETE to ES instance
• Older versions:
script.disable_dynamic: true!
! action.destructive_requires_name: true!
• Use aliases to allow only certain users access to certain indexes
• Use iptables to block ports (9200, 9300, …)
• Performance tuning:
• https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/
Running ElasticSearch
Secur i ty. Analyt ics . Ins ight .24
For debugging:
logstash -e ‘input { … } … output { … }’ !
!
Other Command line parameters:
-w <number of cores>!
--debug!
!
!
Running LogStash
input { stdin { type => "stdin-type" } ! file { type => "syslog-ng" path => [ "/var/log/*.log", “/var/log/messages" ] } } !output { stdout { } elasticsearch{ embedded => false host => "192.168.0.23" cluster => "logstash-cluster" node_name => “logstash" protocol => “node” } }
Act as an ES node, not as an unknown client
Secur i ty. Analyt ics . Ins ight .25
Authentication not built in
Use nginx as a proxy
For example:
Running Kibana
https://github.com/elasticsearch/kibana/blob/master/sample/nginx.conf
Secur i ty. Analyt ics . Ins ight .26
Open source, large scale IPv4
packet capturing, indexing and
database system powered by elastic
search.
Web interface for PCAP browsing,
searching, reporting, and exporting
PCAPs
Moloch
https://github.com/aol/moloch
Secur i ty. Analyt ics . Ins ight .27
• Capture
• Sniffs the network interface,
• Parses the traffic and creates the Session Profile Information (aka SPI-Data)
• Writes the packets to disk
!
• Database
• Elasticsearch is used for storing and searching through the SPI-Data
!
• Viewer
• A web interface that allows for GUI and API access from remote hosts
Moloch – Components
Secur i ty. Analyt ics . Ins ight .28
• Moloch parses various protocols to create SPI-Data:
• IP
• HTTP
• DNS • IP Address • Hostname
• IRC • Channel Names
• SSH • Client Name • Public Key
• SSL/TLS • Certificate elements of various types (common names, serial, etc) !
• This is not an all inclusive list
Moloch – Capture – SPI-Data Types
Secur i ty. Analyt ics . Ins ight .34
• Web API’s
• Access meta information
• Grab PCAPs
!
• Indexing PCAP files:
! ${moloch_dir}/bin/moloch-capture -c [config_file] -r [pcap_file]
Moloch - Couple Additions
Secur i ty. Analyt ics . Ins ight .35
• Analyze PCAP files using Apache Pig
• Number of scripts made available
• e.g., running SNORT on the PCAPs
!
PacketPig
https://github.com/bigsnarfdude/packetpig
pig -x local \! -f pig/examples/binning.pig \! -param pcap=data/web.pcap
copyright (c) 2014pixlcloud | turning data into actionable insights
Security Onion•Bro IDS, your choice of Snort or Suricata, Sguil analyst console, ELSA, Squert, Snorby and capME web interfaces
•All setup to work with each other out of the box
http://securityonion.blogspot.com/
Storing Security Data
37
Secur i ty. Analyt ics . Ins ight .38
• What data do you have?
• PCAP
• Flows
• Context, (e.g., threat feeds)
• “Text” logs
• What’s your use-case?
• Search
• Analytics
• Forensics on PCAP
Data Type and Use
Index -> Elastic Search
Columnar, SQL enabled
Moloch? Or extract meta data and store PCAP in HDFS/HBase
PCAP in HDFS or HBase
Row or columnar, fixed schema?
Unstructured in ElasticSearch, enrich on ingestion?
ES or relational
Secur i ty. Analyt ics . Ins ight .39
OpenSOC
Raffael . Marty @ pixlcloud . com
40
Visualization
Secur i ty. Analyt ics . Ins ight .41
Visualization To …
Present / Communicate Discover / Explore
Secur i ty. Analyt ics . Ins ight .42
Show Context
42
Secur i ty. Analyt ics . Ins ight .43
Show Context
42 is just a number
and means nothing without context
Secur i ty. Analyt ics . Ins ight .45
Use Numbers To Highlight Most Important Parts of Data
NumbersSummaries
Secur i ty. Analyt ics . Ins ight .46
Visualization Creates Context
Visualization Puts Numbers (Data) in Context!
Secur i ty. Analyt ics . Ins ight .47
• Show comparisons, contrasts, differences
• Show causality, mechanism, explanation, systematic structure.
• Show multivariate data; that is, show more than 1 or 2 variables. !
by Edward Tufte
Principals of Analytic Design
Secur i ty. Analyt ics . Ins ight .48
Additional information about objects, such as: • machine
• roles • criticality • location • owner • …
• user • roles • office location • …
Add Context
source destination
machine and user context
machine role
Secur i ty. Analyt ics . Ins ight .49
Traffic Flow Analysis With Context
Secur i ty. Analyt ics . Ins ight .50
Visualize Me Lots (>1TB) of Data
!! SecViz is Hard!
Secur i ty. Analyt ics . Ins ight .51
Data Visualization Workflow
Overview Zoom / Filter Details on Demand
Principle by Ben Shneiderman
Secur i ty. Analyt ics . Ins ight .52
This visualization process requires:
• Low latency, scalable backend (columnar, distributed data store)
• Efficient client-server communications and caching
• Assistance of data mining to
• Reduce overall data to look at
• Highlight relationships, patterns, and outliers
• Assist analyst in focussing on ‘important’ areas
Backend Support
Visualization Tools
53
Secur i ty. Analyt ics . Ins ight .54
• Graphs:
• Histogram
• Box plots
• Scatterplot
• Mosaicplots
• Parallel Coordinates
• Boxplots
• ...
• Linking, brushing, …
• Reads CSV files
Mondrian
http://www.theusrus.de/Mondrian/
Secur i ty. Analyt ics . Ins ight .55
Treemap 4.1
www.cs.umd.edu/hcil/treemap
TM3 Input files:Source Port Destination Action
STRING INTEGER STRING STRING
10.0.0.2 80 23.2.1.2 failed
Secur i ty. Analyt ics . Ins ight .56
Gephi http://gephi.org
•Gephi UI • interactive link graphs • multiple layout algorithms • reads: CSV, DOT, GDF, etc. • graph metrics
•Gephi Toolkit • APIs
•Gephi Plugins •Gephi ‘Platform’ • adding JavaFX components
Secur i ty. Analyt ics . Ins ight .57
1. Loading Data
Visually Finding Insight in Gephi
Secur i ty. Analyt ics . Ins ight .58
2. Run Layout Algorithm (Force Atlas 2)
Visually Finding Insight in Gephi
Secur i ty. Analyt ics . Ins ight .60
Visually Finding Insight in Gephi
3. Use Degree as color and size of nodes
Secur i ty. Analyt ics . Ins ight .63
Visually Finding Insight in Gephi
6. Use Preview and export Graph
Secur i ty. Analyt ics . Ins ight .65
AfterGlow - Creating DOT/GDF Files From CSV
CSV File Graph LanguageFile
digraph structs { graph [label="AfterGlow 1.5.8", fontsize=8]; node [shape=ellipse, style=filled, fontsize=10, width=1, height=1, fixedsize=true]; edge [len=1.6]; ! "aaelenes" -> "Printing Resume" ; "abbe" -> "Information Encryption" ; "aanna" -> "Patent Access" ; "aatharuv" -> "Ping" ; }
aaelenes,Printing Resume abbe,Information Encrytion aanna,Patent Access aatharuy,Ping
Parser Grapher
cat file | ./afterglow –c simple.properties –t | neato –Tgif –o test.gif
Hands On
66
Secur i ty. Analyt ics . Ins ight .67
1. Get data into ElasticSearch
Parse data first, then store in ES
2. Get data out of ES (query)
Get into data format for visualization tool (e.g., CSV)
3. Visualize in the visualization tool
Potentially translate CSV into other format (e.g., DOT, GDF)
Process the data (aggregation, enhancement, etc)
Processing Pipeline
Secur i ty. Analyt ics . Ins ight .68
1. Check out /home/davix/ue14
logstash-syslog.conf [read, understand!]
2. Run logstash and index data: ! sudo /opt/logstash/bin/logstash -f logstash-syslog.conf!! head -10 firewall | nc localhost 5000! ! # send data
3. Check what’s in LogStash:
sudo /etc/init.d/logstash-web start!
! open http://localhost:9292 !# kibana
4. Use script to extract data
read_es.py [check out the script]
update the script to output a (src_ip, dst_ip, dst_port) tuple
5. Convert the CSV output to a GDF file to then load into Gephi
OR create a TM3 file for the treemap tool
LogStash Setup - Exercise
curl 'http://localhost:9200/_all/_search?q=ACCEPTED' curl ‘http://localhost:9200/twitter/_search?q=user:kimchy'
Secur i ty. Analyt ics . Ins ight .69
BlackHat Europe - Workshop
VISUAL ANALYTICS DELIVERING ACTIONABLE SECURITY INTELLIGENCE
October 14, 15 - Amsterdam
copyright (c) 2013pixlcloud | turning data into actionable insights
Share, discuss, challenge, and learn about security visualization.
•http://secviz.org •List: secviz.org/mailinglist
•Twitter: @secviz
Security Visualization Community