Ln monitoring repositories

Monitoring repositories for FUN and PROFIT

@snyff [_]

About me

● Security consultant (C.T.O.) working for Securus Global in Melbourne

● PentesterLab (.com): ○ cool/awesome (web) *free* training/exercises ○ real life scenario

Disclaimer

● No code is going to be released today

● No repositories were harmed duringthe preparation of this talk

● I worked on Web and Open Source projects● I worked on commits without using the entire

project's source code

Why work on commits?

● Corporate development:○ Cannot review all projects anymore○ Nice to have a “what to check today” ○ Sort commits by criticality○ Detect backdoors

● Agile development:○ The code changes every day○ Can’t rely on one time code review anymore○ Current approach: daily scan

● You have vulnerabilities:○ Detect patches affecting your bugs○ Detect changes to sensitive functions

● You want vulnerabilities ($$):○ Detect new features with dangerous functions○ Detect changes to sensitive functions

● You want bugs (lulz):○ Get bugs few hours before the patch is available○ Get a list of bad practices examples○ Detect silent patching

What's a repository?

● Developers

● Files

● Commits

● And all of these are constantly moving...

Developers

● Main developer(s):○ Add features○ Fix bugs

● Cosmetic committer(s):○ Change comments (fix typo)○ Change designs of the website○ Change indentation○ Add documentation

● External people○ Do a bit of everything

● README/LICENSE files

● Templates, HTML, CSS

● Images

● Code:○ Libraries○ Installation code○ "normal" code

Commits

● Developer's name

● Code changes:○ Changes: diff○ Files changed○ Number of deletion/addition

● Date/Time of the commit

● Message

Examples of projects monitored

Stats (on the last 5000 commits)

● Commits per week:○ anywhere between 20 and 180 (phpmyadmin) per

week○ 40 commits per week seems to be the average for

"normal/interesting" projects

● Authors:○ between 1 and 140

● Average commit: 200 lines (insertions+deletions)

Goals...

Goals: counterexample

Goals: example

Filtering...

Filtering files

● General approach:○ images○ css○ README

● Framework based:○ tests (interesting to keep for some projects)○ database migration/creation script

● Project based files○ deployment○ installation files

Filtering developers

● For a given project find the "cosmetic developers"

● Don't get me wrong they are not useless, they just do things i don't care about

Results

● Around 5-10% of commits have nothing to do with code...

● You can divide the size of most other commits by 2-3 if you ignore noise (files/comments/...):○ new code with test cases○ modification in comments○ ...

Classification

Data mining

● Take your samples (commits)○ Extract a vector from each sample○ Classify each sample

● From a training set, learn to classify the data

● Apply what you learned:○ to the same training set after splitting it (cross-

validation) ○ to new samples

Data mining

● training set: [1,2,3,0,10,220 ] -> bugfix [2,4,3,0,1,0 ] -> boring [2,5,3,3,1,1 ] -> boring [20,1,0,100,0,10 ] -> new bug

● testing:[23,0,1,90,0,15 ] -> ???

Extracting a vector

● You can't really say a commit is close to another commit

● You need to generate a vector from each commit to compare them

● Once you have done that, everything else is just magic^W Maths

Extracting a vector: getting data

● Number of lines changed:○ insertion vs deletion

● Number of words changed (--word-diff):○ insertion vs deletion

● Authors:○ rating of authors based on the project's history

■ "fixing" score■ "vulnerability creator" score

○ new developers○ known security researchers

● Number of "dangerous" functions:○ insertion○ deletion

● Number of "filtering" functions:○ insertion○ deletion

● commit date vs author date

● Keywords in the message and in the code

● Files modified:○ already implicated in a bug fix○ already implicated in a vulnerability

Filtering vs Dangerous

● Good list of "dangerous" signatures from graudit:○ https://github.com/wireghoul/graudit/

● Weighting is *really* important:○ echo -> potential XSS -> 1 point○ system -> potential commands execution -> 10

points

● Some functions are in both:○ crypto functions for example○ crypto can be dangerous and but can filter as well

Filtering vs Dangerous

system htmlentities

create_function

preg_replacepreg_replace

intvalbasename

File.basename

open3 popenescape

echo attr_accessible

attr_protectedattr_protected

assert

Keywords

SQL injection

Command execTypo

Cross Site Scripting

Dangerous

CSS rules

CSS selector

Directory traversal

Code executionvulnerability

Version number

Changelogdescription

Documentation

punctuation

Security

RiskyCSRF

disclosure

Classification

● Fixed bugs:○ learn from dangerous keywords

● New bugs:○ git blame○ read the source code and classify manually

● Potentially interesting new feature:○ read the source code○ can be a new bug

Results

● Vector computation:○ between 15 and 120 minutes for 5000 commits

● Classification:○ less than a minute

● Scoring:○ 90% success rate on bug fix (without using the

message as part of the vector)○ 50/50 between FP and FN on bug fix○ 200 commits down to 5-10 bugs per day

My tool: SANZARU

● Japanese names for tools make you a Ninja ;)

● Ruby based (what else...)

● Data Mining done with Weka (thx Silvio)

SANZARU: virtuous circle

● Made in a way that the more you learn on a project the more effective it gets :)

● Score authors through learning

● Score files through learning

● add functions used by the project

SANZARU: "learning mode"

● take the last 5k commits and give you the list of impacted files and authors with a weight

● still working on finding the initial bug's author but it doesn't really give you more information

SANZARU: configuration fileconfigure({ :path => "/home/snyff/code/rails", :type => :git, :remote => "origin/master", :origin => "https://github.com/rails/rails", :languages => [ :ruby ] })

filter({ :extensions => [ :html, :css, :jpg, :png, :md, :tpl ], :files => ["LICENSE", "*test*"] })

alert({ :keywords => [keywords_default]... })

SANZARU: configuration fileclassify(:authors => { :default => 0,"rafaelmfranca@gmail.com"=>19,"guilleiguaran@gmail.com"=>15,"fxn@hashref.com"=>8, "lrodriguezsanc@gmail.com"=>11,"vijaydev.cse@gmail.com"=>25, .... },

:files => { :default => 0,"activemodel/lib/active_model/mass_assignment_security.rb"=>20, "railties/lib/rails/application.rb"=>17, "actionpack/lib/action_view/helpers/form_helper.rb"=>17, "activerecord/lib/active_record/core.rb"=>17, ...})

SANZARU: "classification mode"

● Using ruby to create all the vectors

● Using weka to classify the data

● Then manual review of the results:○ New features to find security bugs ○ FP for possible silent patching

SANZARU: "daily mode"

● Cron job (every day)○ update all repositories (hasn't been blacklisted by

github...yet), ruby-git is *shit*○ find alerts in new commits○ classify new commits○ give me a nice report with what to read

SANZARU: example of output

Example found this week (not exploitable... yet):

esc_js escapes ' and "... this doesn't

Example found this week:

Example found this morning:

General observations

● Most fixes are:○ small code insertion (less than 10 lines)○ basic line substitution○ easy to detect

● Most new bugs are:○ details...○ really hard to detect statistically○ general approach: read all potentially interesting

commits○ working on important projects make the creation of

bugs far less likely○ it's not going to rain 0dayz...

Possible improvements

● Integrating syntactic analysis:○ regular expression are just not enough○ False alerts are time consuming...

● Retrieve information from external sources:○ bug report○ CVE

● Support for more languages/platforms:○ Objective C libraries and applications?○ Linux kernel?○ ...

Conclusion

● Easy to detect:○ (Silent) Security Fixes○ New features with "interesting" functions

● Not so easy to detect○ New security bugs

● Still worth the time○ if you want bugs○ if you are doing code review to have examples to

learn from or share: vulnerability patterns○ most frustrating thing you can do?

Questions?@snyff

● Have a great Ruxcon● Play the CTF and Lock Picking● Remember to checkout:

○ PentesterLab.com○ @PentesterLab

● Thx to everyone who helped me putting this talk together

Ln monitoring repositories

Technology

Transcript of Ln monitoring repositories

Review Day 2 Algebra - Chino Valley Unified School District!=4 c. ln!+ln!=0 d. ln1−ln!=! e. ln6+ln!−ln2=3 d. ln!+5=ln!−1−ln!+1 17. Express y in terms of x. a. ln!=!+2 b. ln!=4ln!+3

A Digital Anthology of Early Modern English Drama...ln 0019 img: 3a sig: A2v ln 0020 ln 0021 ln 0022 ln 0023 ln 0024 ln 0025 ln 0026 ln 0027 ln 0028 ln 0029 ln 0030 ln 0031 ln 0032

320B/320B L/LN 322B L/LN 325B L/LN 330B L/LN 345B L€¦ · 320B /L/LN 322B L/LN 325B L/LN 330B L/LN 345B L Cat® engine Gross power 100 kW 120 kW 132 kW 176 kW 233 kW Flywheel power

ATTENTION FISHERMEN & HUNTERS LEGEND · Green Lawn Ln. Monroe Trail Ln. Delaware Ln. Irvine Ln. Adams Trail Truman Ln. Clarence Cannon Ln. Fr azer Rd. Ha ... 556 567 543 573 344 390

DIRECTIONAL DISTIBUTION 51-49% TRAFFIC VOLUME DIAGRAM€¦ · ln ln ln 12’ ln ln ln 2.50% sh ln aux ln sh sh managed lanes ln ln ln 4’ sh 14’ ln pgl 8’ g.p. lanes 2.50% 12’

Construction of Monitoring Wells BR–1, LN–1, and FH–1, and ...

Comparing alternative methods to estimate gravity models ... · 4 ln Xij =k +ln yi +ln yj +(1−σ)ρln dij +(1−σ)ln bij −(1−σ)ln Pi −(1−σ)ln Pj +εij (1) where Xij is

y = ln (kx) / ln (k)

SECTORS: CONSTITUENT SHARES · MARS LN Marston’s PLC MRO LN Melrose PLC MRC LN Mercantile Investment Trust PLC MRCH LN Merchants Trust PLC/The MPI LN Michael Page International

ACCESSORIOS Magnum Pistolas semiautomáticas...Lincoln Electric ® Serie LN-7, LN-8, LN-9, LN-25 [Alambre de hasta .052 pulg. (1.4 mm)] K489-1 Serie LN-8, LN-9, LN-25 [Alambre de 1/16

Ducted Split (60Hz, R22) - REFRICOIN · LN-0320CC/AC LN-0321CC/AC LN-0420CC/AC LN-0421CC/AC LN-04B0CC/AC LN-C0482SA0 ... • LG Ducted split unit have been rigorously life tested

APPROXIMATE DURATION 6 MONTHS N - NHSLS · amar dr s rd uck dr or hill majors ln lambskin ln cre rd old dobbin ln ck ln y shed ln sohap ln t ln ampfire ook ln blade green ln w th

Monitoring of geological repositories for high level ... · IAEA-TECDOC-1208 Monitoring of geological repositories for high level radioactive waste April 2001 . The originating Section

Air Cooler Industry AC-LN / ACA-LN / ACAF-LN 8-141].pdf · Noise levels See technical data AC-LN and ACA-LN /ACAF-LN The noise levels are only reference values as the acoustic properties

Saurashtra University - COnnecting REpositories:: 5 :: 0 F³ ¼FDN¼X lDz R-38, JF6L lJCF¼ p¿D GU¼ G." lN

...Sign-ln Card act r: Sign-ln Card No . Sian-ln Card c

Samsung Ln-s3292d Ln-s4092d Ln-s4692d Bn94-01037a Schematic Diagram [Sch]

A Digital Anthology of Early Modern English Drama · ln 0022 ln 0023 ln 0024 ln 0025 img: 3a sig: A3v ln 0026 ln 0027 ln 0028 ln 0029 ln 0030 ln 0031 ln 0032 ln 0033 ln 0034 ln 0035

Infor LN Development Tools Development Guide · LN software environment ... 202 Chapter 10 LN Reporting Overview.....205 LN Reporting Overview ...

LN-25 Wire Feeders Product Info - Ram Welding Supply - LN-25...LN-25 Wire Feeder | [ 3 ] key controls LN-25 PRO/LN-25 PRO eXTRa TORQUe LN-25 PRO DUaL POWeR 1. Analog Voltmeter 2. Wire