OSSPolice-Identifying Open-Source License Violation and 1 ...

24
OSSPolice - Identifying Open-Source License Violation and 1-day Security Risk at Large Scale Ruian Duan, Ashish Bijlani, Meng Xu Taesoo Kim, Wenke Lee ACM CCS 2017 1

Transcript of OSSPolice-Identifying Open-Source License Violation and 1 ...

OSSPolice - IdentifyingOpen-SourceLicenseViolationand1-daySecurityRiskatLargeScale

Ruian Duan,AshishBijlani,Meng XuTaesoo Kim,Wenke Lee

ACMCCS2017

1

Background

• OpenSourceSoftware(OSS)isgainingpopularity,e.g.GitHubreported20Musersand57Mrepos

• Mobileappmarketgrowsfastwithover2MappsonPlayStore

• DevelopersreuseOSSasisforlotsofbenefits

• Legalrisksandsecurityrisksarise

2

RisksinOSSuse

• OSSlicenseshaveconstraints(e.g.GNUGPLrequiresderivativeworkstoopensource)

• 1-dayvulnerabilitiesinstaleOSSversionsareexploitedbyhackers

3

Fornow,GNUGPLisanenforceablecontract,saysUSfederaljudge!

Artifex SlapsPalmwithPDFReaderCopyrightSuit

Equifaxblamesopen-sourcesoftwareforitsrecord-breakingsecuritybreach

CommunityHealthSystemsBreachPossibleduetoHeartbleedVulnerability

Goal

• Designatool,OSSPolice,toanalyzeAndroidappsforopen-sourcelicenseviolationand1-daysecurityriskbydetectingreuseofOSSandtheirversionsatlargescale

• Requirements• AccuratedetectionforhundredsofthousandsofOSS• Accurateversionpinpointing• Efficientresourceusage• FastsearchtosupportvettingalargenumberofAndroidapps

4

Overviewandchallenges

• Featureselection• Sourcevsbinary:automaticallybuildingsourcecodeishard,duetodependencies,variousbuildconfigs etc.

• CompareAppagainstOSS• Fusedappbinaries:multipleOSScanbelinkedorcompiledintoasinglefile• Partialbuildsandinternalcodeclones:notallOSSfeaturesarebuiltintolibrariesandOSSreusesotherOSS

• IdentifyOSSversions• Cross-matchofuniqueversionfeatures:fusedappbinariesandinternalcodeclonescanconfusetheprovenanceofuniquefeatures

5

Sourcevsbinary

• C/C++OSSarebuiltintostrippednativesharedlibraries(sofiles)

• JavaOSSarebuiltintoobfuscateddalvik executables(dex files)

6

SourceCode SharedLibrary StrippedSharedLibraryFoo.c

voidfoo(){w=“hello”…}

.text.dynsym

.rodata.symtab

.debug_info

Bar.cstaticbar(){w=“world”}

.text.dynsym

.rodata

Sourcecode Dalvik Bytecode ObfuscatedDalvikBytecode.classedu/gatech/Foo

.methodbarconst-stringv1,"HelloWorld”invoke-virtual{v0,v1},println

packageedu.gatech;classFoo{bar(){println(“helloworld”)};}

.classa .methodaconst-stringv1,"HelloWorld”invoke-virtual{v0,v1},println

Featureselection

• C/C++OSSvssofiles• Stringliteral

• Clang-basedlexer forOSSand.rodata forlibraries• Exportedfunction

• Clang-basedparserforOSSand.dynsym forlibraries

• JavaOSSvsdex files• Stringconstant• Normalizedclass

• Capturesinteractionwithframework• Functioncentroid

• Capturesintra-proceduralcontrolflow 7

Fusedappbinaries

• AnappusesmultipleOSS• !"#∩%&&

!"#

• %&&∩!"#%&&

• Iterate𝑁 OSShas𝑂(𝑁) timecomplexity

• FlagallOSSbeingusedatthesametime• IndexOSSandtheirversions!

8

edu.gatech.example

MuPDFOpenCV

OpenSSL OkHttpMoPubLog4j

Flatindexingandmatching

• Indexing:MapsfeaturestoOSS• Matching:Lookupfeature->OSSmappingtoidentifyOSSreuse

• Flatindexingblowuptableto90Gafterindexing7KOSS• IndexingmultipleversionsofOSSfurtheraddstotheproblem• Given𝑁 OSSwith𝐹 featuresand𝑉 versions,𝑂(𝑁𝐹𝑉) spacecomplexity

9

feature1

feature2

feature3

MuPDF

OpenCVedu.gatech.example

Partialbuildsandinternalcodeclones

10

repodir file

LibJPEG LibPNG

MuPDF OpenCV

source thirdparty 3rdparty modules/core

test-dev.cpppdf-lex.c opengl.cpp test-io.cpp

pdf fitz testsrc

jpeglib.hpngtest.c

png.c…… … … …

Internalcodeclonesconfusesthird-partywithcoreand

requireshighmatchratiotofilter

Partialbuilds(e.g.examples,tests)causesthematchratiotobelow

Hierarchicalindexingandmatching

• HierarchicalIndexing• Recordssourcehierarchytotrackinternalclones• UsesSimhash algorithmtogenerateidsfornon-leafnodesfordeduplication• Recorduniquefeaturesacrossversionsviaseparatelists

• HierarchicalMatching• NormScore (TF-IDFbased)topromoteuniquepartswhencomputingmatchingratioofanode• Allow partialbuildsbyskippingnodeswithlowratio• Drop internalcodeclonesbyskippingnodeslikelytobethird-party

11

feature1

feature2

feature3

file1

file2

file3

dir 1

dir 2

dir 3

dir 4

dir 5MuPDFOpenCVLibPNG

edu.gatech.example

Cross-matchofuniqueversionfeatures

12

1.5.0

1.6.0

1.2.46

foo_string

int bar_func()

MuPDFV 1.5

V1.6

LibPNGV 1.2.46

V1.6.0

edu.gatech.exampleMuPDF V1.6

LibPNG V1.2.46

Collocation-basedfiltering

• Leveragecollocationinformationintheindexingtableandbinaries• UseNormScore toassigndifferentweightstofeatures

13

MuPDF V1.6

LibPNG V1.6.0

pdf.c

1.6.0

int pdf_read()

png.c

1.6.0

int png_read()

edu.gatech.exampleMuPDF V1.6

LibPNG V1.2.46

Implementation

• DataCollection• Scrapy forcrawlingofOSSrepos• PlayDrone forcrawlingAndroidapps

• FeatureExtraction• Clang-basedlexer andparserforC/C++source• Pyelftools fornativebinaries• Soot-basedparserforJavabytecodeandDex bytecode

• OSSDetection• Redis key-valueclusterforstoringandqueryingindexingresults• Celeryjobschedulerfordistributingworktomultipleservers

14

Evaluation

• FDroid Apps• 4,469apps,579withnativelibraries• 295C/C++OSSuses,7,055JavaOSSuses

• BAT:internalcodeclones• LibScout:partialbuilds(coderemoval)

15

55matches

020406080100

Precision(%) Recall(%) VersionPrecision(%)

C/C++OSSEvaluationResults

OSSPolice BAT

478matches

295matches

020406080100

Precision(%) Recall(%) VersionPrecision(%)

JavaOSSEvaluationResults

OSSPolice LibScout

MeasurementDataset

• C/C++OSSfromGitHub• 3,119popularreposand60,450OSSversions• 29%reposareGPL/AGPL• 11%reposarevulnerablewith5,611severeCVEs(𝐶𝑉𝑆𝑆 ≥ 4.0)

• JavaOSSfromMavenandJCenter• 4,777popularartifacts,77,308artifactversions• 2.3%artifactsareGPL/AGPL• 1.7%artifactsarevulnerablewith452severeCVEids

• AndroidAppsfromGooglePlay• 1.6Mapps,515,812withnativelibraries

16

PerformanceandScalability

• Indexing• 60,450C/C++repos and 77,308Javarepos• Timecost is 1000svs.40sonaverage• Memorygrows sublinearly to 30GBand 9GB

• Matching• Sampled10,000GooglePlayapps• 80%ofdex andsofilesfinishwithin100sand200s

17

0 10 20 30 40 50 60 70 80Number of indexed repos(Thousands)

0.004.669.31

13.9718.6323.2827.9432.6037.25

Mem

ory

usag

e(G

B)

C/C++ Memory UsageJava Memory Usage

Popularlibraries

• Long-taileddistributionofOSSuses

18

020000400006000080000100000120000

Top10detectedJavaOSSexcludingAndroidandGoogleOSS

Utils Network Social

Image Codec

010,00020,00030,00040,00050,00060,00070,00080,00090,000100,000

Top10detectedC/C++OSS

Codec Game Font

Network Audio Viewer

LegalRisks

• Morethan40KpotentialGPLviolators• MoreviolatorsusingC/C++thanJavaandencodinglibrariesdominate

19

0200400600800100012001400

Top5offendedJavaOSS

0

5000

10000

15000

20000

25000

30000

35000

40000

MuPDF FFmpeg PJSIP VLCandX264

BZRTP

Top5offendedC/C++OSS

Codec Utils Compiler Codec Communication

LegalRisks

• WhyviolatingGPL/AGPL?• MuPDF andiTextPDF areusedduetolackoffreealternatives

• OSSdevelopersresponses• MuPDF gotnewcustomersJ• FFmpeg andVideoLANhaveinterest,butFFmpeg cannotenforceJ• PJSIPnotinterestedduetoNDA,iText didnotreplyL

• AwarenessofOSSlicensingterms• NoneoftheappdevelopersprovidedsourcecodeyetL

20

SecurityRisks

• Morethan100KappsusingvulnerableOSSversions• MoreappsusingvulnerableC/C++OSSthanJava

21

050001000015000200002500030000350004000045000

Top6C/C++and4JavavulnerableOSS

C/C++ Java

1,244LibPNG and4,919OpenSSLusesarenotdetectedbyAppSecurityImprovementProgram(ASIP)

SecurityRisks

• WhichversionsofOSSdonewappdeveloperschoose?• BothvulnerableandpatchedOSSarebeingused

• WhendodevelopersupdateOSSversions?• ASIPmitigatesvulnerableOSSusage,butstillremainsaproblem

22

0250500750

MoP

ub

0200400600800

Ope

nSSL

0800

16002400

OkH

ttp

2013-05-122013-11-28

2014-06-162015-01-02

2015-07-212016-02-06

2016-08-24

Date

080

160240

FFm

peg

# Vuln. Usage# Patched Usage

ASIP DeadlineASIP Notification

TimelineofOSSusageforthetop10Kapps,300Kappversions

Discussion

• Checkinglicensecompliancerequiresmanualefforts

• Obfuscationandoptimization• Stringencryptionindex files• Functionhidinginsofiles

• Versionpinpointing• Notallversionscanbeuniquelyidentified

• Moreprogramminglanguages(i.e.JS,Python)andplatforms(i.e.iOS)23

Conclusion

• OSSPolice:anaccurateandscalabletooltoidentifylicenseviolationsand1-daysecurityrisks• Hierarchicalindexingandmatchingscheme• Collocation-baseduniquefeaturefiltering

• Alargescalemeasurement• 1.6MfreeGooglePlayStoreapps• 40KcasesofpotentialGPL/AGPLviolationsand100KappsusingvulnerableOSS

• Interestinginsights• AppdevelopersviolateGPL/AGPLduetolackoffreealternatives• AppdevelopersusevulnerableOSSversionsdespiteeffortsfromGoogle

24