OSSPolice-Identifying Open-Source License Violation and 1 ...
Transcript of OSSPolice-Identifying Open-Source License Violation and 1 ...
OSSPolice - IdentifyingOpen-SourceLicenseViolationand1-daySecurityRiskatLargeScale
Ruian Duan,AshishBijlani,Meng XuTaesoo Kim,Wenke Lee
ACMCCS2017
1
Background
• OpenSourceSoftware(OSS)isgainingpopularity,e.g.GitHubreported20Musersand57Mrepos
• Mobileappmarketgrowsfastwithover2MappsonPlayStore
• DevelopersreuseOSSasisforlotsofbenefits
• Legalrisksandsecurityrisksarise
2
RisksinOSSuse
• OSSlicenseshaveconstraints(e.g.GNUGPLrequiresderivativeworkstoopensource)
• 1-dayvulnerabilitiesinstaleOSSversionsareexploitedbyhackers
3
Fornow,GNUGPLisanenforceablecontract,saysUSfederaljudge!
Artifex SlapsPalmwithPDFReaderCopyrightSuit
Equifaxblamesopen-sourcesoftwareforitsrecord-breakingsecuritybreach
CommunityHealthSystemsBreachPossibleduetoHeartbleedVulnerability
Goal
• Designatool,OSSPolice,toanalyzeAndroidappsforopen-sourcelicenseviolationand1-daysecurityriskbydetectingreuseofOSSandtheirversionsatlargescale
• Requirements• AccuratedetectionforhundredsofthousandsofOSS• Accurateversionpinpointing• Efficientresourceusage• FastsearchtosupportvettingalargenumberofAndroidapps
4
Overviewandchallenges
• Featureselection• Sourcevsbinary:automaticallybuildingsourcecodeishard,duetodependencies,variousbuildconfigs etc.
• CompareAppagainstOSS• Fusedappbinaries:multipleOSScanbelinkedorcompiledintoasinglefile• Partialbuildsandinternalcodeclones:notallOSSfeaturesarebuiltintolibrariesandOSSreusesotherOSS
• IdentifyOSSversions• Cross-matchofuniqueversionfeatures:fusedappbinariesandinternalcodeclonescanconfusetheprovenanceofuniquefeatures
5
Sourcevsbinary
• C/C++OSSarebuiltintostrippednativesharedlibraries(sofiles)
• JavaOSSarebuiltintoobfuscateddalvik executables(dex files)
6
SourceCode SharedLibrary StrippedSharedLibraryFoo.c
voidfoo(){w=“hello”…}
.text.dynsym
.rodata.symtab
.debug_info
Bar.cstaticbar(){w=“world”}
.text.dynsym
.rodata
Sourcecode Dalvik Bytecode ObfuscatedDalvikBytecode.classedu/gatech/Foo
.methodbarconst-stringv1,"HelloWorld”invoke-virtual{v0,v1},println
packageedu.gatech;classFoo{bar(){println(“helloworld”)};}
.classa .methodaconst-stringv1,"HelloWorld”invoke-virtual{v0,v1},println
Featureselection
• C/C++OSSvssofiles• Stringliteral
• Clang-basedlexer forOSSand.rodata forlibraries• Exportedfunction
• Clang-basedparserforOSSand.dynsym forlibraries
• JavaOSSvsdex files• Stringconstant• Normalizedclass
• Capturesinteractionwithframework• Functioncentroid
• Capturesintra-proceduralcontrolflow 7
Fusedappbinaries
• AnappusesmultipleOSS• !"#∩%&&
!"#
• %&&∩!"#%&&
• Iterate𝑁 OSShas𝑂(𝑁) timecomplexity
• FlagallOSSbeingusedatthesametime• IndexOSSandtheirversions!
8
edu.gatech.example
MuPDFOpenCV
OpenSSL OkHttpMoPubLog4j
Flatindexingandmatching
• Indexing:MapsfeaturestoOSS• Matching:Lookupfeature->OSSmappingtoidentifyOSSreuse
• Flatindexingblowuptableto90Gafterindexing7KOSS• IndexingmultipleversionsofOSSfurtheraddstotheproblem• Given𝑁 OSSwith𝐹 featuresand𝑉 versions,𝑂(𝑁𝐹𝑉) spacecomplexity
9
feature1
feature2
feature3
MuPDF
OpenCVedu.gatech.example
Partialbuildsandinternalcodeclones
10
repodir file
LibJPEG LibPNG
MuPDF OpenCV
source thirdparty 3rdparty modules/core
test-dev.cpppdf-lex.c opengl.cpp test-io.cpp
pdf fitz testsrc
jpeglib.hpngtest.c
png.c…… … … …
Internalcodeclonesconfusesthird-partywithcoreand
requireshighmatchratiotofilter
Partialbuilds(e.g.examples,tests)causesthematchratiotobelow
Hierarchicalindexingandmatching
• HierarchicalIndexing• Recordssourcehierarchytotrackinternalclones• UsesSimhash algorithmtogenerateidsfornon-leafnodesfordeduplication• Recorduniquefeaturesacrossversionsviaseparatelists
• HierarchicalMatching• NormScore (TF-IDFbased)topromoteuniquepartswhencomputingmatchingratioofanode• Allow partialbuildsbyskippingnodeswithlowratio• Drop internalcodeclonesbyskippingnodeslikelytobethird-party
11
feature1
feature2
feature3
file1
file2
file3
dir 1
dir 2
dir 3
dir 4
dir 5MuPDFOpenCVLibPNG
edu.gatech.example
Cross-matchofuniqueversionfeatures
12
1.5.0
1.6.0
1.2.46
foo_string
int bar_func()
MuPDFV 1.5
V1.6
LibPNGV 1.2.46
V1.6.0
edu.gatech.exampleMuPDF V1.6
LibPNG V1.2.46
Collocation-basedfiltering
• Leveragecollocationinformationintheindexingtableandbinaries• UseNormScore toassigndifferentweightstofeatures
13
MuPDF V1.6
LibPNG V1.6.0
pdf.c
1.6.0
int pdf_read()
png.c
1.6.0
int png_read()
edu.gatech.exampleMuPDF V1.6
LibPNG V1.2.46
Implementation
• DataCollection• Scrapy forcrawlingofOSSrepos• PlayDrone forcrawlingAndroidapps
• FeatureExtraction• Clang-basedlexer andparserforC/C++source• Pyelftools fornativebinaries• Soot-basedparserforJavabytecodeandDex bytecode
• OSSDetection• Redis key-valueclusterforstoringandqueryingindexingresults• Celeryjobschedulerfordistributingworktomultipleservers
14
Evaluation
• FDroid Apps• 4,469apps,579withnativelibraries• 295C/C++OSSuses,7,055JavaOSSuses
• BAT:internalcodeclones• LibScout:partialbuilds(coderemoval)
15
55matches
020406080100
Precision(%) Recall(%) VersionPrecision(%)
C/C++OSSEvaluationResults
OSSPolice BAT
478matches
295matches
020406080100
Precision(%) Recall(%) VersionPrecision(%)
JavaOSSEvaluationResults
OSSPolice LibScout
MeasurementDataset
• C/C++OSSfromGitHub• 3,119popularreposand60,450OSSversions• 29%reposareGPL/AGPL• 11%reposarevulnerablewith5,611severeCVEs(𝐶𝑉𝑆𝑆 ≥ 4.0)
• JavaOSSfromMavenandJCenter• 4,777popularartifacts,77,308artifactversions• 2.3%artifactsareGPL/AGPL• 1.7%artifactsarevulnerablewith452severeCVEids
• AndroidAppsfromGooglePlay• 1.6Mapps,515,812withnativelibraries
16
PerformanceandScalability
• Indexing• 60,450C/C++repos and 77,308Javarepos• Timecost is 1000svs.40sonaverage• Memorygrows sublinearly to 30GBand 9GB
• Matching• Sampled10,000GooglePlayapps• 80%ofdex andsofilesfinishwithin100sand200s
17
0 10 20 30 40 50 60 70 80Number of indexed repos(Thousands)
0.004.669.31
13.9718.6323.2827.9432.6037.25
Mem
ory
usag
e(G
B)
C/C++ Memory UsageJava Memory Usage
Popularlibraries
• Long-taileddistributionofOSSuses
18
020000400006000080000100000120000
Top10detectedJavaOSSexcludingAndroidandGoogleOSS
Utils Network Social
Image Codec
010,00020,00030,00040,00050,00060,00070,00080,00090,000100,000
Top10detectedC/C++OSS
Codec Game Font
Network Audio Viewer
LegalRisks
• Morethan40KpotentialGPLviolators• MoreviolatorsusingC/C++thanJavaandencodinglibrariesdominate
19
0200400600800100012001400
Top5offendedJavaOSS
0
5000
10000
15000
20000
25000
30000
35000
40000
MuPDF FFmpeg PJSIP VLCandX264
BZRTP
Top5offendedC/C++OSS
Codec Utils Compiler Codec Communication
LegalRisks
• WhyviolatingGPL/AGPL?• MuPDF andiTextPDF areusedduetolackoffreealternatives
• OSSdevelopersresponses• MuPDF gotnewcustomersJ• FFmpeg andVideoLANhaveinterest,butFFmpeg cannotenforceJ• PJSIPnotinterestedduetoNDA,iText didnotreplyL
• AwarenessofOSSlicensingterms• NoneoftheappdevelopersprovidedsourcecodeyetL
20
SecurityRisks
• Morethan100KappsusingvulnerableOSSversions• MoreappsusingvulnerableC/C++OSSthanJava
21
050001000015000200002500030000350004000045000
Top6C/C++and4JavavulnerableOSS
C/C++ Java
1,244LibPNG and4,919OpenSSLusesarenotdetectedbyAppSecurityImprovementProgram(ASIP)
SecurityRisks
• WhichversionsofOSSdonewappdeveloperschoose?• BothvulnerableandpatchedOSSarebeingused
• WhendodevelopersupdateOSSversions?• ASIPmitigatesvulnerableOSSusage,butstillremainsaproblem
22
0250500750
MoP
ub
0200400600800
Ope
nSSL
0800
16002400
OkH
ttp
2013-05-122013-11-28
2014-06-162015-01-02
2015-07-212016-02-06
2016-08-24
Date
080
160240
FFm
peg
# Vuln. Usage# Patched Usage
ASIP DeadlineASIP Notification
TimelineofOSSusageforthetop10Kapps,300Kappversions
Discussion
• Checkinglicensecompliancerequiresmanualefforts
• Obfuscationandoptimization• Stringencryptionindex files• Functionhidinginsofiles
• Versionpinpointing• Notallversionscanbeuniquelyidentified
• Moreprogramminglanguages(i.e.JS,Python)andplatforms(i.e.iOS)23
Conclusion
• OSSPolice:anaccurateandscalabletooltoidentifylicenseviolationsand1-daysecurityrisks• Hierarchicalindexingandmatchingscheme• Collocation-baseduniquefeaturefiltering
• Alargescalemeasurement• 1.6MfreeGooglePlayStoreapps• 40KcasesofpotentialGPL/AGPLviolationsand100KappsusingvulnerableOSS
• Interestinginsights• AppdevelopersviolateGPL/AGPLduetolackoffreealternatives• AppdevelopersusevulnerableOSSversionsdespiteeffortsfromGoogle
24