REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 ·...

65
Partially supported by: NSF, DHS, and US Air Force Alessandro (Alex) Orso School of Computer Science – College of Computing Georgia Institute of Technology http://www.cc.gatech.edu/~orso/ REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: BETTER TOGETHER?

Transcript of REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 ·...

Page 1: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

Partially supported by: NSF, DHS, and US Air Force

Alessandro (Alex) OrsoSchool of Computer Science – College of Computing

Georgia Institute of Technologyhttp://www.cc.gatech.edu/~orso/

REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: BETTER TOGETHER?

Page 2: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

MSR PAPERS ANDPROGRAM ANALYSIS

0

1

2

3

4

2004 2005 2006 2007 2008 2009 2010

# M

SR p

aper

s th

at le

vera

gest

atic

and

/or

dyna

mic

ana

lyse

s

Year

Note: this is only

for MSR!

Page 3: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

• Mini-history of software archives

• < 1996 – Mostly small examples, limited evaluation

• 1996 – Siemens suite (<500 LOC)

• 2005 – Software-artifact Infrastructure Repository

• 2006 – Eclipse Bug Data

• 2007 – iBUGS

• In 2010, much (most?) research still uses the Siemens suite

PROGRAM ANALYSIS ANDSOFTWARE ARCHIVES

Page 4: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ISSUE #1

Communication

ISSTA PCs (76) MSR (72)

4

Page 5: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ISSUE #2

Mismatch in assumptions (or schisms)

• (Most) program analyses

• Complete programs

• Single language

• Restricted set of features

• Soundness

• False positives problematic

• Mining techniques

• Incomplete programs

• Multiple languages

• Complete languages

• Noisy data

• False positives acceptable

Page 6: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ISSUE #3

Infrastructure

• Program analysis tools

• Unavailable

• Unusable

• Limited

• Mining infrastructure

• No standard format

• Complicated setup

• Unusable

Page 7: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ISSUE #4

Narrow focus of some MSA research

Page 8: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

LOOKING FOR GOLD...

Page 9: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

LOOKING FOR KEYS...

Page 10: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

LOOKING FOR KEYS...

Softw

are

arch

ives

Page 11: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

LOOKING FOR KEYS...

Softw

are

arch

ives

Page 12: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

LOOKING FOR KEYS...

Softw

are

arch

ives

Page 13: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

LOOKING FOR KEYS...

Softw

are

arch

ives

Page 14: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

MAYBE IF WE TURN ON THE LIGHT

Page 15: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

MAYBE IF WE TURN ON THE LIGHT

Page 16: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

MINING MORE THAN ARCHIVES

Software

Page 17: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

MINING MORE THAN ARCHIVES

Software Archives

Page 18: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

MINING MORE THAN ARCHIVES

Software Archives Program runsProgram traces...

Page 19: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

MINING MORE THAN ARCHIVES

Software Archives Program runsProgram traces...

Static/dynamic metrics

Page 20: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

MINING MORE THAN ARCHIVES

Software Archives Program runsProgram traces...

Static/dynamic metrics

Page 21: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

GAMMA PROJECT

?

Field Data

In house In the field

Maintenance tasks:Impact analysis

Regression testingDebugging

Behavior classification...

Developers

Maintenance tasks:Impact analysisRegression testing

DebuggingBehavior classification

...

"Gamma System: Continuous Evolution of Software after Deployment."

Orso et al., ISSTA 2002.

Page 22: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

IMPACT ANALYSIS

Page 23: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

IMPACT ANALYSIS

• Assess effects of changes on a software system

• Predictive: help decide which changes to perform and how to implement changes

• Our approach

• Program-sensitive impact analysis

• User-sensitive impact analysis

Page 24: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

IMPACT ANALYSIS USING FIELD DATA

m1

Program P

XXA2XXXB2

XXXXB1XXA1

m6m5m4m3m2m1

m2

m4m3

m5 m6

m1 m2

m4m3

m5 m6

m1 m2

m4m3

m5 m6ex

ecut

ion

data

m1 m2

m4m3

m5 m6

m1 m2

m4m3

m5 m6

User A User B

C1 X X"Leveraging Field Data for Impact Analysis and Regression Testing."

Orso et al., ESEC-FSE 2003.

Page 25: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

PROGRAM-SENSITIVEIMPACT ANALYSIS

1. Field execution data

2. Change

Input:

C={m2, m5}

XXA2XXXB2

XXXXB1XXA1

m6m5m4m3m2m1

C1

Impact set = Output:

X X

Page 26: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

PROGRAM-SENSITIVEIMPACT ANALYSIS

Step 1• Identify user executions through

methods in C• Identify methods covered by such

executions

1. Field execution data

2. Change

Input:

C={m2, m5}

XXA2XXXB2

XXXXB1XXA1

m6m5m4m3m2m1

C1

Impact set = Output:

X X

Page 27: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

PROGRAM-SENSITIVEIMPACT ANALYSIS

Step 1• Identify user executions through

methods in C• Identify methods covered by such

executions

1. Field execution data

2. Change

Input:

C={m2, m5}

XXA2XXXB2

XXXXB1XXA1

m6m5m4m3m2m1

C1

Impact set = Output:

X X

Page 28: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

PROGRAM-SENSITIVEIMPACT ANALYSIS

Step 1• Identify user executions through

methods in C• Identify methods covered by such

executions

1. Field execution data

2. Change

Input:

C={m2, m5}

XXA2XXXB2

XXXXB1XXA1

m6m5m4m3m2m1

C1

Impact set = Output:

X X

Page 29: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

PROGRAM-SENSITIVEIMPACT ANALYSIS

Step 1• Identify user executions through

methods in C• Identify methods covered by such

executions

1. Field execution data

2. Change

Input:

C={m2, m5}

XXA2XXXB2

XXXXB1XXA1

m6m5m4m3m2m1

C1

Impact set = Output:

X X

Page 30: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

PROGRAM-SENSITIVEIMPACT ANALYSIS

Step 1• Identify user executions through

methods in C• Identify methods covered by such

executions

1. Field execution data

2. Change

Input:

C={m2, m5}

covered methods = {m1,m2,m3,m5,m6}

XXA2XXXB2

XXXXB1XXA1

m6m5m4m3m2m1

C1

Impact set = Output:

X X

Page 31: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

PROGRAM-SENSITIVEIMPACT ANALYSIS

Step 1• Identify user executions through

methods in C• Identify methods covered by such

executions

1. Field execution data

2. Change

Input:

C={m2, m5} Step 2

• Dynamic forward slice from C

covered methods = {m1,m2,m3,m5,m6}

XXA2XXXB2

XXXXB1XXA1

m6m5m4m3m2m1

C1

Impact set = Output:

X X

Page 32: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

PROGRAM-SENSITIVEIMPACT ANALYSIS

Step 1• Identify user executions through

methods in C• Identify methods covered by such

executions

1. Field execution data

2. Change

Input:

C={m2, m5} Step 2

• Dynamic forward slice from C

covered methods = {m1,m2,m3,m5,m6}

XXA2XXXB2

XXXXB1XXA1

m6m5m4m3m2m1

C1

Impact set = Output:

{m2,m5,m6}

X X

dynamic fwd slice = {m2,m5,m6}

Page 33: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

USER-SENSITIVEIMPACT ANALYSIS

1. Collective impact =

Collective impact• Percentage of executions through

at least one changed methodXXA2

XXXB2XXXXB1XXA1

C1

Input:

Affected users• Percentage of users that executed

at least once one changed method

3/5 = 60%

3/3 = 100%

2. Affected users =

2. Change

Output:

C={m5, m6}

60%

100%

1. Field execution data

X X

m6m5m4m3m2m1

Page 34: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

EMPIRICAL STUDY

• Subject:

• JABA: Java Architecture for Bytecode Analysis (60 KLOC, 500 classes, 3K Methods)

• Data

• Field data: 1,100 executions (14 users, 12 weeks)

• In-house data: 195 test cases, 63% method coverage

• Changes: 20 real changes extracted from JABA’s CVS repository

• Research question: Does field data yield different results than in-house data?

• Experimental setup

• Computed impact sets for the 20 changes using field data and using in-house data

• Compared impact sets for the two datasets

Page 35: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

RESULTS

0

225

450

675

900

C1 C2 C3 C4 C5 C6 C7 C8 C9C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20

Field InHouse Field - InHouse InHouse - Field

InHouse

100 96636

Field

# m

etho

ds

changes

Page 36: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

"Gammatella: Visualizing Program-Execution Data for Deployed Software." Jones et al., Information Visualization, 2004.

DEMO

Page 37: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

DEBUGGING FIELD FAILURES

Page 38: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

FIELD FAILURES

Field failures: Anomalous behaviors (or crashes) of deployed software that occur on user machines

• Difficult to debug• Relevant to users

Page 39: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

Ask the user

CURRENT PRACTICE

I opened my web browser.

Specifically, I clicked on the dock icon. It bounced twice before crashing.

Please help.

Page 40: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

Gather static information

CURRENT PRACTICE

Difficult to reproduce the problem

Only locations directly correlated with the failure

Page 41: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

OUR SOLUTION

Recordfailing executions

in the field

Replayfailing executions

in house

Debugfield failureseffectively

+

Page 42: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

In the fieldIn house

USAGE SCENARIO

!Replay / Debug

Develop Record

Capturedfailure

Page 43: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

345345

CHALLENGES

Large in size Contain sensitiveinformation

!

Page 44: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

345345

CHALLENGES

Large in size Contain sensitiveinformation

Minimize

! !

Page 45: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

345345

CHALLENGES

Large in size Contain sensitiveinformation

Minimize Anonymize

! !

Page 46: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

In the fieldIn house

Replay / Debug

Develop Record

Capturedfailure

MinimizeAnonymize

USAGE SCENARIO

!

!

Page 47: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

Results:• negligible overheads (i.e., less than 10%)• data size is acceptable (application dependent)

Subjects:• several cpu intensive applications (e.g., bzip, gcc)

Research question 1:• does the technique impose an acceptable

overhead?

EVALUATION (PRACTICALITY)

"A Technique for Enabling and Supporting Debugging of Field Failures" Clause and Orso, ICSE 2007.

Page 48: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

EVALUATION (FEASIBILITY)Research question 2:• can the technique produce minimized executions

that can be used to debug the original failure?

Results:• execution reduced to less than 10% in size• all failures reproducible

Subject: Pine email and news client• two real field failures• 20 failing executions, 10 per failure

"A Technique for Enabling and Supporting Debugging of Field Failures" Clause and Orso, ICSE 2007.

Page 49: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

EVALUATION (EFFECTIVENESS)Research question 3:• How much information about the original inputs is

revealed?

Results:• Anonymized inputs revealed between 2% and 60%

of the information in the original inputs

Subjects: NanoXML, htmlparser, Printtokens, Columba• 20 faults overall• inputs from 100 bytes to 5MB in size• all inputs considered sensitive

"Camouflage: Automated Anonymization of Field Data." Clause and Orso, GT Tech Report, March 2010.

Page 50: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

RQ3: EFFECTIVENESSNANOXML

<!DOCTYPE Foo [   <!ELEMENT Foo (ns:Bar)>   <!ATTLIST Foo       xmlns CDATA #FIXED 'http://nanoxml.n3.net/bar'       a     CDATA #REQUIRED>

   <!ELEMENT ns:Bar (Blah)>   <!ATTLIST ns:Bar       xmlns:ns CDATA #FIXED 'http://nanoxml.n3.net/bar'>

   <!ELEMENT Blah EMPTY>   <!ATTLIST Blah       x    CDATA #REQUIRED       ns:x CDATA #REQUIRED>]><!-- comment --><Foo a='very' b='secret' c='stuff'>vaz   <ns:Bar>       <Blah x="1" ns:x="2"/>   </ns:Bar></Foo>

Page 51: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

RQ3: EFFECTIVENESSNANOXML

<!DOCTYPE [   <! >   <!ATTLIST         #FIXED ' '        >

   <!E >   <!ATTLIST        #FIXED ' '>

   <!E >   <!ATTLIST        #        : # >]><!-- -->< =' ' =' ' =' '>   < : >       < =" " : =" "/>   </ :

Page 52: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

Wayne,Bartley,Bartley,Wayne,[email protected],,Ronald,Kahle,Kahle,Ron,[email protected],,Wilma,Lavelle,Lavelle,Wilma,,[email protected],Jesse,Hammonds,Hammonds,Jesse,,[email protected],Amy,Uhl,Uhl,Amy,uhla@corp1,com,[email protected],Hazel,Miracle,Miracle,Hazel,[email protected],,Roxanne,Nealy,Nealy,Roxie,,[email protected],Heather,Kane,Kane,Heather,[email protected],,Rosa,Stovall,Stovall,Rosa,,[email protected],Peter,Hyden,Hyden,Pete,,[email protected],Jeffrey,Wesson,Wesson,Jeff,[email protected],,Virginia,Mendoza,Mendoza,Ginny,[email protected],,Richard,Robledo,Robledo,Ralph,[email protected],,Edward,Blanding,Blanding,Ed,,[email protected],Sean,Pulliam,Pulliam,Sean,[email protected],,Steven,Kocher,Kocher,Steve,[email protected],,Tony,Whitlock,Whitlock,Tony,,[email protected],Frank,Earl,Earl,Frankie,,,Shelly,Riojas,Riojas,Shelly,[email protected],,

RQ3: EFFECTIVENESSCOLUMBA

Page 53: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

RQ3: EFFECTIVENESSCOLUMBA

, , , ,, , , , , ,,Wilma,Lavelle,Lavelle,Wilma,,[email protected],Jesse,Hammonds,Hammonds,Jesse,,[email protected],Amy,Uhl,Uhl,Amy,uhla@corp1,com,[email protected],Hazel,Miracle,Miracle,Hazel,[email protected],,Roxanne,Nealy,Nealy,Roxie,,[email protected],Heather,Kane,Kane,Heather,[email protected],,Rosa,Stovall,Stovall,Rosa,,[email protected],Peter,Hyden,Hyden,Pete,,[email protected],Jeffrey,Wesson,Wesson,Jeff,[email protected],,Virginia,Mendoza,Mendoza,Ginny,[email protected],,Richard,Robledo,Robledo,Ralph,[email protected],,Edward,Blanding,Blanding,Ed,,[email protected],Sean,Pulliam,Pulliam,Sean,[email protected],,Steven,Kocher,Kocher,Steve,[email protected],,Tony,Whitlock,Whitlock,Tony,,[email protected],Frank,Earl,Earl,Frankie,,,Shelly,Riojas,Riojas,Shelly,[email protected],,

Page 54: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

RQ3: EFFECTIVENESSHTMLPARSER

<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>

<style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/

body { margin: 0px;...

/*]]>*/--></style></head><body> ...</body>

Page 55: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

RQ3: EFFECTIVENESSHTMLPARSER

<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>

<style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/

body { margin: 0px;...

/*]]>*/--></style></head><body> ...</body>

Page 56: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

RQ3: EFFECTIVENESSHTMLPARSER

<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head><title>james clause @ gatech | home</title>

<style type="text/css" media="screen" title=""><!--/*--><![CDATA[<!--*/

body { margin: 0px;...

/*]]>*/--></style></head><body> ...</body>

The portions of the inputs that remain after anonymization tend to be structural in nature and

therefore are safe to send to developers

Page 57: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

CONCLUDING REMARKS

Page 58: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ADDRESSING THE ISSUES

• Issue #1: Communication

• Issue #2: Mismatch in assumptions

• Issue #3: Infrastructure

• Issue #4: Narrow focus of some MSA research

Page 59: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ADDRESSING THE ISSUES

• Issue #1: Communication

• Issue #2: Mismatch in assumptions

• Issue #3: Infrastructure

• Issue #4: Narrow focus of some MSA research

• Reaching out

• More common events

• Challenge

Page 60: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ADDRESSING THE ISSUES

• Issue #1: Communication

• Issue #2: Mismatch in assumptions

• Issue #3: Infrastructure

• Issue #4: Narrow focus of some MSA research

• Many similarities and potential synergies

• Opportunity for defining new (or specialized) analyses

• Opportunity for performing more thorough evaluations

Page 61: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ADDRESSING THE ISSUES

• Issue #4: Narrow focus of some MSA research

• Related to communication

• Reciprocal help

• Issue #1: Communication

• Issue #2: Mismatch in assumptions

• Issue #3: Infrastructure

Page 62: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ADDRESSING THE ISSUES

• Issue #1: Communication

• Issue #2: Mismatch in assumptions

• Issue #3: Infrastructure

• Issue #4: Narrow focus of some MSA research

• Go beyond the analysis of “easy” information in the repositories

• Consider all aspects of software, both static and dynamic

• Consider both in-vitro and in-vivo data

Page 63: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

IN CONCLUSION,BETTER TOGETHER?

Page 64: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

IN CONCLUSION,BETTER TOGETHER?

Techniques for analyzing/mining a program in all of its aspects, static and dynamic, and throughout its lifetime

Page 65: REPOSITORY MINING AND PROGRAM ANALYSIS & TESTING: …ffffffff-896a-a3c8-ffff... · 2016-06-23 · •Mini-history of software archives • < 1996 – Mostly small examples, limited

ACKNOWLEDGEMENTS

• Taweesup Apiwattanapong

• James Clause

• Mary Jean Harrold

• James Jones

• Donglin Liang

• Dick Lipton