Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

28
Analyzing Software Code and Execution Plagiarism and Bug Detection Shoaib Jameel
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Page 1: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Analyzing Software Code and Execution – Plagiarism

and Bug Detection

Shoaib Jameel

Page 2: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Preliminaries

• Plagiarism - "use or close imitation of the language and thoughts of another author and the representation of them as one's own original work.“

• Plagiarism.wmv• Funny quote - “When one copies from

one resource it’s Plagiarism but when copies from multiple resources – Research”

Page 3: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

So what happens when you plagiarize?

• In countries like the US.

Page 4: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

In India

Page 5: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Let’s get to the main theme now!

• GPLAG – Detection of Software Plagiarism by Program Dependence

Graph Analysis

Page 6: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Motivation

• It’s time consuming and labour intensive to design large softwares with multitude of lines of code.

• So, the easiest way is to Plagiarize! – especially from Open Source Softwares

Page 7: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Review

• Review of Plagiarism Detection

• 1. String Based

• 2. AST-based

• 3. Token-based

Page 8: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

String Based

B. S. Baker. On finding duplication and near duplication in large software systems. In Proc. of 2nd Working Conf. on Reverse Engineering, 1995.

Page 9: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

AST-Based

I. D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier. Clone detection using abstract syntax trees. In Proc. of Int. Conf. on Software Maintenance, 1998.

Page 10: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Background

Page 11: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

PDG – Program Dependence Graph

• A PDG is a graph representation of the source code of a procedure. Basic statements like variables, assignments, and procedure calls are represented by program vertices in a PDG.

Page 12: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.
Page 13: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Original and Plagiarized Code

Page 14: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

PDG-Based Plagiarism Detection

• Given an original program P, and a plagiarism suspect P`, plagiarism detection tries to search for duplicate structures between P and P` in order to prove or disprove the existence of plagiarism. By representing a program as a set of PDGs, the search for duplicates are performed on PDGs.

Page 15: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.
Page 16: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Plagiarism as Subgraph Isomorphism

• The disguises are:• 1. Format alteration and identifier renaming• 2. Statement reordering• 3. Control replacement• 4. Code insertion

Page 17: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

• The mature rate γ is set based on one’s belief in what proportion of a PDG will stay untouched in plagiarism. It is 0.9 in experiments because overhauling (without errors) 10% of a PDG of reasonable size is almost equivalent to rewriting the code.

Page 18: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Pruning Plagiarism Search Space

• In order to find plagiarized PDG pairs, n ¤ m pair-wise (relaxed) subgraph isomorphism testings are needed in principle.

• Two kinds of filters:

• 1. Lossless filter

• 2. Lossy filter

Page 19: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Lossless Filter

• 1. PDGs smaller than an interesting size K are excluded from both G and G`.

• 2. Based on the definition of γ-isomorphism, a PDG pair (g , g`), g G and g’ G`, can be excluded if

Page 20: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Lossy Filter

• Vertex histogram is constructed as a summarized representation of each PDG.

• Similarity is measured in terms of vertex histograms between g and g`.

Page 21: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

The Main Idea

• Estimate the k-dimensional multinomial distribution and then consider whether h(g`) is likely to be an observation from Pg

Page 22: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

GPLAG Algorithm

Page 23: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Experiment Evaluation

Page 24: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.
Page 25: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Efficiency of GPLAG

Page 26: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.
Page 27: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Core part Plagiarism

Page 28: Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.

Conclusion – Some questions still remain

• How this implementation better than debuggers?

• How is this approach better than reverse engineering?

• Human intervention!