Understanding Source Code with Deep Learning
Transcript of Understanding Source Code with Deep Learning
![Page 1: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/1.jpg)
Microsoft Research Cambridge
miltos1
https://miltos.allamanis.com
![Page 2: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/2.jpg)
![Page 3: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/3.jpg)
https://visualstudio.microsoft.com/services/intellicode/
http://www.eclipse.org/recommenders/
![Page 4: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/4.jpg)
http://jsnice.org/
Deep Learning Type Inference
V. Hellendoorn, C. Bird, E.T. Barr, M. Allamanis. 2018
Predicting Program Properties from Code
V. Raychev, M. Vechev, A. Krause. 2015
![Page 5: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/5.jpg)
Defined Types
string
string
![Page 6: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/6.jpg)
![Page 7: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/7.jpg)
Research in ML+Code
![Page 8: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/8.jpg)
![Page 9: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/9.jpg)
Target Task
![Page 10: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/10.jpg)
![Page 11: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/11.jpg)
int int int
int
return
for (int i =0; < ; ++)
if ( [ ]>0)
+= [ ];
int int int
int
return
for (int i = 0; i < lim; i++)
if (arr[i] > 0)
sum += arr[i];
Programs as Graphs: Key Idea
![Page 12: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/12.jpg)
Assert.NotNull(clazz);
Assert . (NotNull …
ExpressionStatement
InvocationExpression
MemberAccessExpression ArgumentList
Next Token
AST Child
Programs as Graphs: Syntax
![Page 13: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/13.jpg)
(x, y) = Foo();
while (x > 0)
x = x + y;
Last Write
Last Use
Computed From
Programs as Graphs: Data Flow
![Page 14: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/14.jpg)
Programs as Graphs
int int int
int
return
for (int i =0; < ; ++)
if ( [ ]>0)
+= [ ];
~900 nodes/graph ~8k edges/graph
![Page 15: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/15.jpg)
Graph Representation for Variable Misuse B
A
E G
D
C
F
![Page 16: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/16.jpg)
Graph Representation for Variable Misuse B
A
E G
D
C
F
![Page 17: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/17.jpg)
Vector Space Representations
![Page 18: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/18.jpg)
Graph Neural Networks
BA
EG
D
C
F
Li et al (2015). Gated Graph Sequence Neural Networks.
BA
EG
D
C
F
Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.
![Page 19: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/19.jpg)
Graph Neural Networks: Message Passing
E
D
F
BA
EG
D
C
F
![Page 20: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/20.jpg)
Graph Neural Networks: Message Passing
E
D
F
Li et al (2015). Gated graph sequence neural networks.
BA
EG
D
C
F
![Page 21: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/21.jpg)
Graph Neural Networks: Unrolling
![Page 22: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/22.jpg)
Graph Neural Networks: Unrolling
Li et al (2015). Gated graph sequence neural networks.
![Page 23: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/23.jpg)
Graph Neural Networks: Unrolling
Li et al (2015). Gated Graph Sequence Neural Networks.Gilmer et al (2017). Neural Message Passing for Quantum Chemistry.
• node selection• node classification• graph classification
https://github.com/Microsoft/gated-graph-neural-network-samples
![Page 24: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/24.jpg)
Quantitative Results – Variable Misuse
Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and test
3.8 type-correct alternative variables per slot (median 3, σ= 2.6)
Accuracy (%) BiGRU BiGRU+Dataflow GGNN
Seen Projects 50.0 73.7 85.5
![Page 25: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/25.jpg)
Quantitative Results – Variable Misuse
Accuracy (%) BiGRU BiGRU+Dataflow GGNN
Seen Projects 50.0 73.7 85.5
Unseen Projects 28.9 60.2 78.2
Seen Projects: 24 F/OSS C# projects (2060 kLOC): Used for train and testUnseen Projects: 3 F/OSS C# projects (228 kLOC): Used only for test3.8 type-correct alternative variables per slot (median 3, σ= 2.6)
![Page 26: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/26.jpg)
bool string string out string
var
while null
if return true
null
return false
What the model sees…
![Page 27: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/27.jpg)
UI/UX
ML Capabilities
Metrics
Low resources
![Page 28: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/28.jpg)
Learning Signals
target
prediction
𝑓𝜃(𝑥)input
data 𝑥
model of problem
• Given dataset 𝑥1, 𝑦0 , … , 𝑥𝑁 , 𝑦𝑁• Minimize Loss ℒ 𝜃 =
1
𝑁σ𝑖 𝐿 𝑓𝜃 𝑥𝑖 , 𝑦𝑖
![Page 29: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/29.jpg)
![Page 30: Understanding Source Code with Deep Learning](https://reader031.fdocuments.us/reader031/viewer/2022012507/6183685b4775c8335c37c31f/html5/thumbnails/30.jpg)