Program Synthesis using Deduction-Guided Reinforcement...

Program Synthesis using Deduction-Guided Reinforcement Learning

Yanju Chen, Chenglong Wang, Osbert Bastani, Isil Dillig and Yu Feng

Program Synthesis 2

Find a program ! that satisfies all the specifications ".specifications # synthesizer program $

table transformation string transformationdata visualization HPC

Program Synthesis 3

Deductive Reasoning

Statistical ReasoningDISCONNECTION

NEO(Feng et al. 2018)

BLAZE(Wang et al. 2018)

DEEPCODER(Balog et al. 2017)

AutoPandas(Bavishi et al. 2019)

MORPHEUS(Feng et al. 2017)

The feedback of deduction can not be seamlessly used by the statistical model.

Can we bridge the statistical and deductive approaches in program synthesis in a seamless way?

There's a fundamental disconnection between program synthesis using pure statistical and deductive methods, so...

CONCORDutilizes fine-grained feedbacks from deductive reasoning

to update statistical reasoning on the fly

Our Approach: CONCORD(An Overview)

Take Action Deduce

UpdatePolicy

initial policy !"

!specification #

program

A Running Example(Koala Habitat Selectivity Study*)

u (goal) aggregate two koala factor tracking lists:u (input1) factor 1 - GPS last seen locations (reversed order): [152,140,102,58,35]u (input2) factor 2 - tree conditions: [78,55,89,40,33]

u (output) model computes cumulative factors: [43,75,120,122,143]

koalaFactor::list->list->list

35 58 102 140 152

*Synthetic data inspired by Davies, N., Gramotnev, G., Seabrook, L. et al. Movement patterns of an arboreal marsupial at the edge of its range: a case study of the koala. Mov Ecol 1, 8 (2013).

Sample Domain Specific Language(A DSL for List Processing)

u koalaFactor::list->list->listu koalaFactor([152,140,102,58,35],[78,55,89,40,33])=[43,75,120,122,143]

u solution program:u v1 <- reverse input1

u v2 <- sumUpTo input2u v3 <- sub v2,v1

u which is: sub(sumUpTo(input2), reverse(input1))

|reverse(L)|add(L,L)|sub(L,L)|sumUpTo(L)

sample DSL

sumUpTo reverse

input2 input1

solution ast

Deductive Approach(Solving koalaFactor in Variant of CEGIS* Loop, Step1)

Decide Deduce

#1: add(sort(input1),take(input2,N))#2: add(sort(input1),drop(input2,N))#3: add(sort(input1),sumUpTo(input2))...#20: sub(sumUpTo(input2),reverse(input1))#21: add(sumUpTo(input1),take(input2,N))...

fixed candidate list

input1 = [152,140,102, 58, 35]input2 = [ 78, 55, 89, 40, 33]output = [ 43, 75,120,122,143]

*CEGIS: Counter-Example-Guided Inductive Synthesis

Deductive Approach(Solving koalaFactor in Variant of CEGIS Loop, Step2)

Decide Deduce

• rich and accurate deduction feedback• efficient pruning of search space

• incorporating deductive feedback into existing statistical model is difficult

Statistical Approach(Solving koalaFactor using Data-Driven Machine Learning, Step1)

Infer Check#1: add(sort(input1),take(input2,N))#2: add(sort(input1),drop(input2,N))#3: add(sort(input1),sumUpTo(input2))...#20: sub(sumUpTo(input2),reverse(input1))#21: add(sumUpTo(input1),take(input2,N))...

candidate list

! (if in reinforcement learning setting)

Infer Check

Infer Check#1: add(sort(input1),take(input2,N))#2: add(sort(input1),drop(input2,N))#3: add(sort(input1),sumUpTo(input2))...#20: sub(sumUpTo(input2),reverse(input1))#21: add(sumUpTo(input1),take(input2,N))...

candidate list

Infer Check

candidate list

• data-driven candidate list• can update policy seamlessly

• feedback from checker is relatively less informative than that from deduction

Our Approach: CONCORD(An Overview)

Take Action Deduce

UpdatePolicy

program

initial policy !#specification $

CONCORD: Formalization(Program Synthesis as Markov Decision Process)

sumUpTo reverse

input2 input1

initial state

final state +1

sumUpTo

sub 0.5

sumUpTo 0.3

input2 0.4

reverse 0.2

input1 0.3

... ...

action probability S working node instantiated nodeuninstantiated node?state reward

maximize rewardsObjective

CONCORD: Running Example(Solving koalaFactor using Deduction-Guided RL, Step1)

Take Action Deduce

#1: add(sort(input1),take(L,N))#2: add(sort(input1),drop(L,N))#3: add(sort(input1),sumUpTo(L))...#20: sub(sumUpTo(input2),reverse(L))#21: add(sumUpTo(input1),take(L,N))...

candidate list

UpdatePolicy

Take Action Deduce

#1: add(sort(input1),take(L,N))#2: add(sort(input1),drop(L,N))#3: add(sort(input1),sumUpTo(L))...#20: sub(sumUpTo(input2),reverse(L))#21: add(sumUpTo(input1),take(L,N))...

candidate list

UpdatePolicy

!add(sort(L),take(L,N))add(sort(L),drop(L,N))add(sumUpTo(L),take(L,N))...

sampled infeasible programs

Take Action Deduce

updated candidate list

UpdatePolicy

...#20: sub(sumUpTo(input2),reverse(L))

#2: add(sort(input1),drop(L,N))#3: add(sort(input1),sumUpTo(L))

#21: add(sumUpTo(input1),take(L,N))...

#1: add(sort(input1),take(L,N))!"#

Take Action Deduce

#1: sub(sumUpTo(input2),reverse(L))#2: add(sort(input1),sumUpTo(L))...#24: add(sumUpTo(input1),take(L,N))#25: add(sort(input1),take(L,N))#26: add(sort(input1),drop(L,N))...

UpdatePolicy

Take Action Deduce

#1: sub(sumUpTo(input2),reverse(input1))#2: add(sort(input1),sumUpTo(input2))...#24: add(sumUpTo(input1),take(input2,N))#25: add(sort(input1),take(input2,N))#26: add(sort(input1),drop(input2,N))...

UpdatePolicy

sampled infeasible programs• feedback from deduction flows seamlessly to the policy update• not only prune the search space, but also promote good candidates

What's the difference?

CONCORD: Synthesis Algorithm 24

DeductionEngine

infeasibleSampler

add(sort(input1),take(L,N))partial program

feasible empty set ∅

add(sort(L),take(L,N))add(sort(L),drop(L,N))add(sumUpTo(L),take(L,N))...

sampled infeasible programsDeduceTake

ActionUpdatePolicy

Deduce

CONCORD: Synthesis Algorithm 25

DeductionEngine

infeasibleSampler

?(?(L),take(L,N))?(?(L),drop(L,N))...

logical constraints

Deduce

CONCORD: Synthesis Algorithm(Deduction-Guided Reinforcement Learning)

DeductionEngine

infeasibleSampler

Take Action

UpdatePolicy

weighted future rewards

importance weighting

Off-Policy Sampling

program distribution of policy

program distribution of Sampler

Evaluations(Research Questions and Experiment Settings)

u Research Questions:u How does Concord compare against existing synthesis tools?

u How effective is the off-policy RL algorithm compared to standard policy gradient?

u Experiment Settingsu Deduction Engine: NEO's (Feng et al. 2018) deduction engineu Policy: Gated Recurrent Unit (GRU)

u Benchmarks: DEEPCODER benchmarks used in NEO

u 100 challenging list processing problems

u Comparison between:u NEO (Feng et al. 2018)

u DEEPCODER (Balog et al. 2017)

The architecture of the policy network used

Evaluations(Experiment Results and Analysis)

tool solved timeCONCORD 82% 36s

NEO 71% 99sDEEPCODER 32% 205s

tool solved speedup over NEO

CONCORD 82% 8.71xCONCORD

(StandardPG)65% 2.88x

• Concord tightly couples statistical and deductive reasoning based on reinforcement learning.• The off-policy reinforcement learning technique is effective.

Take-Away Message

Thank you!Questions?

Program Synthesis using Deduction-Guided Reinforcement...

Documents

Transcript of Program Synthesis using Deduction-Guided Reinforcement...

By Yanju Chen Ai W ADOBE® ILLUSTRATOR® CS 5 Feyanju/misc/WebArts01.pdf · ADOBE® ILLUSTRATOR® CS 5 Web Arts #1 - Fireworks By Yanju Chen. F W ADOBE® FIREWORKS® CS 5. F W ADOBE®

Speech Synthesis for Linguists: An Introduction to MBROLA · Speech Synthesis for Linguists: An Introduction to MBROLA Dafydd Gibbon Universität Bielefeld May 2007. U Bielefeld,

THE SCENRRIO(U) PROGRAMMER' MASSACHUSETTS … · 'ad-all a f916m the scenrrio(u) programmer' massachusetts s apprentice: inst a program of tech synthesis cambridge / 7i unclssife

Synthesis from VHDL 1. Layout synthesis 2. logic synthesis 3. RTL synthesis 4. High Level Synthesis 5. System Synthesis Behavioral synthesis of pieces.

SYNTHESIS AND CHARACTERIZATION OF TANTALUM OXIDE …acumen.lib.ua.edu/content/u0015/0000001/0001019/u... · synthesis and characterization of tantalum oxide-based nanowire heterostructures

Algorithms for the Synthesis of Elementary Net Systems with ......Algorithms for the Synthesis of Elementary Net Systems with Localities 89 this q U 1:::U k! q 0or q!˙ q. We will

S Y L L A B U S - jodhpurnationaluniversity.netjodhpurnationaluniversity.net/jnu_syllabus/Engineering_and... · Preliminary on antenna synthesis. Radar Fundamentals, ... Satellite

Climate prediction Ding Yihui, Liu Yanju, Liu Yiming, Shi Xueli National Climate Center, CMA.

Jessica Hawley PROTEIN SYNTHESIS. Protein Synthesis Protein Synthesis PROTEIN SYNTHESIS.

IC-Project I-Synthesis€¦ · To run the genus synthesis tool, issue the command 'genus' after you did the ... genus -legacy_ui. L U N D U N I V E R S I T Y ... Ref: Cadence - Genus

UMI - scholarspace.manoa.hawaii.edu › bitstream › 10125 › 9527 › u… · part i: the stereoselective synthesis of cannabinoids part ii: the total synthesis of sarcophytol

Click chemistry reactions in polymer synthesis and ... · PDF fileClick chemistry reactions in polymer synthesis and modification U. Jaffry,A. Muñoz-Bonilla and P. Herrasti Departamento

Protein Synthesis - Home - Biology Junction synthesis2 ppt.ppt · PPT file · Web view2009-04-03 · ... copyright cmassengale * mRNA A U G C U A C U U C G 2-tRNA G aa1 aa2 A U A

Synthesis, and crystal and electronic structure of sodium ... · 3.71 eV, U Mn = 3.92 eV, U Co = 5.05, U Ni = 5.26 eV) in which the eﬀective U parameters were determined from a

Success for All: A Quantitative Synthesis of U. S. Evaluations

SYNTHESIS OF ANTENNAS *mmmmmmm … · THE PROBLEM OF APERTURE SYNTHESIS OF ANTENNAS WHICH ARE MOUNTED--ETC U) ... new possibilities for using the principle of an artificial antenna

Synthesis of New Heterocyclic TCNQ Analoguesdoras.dcu.ie/18493/1/James_J_Delaney.pdf · 2018. 7. 19. · Synthesis of New Heterocyclic TCNQ Analogues _JPMIHI III_ D ublin City U n

Cambridge Pre-U Biology - 1.6 Genes and Protein Synthesis PART 1 Sample

Research performed at UNLV on the chemistry of Technetium in the nuclear fuel cycle 1. Separation U/Tc and synthesis of solids form 2. Synthesis and characterization.

1. Synthesis of selenated hydroxy fatty acids BRASINOE U.