Quantum-enhanced deliberation of learning agents using...

Projective SimulationIon Implementation

Quantum-enhanced deliberation of

learning agents using trapped ions

Quaint-EPSRC-FQXi Workshop on

Quantum Cybernetics and Control

University of Nottingham, January 20th, 2015

Nicolai Friis

Institute for Theoretical Physics, University of Innsbruck

work in collaboration with Vedran Dunjko and Hans J. Briegel

Nicolai Friis (ITP Innsbruck) Quantum-enhanced deliberation of learning agents

Outline

Projective Simulation

• Projective simulation (PS) agents

• Standard PS vs. reflecting PS (RPS)

• The RPS deliberation process

Quantum-enhanced deliberation using trapped ions

• Flexible implementations — coherent controlization

• Trapped ion setups

• Coherent controlization using trapped ions

Outline

Projective Simulation and its variantsQuantum-enhanced deliberation

Projective Simulation (PS) Agents1

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

EpisodicCompositional

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

Episodic & compositional memory:experience stored as clips

PS agents: deliberation process basedon a random walk in the clip space

Represented by Markov chain withtransition matrix P = [pij ]

Different types of deliberation processes

e.g., standard PS & reflecting PS

1 Briegel, De las Cuevas, Sci. Rep. 2, 400 (2012) [arXiv:1104.3787].

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

p13 p31

p42p24

p11 p22

p33 p44

Agent may be. . .

• embodied

• autonomous

• active/passive

tSensors

Actuators

Memory

Percepts

Actions

PS Agent

Decision-making

input (percept)

→ deliberation

→ output (action)

p13 p31

p42p24

p11 p22

p33 p44

Variants of PS: Standard & Reflecting PS (RPS)

Standard PS

One clip network ⇔ Markov chain

Percept associated to clip i registered:

random walk starting from clip i

until action is obtained → output

Quantization

clips: state vectors |0 〉, |1 〉, |2 〉, . . .

associate unitary Ui to each clip, s.t.,

Ui |0 〉 =∑j

√pji | j 〉

measure in clip-basis, result: |k 〉if k is action → output

if not, apply Uk to |0 〉, repeat

Reflecting PS2

Desire output according to distribution πspecific to each percept

Separate clip network for every percept

Focus on one percept with transition

matrix P , s.t., Pπ = π

Quantization

two registers I/II: {| i 〉}I/IIuse Ui to construct operators UP , VP :

UP | i 〉I |0 〉II = | i 〉I Ui |0 〉IIVP |0 〉I | i 〉II = Ui |0 〉I | i 〉II

Encode π: |π′ 〉=∑i

√πi | i 〉I Ui |0 〉II

Szegedy-walk operator3 W (P ):

W (P ) |π′ 〉 = |π′ 〉2 Paparo, Dunjko, Makmal, Martın-Delgado, Briegel, Phys. Rev. X 4, 031002 (2014) [arXiv:1401.4997].3 M. Szegedy, in Foundations of Computer Science, Proc. 45th IEEE Symp. Found. Comp. Sc., 32–41 (2004).

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

Standard PS

Quantization

Ui |0 〉 =∑j

√pji | j 〉

Reflecting PS2

Quantization

The RPS Decision-Making Process

RPS agent: output actions according to tail of stationary distribution π

Achieved by Grover-like procedure!2

Recall Grover’s algorithm4

Search a set S for marked itemsfrom a subset M⊂ S

(0) Prepare |ψ 〉: uniformsuperposition of basis statesrepresenting all items

(1) Reflection over marked items |φ 〉

(2) Reflection over |ψ 〉

Repeat steps (1) and (2)

Note: every item found equally likely2 Paparo, Dunjko, Makmal, Martın-Delgado, Briegel, Phys. Rev. X 4, 031002 (2014) [arXiv:1401.4997].4 Grover, Proc. 28th ACM Symp. Theory Comput. (STOC), 212–219 (1996) [arXiv:quant-ph/9605043].

|ϕ ⟩

|ψ ⟩

|ϕ ⟩

|ψ ⟩

|ϕ ⟩

|ψ ⟩

|ϕ ⟩

|ψ ⟩

|ϕ ⟩

|ψ ⟩

|ϕ ⟩

|ψ ⟩

RPS: Grover-like procedure

(0) Prepare |π′ 〉: reprentation ofstationary distribution

(1) Reflection over actions

(2) Approximate5 reflection over |π′ 〉

Approximate reflection:Ui → UP , VP→W (P )→ PD(W )→ ARO

Sample, and repeat if no action is found.

Quantum RPS quadratically faster than classical RPS! Speed-up!2 Paparo, Dunjko, Makmal, Martın-Delgado, Briegel, Phys. Rev. X 4, 031002 (2014) [arXiv:1401.4997].5 Magniez, Nayak, Roland, Santha, SIAM J. Comput. 40, 142 (2011) [arXiv:quant-ph/0608026].

Iref HA L

Iref HBL

D0,III

D0,Aux

PDHWL†

D0,Aux

PDHWL†

D0,Aux

PDHWL†

D0,Aux

PDHWL†

Coherent Controlization using Trapped Ions

Flexible Implementation — Coherent Controlization

Key element: flexible (updateable) implementation without prior knowledge of π

First step: construction of unitaries Ui: first column: real, positive entries√pji

For two clips: U(θ) = exp(−i θ

(cos θ

2− sin θ

sin θ2

cos θ2

)Extend by nested scheme of adding control7 (see also [8, 9, 10]), e.g., for (up to) 8 clips:

Remaining algorithm: some fixed operations + adding control: non-trivial, but possible10

7 Dunjko, Friis, Briegel, New J. Phys. (accepted, 2015) [arXiv:1407.2830].8 Araujo, Feix, Costa, Brukner, New J. Phys. 16, 093026 (2014) [arXiv:1309.7976].9 Thompson, Gu, Modi, Vedral, arXiv:1310.2927 [quant-ph] (2013).10 Friis, Dunjko, Dur, Briegel, Phys. Rev. A 89, 030303(R) (2014) [arXiv:1401.8128].

(cos θ

2− sin θ

sin θ2

cos θ2

(cos θ

2− sin θ

sin θ2

cos θ2

(cos θ

2− sin θ

sin θ2

cos θ2

(cos θ

2− sin θ

sin θ2

cos θ2

III III

UHΘ1L

U HΘ2,Θ4,Θ5L U HΘ3,Θ6,Θ7L

UHΘ2L

UHΘ4L UHΘ5L

UHΘ3L

UHΘ6L UHΘ7L

(cos θ

2− sin θ

sin θ2

cos θ2

III III

UHΘ1L

U HΘ2,Θ4,Θ5L U HΘ3,Θ6,Θ7L

UHΘ2L

UHΘ4L UHΘ5L

UHΘ3L

UHΘ6L UHΘ7L

Ion Trap Setups

(a) (b) (c)

String of, e.g., 40Ca+ ions in quadrupole trap — Paul trap

• Electric fields applied by blades (rapidly oscillating) & tips (static)

⇒ nearly harmonic 3−d trapping potential

• Ions individually addressable via position/frequency (dep. on setup)

• Coulomb coupling ⇒ bus (e.g., via vibrational modes)

(a) Figure from: Blatt & Wineland, Entangled states of trapped atomic ions, Nature 453, 1008–1015 (2008).

(b) Paul trap currently used by Blatt group (Innsbruck), image courtesy of B. Lanyon.

(c) Paul trap used in Wunderlich group (Siegen).

Ion Trap Setups

(a) (b) (c)

Ion Trap Setups

(a) (b) (c)

Ion Trap Setups

(a) (b) (c)

Ion Trap Setups

(a) (b) (c)

Scheme to Add Control to Unknown Unitaries7

Initial state:(α |g 〉1 + β |e 〉1

)|ψ 〉2 vibrational mode in |0 〉

(i) Cirac-Zoller method11: laser pulse on ion 1: |g 〉1 |0 〉 → | e 〉1 |1 〉

Èg\1È0\

Èe\1È0\

Èg\1È1\

Èe\1È1\Èg'\È0\Èe'\È0\

Èg\2È0\

Èe\2È0\

Èg\2È1\

Èe\2È1\

10 Friis, Dunjko, Dur, Briegel, Phys. Rev. A 89, 030303(R) (2014) [arXiv:1401.8128].11 Cirac, Zoller, Phys. Rev. Lett. 74, 4091 (1995).

(α |g 〉1 + β |e 〉1

)|ψ 〉2 |0 〉 → |e 〉1 |ψ 〉2

(α |1 〉+ β |0 〉

)(i) Cirac-Zoller method11: laser pulse on ion 1: |g 〉1 |0 〉 → | e 〉1 |1 〉

Èg\1È0\

Èe\1È0\

Èg\1È1\

Èe\1È1\

Èg\2È0\

Èe\2È0\

Èg\2È1\

Èe\2È1\

|e 〉1 |ψ 〉2(α |1 〉+ β |0 〉

)→ |e 〉1

(α |ψ′ 〉2 + β |ψ 〉2

)|0 〉

(ii) Hiding pulses: laser pulses on ion 2: |g 〉2 |1 〉 → |g′ 〉2 |0 〉

Èg\1È0\

Èe\1È0\

Èg\1È1\

Èe\1È1\

Èg\2È0\

Èe\2È0\

Èg\2È1\

Èe\2È1\ Èg'\2È0\Èe'\2È0\

|e 〉1(α |ψ′ 〉2 + β |ψ 〉2

)|0 〉 → |e 〉1

(α|ψ′ 〉2 +β U |ψ 〉2

)|0 〉

(iii) Unknown unitary U : laser pulse on ion 2: |ψ 〉2 → U |ψ 〉2

Èg\1È0\

Èe\1È0\

Èg\1È1\

Èe\1È1\

Èg\2È0\

Èe\2È0\

Èg\2È1\

Èe\2È1\ Èg'\2È0\Èe'\2È0\

|e 〉1(α|ψ′ 〉2 +β U |ψ 〉2

)|0 〉 → |e 〉1

(α|ψ 〉2 |1 〉+β U |ψ 〉2 |0 〉

)(iv) Undo hiding pulses: laser pulses on ion 2: |g′ 〉2 |0 〉 → |g 〉2 |1 〉

Èg\1È0\

Èe\1È0\

Èg\1È1\

Èe\1È1\

Èg\2È0\

Èe\2È0\

Èg\2È1\

Èe\2È1\ Èg'\2È0\Èe'\2È0\

|e 〉1(α |ψ 〉2 |1 〉+β U |ψ 〉2 |0 〉

(α |g 〉1 |ψ 〉2+β |e 〉1 U |ψ 〉2

)|0 〉

(v) Switch back control: Cirac-Zoller pulse11 on ion 1: | e 〉1 |1 〉 → |g 〉1 |0 〉

Èg\1È0\

Èe\1È0\

Èg\1È1\

Èe\1È1\

Èg\2È0\

Èe\2È0\

Èg\2È1\

Èe\2È1\

Summary & Conclusion

• (R)PS: decision-making processes for autonmous learning agents

• Speed-up: Quantum RPS decides quadratically faster than classical RPS2

• Flexible (learning! growth!) implementation using coherent controlization7

• All elements of algorithm implementable in ion traps7

• Feasible implementation: detailed simulations including errors for simplescenario7

DHΠ�,N�

2 Paparo, Dunjko, Makmal, Martın-Delgado, Briegel, Phys. Rev. X 4, 031002 (2014) [arXiv:1401.4997].7 Dunjko, Friis, Briegel, New J. Phys. (accepted, 2015) [arXiv:1407.2830].

DHΠ�,N�

Thank you for your attention.

Quantum-enhanced deliberation of learning agents using...

Documents

Transcript of Quantum-enhanced deliberation of learning agents using...