Manipulating Soft Tissues by Deep Reinforcement Learning ...Manipulating Soft Tissues by Deep...

Manipulating Soft Tissues by Deep ReinforcementLearning for Autonomous Robotic Surgery

Ngoc Duy NguyenInstitute for Intelligent Systems

Research and Innovation (IISRI)Deakin University

Geelong, Victoria, [email protected]

Thanh NguyenInstitute for Intelligent Systems



Saeid NahavandiInstitute for Intelligent Systems



Asim BhattiInstitute for Intelligent Systems Research and Innovation (IISRI)

Deakin UniversityGeelong, Victoria, [email protected]

Glenn GuestFaculty of HealthDeakin University


Abstract—In robotic surgery, pattern cutting through a de-formable material is a challenging research field. The cuttingprocedure requires a robot to concurrently manipulate a scissorand a gripper to cut through a predefined contour trajectory onthe deformable sheet. The gripper ensures the cutting accuracy bynailing a point on the sheet and continuously tensioning the pinchpoint to different directions while the scissor is in action. The goalis to find a pinch point and a corresponding tensioning policy tominimize damage to the material and increase cutting accuracymeasured by the symmetric difference between the predefinedcontour and the cut contour. Previous study considers finding onefixed pinch point during the course of cutting, which is inaccurateand unsafe when the contour trajectory is complex. In thispaper, we examine the soft tissue cutting task by using multiplepinch points, which imitates human operations while cutting. Thisapproach, however, does not require the use of a multi-gripperrobot. We use a deep reinforcement learning algorithm to findan optimal tensioning policy of a pinch point. Simulation resultsshow that the multi-point approach outperforms the state-of-the-art method in soft pattern cutting task with respect to bothaccuracy and reliability.

Index Terms—pattern cutting, soft tissue, deep learning, rein-forcement learning, tensioning, surgical robotics

I. INTRODUCTION

In robotic surgery, manipulation of a deformable sheet,especially cutting through a predefined contour trajectory, is acritical task that has attracted a significant number of researchinterests [1]–[5]. The pattern cutting task is one of the Funda-mental Skills of Robotic Surgery (FSRS) because it minimizessurgeon errors, operation time, trauma, and expenses [6]–[8]. Furthermore, the deformable material is usually soft andelastic, which is intractable to perform a cutting procedureaccurately [9]. Therefore, it is necessary to use a gripper[10], [11], which holds a point (pinch point) on the sheetand tensions it along an allowable set of directions with areasonable force while a surgical scissor is used to cut througha contour trajectory, as shown in Fig. 1.

In other words, the pattern cutting task involves two essen-tial steps: 1) selecting a pinch point and 2) finding a tensioningpolicy from that pinch point. Previous study considers a singlepinch point over the course of cutting, which is only efficientwhen the contour shape is simple. Conversely, it is moreappropriate to divide a complicated contour into differentsegments. In this case, the use of one pinch point is unsafeand significantly reduces cutting accuracy [12].

In this paper, we examine a multi-point approach andcompare the accuracy with its counterpart. Because the robothas a single gripper, only one pinch point is used for tensioningand the others are pinned permanently in the setup phase. Fi-nally, we use a deep reinforcement learning algorithm, namelyTrust Region Policy Optimization (TRPO) [13], as in [12], toseek the optimal tensioning policy from a pinch point. Thetensioning policy determines the tensioning direction based onthe current state of the sheet, contour information, and cuttingposition.

In this study, we use the simulator described in [12] toevaluate the accuracy and reliability of the proposed approachand compare its performance with the state-of-the-art method[12]. For the sake of conciseness, we call the method de-scribed in [12] the Single-point Deep Reinforcement LearningTensioning (SDRLT) method. Furthermore, the simulator hasa practical perspective because the learned tensioning policyis reevaluated by the well-known physical surgical system,da Vinci Research Kit (dVRK) [14]. Simulation results showthat the multi-point approach outperforms the SDRLT methodand achieves an average of 55% better accuracy than the non-tensioned baseline over a set of 14 multi-segment contours.

Finally, the paper has the following contributions:1) This work provides the first study of using multiple

pinch points in pattern cutting task. The study showsthe benefits of using multiple pinch points, particularlyin complicated contours where a scissor is instructed to

arX

iv:1

902.

0518

3v1

[cs

.RO

] 1

4 Fe

b 20

19

Scissor

Gripper

Pinchpoint

Tensioningdirection

Contourtrajectory

Deformable material

z

x

y

Fig. 1: A surgical pattern cutting task.

cut multiple segments to complete the whole contourtrajectory.

2) The proposed scheme outperforms the state-of-the-artmethod in pattern cutting task and becomes a premiseto develop a significant number of research extensionssuch as the use of multiple grippers, multiple scissors,and multi-layer pattern cutting in a 3D environment.

3) The multi-point approach imitates human demonstra-tions while cutting. Therefore, the proposed schemeis useful in both practical implication and theoreticalanalysis. Finally, the proposed scheme achieves highaccuracy and reliability in pattern cutting task.

The rest of the paper is organized as follows. The nextsection reviews recent advances in surgical automation andthe benefits of using reinforcement learning in surgical tasks.Section III presents a preliminary background of patterncutting task and introduces our proposed scheme. Section IVdiscusses the experimental results of the proposed scheme andSection V concludes the paper.

II. RELATED WORK

The use of robot-assisted surgery allows doctors to automatea complicated task with minimal errors and effort. A numberof automation levels have been discussed extensively in theliterature [15]–[21]. Specifically, early approaches required theexistence of experts to create a model trajectory that is usedto teach a robot to automatically complete a designated task.For example, Schulman et al. [22] use a trajectory transferalgorithm to transform human demonstrations into modeltrajectories. These trajectories are updated to adapt to newenvironment geometry and hence assist a robot in learning thetask of suturing. Recently, Osa et al. [23] propose a frameworkfor online trajectory planning. The framework uses a statisticalmethod to model the distribution of demonstrated trajectories.

dx

dy

Pinch pointTensioning direction

Fig. 2: A deformable sheet is represented by a mesh ofpoint masses. The left figure illustrates a pinch point withouttensioning. The right figure illustrates a pinch point that istensioned along vertical direction.

As a result, it is possible to infer the trajectories into a dynamicenvironment based on the conditional distribution.

Reinforcement learning (RL) has become a promising ap-proach to modeling an autonomous agent [24]–[28]. RL hasthe abilities to mimic human learning behaviors to maximizethe long-term reward. As a result, RL enables a robot tolearn on its own and partially eliminates the existence ofexperts. Examples of these agents are the box-pushing robot[29], pole-balancing humanoid [30], helicopter control [31],soccer-playing agent [32], and table tennis playing agent [33].Furthermore, RL has been utilized in a significant number ofresearch interests in surgical tasks [34], [35]. For example,Chen et al. [36] propose a data-driven workflow that combinesProgramming by Demonstration [37] with RL. The workflowencodes the inverse kinematics using trajectories from humandemonstrations. Finally, RL is used to minimize the noise andadapt the learned inverse kinematics to the online environment.

Recent breakthrough [38] combines neural networks withRL (deep RL), which enables traditional RL methods towork properly in high-dimensional environments. Typically,Thananjeyan et al. [12] use a deep RL algorithm (TRPO)to learn the tensioning policy from a pinch point in patterncutting task. The tensioning problem is described as a MarkovDecision Process [39] where each action moves the gripper1mm along one of four directions in the 2D space. A stateof the environment is a state of the deformable sheet, whichis represented by a rectangular mesh of point masses. Thegoal is to minimize the symmetric difference between the idealcontour and the achieved contour cut. This approach, however,uses a single pinch point to complete the pattern cutting task.In this paper, we examine the use of multiple pinch pointsover the course of cutting. Finally, we compare the cuttingaccuracy of our proposed scheme with two baseline methodsdescribed in [12]: 1) the non-tensioned scheme and 2) SDRLT.Section III and Section IV present the proposed scheme inmore details.

III. PROPOSED SCHEME

A. Preliminary

As mentioned earlier, the tensioning problem is described asa Markov Decision Process where a state of the environment

is represented by a mesh of N point masses. Initially, thesepoints are aligned evenly in the horizontal direction by a dis-tance dx and in the vertical direction by a distance dy (in the2D space), as illustrated in Fig. 2. A set of N points is indexedby Σ = {1, 2, ..., N}. We define LΣ = {pi|pi ∈ R3, i ∈ Σ},where pi denotes a position of point i in 3D space. Initially,we assume that pi|z = 0,∀i ∈ Σ. Therefore, a state of theenvironment at time t is represented by Lt

Σ.To simulate the deformable material properly, each point i

is connected to its neighbors ∆iN by a spring force. This force

maintains the distance between neighboring points. A point ismoved to a cut set ∆C if it does not have any constraints withits neighbors, i.e., the point is cut by a scissor. When we applyan external force Ft to tension a pinch point, the positions ofits neighbors can be calculated by using the Hooke’s law [12]:

pt+1i = αpti+δ(p

ti−pt−1

i )−∑

j∈∆iN−∆C

τ(ptj−pti)+(Ft+g(t)),

where α and δ represent time-constant parameters, τ repre-sents a spring constant, and g(t) denotes gravity. The vectorof gravity

−−→g(t) belongs to the z-axis. For simplicity, we assume−−→

F (t)|z = 0. Let A be a set of actions, we can define atensioning policy πT as a mapping function from Lt

Σ toprobability distribution of A, i.e., πt

T : LtΣ → P t

A(.).

B. Multi-Point Deep RL Tensioning Method

Before cutting Segment 1 is cut Segments 1 and 2 are cut

Pinch point

Tensioning directionSegment 1

Segment 2

Fig. 3: Limitations of the use of one pinch point.

A robot is normally limited by spatial and mechanicalconstraints. Therefore, it is intractable to use a scissor to cut acomplicated contour without interruptions. One solution is todivide the contour into multiple segments and find the cuttingorder among these segments to minimize the damage to thematerial [12]. However, the use of one pinch point makes itimpossible to avoid any damage to the material, especially nearthe joint areas between segments. In Fig. 3, for example, thecontour is divided into two segments: segment 1 is illustratedby the orange dots and segment 2 is illustrated by the dark bluedots. The pinch point is represented by a solid green dot. Thegray areas denote the joint areas between the segments. Aftersegment 1 is completely cut, a tensioning force applied to thepinch point inadvertently causes the joint areas to distort, i.e.,we start cutting segment 2 from an improper position. This

Segment 1 is being cutPinch point 1 is used for tensioning

Pinch point 1

Segment 1

Segment 2

Segment 1 is cut, segment 2 is being cutPinch point 2 is used for tensioning

Pinch point 2

Segment 2

Fig. 4: The use of multiple pinch points for tensioning.

drastically reduces the cutting accuracy. To overcome theseobstacles, we add a fixed pinch point in each joint area toavoid distortion. This approach is feasible as it is done in thesetup phase, which can be arranged before the cutting process.

To further increase the efficiency by using pinch points,we proceed a divide and conquer approach. In other words,if the contour is divided into N segments, we find a set ofN different pinch points. Because we have one gripper, onlyone pinch point is used for tensioning. We also assume thatthe gripper while moving to a different pinch point does notaffect the deformable sheet. This assumption is reasonable insurgical tasks where the deformable material is not too soft. InFig. 4, for example, we are cutting segment i by using a pinchpoint i. After the segment i is cut, the gripper selects a pinchpoint j to start cutting segment j. This approach indicates thatwe need to find the best pinch point in each segment, which wecall this process the local search. Previous work [12] finds thebest pinch point for all the segments, which is an intractabletask.

A local search process for segment i involves two steps:1) finding a set of candidate pinch points for segment i and2) selecting the best pinch point among the candidates. Fig.5 describes the process of finding a set of candidate pinchpoints for a specific segment S. Initially, we define a distancethreshold d > 0, and then we find a set of candidate pointsaround the segment based on d. Specifically, we select a pointp as a candidate point if it satisfies the following equation:

mini∈S||p− ci|| < d,

where ||p−ci|| denotes the distance between two points p andci, and ci is a point in the segment S. After this step, we havea set of candidates A. We take M candidates randomly fromA and put them in an empty set B. After that, we remove allcandidates that are direct neighbors in B to form a set C, asshown in Fig. 5. The next step is to use the TRPO algorithmto create a tensioning policy for each candidate in C.

Fig. 6 summarizes the workflow of our Multi-point DeepRL Tensioning method (MDRLT) as follows:

• Problem definition: A complex contour is defined in thedeformable material.

• Setup phase: This phase involves dividing the contour intomultiple segments, finding cutting order between thesesegments, and finally adding fixed pinch points in jointareas.

• Local search: As described earlier, the goal of this phaseis to find a set of candidate tensioning pinch points ineach segment.

• Evaluation phase: This phase combines each candidatepinch point in each segment to evaluate the accuracywhile cutting the whole contour.

• Final phase: The best combination of candidate pinchpoints is selected together with fixed pinch points in jointareas. This phase terminates our algorithm.

IV. PERFORMANCE EVALUATION

A. Simulation settings

In this section, we use the simulator described in [12] withthe following parameter settings: g(t) = −2500, α = 0.99,δ = 0.008, and τ = 1. The threshold d equals to 100.The maximum number of candidate pinch points in eachsegment is M = 20 if the number of segments equals to 2and M = 10 if the number of segments is greater than 2.The algorithms to divide the contour into different segmentsare based on the mechanical constraints of the dVRK anddeveloped in the simulator. To find the best cutting orderamong different segments, we use the exhaustive search to findthe order that provides the highest accuracy. Each tensioningpolicy is trained with TRPO in 20 iterations, a batch sizeof 500, a step size of 0.01, and a discount factor of 1. Weuse the implementation of the TRPO algorithm as in [40].The cutting accuracy is also defined in the simulator, whichis the symmetric difference between the ideal contour withthe actual contour cut. The cutting reliability is measuredby calculating the standard deviation while evaluating cuttingaccuracy. Finally, the simulator is significantly modified tosupport local search.

SegmentTrajectory

d

Candidate pinch point

CandidatePinch point i

CandidatePinch point j

Direct neightborsare removed

Selecting candidate pinch points Pruning candidate pinch points

Fig. 5: Finding candidate pinch points of a segment.

A complex contouris defined

Divide the contourinto multiple segments

Local searchFor each segment: 1. Find candidate pinch points

2. Train a tensioning policy for each candidate pinch point

Find cutting order

Setup phase

START

Evaluation phaseSelect a candidate pinch point in each segment

to measure its cutting accuracy

Add fixed pinch pointsin joint areas

1

2

…

Compare accuracy and select the best combination

Found bestpinch points END

Fig. 6: A workflow of the MDRLT method.

B. Accuracy performance

To compare the cutting accuracy between different al-gorithms, we select 14 complicated multi-segment contoursdescribed in [12], as shown in Fig. 7. The black dots representthe fixed pinch points that are used in the setup phase. The reddots represent the best tensioning pinch points found by thelocal search. We compare the cutting accuracy and reliabilitybetween six algorithms (the first four algorithms are based in[12]):

FIG. A FIG. B FIG. C FIG. D FIG. E

FIG. F FIG. G FIG. H FIG. I FIG. J

FIG. K FIG. L FIG. M FIG. N

Fig. 7: A testbed of 14 different open and closed contours.

SFP STP SDRLT MDRLT-1 MDRLT-20

10

20

30

40

50

60

70

80

90

100

Mea

n re

lativ

e im

prov

emen

t ove

r NTB

(%)

Fig. 8: The mean of relative percentage improvement overNTB of five algorithms.

• Non-Tensioned Baseline (NTB): We only use the scissorto cut the contour without using the gripper.

• Single Fixed Pinch point without tensioning (SFP): Asingle fixed pinch point is used without tensioning.

• Single Tensioning Pinch point (STP): A single tensioningpinch point is used.

• SDRLT: A single tensioning pinch point is used. We usethe TRPO to find the tensioning policy for the pinchpoint.

• MDRLT-1: The proposed algorithm without using thefixed pinch points in joint areas.

• MDRLT-2: The proposed algorithm using both fixed pinch

SFP STP SDRLT MDRLT-1 MDRLT-20

5

10

15

20

Abso

lute

erro

r (%

)

Fig. 9: The reliability comparison of five algorithms.

points and tensioning pinch points.

We evaluate the proposed algorithms (MDRLT-1 andMDRLT-2) in 10 simulated trials for each figure in the testbed.Table I presents the raw values of the symmetric differencebetween the ideal contour and the actual contour cut in thisevaluation. Fig. 8 shows the mean of relative percentageimprovement in symmetric difference over the NTB methodof five different algorithms. We see that the use of fixed pinchpoints during the setup phase determines the cutting accuracy.Therefore, MDRLT-1 is not better than SDRLT but MDRLT-2significantly outperforms SDRLT, which is the state-of-the-artmethod in surgical pattern cutting.

Finally, Fig. 9 shows the absolute error while evaluating

TABLE I: Raw values of the accuracy test using MDRLT-1 and MDRLT-2. Each figure is cut in 10 times to measure thesymmetric difference between the ideal contour and the actual contour cut.

Contour Algorithm Eval 1 Eval 2 Eval 3 Eval 4 Eval 5 Eval 6 Eval 7 Eval 8 Eval 9 Eval 10 Mean

Figure A MDRLT-1 29 31 46 34 50 31 40 38 40 39 37.8

MDRLT-2 24 37 35 32 33 45 37 30 37 27 33.7

Figure B MDRLT-1 33 67 55 53 50 60 42 36 33 54 48.3

MDRLT-2 27 39 42 79 27 26 23 25 38 28 35.4

Figure C MDRLT-1 36 44 44 42 47 34 50 43 45 53 43.8

MDRLT-2 39 30 26 34 26 35 33 32 39 33 32.7

Figure D MDRLT-1 144 137 130 139 136 136 137 129 129 133 135

MDRLT-2 40 34 42 36 45 32 36 44 39 25 37.3

Figure E MDRLT-1 65 65 59 55 64 52 51 74 62 56 60.3

MDRLT-2 38 44 34 51 34 39 35 36 36 36 38.3

Figure F MDRLT-1 58 70 49 44 42 53 53 33 37 35 47.4

MDRLT-2 36 38 34 28 35 28 25 28 40 27 31.9

Figure G MDRLT-1 38 41 36 44 45 33 48 48 38 42 41.3

MDRLT-2 19 25 22 23 22 22 26 25 29 24 23.7

Figure H MDRLT-1 43 18 16 46 39 28 25 23 28 18 28.4

MDRLT-2 17 17 20 20 13 24 17 16 17 21 18.2

Figure I MDRLT-1 78 62 73 62 51 55 51 74 76 77 65.9

MDRLT-2 37 40 42 44 38 43 39 52 39 46 42

Figure J MDRLT-1 19 18 13 12 20 19 21 18 19 19 17.8

MDRLT-2 25 26 21 22 21 21 21 21 21 21 22

Figure K MDRLT-1 82 104 79 87 90 89 76 78 74 59 81.8

MDRLT-2 48 47 39 45 56 57 49 81 42 43 50.7

Figure L MDRLT-1 35 31 37 32 41 32 35 32 33 33 34.1

MDRLT-2 14 15 18 12 12 15 15 14 15 20 15

Figure M MDRLT-1 21 24 20 21 21 25 23 18 17 23 21.3

MDRLT-2 14 15 14 13 19 15 14 17 17 14 15.2

Figure N MDRLT-1 38 34 102 25 34 26 28 32 18 56 39.3

MDRLT-2 23 19 20 27 18 26 26 30 23 23 23.5

the cutting accuracy in 10 trials. This metrics represents thereliability of the proposed methods. Among three algorithmsusing deep RL, MDRLT-2 provides the highest reliability asit has the lowest value of absolute error.

V. CONCLUSION

This paper introduces an interesting multi-point approachbased on deep reinforcement learning for the surgical soft tis-sue cutting task that is meaningful in both practical perspective

and theoretical analysis. In the theoretical analysis, the paperbenchmarks the accuracy of the use of multiple pinch pointsduring the course of cutting, which is the first study accordingto our best knowledge. The study also concludes that the useof fixed pinch points in joint areas is the key to significantlyoutperform the state-of-the-art cutting method with respect toaccuracy and reliability.

The proposed approach becomes a normative workflow toensure the safety in the surgical pattern cutting task. Moreover,

it can be applied to a diversity of future research such as theuse of multiple grippers or multiple scissors in surgical tasks,multi-layer pattern cutting in 3D space, or 3D multi-segmentcontours.

REFERENCES

[1] A. Murali, S. Sen, B. Kehoe, A. Garg, S. McFarland, S. Patil,W. D. Boyd, S. Lim, P. Abbeel, and K. Goldberg, “Learning byobservation for surgical subtasks: multilateral cutting of 3d viscoelasticand 2d orthotropic tissue phantoms,” in International Conference onRobotics and Automation (ICRA), pp. 1202–1209, 2015.

[2] H.-W. Nienhuys and A. F. Van der Stappen, “A surgery simulation sup-porting cuts and finite element deformation,” in International Conferenceon Medical Image Computing & Computer Assisted Intervention, 2001.

[3] K. A. Nichols and A. M. Okamura, “Methods to segment hard inclusionsin soft tissue during autonomous robotic palpation,” IEEE Transactionson Robotics, vol. 31, no. 2, pp. 344–354, 2015.

[4] N. Haouchine, S. Cotin, I. Peterlik, J. Dequidt, M. S. Lopez, E. Kerrien,and M. O. Berger, “Impact of soft tissue heterogeneity on augmented re-ality for liver surgery,” IEEE Transactions on Visualization & ComputerGraphics, vol. 1, 2015.

[5] T. T. Nguyen, N. D. Nguyen, F. Bello, and S. Nahavandi, “A NewTensioning Method using Deep Reinforcement Learning for SurgicalPattern Cutting,” arXiv preprint arXiv:1901.03327, 2019.

[6] R. A. Fisher, P. Dasgupta, A. Mottrie, A. Volpe, M. S. Khan, B. Challa-combe, and K. Ahmed, “An overview of robot assisted surgery curriculaand the status of their validation,” International Journal of Surgery,vol. 13, pp. 115–123, 2015.

[7] G. Dulan, R. V. Rege, D. C. Hogg, K. M. Gilberg-Fisher, N. A. Arain,S. T. Tesfay, and D. J. Scott, “Developing a comprehensive, proficiency-based training program for robotic surgery,” Surgery, vol. 152, no. 3,pp. 477–488, 2012.

[8] A. P. Stegemann, K. Ahmed, J. R. Syed, S. Rehman, K. Ghani,R. Autorino, et al., “Fundamental skills of robotic surgery: a multi-institutional randomized controlled trial for validation of a simulation-based curriculum,” Urology, vol. 81, no. 4, pp. 767–774, 2013.

[9] A. Shademan, R. S. Decker, J. D. Opfermann, S. Leonard, A. Krieger,and P. C. Kim, “Supervised autonomous robotic soft tissue surgery,”Science Translational Medicine, vol. 8, no. 337, 2016.

[10] K. Tai, A. R. El-Sayed, M. Shahriari, M. Biglarbegian, and S. Mahmud,“State of the art robotic grippers and applications,” Robotics, vol. 5,no. 2, 2016.

[11] G. Rateni, M. Cianchetti, G. Ciuti, A. Menciassi, and C. Laschi, “Designand development of a soft robotic gripper for manipulation in minimallyinvasive surgery: a proof of concept,” Meccanica, vol. 50, no. 11,pp. 2855–2863, 2015.

[12] B. Thananjeyan, A. Garg, S. Krishnan, C. Chen, L. Miller, and K. Gold-berg, “Multilateral surgical pattern cutting in 2D orthotropic gauze withdeep reinforcement learning policies for tensioning,” in InternationalConference on Robotics and Automation (ICRA), pp. 2371–2378, 2017.

[13] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trustregion policy optimization,” in International Conference on MachineLearning, pp. 1889–1897, 2015.

[14] P. Kazanzides, Z. Chen, A. Deguet, G. S. Fischer, R. H. Taylor, andS. P. DiMaio, “An open-source research kit for the da Vinci SurgicalSystem,” in International Conference on Robotics and Automation(ICRA), pp. 6434–6439, 2014.

[15] N. R. Crawford, S. Cicchini, and N. Johnson, “Surgical robotic au-tomation with tracking markers, U.S. Patent Application No. 15/609,334,2017.

[16] G. P. Moustris, S. C. Hiridis, K. M. Deliparaschos, and K. M. Konstan-tinidis, “Evolution of autonomous and semiautonomous robotic surgicalsystems: a review of the literature,” The International Journal of MedicalRobotics and Computer Assisted Surgery, vo. 7, no. 4, pp. 375–392.

[17] J. Van Den Berg, S. Miller, D. Duckworth, H. Hu, A. Wan, X. Y. Fu,et al., “Superhuman performance of surgical tasks by robots usingiterative learning from human-guided demonstrations,” in InternationalConference on Robotics and Automation (ICRA), pp. 2074–2081, 2010.

[18] J. M. Prendergast and M. E. Rentschler, “Towards autonomous motioncontrol in minimally invasive robotic surgery,” Expert Review of MedicalDevices, vol. 13, no. 8, pp. 741–748, 2016.

[19] D. T. Nguyen, C. Song, Z. Qian, S. V. Krishnamurthy, E. J. Colbert,and P. McDaniel, “IoTSan: fortifying the safety of IoT Systems,” arXivpreprint arXiv:1810.09551, 2018.

[20] C. Staub, T. Osa, A. Knoll, and R. Bauernschmitt, “Automation oftissue piercing using circular needles and vision guidance for computeraided laparoscopic surgery,” in International Conference on Roboticsand Automation (ICRA), pp. 4585–4590, 2010.

[21] S. Sen, A. Garg, D. V. Gealy, S. McKinley, Y. Jen, and K. Goldberg,“Automating multi-throw multilateral surgical suturing with a mechan-ical needle guide and sequential convex optimization,” in InternationalConference on Robotics and Automation (ICRA), pp. 4178–4185, 2016.

[22] J. Schulman, A. Gupta, S. Venkatesan, M. Tayson-Frederick, andP. Abbeel, “A case study of trajectory transfer through non-rigid reg-istration for a simplified suturing scenario,” in International Conferenceon Intelligent Robots and Systems (IROS), pp. 4111–4117, 2013.

[23] T. Osa, N. Sugita, and M. Mitsuishi, “Online trajectory planning andforce control for automation of surgical tasks,” IEEE Transactions onAutomation Science and Engineering, vol. 15, no. 2, pp. 675–691, 2018.

[24] T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep ReinforcementLearning for Multi-Agent Systems: A Review of Challenges, Solutionsand Applications,” arXiv preprint arXiv:1812.11794, 2019.

[25] T. T. Nguyen, “A Multi-Objective Deep Reinforcement Learning Frame-work,” arXiv preprint arXiv:1803.02965, 2018.

[26] N. D. Nguyen, T. Nguyen, and S. Nahavandi, “System design perspectivefor human-level agents using deep reinforcement learning: A survey.IEEE Access, vol. 5, pp. 27091-27102, 2017.

[27] N. D. Nguyen, S. Nahavandi, and T. Nguyen, “A human mixedstrategy approach to deep reinforcement learning,” arXiv preprintarXiv:1804.01874, 2017.

[28] T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Multi-agentdeep reinforcement learning with human strategies,” arXiv preprintarXiv:1806.04562, 2018.

[29] S. Mahadevan and J. Connell, “Automatic programming of behavior-based robots using reinforcement learning,” Artificial Intelligence,vol. 55, no. 2–3, pp. 311–365, 1992.

[30] S. Schaal, “Learning from demonstration,” in Advance in Neural Infor-mation Processing Systems, 1997, pp. 1040–1046.

[31] A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger,and E. Liang, “Autonomous inverted helicopter flight via reinforcementlearning,” Experimental Robotics IX, pp. 363–372, 2006.

[32] M. Riedmiller, T. Gabel, R. Hafner, and S. Lange, “Reinforcementlearning for robot soccer,” Journal of Autonomous Robots, vol. 27, no. 1,pp. 55–73, 2009.

[33] K. Mulling, J. Kober, O. Kroemer, and J. Peters, “Learning to selectand generalize striking movements in robot table tennis,” InternationalJournal of Robotics Research, vol. 32, no. 3, pp. 263–279, 2013.

[34] Z. Du, W. Wang, Z. Yan, W. Dong, and W. Wang, “Variable admittancecontrol based on fuzzy reinforcement learning for minimally invasivesurgery manipulator,” Sensors, vol. 17, no. 4, 2017.

[35] B. Yu, A. T. Tibebu, D. Stoyanov, S. Giannarou, J. H. Metzen, andE. Vander Poorten, “Surgical robotics beyond enhanced dexterity in-strumentation: a survey of machine learning techniques and their rolein intelligent and autonomous surgical actions,” International Journal ofComputer Assisted Radiology and Surgery, vol. 11, no. 4, pp.553–568,2016.

[36] J. Chen, H. Y. Lau, W. Xu, and H. Ren, “Towards transferring skillsto flexible surgical robots with programming by demonstration andreinforcement learning,” in International Conference on Advanced Com-putational Intelligence (ICACI), pp. 378–384, 2016.

[37] A. Billard, S. Calinon, R. Dillmann, and S. Schaal, “Robot programmingby demonstration,” Springer handbook of robotics, pp. 1371–1394, 2008.

[38] V. Mnih et al., “Human-level control through deep reinforcement learn-ing,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.

[39] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.Cambridge, MA: The MIT Press, 2012.

[40] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel, “Bench-marking deep reinforcement learning for continuous control,” in Inter-national Conference on Machine Learning, pp. 1329–1338, 2016.

http://arxiv.org/abs/1901.03327






Manipulating Soft Tissues by Deep Reinforcement Learning ...Manipulating Soft Tissues by Deep...

Documents

Transcript of Manipulating Soft Tissues by Deep Reinforcement Learning ...Manipulating Soft Tissues by Deep...