A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Transcript of A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 2: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

A Reinforcement Learning Approach to SolvingHybrid Flexible Flowline Scheduling Problems

Bert Van Vreckem Dmitriy Borodin Wim De Bruyn AnnNowe

Page 3: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Authors

• Bert Van Vreckem, HoGent Business and [email protected]

• Dmitriy Borodin, [email protected]

• Wim De Bruyn, HoGent Business and [email protected]

• Ann Nowe, Artificial Intelligence Lab, Vrije Universiteit [email protected]

HFFSP MISTA2013: 29 August 2013 3/28

Page 4: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Contents

1 Hybrid Flexible Flowline Scheduling Problems

2 A Machine Learning Approach

3 Learning Permutations with Precedence Constraints

4 Experiments & results

5 Conclusion

Hybrid Flexible Flowline Scheduling Problems

Powerful model for complex real-life production schedulingproblems.In α/β/γ notation1:

HFFLm, ((RM(i))

(m)i=1/Mj , rm, prec, Siljk, Ailjk, lag/Cmax

Flowline Scheduling problems: jobs processed in consecutive stages.

Stage 1 Stage 2 Stage 3 Stage 4

1(Urlings, 2010)

Page 6: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Powerful model for complex real-life production schedulingproblems.In α/β/γ notation1:

HFFLm, ((RM(i))

(m)i=1/Mj , rm, prec, Siljk, Ailjk, lag/Cmax

Flowline Scheduling problems: jobs processed in consecutive stages.

Stage 1 Stage 2 Stage 3 Stage 4

1(Urlings, 2010)

Page 7: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Hybrid case: unrelated parallel machines

M11

M12

M13

M21

M22

M31

M32

M33

M34

M41

M42

Page 8: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Flexible case: stages may be skipped

M11

M12

M13

M21

M22

M41

M42

Page 9: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Other constraints: Machine eligibility

M11

M13

M21

M22

M31

M33

M42

Page 10: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Other constraints: Time lag between stages

Stage 1

Stage 2

Stage 3

Stage 4

Page 11: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Other constraints: Sequence dependent setup times

1 2 3 4 5 6 7 8 9 10 11 12

J1 J2M1

J1 J2M2

J2 J1M1

J2 J1M2

Page 12: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

1 2 3 4 5 6 7 8 9 10 11 12

J1 J2M1

J1 J2M2

J2 J1M1

J2 J1M2

Page 13: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

1 2 3 4 5 6 7 8 9 10 11 12

J1 J2M1

J1 J2M2

J2 J1M1

J2 J1M2

Page 14: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Other constraints: Precendence relations between jobs

1 2 3 4 5 6 7 8 9 10 11 12

J1 J2M1

J1 J2M2

J2 J1M1

J2 J1M2

Page 15: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Precedence relations between jobs make the problem muchharder, in a way that MILP/CPLEX approach doesn’t workanymore for larger instances (Urlings, 2010)

Page 16: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Contents

5 Conclusion

Page 17: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems

Two stages:

• Job permutations

→ Learning Automata

• Machine assignment

→ Earliest Preparation Next Stage(EPNS) (Urlings, 2010)

Page 18: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Two stages:

• Job permutations → Learning Automata

• Machine assignment

→ Earliest Preparation Next Stage(EPNS) (Urlings, 2010)

Page 19: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Two stages:

• Machine assignment → Earliest Preparation Next Stage(EPNS) (Urlings, 2010)

Page 20: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Two stages:

• Machine assignment → Earliest Preparation Next Stage(EPNS) (Urlings, 2010)

Page 21: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Reinforcement learningAt every discrete time step t:

• Agent percieves environment state s(t)

• Agent chooses action a(t) ∈ A = a1, . . . , an according tosome policy

• Environment places agent in new state s(t+ 1) and givesreinforcement r(t)

• Goal: learn policy that maximizes long term cumulativereward

∑t r(t)

Environment

Agent

s

r

a

Page 22: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Learning Automata (LA)

Reinforcement Learning agents that choose action according toprobability distribution p(t) = (p1(t), . . . , pn(t)), withpi = Prob[a(t) = ai] and s.t.

∑ni=1 pi = 1

pi(0) = 1n (1)

pi(t+ 1) = pi(t) +αrewr(t)(1− pi(t))−αpen(1− r(t))pi(t) (2)

if ai is the action taken at instant t

pj(t+ 1) = pj(t) −αrewr(t)pj(t)

+αpen(1− r(t))(

1

n− 1− pj(t)

)(3)

if aj 6= ai

Page 23: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

∑ni=1 pi = 1

pi(0) = 1n (1)

+αpen(1− r(t))(

1

n− 1− pj(t)

)(3)

if aj 6= ai

Page 24: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

∑ni=1 pi = 1

pi(0) = 1n (1)

+αpen(1− r(t))(

1

n− 1− pj(t)

)(3)

if aj 6= ai

Page 25: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Learning Automaton update

1 2 3 40

0.2

0.4

0.6

0.8

1

i

pi

E.g. action 3 was chosen

1 2 3 40

0.2

0.4

0.6

0.8

1

r(t) = 1

pi

1 2 3 40

0.2

0.4

0.6

0.8

1

r(t) = 0

pi

Page 26: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

1 2 3 40

0.2

0.4

0.6

0.8

1

i

pi

1 2 3 40

0.2

0.4

0.6

0.8

1

r(t) = 1

pi

1 2 3 40

0.2

0.4

0.6

0.8

1

r(t) = 0

pi

Page 27: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

1 2 3 40

0.2

0.4

0.6

0.8

1

i

pi

1 2 3 40

0.2

0.4

0.6

0.8

1

r(t) = 1

pi

1 2 3 40

0.2

0.4

0.6

0.8

1

r(t) = 0

pi

Page 28: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

1 2 3 40

0.2

0.4

0.6

0.8

1

i

pi

1 2 3 40

0.2

0.4

0.6

0.8

1

r(t) = 1

pi

1 2 3 40

0.2

0.4

0.6

0.8

1

r(t) = 0

pi

Page 29: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Contents

5 Conclusion

Page 30: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)

• A LA is assigned to every position of a permutation

• LAs play a dispersion game to choose unique action, resultingin a permutation

• Quality of solution is evaluated

• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):

• Better result than best one so far: r(t) = 1• If not, r(t) = 0

• Repeat until convergence

Page 31: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 32: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 33: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 34: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

• Better result than best one so far: r(t) = 1

• If not, r(t) = 0

Page 35: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 36: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 37: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Probabilistic Basic Simple Strategy (PBSS)

• PBSS: great results in several optimization problems thatinvolve learning permutations

• but doesn’t work well when precedence constraints areinvolved

• PBSS only learns from positive experience (i.e. improving onprevious solutions)

• Doesn’t learn to avoid invalid permutations

Page 38: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 39: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 40: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 41: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Extending PBSS for precendence constraints

Updating probabilities:

• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.

• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:

• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest

2ms ;• no valid schedule found: r(t) = 0;

Page 42: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 43: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

• improved: r(t) = 1;

• equally good: r(t) = 1/2;• worse: r(t) = msbest

Page 44: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

• improved: r(t) = 1;• equally good: r(t) = 1/2;

• worse: r(t) = msbest2ms ;

• no valid schedule found: r(t) = 0;

Page 45: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

2ms ;

• no valid schedule found: r(t) = 0;

Page 46: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Page 47: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Contents

5 Conclusion

Page 48: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

Experiments

• HFFSP Benchmark problems from (Ruiz et al., 2008)2

• problem sets with 5, 7, 9, 11, 13, 15 jobs, 96 instances in eachset

• + other constraints that make problems harder (precedencerelations!)

• αrew = 0.1; αpen = 0.5 (no tuning)

• Run until converges, or at most 300 seconds

2Available at http://soa.iti.es/problem-instances

http://soa.iti.es/problem-instances

Page 49: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

ResultsInstance set 5 7 9 11 13 15 overallmean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34# improved 11 12 18 12 9 6 68# equal 62 40 19 18 8 7 154# worse 23 44 59 66 79 82 354

Page 50: A Reinforcement Learning Approach for Hybrid Flexible Flowline Scheduling Problems

ResultsInstance set 5 7 9 11 13 15 overallmean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34# improved 11 12 18 12 9 6 68# equal 62 40 19 18 8 7 154# worse 23 44 59 66 79 82 354