A POMDP framework to nd optimal policy in sustainable ...

18

Transcript of A POMDP framework to nd optimal policy in sustainable ...

Page 1: A POMDP framework to nd optimal policy in sustainable ...

Scientia Iranica E (2020) 27(3), 1544{1561

Sharif University of TechnologyScientia Iranica

Transactions E: Industrial Engineeringhttp://scientiairanica.sharif.edu

A POMDP framework to �nd optimal policy insustainable maintenance

R. Ghandali, M.H. Abooie�, and M.S. Fallah Nezhad

Department of Industrial Engineering, Yazd University, Yazd, Iran.

Received 31 October 2017; received in revised form 1 September 2018; accepted 27 October 2018

KEYWORDSCondition basedmaintenance;Sustainability;Partially observableMarkov decisionprocess;Stochasticallydeteriorating systems;Incremental pruning;Accelerated vectorpruning;Perseus.

Abstract. The increasing importance of maintenance and a cleaner environmentbesides the relations between them has encouraged the current authors to investigatea mathematical Markovian model for the condition-based maintenance problem whileconsidering environmental e�ects. In this paper, the problem of proposing a maintenanceoptimal policy for a partially observable, stochastically deteriorating system is studiedin order to maximize the average pro�t of the system with sustainability aspects. Themodeling of this Condition-Based Sustainable Maintenance (CBSM) problem is done bymathematical methods such as Partially Observable Markov Decision Process (POMDP)and Bayesian theory. A new exact method, called accelerated vector pruning method, andother popular estimating and exact methods are applied and compared for solving thepresented CBSM model, and several managerial conclusions are obtained.

© 2020 Sharif University of Technology. All rights reserved.

1. Introduction

In this section, �rst, the signi�cance of maintenanceand, then, its relation to environmental considerationsand sustainability concept will be described.

Nowadays, the increasing importance of mainte-nance to all industries is undeniable [1,2], and it isobvious that the magnitude of this importance variesfor various systems. The priorities of all parts arenot equal in one given system, and selecting a propermaintenance technique for each component has becomean important topic since the latter half of the 1980s.For this reason, maintenance strategies have beendeveloped. A maintenance strategy in a system forms

*. Corresponding author. Tel.: +98 3531232414;Fax: +98 3538211792E-mail addresses: [email protected] (R. Ghandali);[email protected] (M.H. Abooie)

doi: 10.24200/sci.2018.5490.1306

a framework to install speci�c maintenance techniquesfor each part [3]. Reliability Centered Maintenance(RCM), Total Productive Maintenance (TPM), andRisk-based Maintenance (RBM) are the most impor-tant examples of maintenance strategies.

Some recent studies on RCM, TPM, and RBMstrategies, as well as their development and implemen-tation, can be found in [4{6].

The speci�c based maintenance techniques enjoyunique principles for solving maintenance problems [7].Time-based Maintenance (TBM) and Condition-basedMaintenance (CBM) are the most important mainte-nance techniques.

When the reliability concept emerged in 1980,Preventive Maintenance (PM) and TBM techniquewere introduced, respectively. For TBM, after aspeci�c period of operations (or time), the failure rateof products increases based on a function called thebathtub curve.

However, the deterioration rate of a system de-pends on not only the operation time duration but

Page 2: A POMDP framework to nd optimal policy in sustainable ...

R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561 1545

also various other factors and operational conditions.Therefore, the real condition of the system cannotbe identi�ed by the TBM strategy. Moreover, thisstrategy sometimes imposes unnecessary treatments,which often disrupt normal operations [8]. A perfectsurvey of such models is given in [7].

Since the TBM limitations were known in 1975,the CBM technique was developed to maximize PMe�ciency. CBM as a predictive method is the mostmodern and popular maintenance technique discussedin [7,9{11]. Authors [7] expressed that CBM applica-tion was more bene�cial and realistic than traditionalTBM approach. Therefore, CBM is usually appliedto high-priority components based on their healthcondition.

Data acquisition is introduced as the �rst mainstep in the case of CBM [12], which requires monitoringthe system's health condition. CBM estimates thehealth condition of equipment by real-time data re-ceived from condition-monitoring tools to make main-tenance decisions and reduce unnecessary maintenanceand related costs [13].

Since the application of this technique includ-ing the task of providing monitoring tools and dataprocessing tools is not cost e�ective, it can be con-cluded that accurate and integrated planning should beconsidered necessary in CBM planning. Consideringmore than one objective and integrating them intoan optimization mathematical model is more desirableand realistic from a decision-maker's point of view;therefore, a multi-objective optimization model can beuseful to formulate a CBM program.

Ecological and industrial objectives, which havebeen considered as antagonists many times, are nowtaken into account simultaneously in the sustainabilityconcept to support social, economic, and environmentalpillars [14]. Balancing environmental and societalobjects with �nancial success is no longer a matter ofchoice, but a must [15].

Sustainable development in general is de�ned as\development that meets the needs of the presentwithout compromising the ability of future genera-tions to meet their own needs" [16]. With planningsystems characterized by low negative impact on theenvironment, people can participate in sustainabledevelopment e�ectively [17].

Recently, many researchers in various �elds haveturned an attentive eye to the concept of sustainability.For instance, authors on supply chain design [18] andon electric power systems [19,20] attempted to includesustainability considerations into their studies.

In the maintenance concept as the main partof industry, environmental considerations can a�ect aproblem, too. Ignoring the deterioration process of apart can lead to the failure of other related perfectparts and cause di�erent damages to humans and the

environment (such as waste and CO2 emission due tothe use of old, outdated or ill-maintained equipment)besides the economic losses (such as heavy emergencyreplacement costs related to unexpected breakdowns).Moreover, over-maintenance by spending greater costs(more maintenance-related costs such as employmentcost of maintenance workers and more part replacementcosts), generating more waste (more useless replacedparts), and increasing error probability can be a riskfactor in the production system and the surroundingenvironment. Therefore, maintenance decisions shouldbe optimized to achieve economic, social, and environ-mental goals in the best feasible way.

In manufacturing businesses, sustainable mainte-nance can be de�ned as the maintenance that satis�esthe objectives of a system by considering sustainabilitygoals and constraints such as those that are related toCO2 emission, defective and non-recyclable products,useless replaced parts, scrapped items, toxic materialemission (such as cleaning material utilized for repairwork), and other factors that a�ect the environment.

Sustainability considerations are often takeninto account in manufacturing processes; however,maintenance-oriented studies with sustainability con-siderations are still in the early stages [21].

In the rest of this section, papers in the literaturerelated to sustainable maintenance will be divided intotwo main categories: conceptual and mathematicalstudies. Then, mathematical studies in the mainte-nance �eld will be divided into two groups: TBM-related and CBM-related studies.

In the conceptual studies category, the followingapproaches were expressed in [22] that help companiesimprove their sustainability performance: minimizingwaste production, using the best practice in mainte-nance, and producing process, improving the usageof metalworking uids, lubricating oils and hydraulicsoils, and so on. In [23], maintenance performanceand maintenance quality and its impact on sustain-ability performance were studied. They concludedthat maintenance processes had signi�cant impacts onthe businesses in terms of economic, environmental,and societal implications of the commercial activities.In [24], de�ning key performance measures for sustain-able maintenance at operational and strategic levels(15 measures at the corporate level, 20 measures atthe tactical level, and 43 measures at the functionallevel) was proposed. In [25], a sustainability-basedframework for the maintenance strategy selection prob-lem was studied. A fuzzy method was used to selectthe most appropriate maintenance strategy. However,the optimization model for one speci�ed maintenancetechnique was not proposed.

Moreover, in a recent study [26], a comprehensivede�nition of sustainable maintenance was proposed inwhich several simple and complicated aspects such as

Page 3: A POMDP framework to nd optimal policy in sustainable ...

1546 R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561

economics, society, and environment besides manufac-turing, overhauling, assembly, and ecology should beconsidered when maintenance actions were performed.

Conceptual studies mentioned above generally donot provide any mathematical programming or opti-mization modeling; however, basic concepts have beenexpressed in detail. In the following, the mathematicalstudies will be outlined in two groups: TBM-relatedand CBM-related studies.

Studies in the TBM-related group include sus-tainable maintenance modeling considering degrada-tion process as a function of the system's lifetime,production rates, or so on. Some of these studies donein recent years will be introduced in the following.

In [27], a production-maintenance mathematicalmodel for a deteriorating environmental-friendly man-ufacturing system was proposed that invests in R&Dto make this productive technology more sustainableand avoid government-related taxation. Productionrate, maintenance rate, and pollution R&D investmentrate as decision variables were determined in the afore-mentioned model to minimize the total cost includinginventory costs, production costs, emission tax, andpollution R&D investment costs.

Another optimization model for a manufacturingsystem under the environmental constraint was pro-posed in [28], in which production, TBM, and emissionrates were determined as decision variables to minimizetotal cost. It was claimed that, among the relatedprevious studies, the e�ect of system deteriorationon the emission was addressed only in their paper;therefore, maintenance activities were considered indetail to reduce the e�ects of degradation.

As a related work, in [29], an ecological produc-tion and TBM policy for a deteriorating manufacturingsystem under carbon tax was proposed. The systemsubcontracts satisfy random demand and decrease thecarbon tax to another deteriorating system. Theobjective is to determine the optimal number of PMtasks and production rates in order to minimize totalmaintenance, production, and carbon emission costs.They did not consider the environment (reducing theamount of carbon emission) and merely focused onminimizing the penalty costs related to their ownsystem. In fact, the whole system can release the samecarbon amount because the subcontractor generatescarbon emission, too.

In [30], spare parts concept was added to previousenvironment-oriented production-maintenance models.In fact, in their study, environmental e�ects of applyingnew or used spare parts during corrective or PM wereconsidered through minimizing carbon emissions andglobal cost of spare parts' lifetime.

Moreover, in this �eld, in [31], a mathematicalframework was used to propose an optimal productionand TBM strategy that determines both maintenance

and production elements (number of maintenance ac-tivities and production rate) in order to minimize to-tal production, maintenance, inventory, and pollution(CO2 related) costs for a deteriorating manufacturingsystem.

Finally, in another recent work, in [32], a sustain-able TBM optimization model was proposed in whichsocial costs (social impacts of injury caused by failures)and environmental and conventional maintenance costswere minimized. Moreover, in addition to the wasteand emission generation, spare parts concept wasconsidered in terms of sustainability.

The papers mentioned above all belong to the �rstcategory, i.e., TBM-related studies. Whilst respectingthe usefulness of such studies, CBSM can be morevaluable if it is applicable to the system because ofthe CBM advantages discussed before and those to bepresented in the following.

Considering the fact that one of the maintenanceoperations to provide a sustainable performance ofequipment is the optimization of technical conditionswith respect to plant performance target [33], thecondition optimization with CBM has a key role inimplementing sustainable maintenance.

In addition, CBM is the most appropriate methodto pursue when it is technically possible because itallows monitoring the degradation level and the result-ing environmental damage in order to take appropriatepreventive actions and limit the risk of penalties [34].

Despite the recommendations of previous review-ers for performing CBM studies considering sustain-ability factors, only few such studies have been pub-lished in recent years, some of which will be presentedin the group of CBM-related studies.

A CBM model for cost minimization was pre-sented in [35], where the objective function was aimedat reducing the replacement, maintenance, and en-vironmental penalty costs; in addition, the decisionvariable was only the inspection time. In the mentionedmodel, when inspection reveals a given degradationlevel (from which the environment begins to deteri-orate), the system must be replaced; otherwise, itmust be allowed to continue. The authors appliedthe Nelder-Mead method as an optimization methodto solve the problem.

Authors in [34] proposed a CMB model consid-ering two threshold levels related to the amount ofenvironmental damage: a critical one (which yields asigni�cant penalty when it exceeds the threshold level)and a lower one. Herein, the threshold levels andthe inspection sequence were considered to be decisionvariables in their work, and the total expected cost pertime unit over an in�nite time horizon was the objectivefunction.

A deteriorating manufacturing system with 0-1 states that generated harmful emissions, which

Page 4: A POMDP framework to nd optimal policy in sustainable ...

R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561 1547

subcontracted to a new plant with clean productiontechnology, was studied in [36]. The authors presenteda mathematical model for minimizing the total cost ofthe system including inventory, backlog, production,emission, and subcontracting costs to determine theproduction and subcontracting rates and emission level(no maintenance-related decision variables) while in-cluding maintenance considerations in constraints.

Further, authors in [37] added the energy con-sumption factor to the previous CBM optimizationmodels by considering sustainability. Their workincluded two aspects of sustainability: CO2 emissionand energy consumption. Their proposed objectivefunction involved the minimization of total cost, anddecision variables included optimal CO2 emission andenergy consumption thresholds besides the periodicinspection intervals.

The papers mentioned above belong to the sec-ond category (CBM-related studies), where decisionvariables are mostly related to periodic inspectionrather than the maintenance optimal policy. Moreover,the aforementioned papers have not considered theassumption that a system as partially observable canusually make the model more di�cult and closer to realworld. Further to that, other factors of sustainabilitysuch as defective and non-recyclable products, uselessreplaced parts, scrapped items, and generally toxicmaterial and waste emission are not considered.

The present paper aims to propose an optimalmaintenance policy with environmental considerationsthat optimizes both maintenance actions and non-periodic inspections in the context of partially ob-servable systems. On the importance of partiallyobservable systems, authors in a recent survey studiedCBM optimization models for stochastically deterio-rating systems and considered the popularity of CBMto be strongly dependent on the development of thestochastic deterioration models, which are properlyused for partially observable systems [38]. In [39], atwo-part study was done to highlight the advantages ofapplying Partially Observable Markov Decision Process(POMDP), as a stochastic control technique, for opti-mizing inspection and maintenance policies by applyingstochastic models. The authors proposed an optimalpolicy for the cost-minimization CBM model in civilengineering structures (structural maintenance) withdiscrete-state deterioration based on a POMDP frame-work. They used a point-based value iteration solver(Perseus) and, also, two simpler approximate solversbased on Modi�ed Dynamic Programming (MDPs) tosolve the problem. They illustrated that the Perseusmethod is more e�cient for this type of applicationsand larger state-space problems.

The present study relies on the previously re-viewed works with respect to the CBM modelingapproach and signi�cantly extends it to sustainability-

oriented manufacturing systems (not a structure or in-frastructure) by a more rigorous, systematic formationof maintenance actions in the POMDP context. Thisstudy focuses on CBSM to present an optimizationmodel that takes sustainability considerations andCBM goals simultaneously into account. The problemis modeled separately for four maintenance actionsincluding keep, regular maintenance (minor repair),overhaul (major repair), and replace, with terms re-lated to sustainability formulations. Di�erent levels ofrepair can be de�ned between minor and major repairs,as required. CO2 and waste emissions as some of themost important factors of sustainability are includedin the formulation of each action, depending on thespeci�c situations. Other assumptions and conditionsof the model will be explained in detail subsequently.

Stochastic deterioration and partial observationare two important assumptions in modeling the system;therefore, deteriorating and observation processes areboth formulated by probability matrixes that representthe stochastic nature of the system.

The model is solved in both �nite and in�nitehorizons. Perseus algorithm has been applied to solvethe model approximately; besides, the problem isexactly solved with the traditional Incremental Pruning(IP) method and Accelerated Vector Pruning (AVP)method, as the current exact and fast algorithm.The objective of this work is to propose an optimalmaintenance policy (including replacement, overhaul,regular repair, keep, and non-periodic inspections)as a sequence of decisions, executable by the agentbased on the observations of monitors in order toachieve the system's goal (pro�t maximization besidesthe sustainability considerations) in the best possibleway.

In the rest of this study, Section 2 presents thePOMDP model of the CBSM problem completely,accompanied by environmental considerations in theformulation. Section 3 explains the estimated andexact value iteration algorithms, Perseus and AVP.Section 4 reports detailed computational results andconcluding remarks.

2. CBSM statement and POMDP formulation

The CBSM problem under consideration is �rst de-scribed in Subsection 2.1; then, it is formulated in thePOMDP framework in Subsection 2.2.

2.1. Problem descriptionConsider a CBM problem that determines the opti-mal maintenance/inspection policy (including the bestsequence of actions and action times, monitoring,and non-periodic inspections types and intervals) byconsidering sustainability aspects (such as Green HouseGas (GHG) and waste emission) in order to maximize

Page 5: A POMDP framework to nd optimal policy in sustainable ...

1548 R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561

the pro�t of a partially observable, stochastically dete-riorating manufacturing system with uncertain actione�ects.

Some of the assumptions of the problem arepresented as follows:

� System state is partially observable, which is moni-tored by several monitoring tools;

� \Keep" action in each period shows that the sys-tem should continue its manufacturing without anyinterference. After \keep" action, the state of thesystem usually cannot be improved;

� In the \regular repair" action, some routine repairworks (such as resetting, lubricating, cleaning, etc.)are done for the system without stopping it. Afterthe \regular repair", the state of the system can beimproved;

� \Overhaul" action stops the manufacturing of thesystem and inspects and completely repairs it. Inthis action, the true state of the system can beobserved (probability vector is ej);

� After \replacement" action, the system state is thestate of a new system (probability vector e1);

� Deterioration of the system follows a stochasticprocess;

� Companies have many opportunities for produc-ing environment-friendly productions that lead togaining higher pro�ts from environment-sensitivecustomers [40]. For this reason, customers' demandsare considered environment sensitive and dependenton the system state in this work.

2.2. Model formulationThe CBSM problem is formulated in a POMDP frame-work in the following. In Subsection 2.2.1, inputs of themodel will be introduced and illustrated in detail. InSubsection 2.2.2, �rst, a total schema of the model,as well as its parameters and decision variable, will bepresented speci�cally and, then, each term of the modelwill be described completely.

2.2.1. Inputs� S: A �nite set of possible states of the system S =f1; :::; i; :::; j; :::; ng, in which i denotes the systemdeterioration degree (1 shows \as new" system stateand other numbers are sorted in ascending order todenote more deteriorated system states).

� X: Real state of the system whose value comes fromset S.

� � = f�1; :::; �i; :::; �ng: The probability vector ofthe primitive state. The system states are randomvariables because of the stochastic behavior of thesystem that leads the model to be closer to the prac-tical situations; moreover, since the system state

is non-observable, the exact state is not available.Note that the summation of all probabilities is equalto one.

nXi=1

�i = 1 and 0 � �i � 1 for any i:

� V (�): Optimal expected total pro�t function overin�nite horizon when the current state is �.

� Ci: Unit cost of the manufacturing process whenthe system works in state i (including raw mate-rial costs, energy consuming cost, personnel hires,rent of production place, defective products, etc.).Therefore, greater deterioration (larger i index)leads to increasing manufacturing cost. In otherwords, Ci is an increasing function of i.

� Di: Customers' demands for product manufacturedwhen the system works in state i. In this paper, it issupposed that customers' demands are environmentsensitive and dependent on the health conditionof the manufacturing system that supplies them(see [40]).

� Ri: Cost of replacement (the action of replacing thesystem with a new one) that is a non-decreasingfunction of i. Greater deterioration (larger i in-dex) leads to decreasing the monetary value ofthe dismantled system and, therefore, increasingthe payment to get the new system. The newsystem price is assumed to be constant (R), andthe monetary value of dismantled equipment (qi) isa decreasing function of i, and Ri = R� qi.

� �i: GHG emissions factor in the manufacturingsystem in tons of CO2 per unit demand of productwhen the system works in state i.

� !i: Waste emissions factor in the manufacturingsystem in tons of waste per unit demand whensystem works in state i.

� 'i: Defective percentage of products when thesystem works in state i.

� Gi: Recall cost of one defective product when thesystem works in state i.

� Bi: The ith markup level considered by the manu-facturing system when it works in state i.

� FGHG(ECU=T CO2): GHG emission penalty fac-tor.

� GHG: Acceptable level of GHG emission.� Fwaste(ECU=T waste): Waste emission penalty

factor.� WST : Acceptable level of waste emission.� WMRi: Waste of materials (cleaning materials

utilized for \regular repair" work such as lubricatingand hydraulics oils) during \regular repair" actionwhen the system's state is i.

Page 6: A POMDP framework to nd optimal policy in sustainable ...

R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561 1549

� WOi: Waste of useless parts generated during\overhaul" action when the system's state is i.

� WMOi: Waste of materials (special materials uti-lized for \major repair" work) during \overhaul"action when the system's state is i.

� WRi: Waste of useless replaced parts generatedduring \replacement" action when the system'sstate is i.

� R0i: Cost of regular maintenance (including standardactions such as lubricating, cleaning, resetting, etc.),which is a non-decreasing function of i.

� R00i : Cost of major repair (overhaul) including thecost of replacing defective parts of the system andthe cost of other necessary actions for the restorationof the system to the new one. This parameter is alsoa non-decreasing function of i.

� �: Discount factor between 0 and 1.� pij : One-step transition probability (during one

period) from state i to state j in a normal conditionand under \keep" action (continuing productionwithout repair or replacement). Transition proba-bility matrix is shown as follows:

1 2 : : : j : : : n

P=

12...i...n

2666666664p11 p12 : : : p1j : : : p1np21 p22 : : : p2j : : : p2n...

......

......

...pi1 pi2 : : : pij : : : pin...

......

......

...pn1 pn2 : : : pnj : : : pnn

3777777775: (1)

� p0ij : One-step transition probability from state ito state j after regular repair. The mentionedtransition probability matrix is as follows:

1 2 : : : j : : : n

P 0=

12...i...n

2666666664p011 p012 : : : p01j : : : p01np021 p022 : : : p02j : : : p02n

......

......

......

p0i1 p0i2 : : : p0ij : : : p0in...

......

......

...p0n1 p0n2 : : : p0nj : : : p0nn

3777777775 : (2)

� p00ij : One-step transition probability from state i tostate j after major repair, which is shown below:

1 2 : : : j : : : n

P 00=

12...i...n

2666666664p0011 p0012 : : : p001j : : : p001np0021 p0022 : : : p002j : : : p002n

......

......

......

p00i1 p00i2 : : : p00ij : : : p00in...

......

......

...p00n1 p00n2 : : : p00nj : : : p00nn

3777777775 :(3)

� O = (o1; :::; oK ; :::; oL): Is a vector that showsthe output values of monitoring in each time pe-riod. In general, for the problem under study,the system state in each time period is monitoredpartially by L monitors. In fact, the monitors canprovide relevant data to identify the real state ofthe system. Monitors' output is recorded as M =(M (1); :::;M (K); :::;M (L)), where M (K) representsthe output of the Kth monitor and comes fromf1; :::; oK ; :::;mKg.

� �: Conditional probability matrix that elaborateson the relationships between the real state of thesystem and the output of monitors. This conditionalprobability matrix is de�ned by Eq. (4) as shown inBox I, where:

j(o1; : : : ; oL) = P (Ojj) = Pr(M (1)

= o1; : : : ;M (K) = oK ; : : : ;M (L)

= oLjX = j): (5)

X = j denotes the real state of the system.� T (�; O): Updated probability vector, after observ-

ing monitors `output information (O) in a normalcondition under \keep" action. In other words,T (�; O) reveals the secondary state (posterior prob-abilities) on the condition that � (prior probabili-ties) and O (monitors' output) are known.

Eqs. (6)-(9) (based on Bayes' formula) areapplied to calculate each element of this vector asfollows:T (�; O)=(T1(�; O); T2(�; O); : : : ; Tn(�; O)); (6)

where:

� =

26666664 1(1; : : : ; 1) : : : 1(o1; : : : ; oL) : : : 1(m1; : : : ;mL)

.... . . . . . . . .

... j(1; : : : ; 1) : : : j(o1; : : : ; oL) : : : j(m1; : : : ;mL)

.... . . . . . . . .

... n(1; : : : ; 1) : : : n(o1; : : : ; oL) : : : n(m1; : : : ;mL)

37777775 : (4)

Box I

Page 7: A POMDP framework to nd optimal policy in sustainable ...

1550 R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561

Tj(�; O) = Pr(X = jjM = O;�)

=

nPi=1

�ipij jOnPj=1

nPi=1

�ipij jO; (7)

and:

P (Oj�) = Pr(M = Oj�) =nXj=1

nXi=1

�ipij jO; (8)

Tj(�; O) represents the probability that the sys-tem's real state is j when the primary state vector(prior probabilities) is � and monitors report thestate vector O. In fact, monitors' output is usedto evaluate the values of Tj to make a more preciseprediction of the partially observable state of thesystem.

� T 0(�; O): Updated probability vector by observingmonitors `output information (O) after regular re-pair (posterior probabilities):

T 0(�; O)=(T 01(�; O); T 02(�; O); : : : ; T 0n(�; O)); (9)

T 0j(�; O) =

nPi=1

�ip0ij jOnPj=1

nPi=1

�ip0ij jO; (10)

T 0j(�; O) represents the probability that the sys-tem's real state is j after the regular repair whenthe primary state vector is � and monitors reportthe state vector, O. In fact, monitors' output isused to obtain T 0j for a correct prediction of thepartially observable state of the system. Here,it is necessary to consider that in case of majorrepairs and replacement, the real state is directlyobservable; thus, there is no need to use the updatedprobability vectors such as T (�; O) and T 0(�; O).

2.2.2. ModelingIn the following, a POMDP framework as a 6-tuple(S;A; P;�; Rw) is utilized to model the CBSM prob-lem.

POMDP parameters include S = f1; : : : ; i; : : : ; j;: : : ; ng as a set of states, st 2 S as the systemstate at time t, and � = f�1; : : : ; �i; : : : ; �ng as theprobability of primitive states. A = fa1; a2; : : : ; atgshows the maintenance actions set at time t includingfKeep, Regular, Overhaul, Replaceg. is the main-tenance policy that maps S onto A. P shows thetransition probabilities matrix, and P 0 and P 00 arethe transition probabilities matrixes for \Regular" and\Overhaul" actions. O is monitors' output including(o1; : : : ; oK ; : : : ; oL), and � shows conditional proba-bilities that display the relationship between O and

the real system state with members j(o1; ::; oL). �shows the discount factor between 0 and 1. Finally,Rw(st; at) is the terminal reward that depends onthe maintenance action and system state at time tincluding KV (�), RgV (�), OV (�), and RpV (�) for\Keep", \Regular", \Overhaul", and \Replace", wherethe initial probability vector is �.

CBSM parameters include Ci, Di, �i, 'i, Gi,Bi, FGHG, GHG, Fwaste, WST , !i, WMRi, WOi,WMOi, and WRi. Most POMDP and CBSM pa-rameters mentioned in this section are introduced inSubsection 2.2.1 in detail.

Objective function and decision variable are in-troduced as follows: V (�) in Eq. (11) represents thePOMDP objective function that gives the expectedtotal discounted reward over an in�nite time horizonwith initial state � and maintenance policy .

V (�) =1Xi=0

�tE[Rw (st; at)j�;]: (11)

Decision variable is the optimal maintenance policydenoted by �(�):

�(�) = arg maxa2AV

(�); (12)

where �(�) optimally suggests the type and timeof maintenance actions and inspections that give themaximum expected reward for the initial state �.

The optimal value of CBSM objective functionin the POMDP framework is denoted by V �(�) inEq. (13):

V �(�)=maxfKV (�); RgV (�); OV (�); RpV (�)g;(13)

where KV (�), RgV (�), OV (�), and RpV (�) corre-spond to the expected rewards of \Keep", \Regular re-pair", \Overhaul", and \Replacement" actions, whichare expressed separately in the following.

First, \Keep" action and its related terms aredescribed in Eq. (14):

KV (�) =Xi

�iDi(1 +Bi)Ci

�(X

i

�iDiCi +Xi

�i'iDiGi

)��FGHG

�Xi

�i�i �GHG�

+Fwaste�X

i

�i!iDi �WST��

+�m1Xo1

: : :mLXoL

0@Xj

Xi

�ipij jO

1AV (T (�; O)):(14)

Page 8: A POMDP framework to nd optimal policy in sustainable ...

R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561 1551

Eq. (14) is related to \keep" action and shows theexpected total discounted pro�t of the system when themaintenance action is in the \keep" mode in the currentperiod. Manufacturing continues with this mainte-nance action; therefore, the �rst term in Eq. (14),Pi�iDi(1 +Bi)Ci, shows the total income earned from

supplying customers' demands. For a more detailedstudy on such formulations of earned pro�t, see [41].Terms two and three (

Pi�iDiCi and

Pi�i'iDiGi)

show production and recall costs. Terms four and�ve (FGHG(

Pi�i�i � GHG) and Fwaste(

Pi�i!iDi �

WST )) show GHG and waste emission penaltiesfor \keep", respectively. Note that the penaltyterm is di�erent in each maintenance action. Termsix (�

m1Po1

: : :mLPoL

(Pj

Pi�ipij jO)V (T (�; O))) shows the

summation of the expected total discounted pro�t forperiods after the current period and that of \keep" thatremains.

In the following, \Regular repair" and its relatedterms are described in Eq. (15):

RgV (�) =Xi

�iDi(1 +Bi)Ci

��X

i

�iDiCi+Xi

�i'iDiGi+Xi

�iR0i�

��FGHG

�Xi

�i�i �GHG�

+Fwaste�X

i

�i!iDi +WMRi��WST

�+�

m1Xo1=1

: : :mLXoL=1

�Xj

Xi

�ip0ij jO�

V (T 0(�; O)): (15)

Eq. (15) shows the expected total discounted pro�tof the system when, in the current period, main-tenance action is \regular repair". Manufactur-ing continues under this maintenance action; there-fore, the �rst term in Eq. (15),

Pi�iDi(1 +Bi)Ci,

shows the total income earned from supplying cus-tomers' demands. Terms two, three, and fourin Eq. (15) (

Pi�iDiCi,

Pi�i'iDiGi, and

Pi�iR0i)

are related to production, recall, and regular repaircosts, respectively. Terms �ve and six in Eq. (15)(FGHG(

Pi�i�i � GHG) and Fwaste(

Pi�i!iDi +

WMRi)�WST ) show GHG and waste emission penal-ties for \Regular repair", respectively. Term seven in

Eq. (15) (�m1Po1=1

: : :mLPoL=1

(Pj

Pi�ip0ij jO)V (T 0(�; O)))

shows the summation of the expected total discountedpro�t of the remaining periods after the current periodand that of \Regular repair". In the following, \Over-haul" action and its related terms will be described inEq. (16):

OV (�) = �Xi

�iR00i

�Fwaste�X

i

�i(WOi+MWOi)�WST�

+�Xj

Xi

�ip00ijV (ej): (16)

Eq. (16) shows the expected total discounted pro�t ofthe system when the maintenance action is \Overhaul"in the current period. Manufacturing stops under thismaintenance action; therefore, terms one and two inEq. (16) (

Pi�iR00i and FwastefP

i�i(WOi +WMOi)�

WSTg) that are related to the current period arenegative. The former shows the mean overhaul costand the latter shows waste emission penalty of \Over-haul". Note that, in \Overhaul", there are not anymanufacturing operations; thus, recall cost and GHGpenalty related terms have been omitted. Term threein Eq. (16) (�

Pj

Pi�ip00ijV (ej)) shows the expected

total discounted pro�t for periods that remain afterthis current period with \Overhaul". In the following,\Replacement" action and its related terms will bedescribed in Eq. (17):

RpV (�) = �Xi

�iRi � Fwaste X

i

�iWRi�WST

!+�V (e1): (17)

Eq. (17) shows the expected total discounted pro�t ofthe system when the maintenance action is \Replace"in the current period. Manufacturing stops in thismaintenance action; therefore, Terms one and twoin Eq. (17) (

Pi�iRi and Fwaste(

Pi�iWRi �WST )),

which are related to the current period, are negative.The prior shows mean replacement cost, and the lattershows the waste emission penalty for \Replace". Notethat, in this maintenance action, there are not anymanufacturing operations; therefore, recall cost andGHG penalty-related terms have been omitted. Termthree in Eq. (17) (�V (e1)) shows the expected totaldiscounted pro�t for periods that remain after thecurrent period with \Replace".

In the next section, the CBSM model studied inthis section will be solved e�ciently by 4 methods:

Page 9: A POMDP framework to nd optimal policy in sustainable ...

1552 R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561

Standard Dynamic Programming (SDP), Modi�ed Dy-namic Programming (MDP), IP, and AVP. Solutionapproaches are illustrated, and related algorithms andtheir owcharts are presented.

3. Solution approach

In this section, solving the CBSM model, presented inthe previous section, will be studied.

In general, POMDP solving methods can be di-vided into two main categories: optimal solving meth-ods and near-optimal and approximate methods. Someof these methods are applied to solve the presentedCBSM model in this paper: SDP method (a basictraditional optimal method and MDP method (a recentversion of it), IP method (a more recent and popularoptimal method) and AVP (a very recent version ofit), and Perseus (one of the most popular approximatemethods).

Each of the mentioned solving methods will bedescribed further and compared with each other in thefollowing.

For the �rst time, the concept of maintenancein the POMDP framework was studied in [42]. Theconcept is known to be a proper method to apply topartially observable systems under uncertainty [43] andhas been considered by several studies; however, theoptimal solving mode of it was abandoned long ago(until 2010) because of the complexity and di�culty ofdata acquisition.

SDP as a traditional optimal solving method,which is applied to solve the CBSM problem in thispaper, can present an optimal solution to small-sizedproblems, but it is unable to solve medium and largePOMDPs. On the other hand, MDP as a revisedversion of SDP can solve small and medium problems,but it cannot solve large problems.

Therefore, near-optimal and approximate meth-ods were proposed for solving POMDPs, some of themost important of which exist in the papers such as [44]and [45].

Among the popular approximate methods,Perseus method, described as an approximate point-based solver in [45], will be used and evaluated in thenext section for solving the mentioned CBSM model, inwhich there are some features that make it even morecomplicated than simple POMDP to solve. UnlikeSDP and MDP, Perseus can solve large-scale CBSMproblems in a desirable time span, but the solution isapproximate and not exact.

To present an optimal and exact solution to thePOMDP problems in a desirable amount of time, somerecent papers were proposed in [46{51].

The application of IP variations as an exactmethod has become increasingly popular in recentworks because of their e�ciency in solving POMDPs.

Pruning methods use a piecewise-linear and convex rep-resentation of the value function that can be speci�edby a unique set of vectors and often aim to remove non-necessary vectors. In [51], as the most recent work, anew variant of IP method called AVP is introducedfor the exact solving of large-scale POMDPs so as toimprove the performance of existing pruning methodsand make the proposed method the fastest of its kindfor POMDPs. The AVP method will be applied foroptimally solving the CBSM model, too.

In the remainder of this section, SDP and MDPmethods will be explained in Subsection 3.1, Perseuswith its owchart will be described in Subsection 3.2,and IP method and AVP will be illustrated with theirCBSM-related owcharts in Subsection 3.3. Finally,Perseus and AVP methods as the most e�ective esti-mating and exact methods will be compared with eachother.

3.1. Standard and modi�ed dynamicprogramming

CBSM problem has been solved using SDP as atraditional, exact POMDP solving method in the �nitehorizon (see [52]). A backward pro�t maximizationrecursive algorithm, which can solve problems of DPkind, is employed and implemented using visual basicand C++ solvers. Computational results show thatit is di�cult to apply this method to solve medium-and large-scale POMDP problems. Limitations of theclassical DP are well known because of its long time andhigh memory consumption during the computation.

In the present paper, some techniques in C++programming are applied to increase the system'smemory temporarily and reduce DP's computationallimitations. These methods include memory manage-ment techniques, heap memory management methods,and a private heap for dynamic memory allocation andhandling memory blocks [53]. This method is namedMDP, the application of which necessitates allocatinga greater volume of memory to processing and ensuresa shorter computational duration. Computationalresults of solving CBSM model in the �nite horizonby standard and modi�ed DP methods and comparingthem with other methods will be presented in Table 1in Section 4.

In the next section, Perseus method that canovercome the limitations of DP and MDP methods insolving large-scale problems will be described.

3.2. Perseus methodSince exact methods are usually not suitable for solvinglarge-scale and complicated POMDPs, approximatemethods like Perseus have gained the upper hand andbecome popular [54].

Perseus method is applied to solve the proposedCBSM model e�ciently; it is a randomized Point-

Page 10: A POMDP framework to nd optimal policy in sustainable ...

R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561 1553

Table 1. Computational time of solving Condition-Based Sustainable Maintenance (CBSM) model by Standard andModi�ed DPs (3 states and 2 monitors in the �nite horizon).

N # of stages orhorizon length

Initialprobability vector

Computation time (s)

Standard DP Modi�ed DPState

generationDecision-making

Stategeneration

Decision-making

1 5 (1,0,0) 59 78 0.5 1.52 7 (1,0,0) 610 720 1.5 23 10 (1,0,0) � � 10 154 20 (1,0,0) � � 203 2245 25 (1,0,0) � � � �

6 5 (0.8,0.15,0.05) 63 85 1 1.57 7 (0.8,0.15,0.05) 602 732 2 38 10 (0.8,0.15,0.05) � � 11 199 20 (0.8,0.15,0.05) � � 240 27110 25 (0.8,0.15,0.05) � � � �

11 5 (0,1,0) 62 84 1 1.512 7 (0,1,0) 667 722 3 313 10 (0,1,0) � � 10 1514 20 (0,1,0) � � 258 27715 25 (0,1,0) � � � �

16 6 (0.2,0.3,0.5) 619 705 0.9 1.317 7 (0.2,0.3,0.5) � � 2 318 10 (0.2,0.3,0.5) � � 14 1619 20 (0.2,0.3,0.5) � � 246 26120 25 (0.2,0.3,0.5) � � � �

21 6 (0,0,1) 590 682 1.2 1.822 7 (0,0,1) � � 2 323 10 (0,0,1) � � 11 1424 20 (0,0,1) � � 235 26525 25 (0,0,1) � � � �

Based Value Iteration (PBVI) algorithm that �rstcollects a quite large set of belief points from thebelief space, which remains �xed until the end of thealgorithm. By simply choosing random actions and ob-servations to create trajectories via the belief space, Bwill be built. Next, the algorithm computes the initialvalue function, which is considered as an approximatedlower bound, commonly including a single �-vectorwith all components calculated by 1

1� mins;a

R(s; a). In

the next step, backup stages will begin and continueuntil the end of algorithm, where a belief point b willbe chosen randomly in each stage from B to calculate

a related value and �-vector. To brie y explain thebackup stages, a owchart of the algorithm is shown inFigure 1.

In the last step of the owchart, the convergencecondition is satis�ed if the absolute value di�erence intwo successive iterations is below a de�ned tolerancethreshold (0.001), or the running time exceeds a de�nedlimit (2000 seconds),as described in Section 3.

Moreover, the computational results of solving theCBSM model by Perseus method and the comparisonof the former results and the results of other methodsare presented in Table 2 in Section 4.

Page 11: A POMDP framework to nd optimal policy in sustainable ...

1554 R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561

Figure 1. Flowchart of the Perseus algorithm.

In the next section, a new AVP method that cansolve large-scale problems exactly will be described.

3.3. Accelerated Vector Pruning (AVP) andIncremental Pruning (IP) methods

As mentioned before, the application of AVP for someoptimal POMDP solvers is a recently proposed proce-dure that speeds up the existing pruning algorithmsand leads to the exact solving of large-scale POMDPproblems in a desirable amount of time. Generally,pruning methods work based on the fact that thevalue function in POMDPs is piecewise linear andconvex [55], which can be speci�ed by using a uniqueset of vectors with a minimum size. Pruning algorithmsoften aim to remove unnecessary or dominated vectorsfrom the value function; otherwise, the number ofpossible vectors will increase quickly and can lead toslowing down the process of �nding an optimal policyin the steps of the exact value iteration algorithm.

This property is used for representing the valuefunction by a �nite set of jSj-dimensional vectors,� = f�0; �1; : : : ; �kg, in which the belief point b hasa value of V (b) = max�n2�b:�n, where \:" denotes

Table 2. Computational results of solvingCondition-Based Sustainable Maintenance (CBSM) modelby Perseus, Incremental Pruning (IP), and AcceleratedVector Pruning (AVP) methods in the in�nite horizon,Initial probability vector (0.8, 0.15, 0.05).

N jOj jAjComputation time (s)

Perseus

Error (%) Time IP AVP

1 2 4 0 0.2 31 27

2 3 4 0 19 135 48

3 4 4 5.3E-8 32 228 170

4 6 4 9.6E-7 61 642 345

5 2 4 5.8E-8 8 56 49

6 3 4 9.6E-7 21 202 119

7 3 6 9.5E-6 76 693.5 468

8 3 8 9.0E-6 100 1123 900

9 4 4 0 54 499 290

10 5 3 9.7E-7 75 783 446

11 5 4 9.7E-7 81 809 2226

12 6 4 9.8E-6 105 1196 517

13 3 6 9.9E-6 110 1261 581

14 3 8 9.0E-6 139 1994 1005

15 5 3 9.6E-5 97 1019 963

16 5 4 9.2E-6 108 1249 541

17 6 4 9.8E-5 113 1724 956

18 2 4 9.6E-6 17 102 58

19 3 6 9.8E-6 120 1882 948

20 4 4 9.5E-6 63 667 363

21 4 4 9.6E-5 98 1061 735

22 2 4 9.2E-4 34 255 89

23 4 4 9.8E-4 169 1525 949

24 21 5 9.7E-4 1048 � �25 17 5 9.7E-4 816 � �

\dot product"(for more details about this, refer to [56]or [57]).

The CBSM model proposed in the present papercan also be rewritten in the vector form. Based onthe steps proposed in [58], �rst, the value function isde�ned as a combination of simpler value functions:

V (�) = maxa2AV

a(�); (18)

Page 12: A POMDP framework to nd optimal policy in sustainable ...

R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561 1555

V a(�) =m1XO1=1

: : :mLXOL=1

fV a;O(�)g; (19)

V Keep(�) =m1XO1=1

: : :mLXOL=1

fV Keep;O(�)g; (20)

V Regular(�) =m1XO1=1

: : :mLXOL=1

fV Regular;O(�)g; (21)

V Overhaul(�) =m1XO1=1

: : :mLXOL=1

fV Overhaul;O(�)g; (22)

V Replace(�) =m1XO1=1

: : :mLXOL=1

fV Replace;O(�)g: (23)

Eqs. (24){(27) are shown in Box II. All of thesefunctions are piecewise linear and convex that can bepurged from dominated vectors.

Purged sets with minimum sizes will be shown byPurge() operator as �(�), �a(�), and �a;O(�) forV (�), V a(�), and V a;O(�), respectively. Therefore,

Eqs. (28) to (30) can be written as follows:

�(�) = Purge([a2A�a(�)); (28)

�a(�) = Purge(�o2O�a;O(�)); (29)

�a;O(�) = Purge(f�a;O;ij�i 2 �g); (30)

�a;O;n(s) =R(i; a)jOj + �

Xj2S

P (oja; j)P (jja; i)�n(j):(31)

In Eq. (29), � is the vector-related operator and isde�ned for two sets Y and z as Y � Z = fy + zjy 2Y; z 2 Zg.

Dominated vectors in some mathematical meth-ods are detected by solving a number of Linear Pro-gramming (LP)problems, which is usually a time-consuming operation. In the present paper, two di�er-ent LP methods are considered: a classic LP algorithm(related to IP method) and the AVP LP algorithm.Figure 2 shows a owchart that summarizes the stepsof the pruning algorithms proposed in [59]. Sectionsrelated to LP problem solving are in bold. Note thatthe <lex in the owchart is lexicographic ordering, andits application to this algorithm is explicit in [60].

As mentioned before, AVP is a new variant of theIP method that uses Benders decomposition approach

V Keep;O(�) =1jOj

8><>:Pi�iDi(1 +Bi)Ci � fP

i�iDiCi +

Pi�i'iDiGig��

FGHG�P

i�i�i �GHG

�+ Fwaste

�Pi�i!iDi �WST

�� 9>=>;+�

0@Xj

Xi

�ipij jO

1AV (T (�; O)); (24)

V Regular;O(�) =1jOj

8>><>>:Pi�iDi(1 +Bi)Ci �

�Pi�iDiCi +

Pi�i'iDiGi +

Pi�iR0i

��

fFGHG�P

i�i�i �GHG

�+ Fwaste

�Pi�i!iDi �WMRi

��WST )g

9>>=>>;+�

0@Xj

Xi

�ip0ij jO

1AV (T 0(�; O)); (25)

V Overhaul;O(�)=1jOj

8<:�Xi

�iR00i�Fwaste(X

i

�i(WOi+WMOi)�WST

)+�

0@Xj

Xi

�ip00ijV (ej)

1A9=; ;(26)

V Replace;O(�) =1jOj

(�X

i

�iRi � Fwaste X

i

�iWRi �WST

!+ �V (e1)

): (27)

Box II

Page 13: A POMDP framework to nd optimal policy in sustainable ...

1556 R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561

Figure 2. Flowchart of the pruning algorithm.

to accelerate the pruning procedure by decomposinglinear programs to be solved among the pruning phasesin the IP method. Therefore, the steps of AVP methodare similar to those shown in Figure 2 (related to theIP method) and di�er from those in the LP section,which must be replaced by the owchart presented inFigure 3.

Note that all inputs, outputs, and the overallresults are the same in both cases of the original LP andthe decomposed LP procedures; however, the programis often faster in the second decomposed procedurebecause there are a small number of original LP'sconstraints in the decomposed LP. For more detailsabout AVP method such as how to apply Benders de-composition to it and prove its correctness, please referto [51]. In addition to the Perseus method illustratedin the previous section, for the CBSM model, AVPand IP methods will be applied to solve the CBSMmodel in value iteration algorithm, and computational

results will be compared with each other. IP has beenconsidered as the most e�cient method among thepruning methods before AVP in several papers.

In the next section, computational results ofsolving CBSM problem with SDP, MDP, Perseus, IP,and AVP methods will be presented.

4. Computational results and concludingremarks

For evaluating the performance of algorithms proposedin the previous section, numerical studies are providedin this section. Algorithms such as standard DP,modi�ed DP, Perseus, IP, and AVP were coded in visualbasic, C++, and Java languages with a PC featuringa 2.4 GHz Pentium IV processor, 4GB RAM, andWindows Vista.

As mentioned earlier, exact methods, SDP andMDP, can solve small POMDPs. Therefore, the

Page 14: A POMDP framework to nd optimal policy in sustainable ...

R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561 1557

Figure 3. Flowchart of the decomposed Linear Programming (LP) part in the Accelerated Vector Pruning (AVP)algorithm.

e�ectiveness of these algorithms in solving such CBSMproblems was examined and compared by performinga set of computational experiments including smallproblems with maximum 20 stages (horizon length),3 states (healthy, medium, and broken states of thesystem), and 2 condition monitoring tools (for example,temperature and vibration sensors) in various primarystates including (1,0,0) for healthy primary states,(0,1,0) for medium primary states, (0,0,1) for brokenprimary states, and (0.2, 0.3, 0.5) and (0.8, 0.15, 0.05)for fairly broken and fairly healthy primary states, asshown in Table 1. Memory limitation is observed inmore than 20 stages for 3-state and 2-sensor problems,as shown by � in Table 1.

In�nite horizon CBSM problems can be solved byPerseus, IP, and AVP methods, and the computationalresults of comparing their solution time with each otherare presented in Table 2 for the probability vector ofprimitive states (0.8, 0.15, 0.05).These probabilitiesshow that the underlying system is in a healthy statewith an 80% probability level at the start of themaintenance process. This means that, under sucha condition, maintenance planning will be done fora system that is most likely healthy at the start ofthe process to represent an intuitive position. FromTable 2, it can be seen that the proposed AVP algo-rithm is very successful in achieving exact solutions in areasonable amount of time (all less than 2300 seconds),which indicates the e�ciency of the AVP method insolving such CBSM problems. Moreover, Perseus is

very successful in proposing high-quality solutions withfewer errors, all less than 0.01%.

A summary of sensitivity analysis and managerialremarks is reported in Table 3 and Figures 4 to 8.

Table 1 compares the computational time taken tosolve �nite-horizon small and medium CBSM problemsby SDP and MDP. The initial probability vector isconsidered for some of the following intuitive states:the completely broken, the completely healthy, themedium state, the likely healthy (80%), and the likelybroken (50%). SDP can only solve the instances until7 stages. Symbol \�" is used in this table to show\memory limitation". As shown in Table 1, MDPis often faster than SDP and needs a lower amountof memory; however, large problems with more than20 stages, 3 states, and 2 monitors cannot be solved

Figure 4. Sensitivity analysis of total pro�t forsustainability factors.

Page 15: A POMDP framework to nd optimal policy in sustainable ...

1558 R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561

Table 3. Sensitivity analysis of sustainability factors FGHG and Fwaste; GHG = WST = 25.

N FGHG Fwaste Objective function Limit violation Penaltyof GHG of WST of GHG of WST

1 0 0 13249839.2 { { 0 02 1� 10�20 1� 10�20 13249839.2 3.28E+26 1.4E+27 327840 1397183.93 1� 10�8 1� 10�8 13249839.2 3.28E+13 1.4E+14 327840 1397183.94 1� 10�4 1� 10�4 13249838 1397183836.5 327839962.3 327840 1397183.85 1� 10�1 1� 10�1 13249754 3278323.9 139717536.5 327832.4 1397175.46 0.25 0.25 13249625 1311283.6 5588650.1 327820.9 1397162.57 0.5 0.5 13249411 655603.5 2794282.3 327801.8 1397141.18 1 1 13248984 327763.5 1397098.4 327763.5 1397098.49 2 2 13248128 163843.5 698506.4 327687 1397012.810 5 5 13245561 65491.47 279351.2 327457.4 1396756.111 10 10 13241283 32707.47 139632.8 327074.7 1396328.312 20 20 13232726 16315.47 69773.63 326309.3 1395472.613 50 50 13207443 6480.265 27858.89 324013.3 1392944.314 500 500 12872170.9 598.2358 2718.8 299117.9 1359417.115 5000 5000 9657669 23.7732 207.5934 118866 1037966.916 10000 10000 6107507 7.860074 68.29507 78600.74 682950.7

Figure 5. Sensitivity analysis of GHG violation forsustainability factors.

Figure 6. Sensitivity analysis of WST violation forsustainability factors.

by the method yet. In the following, more powerfulcomputational results will be presented by applyingPerseus and AVP methods in the in�nite time horizon.

Table 2 shows computational results such as com-putation time and error percentage for solving small,medium, and large problems in the in�nite time horizonby Perseus, IP, and AVP methods. Symbol � is usedto show \memory limitation". It can be observed from

Figure 7. Sensitivity analysis of GHG penalty forsustainability factors.

Figure 8. Sensitivity analysis of waste penalty forsustainability factors.

Table 2 that AVP improves the performance of IP inall CBSM cases and needs a lower volume of memory.It should be noted that other larger problems, suchas those shown in the last two rows, could not besolved by AVP and IP; however, Perseus can solve themapproximately. It is notable that in the case of pruning

Page 16: A POMDP framework to nd optimal policy in sustainable ...

R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561 1559

methods such as IP and AVP, problems that have initialprobability vectors with fewer zeros are often more timeconsuming and, therefore, the intuitive initial state(0.8,0.15,0.05) that shows mostly a healthy state for thesystem is selected to begin the maintenance procedure,as shown in this table.

Table 3 shows the objective function, GHG, andwaste violation and also GHG and waste penalty vari-ations versus sustainability factors FGHG and Fwastechanges. The analytical results of this table arepresented in �ve �gures in the following.

Figures 4 to 8 show the impact of changingsustainability factors ( FGHG and Fwaste) on somerelated values (such as objective function, GHG, andwaste violation and also GHG and waste penalty).Decreasing sustainability factors (FGHG and Fwaste)means easier and more exible governmental regula-tions. If these penalty factors go below a certainlevel, manufacturing systems violate GHG and wastelimitations (GHG and WST ) heavily in the exponen-tial form (see Figures 5 and 6) because the assignedpenalties can be controllable simply (see Figures 7 and8) and there will not be any signi�cant decrease in themanufacturing systems' total pro�t. This could be aserious threat to the environment.

On the other hand, increasing sustainability fac-tors (FGHG and Fwaste) means greater stricture andtougher governmental regulations, and increasing theenvironment-related penalties for GHG and waste canonly useful up to a certain limit, before which man-ufacturing systems attempt to produce less pollutionto control their pro�t level; however, after the afore-mentioned set limit, the manufacturing systems cannotwithstand, and there remains only one way to dealwith these excessive penalties: limiting production (seeFigure 8).

5. Conclusions

This study focused on environmental considerationsin the maintenance context of partially observable,stochastically deteriorating systems to optimize bothmaintenance and inspection actions. The Par-tially Observable Markov Decision Process (POMDP)framework with a more rigorous, systematic forma-tion of maintenance actions was applied to modelsustainability-oriented manufacturing system's mainte-nance problem. The problem was modeled separatelyfor four maintenance actions including keep, regularmaintenance (minor repair), overhaul (major repair),and replace with terms related to sustainability formu-lations. CO2 and waste emissions as some of the mostimportant factors of sustainability were consideredin the formulation of each action, depending on thespeci�c situations of it.

The model was solved for both �nite and in�nite

horizons. Perseus algorithm was applied for solving themodel approximately. Apart from that, the problemwas solved by an exact method, Incremental Pruning(IP), and also with a current variant of it, AcceleratedVector Pruning (AVP) method. By comparing thecomputational results and analyzing them, some inter-esting theoretical and managerial points were obtained.The optimization of Green House Gas (GHG) andwaste factors can be proposed for the future research.

References

1. Chu, C., Proth, J.M., and Wol�, P. \Predictivemaintenance: The one-unit replacement model", In-ternational Journal of Production Economics, 54(3),pp. 285{295 (1998).

2. Wireman, T., Benchmarking Best Practices in Main-tenance Management, Industrial Press Inc (2004).

3. Waeyenbergh, G. and Pintelon, L. \A frameworkfor maintenance concept development", InternationalJournal of Production Economics, 77(3), pp. 299{313(2002).

4. Rahmati, S.H.A., Ahmadi, A., and Karimi, B. \De-veloping simulation based optimization mechanism fora novel stochastic reliability centered maintenanceproblem", Scientia Iranica, Transactions E, IndustrialEngineering, 25(5), pp. 2788{2806 (2018).

5. Agustiady, T.K. and Cudney, E.A. \Total productivemaintenance", Total Quality Management & BusinessExcellence, pp. 1{8 (2018) .

6. Nielsen, J.S. and S�rensen, J.D. \Computationalframework for risk-based planning of inspections,maintenance and condition monitoring using discreteBayesian networks", Structure and Infrastructure En-gineering, 14(8), pp. 1082{1094 (2018).

7. Ahmad, R. and Kamaruddin, S. \An overview of time-based and condition-based maintenance in industrialapplication", Computers & Industrial Engineering,63(1), pp. 135{149 (2012).

8. Takata, S., Kirnura, F., Van Houten, F.J.A.M., etal. \Maintenance: changing role in life cycle man-agement", CIRP Annals Manufacturing Technology,53(2), pp. 643{655 (2004).

9. Grall, A., B�erenguer, C., and Dieulle, L. \A condition-based maintenance policy for stochastically deteriorat-ing systems", Reliability Engineering & System Safety,76(2), pp. 167{180 (2002a).

10. Han, Y. and Song, Y.H. \Condition monitoring tech-niques for electrical equipment-a literature survey",Power Delivery, IEEE Transactions on, 18(1), pp. 4{13 (2003).

11. Moya, M.C.C. \The control of the setting up of apredictive maintenance programme using a system ofindicators", Omega, 32(1), pp. 57{75 (2004).

12. Jardine, A.K., Lin, D., and Banjevic, D. \A review onmachinery diagnostics and prognostics implementing

Page 17: A POMDP framework to nd optimal policy in sustainable ...

1560 R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561

condition-based maintenance", Mechanical Systemsand Signal Processing, 20(7), pp. 1483{1510 (2006).

13. Gupta, A. and Lawsirirat, C. \Strategically optimummaintenance of monitoring-enabled multi-componentsystems using continuous-time jump deteriorationmodels", Journal of Quality in Maintenance Engineer-ing, 12(3), pp. 306{329 (2006).

14. Iung, B. and Levrat, E. \Advanced maintenance ser-vices for promoting sustainability", Procedia CIRP,22, pp. 15{22 (2014).

15. Jasiulewicz-Kaczmarek, M. \Sustainability: orienta-tion in maintenance management: case study", Eco-Production and Logistics, pp. 135{154 (2013).

16. WCED, S.W.S. \World commission on environmentand development", Our Common Future (1987).

17. Michaelis, L. \The role of business in sustainableconsumption", Journal of Cleaner Production, 11(8),pp. 915{921 (2003).

18. Kalantary, M., Saen, R.F., and Eshlaghy, A.T. \Sus-tainability assessment of supply chains by inversenetwork dynamic data envelopment analysis", Scien-tia Iranica, Transactions E, Industrial Engineering,25(6), pp. 3723{3743 (2018).

19. Fotuhi-Friuzabad, M., Safdarian, A., Moeini-Aghtaie,M., et al. \Upcoming challenges of future electricpower systems: sustainability and resiliency", ScientiaIranica, 23(4), pp. 1565{1577 (2016).

20. Shahidehpour, M. and Fotuhi-Friuzabad, M. \Gridmodernization for enhancing the resilience, reliability,economics, sustainability, and security of electricitygrid in an uncertain environment", Scientia Iran-ica, Transaction D, Computer Science & Engineering,Electrical, 23(6), p. 2862 (2016).

21. Ararsa, B.B. \Green maintenance: A literature surveyon the role of maintenance for sustainable manufac-turing", MS Thesis, M�alardalen University, School ofInnovation, Design and Engineering (2012).

22. Kopac, J. \Achievements of sustainable manufacturingby machining", Journal of Achievements in Materialsand Manufacturing Engineering, 34(2), pp. 180{187(2009).

23. Ben-Daya, M., Ait-Kadi, D., Du�uaa, S.O., et al.Handbook of Maintenance Management and Engineer-ing, 7, Springer, London (2009).

24. Sari, E., Shaharoun, A.M., Ma'aram, A., et al. \Sus-tainable maintenance performance measures: A pilotsurvey in Malaysian automotive companies", ProcediaCIRP, 26, pp. 443{448 (2015).

25. Nezami, F.G. and Yildirim, M.B. \A sustainabilityapproach for selecting maintenance strategy", Inter-national Journal of Sustainable Engineering, 6(4), pp.332{343 (2013).

26. Mishra, R.P. and Mungi, P. \A system framework fora sustainable approach to maintenance", SustainableOperations in India, pp. 79{91 (2018).

27. Li, S. \Optimal control of production-maintenancesystem with deteriorating items, emission tax andpollution R&D investment", International Journal ofProduction Research, 52(6), pp. 1787{1807 (2014).

28. Ben-Salem, A., Gharbi, A., and Hajji, A. \Environ-mental issue in an alternative production-maintenancecontrol for unreliable manufacturing system subjectto degradation", The International Journal of Ad-vanced Manufacturing Technology, 77(1{4), pp. 383{398 (2015).

29. Hajej, Z., Rezg, N., and Gharbi, A. \Ecological opti-mization for forecasting production and maintenanceproblem based on carbon taxthem", The InternationalJournal of Advanced Manufacturing Technology, 88(5{8), pp. 1595{1606 (2017a).

30. Ba, K., Dellagi, S., Rezg, N., et al. \Joint optimizationof preventive maintenance and spare parts inventoryfor an optimal production plan with considerationof CO2 emission", Reliability Engineering & SystemSafety, 149, pp. 172{186 (2016).

31. Hajej, Z., Rezg, N., and Gharbi, \A. Joint optimizationof production and maintenance planning with an en-vironmental impact study", The International Journalof Advanced Manufacturing Technology, 93(1{4), pp.1269{1282 (2017b).

32. Franciosi, C., Lambiase, A., and Miranda, S. \Sustain-able maintenance: A periodic preventive maintenancemodel with sustainable spare parts management",IFAC-PapersOnLine, 50(1), pp. 13692{13697 (2017).

33. Al-Turki, U.M., Ayar, T., Yilbas, B.S., and Sahin,A.Z., Integrated Maintenance Planning in Manufactur-ing Systems, Cham: Springer (2014).

34. Tlili, L., Radhoui, M., and Chelbi, A. \Condition-based maintenance strategy for production systemsgenerating environmental damage", MathematicalProblems in Engineering, 2015, Article ID 494162(2015). https://doi.org/10.1155/2015/494162

35. Chouikhi, H., Khatab, A., and Rezg, N. \A condition-based maintenance policy for a production system un-der excessive environmental degradation", Journal ofIntelligent Manufacturing, 25(4), pp. 727{737 (2014).

36. Ben-Salem, A., Gharbi, A., and Hajji, A. \Productionand uncertain green subcontracting control for anunreliable manufacturing system facing emissions",The International Journal of Advanced ManufacturingTechnology, 83(9{12), pp. 1787{1799 (2016).

37. Jiang, A., Dong, N., Tam, K.L., et al. \De-velopment and optimization of a condition-basedmaintenance policy with sustainability requirementsfor production system", Mathematical Problems inEngineering, 2018, Article ID 4187575 (2018).https://doi.org/10.1155/2018/4187575

38. Alaswad, S. and Xiang, Y. \A review on condition-based maintenance optimization models for stochasti-cally deteriorating system", Reliability Engineering &System Safety, 157, pp. 54{63 (2017).

Page 18: A POMDP framework to nd optimal policy in sustainable ...

R. Ghandali et al./Scientia Iranica, Transactions E: Industrial Engineering 27 (2020) 1544{1561 1561

39. Papakonstantinou, K.G. and Shinozuka, M. \Planningstructural inspection and maintenance policies viadynamic programming and Markov processes. Part I:theory", Reliability Engineering & System Safety, 130,pp. 202{213 (2014).

40. Kumar, A. and Meenakshi, N., Marketing Manage-ment, Vikas Publishing House (2011).

41. Ahmadi-Javid, A. and Ghandali, R. \An e�cientoptimization procedure for designing a capacitateddistribution network with price-sensitive demand",Optimization and Engineering, 15(3), pp. 801{817(2014).

42. Pak, P.K., Kim, D.W., and Jeong, B.H. \Machinemaintenance policy using partially observable Markovdecision process", Journal of the KSQC, 16(2), pp. 1{9(1988).

43. Kaelbling, L.P., Littman, M.L., and Cassandra, A.R.\Planning and acting in partially observable stochasticdomains", Arti�cial Intelligence, 101(1), pp. 99{134(1998).

44. Pineau, J., Gordon, G., and Thrun, S. \Point-basedvalue iteration: An anytime algorithm for POMDPs",IJCAI, 3, pp. 1025{1032 (2003).

45. Spaan, M.T. and Vlassis, N. \Perseus: Randomizedpoint-based value iteration for POMDPs", Journalof Arti�cial Intelligence Research, 24, pp. 195{220(2005).

46. Raphael, C. and Shani, G. \The skyline algorithm forPOMDP value function pruning", Annals of Math-ematics and Arti�cial Intelligence, 65(1), pp. 61{77(2012).

47. Karmokar, A.K., Senthuran, S., and Anpalagan, A.\POMDP-based cross-layer power adaptation tech-niques in cognitive radio networks", Global Commu-nications Conference, IEEE, pp. 1380{1385 (2012).

48. Roijers, D.M., Vamplew, P., Whiteson, S., et al. \Asurvey of multi-objective sequential decision-making",Journal of Arti�cial Intelligence Research, 48, pp. 67{113 (2013).

49. Li, D. and Jayaweera, S.K. \Machine-learning aidedoptimal customer decisions for an interactive smartgrid", IEEE Systems Journal, 9(4), pp. 1529{1540(2015).

50. Qian, W., Liu, Q., Zhang, Z., et al. \Policy graphpruning and optimization in Monte Carlo value itera-tion for continuous-state POMDPs", In ComputationalIntelligence (SSCI), Symposium Series on IEEE, pp.1{8 (2016).

51. Walraven, E. and Spaan, M.T. \Accelerated VectorPruning for Optimal POMDP Solvers", In AAAI, pp.3672{3678 (2017).

52. Bellman, R., Dynamic Programming, Courier Corpo-ration (2013).

53. Ahuja. D.https://www.codeproject.com/Articles/9898/Heap-Walker (2005).

54. Agrawal, R., Real�, M.J., and Lee, J.H. \MILP basedvalue backups in partially observed Markov decisionprocesses (POMDPs) with very large or continuous ac-tion and observation spaces", Computers & ChemicalEngineering, 56, pp. 101{113 (2013).

55. Smallwood, R.D. and Sondik, E.J. \The optimal con-trol of partially observable Markov processes over a�nite horizon", Operations Research, 21(5), pp. 1071{1088 (1973).

56. �Ozgen, S. and Demirekler, M. \A fast elimina-tion method for pruning in POMDPs", Joint Ger-man/Austrian Conference on Arti�cial Intelligence,Springer International Publishing, pp. 56{68 (2016).

57. Feng, Z. and Zilberstein, S. \Region-based incrementalpruning for POMDPs", 20th Conf. on Uncertaintyin Arti�cial Intelligence, AUAI Press, pp. 146{153(2004).

58. Cassandra, A., Littman, M.L., and Zhang, N.L. \In-cremental pruning: A simple, fast, exact method forpartially observable Markov decision processes", 13thConf. on Uncertainty in Arti�cial Intelligence, MorganKaufmann Publishers Inc., pp. 54{61 (1997).

59. White, C.C. \A survey of solution techniques for thepartially observed Markov decision process", Annals ofOperations Research, 32(1), pp. 215{230 (1991).

60. Littman, M.L., The Witness Algorithm: Solving Par-tially Observable Markov Decision Processes, BrownUniversity, Providence, RI. (1994).

Biographies

Razieh Ghandali is currently a PhD Candidate ofIndustrial Engineering at Yazd University, Yazd, Iran.She obtained an MS degree in Industrial Engineeringfrom Amir Kabir University of Technology (Tehran,Iran) in 2011 and a BS degree in Industrial Engineeringfrom Alzahra University (Tehran, Iran) in 2006. Herresearch interests include operations research, supplychain, time series, dynamic programming, uncertaintymodeling, and Markov models.

Mohammad Hossein Abooie is currently an As-sistant Professor of Industrial Engineering at YazdUniversity. He received his BS, MS, and PhD degreesin Industrial Engineering from Amir Kabir Universityof Technology in 1993, 1995, and 2009, respectively.His research interests include quality control andmanagement, customer relationship management andmarketing, economics, and applied operations research.

Mohammad Saber Fallah Nezhad is currently anAssociate Professor of Industrial Engineering at YazdUniversity. He received his BS, MS, and PhD degreesin Industrial Engineering from Sharif University ofTechnology in 2003, 2005, and 2008, respectively. Hisresearch interests include quality control, Bayesianinference, dynamic programming, and Markov models.