A machine learning approach to the optimization of inventory management policies

(1)

F

ACULDADE DE

E

NGENHARIA DA

U

NIVERSIDADE DO

P

ORTO

A Machine Learning Approach to the

Optimization of Inventory Management

Policies

Álvaro Silva de Melo

F

OR

J

URY

E

VALUATION

Programa Mestrado Integrado em Engenharia Eletrotécnica e de Computadores Orientador: Gonçalo Figueira

Co-orientador: Rui L. Lopes

(2)

c

(3)

Abstract

One of the key areas of operations and supply chain management is inventory control. Inventory control determines when and which quantity of a product should be ordered, given some objective, such as minimizing cost. Over time different algorithms have been proposed to calculate the optimal parameters given the demand characteristics and a fixed cost structure, as well as several heuristics and meta-heuristics that calculate approximations with varying accuracy.

This project revolves around two main models that are very present in the inventory manage-ment field: a single stage system and a multi-stage situation, both with stochastic demand, in this case, a demand that follows a Poisson distribution.

In this work, we present an alternative approach for inventory control optimization based on Genetic programming. Genetic Programming is an optimization method that applies the principles of natural evolution to optimization problems. More specifically, we adopt a Cooperative Coevo-lution approach making us able to evolve closed-form expressions for the inventory control policy parameters being studied. The results from the evolutionary code surpass some of the heuristics related to this subject and constitute near optimal solutions.

(4)

(5)

Acknowledgments

I would first like to thank my thesis advisor Gonçalo Figueira. His door was always open whenever I ran into a trouble spot or had a question about my research or writing. He consistently allowed this paper to be my own work, but steered me in the right the direction whenever he thought I needed it. I would also like to thank my co-advisor Rui L. Lopes whose knowledge in Genetic Programming area and guidance through the programming field in general was crucial for the development of this project.

I would also like to acknowledge Cristiane Ferreira, researcher in INESC, that gave me an extremely important and essential help during this work time, explaining me the basics of a frame-work needed for this project.

I would like to thank my friends for supporting me throughout my academic years and this specific time. Without them, this would have been a way more stressful process and I can not thank them enough.

Finally, I must express my very profound gratitude to my parents and to my sister for provid-ing me with unfailprovid-ing support and continuous encouragement throughout my years of study and through the process of researching and writing this thesis. This accomplishment would not have been possible without them. Thank you.

Álvaro Silva de Melo

(6)

(7)

“I have been impressed with the urgency of doing. Knowing is not enough; we must apply. Being willing is not enough; we must do.”

Leonardo Da Vinci

(8)

(9)

List of Figures

2.1 A-B-C items histogram [1] . . . 5

2.2 Forecast of demand [33] . . . 6

2.3 Serial System [11] . . . 9

3.1 Machine learning types [32] . . . 12

3.2 Example of GP syntax tree representation [27] . . . 13

3.3 Creation of a five node tree using the grow initialization method [27] . . . 14

3.4 Example of subtree one point crossover [27] . . . 15

3.5 Example of subtree mutation [27] . . . 15

6.1 Evolution of fitness value during the best run . . . 31

6.2 Sensitivity analysis over the problem parameters for the eCCGP solution and Gal-lego’s heuristic (λ L: bottom left; h: bottom right; p: up left; k: up right) . . . 32

6.3 Evolution of fitness value during the best run . . . 34

6.4 Sensitivity analysis over the problem parameters for the GP solution . . . 35

6.5 Sensitivity analysis over the problem parameters for the GP solution . . . 36

(12)

(13)

List of Tables

5.1 First Function set and Terminal set for the single-echelon model . . . 25

5.2 Extended version of the Function Set and Terminal Set defined for the single-echelon model . . . 26

5.3 Function Set and Terminal Set defined for the multi-echelon model . . . 27

6.1 Zheng’s data set . . . 30

6.2 Data-set for single-echelon model proposed by Kleinau and Thonemann . . . 30

6.3 Results and comparison of our GP solutions (RCCGPand ReCCGP), Zheng’s heuris-tic (RZg) and Gallego’s heuristic ( Rg) . . . 31

6.4 Comparison of the computational effort, in miliseconds of our GP solutions . . . 31

6.5 Data-set for multi-echelon model . . . 33

6.6 Results and comparison of our GP solution with De Bodt and Schwarz heuristics 33 6.7 GP results in which our solution gets close the heuristics or bests them . . . 34

(14)

(15)

Abbreviations and Symbols

CCGP Cooperative Coevolutionary Genetic programming

eCCGP Extended Cooperative Coevolutionary Genetic programming FEUP Faculty of Engineering of University of Porto

GB Giga-Bytes GP Genetic Programming JR Joint Replenishment KS Kleinau Set ML Machine Learning NR Normal Replenishment SKU Stock-keeping unit ZS Zheng Set

(16)

(17)

Chapter 1

Introduction

In order to reduce costs and improve the performance of an industrial enterprise, in most cases, inventory management is one of the key areas that needs to be heavily developed. This explains the increased exploration of this topic in the field of operations management.

One of the reasons that makes inventory management difficult is that market demand is a stochastic and very dynamic phenomenon and therefore difficult to predict. The presence of com-plex probability distributions and multi-stage systems makes this problem even more comcom-plex. Researchers have developed sophisticated methods and algorithms to optimize inventory manage-ment policies.

Our problem mainly explores the continuous review model, with a (r, Q) inventory control policy for the single stage system, and the (r, nQ) for the multi-stage system. Here we assume its demand pattern follows a Poisson distribution and this is where most of the complexity comes from. The fact that the demand is not a deterministic process, it increases the variability of the whole process and it makes the analysis of these models much harder. In the literature, vari-ous methods were developed to analyze and conclude about these specific models such as some heuristics (Zheng[37], Gallego[15], De Bodt and Graves, Schwarz[20]) and an optimal algorithm for the single stage situation (Federgruen and Zheng [13]) and they will represent an essential part to our project. What transpires from this study is that the closed-form analitical expressions are way more practical, simple and efficient than the somewhat extensive algorithm that gives us the optimal solution.

In this thesis, we essentially explore and implement a new method, based on machine learning, for the approximation of the reorder point and order quantity (r,Q) policy parameters, based on closed-form expressions. This kind of approach has already been explored in other works related to this matter (Kleinau and Thonemann). Additionally, we present an alternative approach based on Genetic Programming. Genetic Programming is an optimization method that applies the principles of natural evolution to optimization problems, which perfectly fits in our situation. More precisely, we present the concept of Cooperative Coevolution Genetic Programming, which allows us to evolve closed-form expressions for all our policy parameters simultaneously, not only in a single-stage situation (r,Q), but also in a multi-echelon system(r,nQ). Basically Cooperative Coevolution

(18)

2 Introduction learns all parameters in parallel populations that adapt to each other along the evolutionary run by sharing the fitness assessment. Knowing that, each of these models has a long-run average total cost associated, our main goal is to take advantage of this powerful tool that machine learning is and find the best solutions for our policy parameters so that we can minimize the system’s total costs.

The results are, in the majority of our tests, compared with the optimal algorithm proposed by Federgruen and Zheng, and the hybrid meta-heuristic proposed by Gallego. There are two main data-sets that were used in this project, which are also the ones that are the most relevant and described in the literature associated with this work: the Zheng’s set and the Kleinau’s set.

With this project, using a machine learning approach, we intend to generate, test, discard or improve hypotheses in order to optimize the expressions of inventory management policies. In this way, companies can improve their operational performance and, at the same time, work in a simpler, more efficient and transparent way, and can apply the final policies to numerous items in a fast and agile way.

Essentially, in Chapter 2 we give the reader an insight and background on the vast and com-plex field of Inventory Management. Then, in chapter 3, we procceed to detail and provide some important information about Machine Learning and its intricacies, which was crucial for the de-velopment of this project. In chapter 4 we introduce our problem concept which englobes the optimization of the (r, Q) policy parameters in a single-stage system as well as the (r, nQ) policy parameters in a multi-stage situation. Our methodology and how we approach this problem are detailed in chapter 5. In chapter 6 we describe how our experimental tests were conducted and the results that came from them. Finally, in chapter 7 we expose our conclusions from this work and give an idea of what could be done in the future regarding the continuation of this project.

(19)

Chapter 2

Background on Inventory Management

2.1 Context

Inventories have an important impact on the usual aggregate scorecards of management perfor-mance—namely, on the balance sheet and the income statement. Inventories are classified as one of the current assets of an organization [33].

Inventory control determines which quantity of a product should be ordered in order to to achieve some objective, such as minimizing cost. It also encompasses decisions regarding pur-chasing, distribution and logistics, and specifically addresses when and how much to order.

Traditional inventory models focus on risk-neutral decision makers, i.e., characterizing replen-ishment strategies that maximize expected total profit, or equivalently, minimize expected total cost over a planning horizon[9]. In this case, we will focus on minimizing expected total costs.

Decision making in production, inventory and supply chain management is therefore basically a problem of coping with large numbers and with a diversity of external and internal factors to the organization. Given that a specific item is to be stocked at a particular location, three basic issues must be resolved[33]:

1. How often the inventory status should be determined 2. When a replenishment order should be placed 3. How large the replenishment order should be

2.1.1 Problem

2.1.1.1 Costs and other important factors

Through empirical studies and deductive mathematical modeling, a number of factors have been identified that are important for inventory management.

The unit value (denoted by the symbol v) of an item is expressed in dollars per unit [33]. For a merchant it is simply the price (including freight) paid to the supplier, plus any cost incurred to make it ready for sale. It can depend, via quantity discounts, on the size of the replenishment. The

(20)

4 Background on Inventory Management unit value is important for two reasons. First, the total acquisition (or production) costs per year clearly depend on its value. Second, the cost of carrying an item in inventory depends on v.

Up to three types of costs are usually considered in inventory management[17]: holding costs, fixed costs and penalty costs.

We have the holding cost, i.e., the cost of holding an item in storage. Holding cost is usually expressed as a percentage of the items value and apply for the time that an item is kept in storage. Part of the holding cost is the return of an alternative investment, handling, storage, damage, obsolescence, insurance, taxes, etc [17].

Then we have the fixed cost (independent of the size of the replenishment), associated with a replenishment. Generally, they represent costs related to ordering and receiving inventory[34]. For a merchant, it is called an ordering cost and it includes the cost of order forms, postage, telephone calls, authorization, typing of orders, receiving, inspection, following up on unexpected situations, and handling of vendor invoices[33].

Shortages occur in case stocks are very low and, therefore, demand cannot be fully satisfied. These are charged as penalty costs[6]. If the customer order is backlogged, there are often extra costs for administration[4], price discounts for late deliveries, material handling, and transporta-tion. If the sale is lost, the contribution of the sale is also lost. In any case, it usually means a loss of good will that may affect the sales in the long run. Most of these costs are difficult to estimate. These costs could also be called the costs of avoiding stockouts and the costs incurred when stockouts take place. In the case of a producer, they include the expenses that result from chang-ing over equipment to run emergency orders and the attendant costs of expeditchang-ing, reschedulchang-ing, split lots, and so forth. For a merchant, they include emergency shipments or substitution of a less-profitable item. System control costs are those associated with the operation of the particular decision system selected. These include the costs of data acquisition, data storage and mainte-nance, and computation [33].

2.1.1.2 Variability of demand

When the demand rate varies with time [33], we can no longer assume that the best strategy is always to use the same replenishment quantity; in fact, this will seldom be the case. Matching supply and demand is particularly challenging when supply must be chosen before observing demand and demand is uncertain[8]. The demand can be either continuous with time or can occur only at discrete equispaced points in time. Another element of the problem that is important in selecting the appropriate replenishment quantities is whether replenishments must be scheduled at specific discrete points in time (e.g., replenishments can only be scheduled at intervals that are integer multiples of a week), or whether they can be scheduled at any point in continuous time[33]. The essential issue is that you must take a firm bet (how much inventory to order) before some random event occurs (demand) and then you learn that you either bet too much (demand was less than your order) or you bet too little (demand exceeded your order)[8]. Still another factor that can materially influence the logic in selecting the replenishment quantities is the duration of the

(21)

2.1 Context 5 demand pattern. A pattern with a clearly specified end resulting from a production contract is very different from the demand situation where no well-defined end is in sight[33].

2.1.2 General Process

2.1.2.1 The A-B-C Classification System

Managerial decisions regarding inventories must ultimately be made at the level of an individual item or product. The specific unit of stock to be controlled will be called a stock-keeping unit (or SKU), where an SKU will be defined as an item of stock that is completely specified as to function, style, size, color, and, often, location[33].

These SKUs will be assigned a higher priority in the allocation of management time and financial resources in any decision system we design. It is common to use three priority ratings: A (most important), B (intermediate in importance), and C (least important). The number of categories appropriate for a particular company depends on its circumstances and the degree to which it wishes to differentiate the amount of effort allocated to various groupings of SKUs[33].

Figure 2.1: A-B-C items histogram [1] According to Ramakrishnan Ramanathan [30]:

• A-items are relatively few in number but constitute a relatively large amount of annual use value.

• C-items are, on the contrary, relatively large in number but constitute a small amount of annual use value.

• B-items are the interclass items, with a medium consumption value. 2.1.2.2 The X-Y-Z Inventory Analysis

(22)

6 Background on Inventory Management • X – Category includes the materials which use is relatively constant or fluctuates rarely and

the ability to schedule or correct prediction is very high;

• Y – Category includes the materials which substantial fluctuations in demand are caused for seasonal reasons or because of trends related to certain products;

• Z – Category includes the materials which have a very irregular usage and the biggest de-mand fluctuations and to make a reliable prediction about its dede-mand is almost impossible .

2.1.2.3 Forecast of demand

[33] Inventory decisions nearly always involve the allocation of resources in the presence of de-mand uncertainty. For inventory decisions, financial resources must be deployed to procure goods in the anticipation of a future sale of those goods. Because of that we need to be able to forecast the future demand. A demand forecast [4] is an estimated average of the demand size over some future period.

Figure 2.2: Forecast of demand [33]

2.1.2.4 Replenishment

A stockout [33] can only occur during periods when the inventory on hand is “low”. The decision about when an order should be placed will always be based on how low the inventory should be allowed to be depleted before the order arrives. The idea is to place an order early enough so that the expected number of units demanded during a replenishment lead time will not result in a stockout very often. We define the replenishment lead time, as the time that elapses from the

(23)

2.1 Context 7 moment at which it is decided to place an order, until it is physically on the shelf ready to satisfy customers’ demands.

2.1.2.5 Inventory Categories

When demand is probabilistic, it is useful to conceptually categorize the inventories as follows [33]:

1. On-hand (OH) stock. This is stock that is physically on the shelf; it can never be negative. This quantity is relevant in determining whether a particular customer demand is satisfied directly from the shelf.

2. Net stock is defined by the following equation: Net stock = (On hand) - (Backorders) . This quantity can become negative (namely, if there are backorders). It is used in some mathematical derivations and is also a component of the following important definition. 3. Inventory position (sometimes also called the available stock). The inventory position is

defined by the relation: Inventory position = (On hand) + (On order) - (Backorders) - (Com-mitted). The on-order stock is that stock which has been requisitioned but not yet received by the stocking point under consideration.

4. Safety stock (SS). The SS (or buffer) is defined as the average level of the net stock just before a replenishment arrives.

2.1.3 Replenishment policies

There are two ways to go about how often the inventory status should be determined: the periodic review and the continuous review.

The continuous review policy is where the inventory position is continuously monitored and new orders are triggered by some events [36]. With periodic review , as the name implies, the stock status is determined only every R time units; the length of R is always some integral multiple of the base period [36]. Between the moments of review, there may be considerable uncertainty as to the value of the stock level[33]. Continuous review will reduce the needed safety stock. When using a continuous review system the inventory position when ordering should guard against demand variations during the lead-time L[4].

Periodic review has advantages, especially when we want to coordinate orders for different items[4]. In such a case, periodic review is particularly appealing because all items in a coor-dinated group can be given the same review interval. Periodic review also allows a reasonable prediction of the level of the workload on the staff involved[33].

In contrast, under continuous review, a replenishment decision can be made at practically any moment in time; hence the load is less predictable. A rhythmic, rather than random, pattern is usually appealing to the staff. Another disadvantage of continuous review is that it is generally more expensive in terms of reviewing costs and reviewing errors.

(24)

8 Background on Inventory Management Once the manager has determined whether the item falls in the A, B, or C category, and he or she has settled the question of continuous versus periodic review, it is time to specify the form of the inventory control policy. The form of the inventory policy will begin to resolve the second and third issues: When should an order be placed and what quantity should be ordered[33].

2.1.3.1 Order Point, Order Quantity (s, Q) policy

This is a continuous-review system (i.e., R=0). This model assumes that an order is placed when the inventory level reaches the reorder point s, i.e, there is no overshoot of the reorder point. [18]. The advantages of the fixed order-quantity (s,Q) system include that it is quite simple, particu-larly in the two-bin form, for the stock clerk to understand, that errors are less likely to occur, and that the production requirements for the supplier are predictable[33].

A key property of (s, Q) policies is that under mild conditions the distribution of the inventory position is uniform over (s, s + Q]. Managerially, (s, Q) policies are attractive because the more restricted order size facilitates packaging, transportation and coordination[15].

The primary disadvantage of an (s,Q) system is that in its unmodified form, it may not be able to effectively cope with the situation where individual transactions are large [33].

2.1.3.2 Order-Point, Order-Up-to-Level (s, S) policy

This system again assumes continuous review. Under an (s, S) policy, an order is placed to increase the item’s inventory position (= inventory on-hand + orders outstanding - backlogs) to the level S as soon as this inventory position reaches or drops below the level s[38]. The (s, S) system is frequently referred to as a min-max system because the inventory position, except for a possible momentary drop below the reorder point, is always between a minimum value of s and a maximum value of S[33].

2.1.3.3 Periodic-Review, Order-Up-to-Level (R, S) policy

In this control procedure each item is reviewed at fixed and constant time intervals. The inventory of all items is reviewed every t units of time and an item i is ordered up to the level S, if its inventory position is less than or equal to R [35]. In addition, the (R, S) system offers a regular opportunity (every t units of time) to adjust the order-up-to-level S, a desirable property if the demand pattern is changing with time[33].

2.1.3.4 (R, s, S) policy

This is a combination of (s, S) and (R, S) systems[33]. The idea is that every R units of time we check the inventory position. If it is at or below the reorder point s, we order enough to raise it to S. If the position is above s, nothing is done until at least the next review instant. It has been shown that, under quite general assumptions concerning the demand pattern and the cost factors involved, the best (R, s, S) system produces a lower total of replenishment, carrying, and shortage costs than

(25)

2.1 Context 9 does any other system. However, the computational effort to obtain the best values of the three control parameters is more intense than that for other systems, certainly for class B items[33]. 2.1.3.5 (r, nQ) policy - Problem extension

This type of policy can be applied to a multi-stage situation. As depicted in fig.2.3, the stages represent the different stocking points in the production-distribution process.

Figure 2.3: Serial System [11]

Material flow from one stage to the next requires [11] a leadtime and incurs a setup cost (in addition to a variable cost proportional to the flow quantity). Demand unsatisfied from on-hand inventory is backlogged, incurring penalty costs. Thus, an optimal policy, even if it exists and is identified, would not be easy to implement. In other words, the “optimal” policy is no longer optimal or even attractive once the managerial effort of implementation is taken into ac-count. Therefore, we turn to simple, cost-effective heuristic policies. Specifically, we consider the echelon-stock (r, nQ) policy, which is a natural generalization of the power-oftwo policy. An (r, nQ) policy operates as follows: whenever the inventory position is at or below the reorder point r, order nQ units where n is the minimum integer required to increase the inventory position to above r. We call Q the base quantity. Combining the (r, nQ) policy with the echelon-stock concept leads to the echelon-stock (r, nQ) policy whereby every stage uses an (r, nQ) policy based on its echelon stock.

(26)

(27)

Chapter 3

Machine Learning

Machine learning (ML) is basically programming computers to optimize a performance criterion using example data or past experience [3]. Machine learning is to build algorithms that can either receive input data and use statistical analysis to predict (Predictive) an output while updating outputs as new data becomes available.

3.1 Machine Learning types

Machine learning algorithms are often categorized as supervised, unsupervised or reinforcement learning. In supervised algorithms [25] a training set with correct responses is provided and , based on this training set, the algorithm generalises to respond correctly to all possible inputs. Unsuper-vised machine learning algorithms infer patterns from a data-set without reference to known, or labeled, outcomes[3]. Unlike supervised machine learning, unsupervised machine learning meth-ods cannot be directly applied to a regression or a classification problem because the values for the output data are unknown, making it impossible to train the algorithm in the most usual way. In reinforcement learning [5] the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm.

(28)

12 Machine Learning

Figure 3.1: Machine learning types [32]

Although machine learning in general, and supervised methods in particular, is usually asso-ciated with predictive analytics, it can also be applied to prescriptive tasks. In some problems (e.g. transactions acceptance), converting a prediction into a prescription is straightforward (if the transaction is predicted to be fraudulent, it should be rejected). When that is not the case, there are two main machine learning paradigms to devise a prescription. The first is to generate good/optimal solutions, and use a supervised method to infer some rule from those solutions (e.g. [2]). The second is to generate, combine and simulate different rules, and output the best one (e.g. [20]).

Within supervised ML we have:

• Decision trees. These models use observations about certain actions and identify an optimal path for arriving at a desired outcome[5].

• Neural Networks can actually perform a number of regression and/or classification tasks at once, although commonly each network performs only one. In the vast majority of cases, therefore, the network will have a single output variable, although in the case of many-state classification problems, this may correspond to a number of output units [7]

• In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.[1] In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression[5].

As unsupervised ML methods we have:

• K-means clustering [19] [7]is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. This model groups a specified number of data points into a specific number of groupings based on like characteristics.

(29)

3.2 Genetic Programming 13 • Dimensionality reduction. As the name suggests, this method is used to remove the least im-portant information (sometime redundant columns) from a data set. Basically is the process of how to discover compact representations of high-dimensional data [31].

3.2 Genetic Programming

The evolutionary algorithm is a highly parallel mathematical algorithm that transforms a set (pop-ulation) of individual mathematical objects (typically fixed-length character strings patterned after chromosome strings), each with an associated fitness value, into a new population (i.e., the next generation) using operations patterned after the Darwinian principle of reproduction and survival of the fittest and after naturally occurring genetic operations (notably sexual recombination) [21]. GP is an evolutionary algorithm (EA) whose goal is the automatic programming of computers. The programs are represented by trees as shown in Figure3.2. The nodes represent functions and the leafs represent terminals.

Figure 3.2: Example of GP syntax tree representation [27]

GP [20] applies the principles of natural selection to optimization problems. Like in other evolutionary algorithms [27], in GP the individuals in the initial population are typically randomly generated. There are a number of different approaches to generating this random initial popu-lation. In a brief way, we will describe two of the simplest (and earliest) methods (the full and grow methods), and a widely used combination of the two known as Ramped half and-half. The full method tends to produce full/bushy trees [27], by choosing functions until the depth limit is reached. The grow method does not require functions to be chosen at any point; if the maximum depth has not yet been reached, either a function or a terminal may be selected, causing more diverse tree structures with some branches longer than the others [27]. A scheme of the grow initialization method is represented in Figure.3.3.

Because neither the grow or full method provide a very wide array of sizes or shapes on their own, Koza [21] proposed a combination called ramped half-and-half. Half the initial population is constructed using full and the other half is constructed using grow. This is done using a range

(30)

14 Machine Learning

Figure 3.3: Creation of a five node tree using the grow initialization method [27]

of depth limits (hence the term “ramped”) to help ensure that we generate trees having a variety of sizes and shapes [27]

As described in [16], a selection operator is intended to improve the average quality of the population by giving the high-quality chromosomes a better chance to get copied into the next generation. The most commonly employed method for selecting individuals in GP is tournament selection [27], which is discussed below, followed by fitness proportionate selection, but any stan-dard evolutionary algorithm selection mechanism can be used. In tournament selection a number of individuals are chosen at random from the population. These are compared with each other and the best of them is chosen to be the parent. In double tournament, individuals must pass two layers of tournaments (one by size, one by fitness) to be selected[27].

GP departs significantly from other evolutionary algorithms in the implementation of the oper-ators of crossover and mutation. The most commonly used form of crossover is subtree crossover [27]. The crossover operator is a mechanism for incorporating the best attributes of two parents into a new individual [14]. Given two parents, subtree crossover randomly (and independently) selects a crossover point (a node) in each parent tree. Then, it creates the offspring by replacing the subtree rooted at the crossover point in a copy of the first parent with a copy of the subtree rooted at the crossover point in the second parent, as illustrated in Figure3.4.

The mutation operator, described in [14], is a mechanism for introducing necessary attributes into an individual when those attributes do not already exist within the current population. The most commonly used form of mutation in GP (which we will call subtree mutation) randomly selects a mutation point in a tree and substitutes the subtree rooted there with a randomly generated subtree. This is illustrated in Figure.3.5.

To have a better understanding of how the evolution process enrolls, the sequential steps of a generic GP algorithm are listed below [20]:

(31)

3.2 Genetic Programming 15

Figure 3.4: Example of subtree one point crossover [27]

Figure 3.5: Example of subtree mutation [27]

1. In the first step of the algorithm, the trees of the initial generation are created randomly. Each tree consists of nodes from a function set F and a terminal set T that are defined before the start of the algorithm. F contains the functions that can be used to construct the trees, such as {+, -, × and ÷}. T contains the input parameters and state variables of the problem, such as demand rates and inventory levels. The first node of a tree is always a function; subsequent nodes are selected from the union F ∪ T .

2. A fitness value is assigned to each tree. Trees that solve the underlying problem well receive a higher fitness value than trees that solve the underlying problem poorly.

(32)

16 Machine Learning 3. In the third step of the algorithm, the trees of the next generation are created by applying the genetic operators crossover, mutation, and reproduction to the trees of the current pop-ulation. The probability that a tree is selected for a genetic operation depends on its fitness value. Trees with a high fitness value have a higher probability of being selected than trees with a low fitness value.

4. Finally, it is checked if a certain termination criteria has been met. If a pre-defined limit is reached, the algorithm terminates and the best tree found during the run is reported as the result of the run. Otherwise steps 2 to 4 are repeated.

3.2.1 Coevolution

Coevolutionary Algorithms (CoEA) [23] try to mimic synergies between different species (either competitive, cooperative, or both) as the means to solve complex problems. In co-evolution, there are two (or more) populations of individuals.

The use of multiple interacting subpopulations has also been explored as an alternate mecha-nism for representing the coevolution of species, but has focused primarily on a fixed number of subpopulations each evolving competing [29]. The environment for the first population consists of the second population. And, conversely, the environment for the second population consists of the first population. The co-evolutionary process typically starts with both populations being highly unfit. Then, the first population tries to adapt to the environment consisting of the second population. Simultaneously, the second population tries to adapt to the environment consisting of the first population[21].

Coevolutionary algorithms, as described in [28] are typically applied to interactive domains. Such domains generally lack an objective function giving a value to each potential solution. Rather, interactive domains encode the outcomes of interactions between two or more entities; depending on the domain, individual entities or the interaction itself may receive value as a result of an interaction. An algorithm must then decide how to use these outcomes to make decisions about which entities to promote in the next generation and which entities to demote or discard [28].

In One-Population Competitive Coevolution [24], the whole population competes in various games. The outcomes of those games determines the fitness of individuals in the subpopulation. Thus the fitness of an individual depends on which other individuals it plays against.

In N-Population Cooperative Coevolution [24], a problem is broken into N subparts, and solu-tions for each subpart are optimized in separate subpopulasolu-tions. Each individual in a subpopulation is tested by combining it with the representatives of each specie to form a joint solution, which is then evaluated. The inner procedures of a cooperative CoEA do not differ much from traditional EAs:

1. randomly define N initial populations of solution candidates;

(33)

3.2 Genetic Programming 17 3. define the survivors and the representatives (typically, the best individual) for the next

gen-eration;

4. repeat steps (2) and (3) until some termination condition is fulfilled.

The result is the co-adaptation of the different populations by reciprocally reacting to changes in the representatives, towards solutions with improved fitness.

(34)

(35)

Chapter 4

Continuous-review policies with Poisson

demand

4.1 Single Stage

Under an (r, Q) policy, the inventory position (= inventory on-hand + orders outstanding-backorders) of the item in question is continuously reviewed, and an order of fixed quantity Q is placed as soon as the inventory position drops to a reorder point r. Such (r, Q) policies are widely used in inven-tory systems with uncertain demands and lead times. In this case, the demand pattern follows a Poisson distribution.

The advantages of the fixed order-quantity (r,Q) system include that it is quite simple for the stock clerk to understand, that errors are less likely to occur, and that the production requirements for the supplier are predictable.

It is well-known [23] that for single-item systems, under standard assumptions, an optimal policy exists within the class of (r, Q) policies. Federgruen and Zheng observed that the long-run average cost C(r, Q) of an (r, Q) policy is as follows[13]:

C(r, Q) =

ξ + r+Q ∑ y=r+1 G(y)

Q (4.1)

where ξ > 0 is the total ordering cost, and G(.) is a uni-modal function with lim_|y|→∞G(y) = ∞. Assuming a Poisson distribution with parameter λ L (the demand rate - λ - multiplied by the lead time L) and a cost structure that consists of a fixed cost K per order (hence, ξ = Kλ ), an inventory carrying cost of h per unit in stock per unit of time, and a backlogging cost of p per unit of back-logged demand, they demonstrate that C(r, Q) may be rewritten as follows:

G(y) = (h + p)

y−1

∑

j=0

P

j

+ p(λ L − y)

(4.2)

where G(y) is the total holding and back-order costs for an inventory level y, and P j = Prob[LD(∞) ≤ j], that is, the probability of the demand rate in a lead time being less than or

(36)

20 Continuous-review policies with Poisson demand equal to j. From here they derive an efficient algorithm for computing an optimal (r, Q) policy for continuous review stochastic inventory systems, which will be referred to as the Federgruen-Zheng algorithm hereafter.

Algorithm 1 Federgruen and Zheng’s efficient algorithm for computing an optimal (r;Q) policy in continuous review stochastic inventory systems.

Data: ξ Result: r, Q L← 0; while ∆G(L) < 0 do L← L + 1; end while S← ξ + G(L); Q← 1; C∗← S; r← L − 1; Z← L + 1; while True do if G(r) ≤ G(Z) then if C∗≤ G(r) then stop; else S← S + G(r); r← r − 1; end if else if C∗≤ G(Z) then stop; else S= S + G(Z); Z= Z + 1; end if Q← Q + 1; C∗← S/Q; end while

A sensitivity analysis was performed, and the results were compared with both the Economic Order Quantity model and the Federgruen and Zheng’s algorithm. Nevertheless, Zheng [37] de-rived approximate expressions (rd, Qd) for both parameters (r, Q) as it shows in Equations. 4.3 4.4.

Q

d

=

q

2λ K(h+p) hp (4.3)

r

d

= λ L −

_h+ph

Q

d (4.4)

(37)

4.2 Multi Stage 21 Gallego[15] built on Federgruen and Zheng’s work to define new bounds and a new heuristic for finding (r, Q) policies. He proposed a closed-form expression for the order quantity which applies a correction to Zheng’s Qd heuristic, followed by the numerical optimisation of r, as it shows in Equation.4.5.

Q

g

= min

_√

2,

v

u

t

s

1 +

(h+p)L 2K

2

(4.5)

4.2 Multi Stage

Our problem is basically a two-stage, serial system where stage 1 orders from stage 2 and stage 2 orders from an outside supplier with unlimited stock. There are economies of scale at each stage for placing orders. The transportation leadtime from stage i + 1 to stage i is a constant Lifor i =

1, . . . , N, with stage N + 1 being the outside supplier. The demand process follows a Poisson Distribution with an average rate λ L. We assume that the demand sizes only take integer values. Excess demand is backlogged with backorder cost rate p. Let hi > 0 be the echelon holding cost

rate at stage i for i = 1, . . . , N. The planning horizon is infinite, and the objective is to minimize the long-run average total cost.

For any time t, define [11]:

• B(t) = backorder level at stage 1, • Ii(t) = echelon inventory at stage i,

= on-hand inventory at stage i plus inventories at, or in transit to, stages 1, . . . , i - 1, • ILi(t) = echelon inventory level at stage i

= Ii(t) - B(t),

• IPi(t) = echelon inventory position at stage i

= ILi(t) plus inventories in transit to stage i,

and

• ESi(t) = echelon stock at stage i

= ILi(t) plus stage i’s outstanding orders, in transit or backlogged at stage i + 1.

The variables mentioned above only take integer values. The systems inventory is controlled by an echelon-stock (r, nQ) policy. That is, stage i orders nQiunits from stage i + 1 whenever stage

i’s echelon stock falls to or below ri, where n is the minimum integer so that stage i’s echelon stock

after ordering is above ri. We call Qi(a positive integer) the base quantity, and ri(an integer) the

reorder point, at stage i. The base quantities at the different stages are coordinated in the sense that Qi+1 = ni Qi, making the OH inventory at stage i always a multiple integer of Qi−1 with i =

2,...,N. At stage i we assess a setup cost Kifor each Qiordered. Thus, the long-run average setup

(38)

22 Continuous-review policies with Poisson demand 2

∑

i=1

λ

_QKi i (4.6)

Note that the rate at which the system-wide holding and backorder costs accrue at time t is

N

∑

i=1

h

i

IL

i

(t) + (p + H

1

)B(t)

(4.7)

where H1is the installation holding cost rate at stage 1, H1= N

∑

i=1

hi.

Here we develop an approximation for holding and backorder costs of echelon stock (R, nQ) policies, which is based on the idea of De Bodt and Graves [12]. This approximation is based on a ’nestedness’ assumption [10]: whenever stage 2 receives a batch, it immediatly ships a sub-batch to stage 1, being Q2= nQ1. The sub-batch that is sent down immediately is called a joint

replen-ishment (JR), whereas the remaining n-1 sub-batches are called normal replenreplen-ishments (NR). The holding costs from stage 2 are clear and there is no need for approximation:

G

₃

(y) = E[h

2

(y − D[L

2

])

(4.8)

The average holding and backorder costs incurred by the JR (at stage 1) are exactly the same as the average costs of the single-location system [10]:

(4.13)

Now we are able to identify what is the expression that defines the long-run average total cost of the echelon-stock (r, nQ) policy:

(39)

4.2 Multi Stage 23

(4.14)

Our main goal is to find a (r, nQ) policy that minimizes the cost function represented above. The point of this section is to demonstrate how we went about finding an approximation of our main parameters, which are Q1, Q2, R1, R2and n. We utilized two main heuristics strongly

associated wit multi-echelon serial systems for this purpose: De Bodt and Graves [12] heuristic and Schwarz heuristic. Being our model a two stage, continuous-review system with a Simple Poisson demand, De Bodt and Graves heuristic assumption is based on the following sequence: whenever stage 2 receives a batch, a sub-batch is immediatly sent to stage 1. From this, we take that each stage 2 batch is the equivalent of n sub-batches each of size Qi i.e, Q2 = nQ1. Being

L₁and L2 the leadtimes for stage 1 and stage 2 correspondingly, it is implicit that if we order Q2

units at time t at stage 2, the order will arrive at t+L2. Q1, R1, R2and Q2will be solutions for these

equations [20]:

Q

₁

₂

) f

₃

(i)

(4.18) In order to obtain Q1, R1, R2and Q2= nQ1we follow these steps:

1. Giving that this is a discrete process, we had to set an upper limit for our sum expressions instead of using ∞. After some analysis, for the Equation4.16we set the limit to 20, and to Equation4.17we set it up to 25.

2. We start by fixing the R1 value using hp1 as reference, and the R2 value using h1+nh2

p as

reference.

3. Then, after we get R1, R2and calculate z, we proceed to compute Q1

4. Given Q1we calculate R1and R2

5. We compute Q1again and the algorithm iterates until all the parameters stabilize.

For n, we can either find certain values and test these procedure for all of them to see which one minimizes the most our long-run average costs or we can use a variant of this heuristic and use the optimal value of n of the deterministic problem proposed by Schwarz:

(40)

24 Continuous-review policies with Poisson demand

n =

q

(

k2 k1

)(

h1 h2

)

(4.19)

(41)

Chapter 5

Methodology and Solution Proposal

To be able to evolve closed-form expression for the inventory control policies parameters simul-taneously, we decided that using Cooperative Coevolution would be an efficient and adequate approach. Using ECJ, a Java-Based Evolutionary Computation Research System, it was imple-mented an algorithm to evolve the (r, Q) policy expressions for a single stage situation and, by applying the same strategy, for the (r, nQ) inventory control policy, in a multi-stage system. The evolved solutions generate the appropriate policy parameters in function of the demand character-istics and the cost structure.

For the single stage, (r,Q) policy, we created two separate species, one that represents the reorder point(r) and other that represents the order quantity (Q). They were both generated by ramped half and half algorithm, which is a mixed method that incorporates both the full and the grow methods.

The species were evolved in parallel, and combined to produce a pair of (r, Q) policy param-eters used in the fitness evaluation, that is defined by the long-run average total cost. In order to produce the offspring of each species its individuals were subjected to one-point crossover and uniform mutation.

ECJ uses Koza’s Style tree structures [21][22], which represent the parse trees of Lisp s-expressions. In an initial experiment, we used a typical function set with basic terminals. These sets are represented in Table5.1 where λ is the average demand rate, which follows a Poisson distribution with parameter λ L (being L the lead time), h represents the carrying cost per unit in stock per unit of time, the backlogging cost of p per unit of back-logged demand and a fixed ordering cost represented by K.

Table 5.1: First Function set and Terminal set for the single-echelon model Function set Terminal set

+, -, /, ×,√ λ , h, k, p, L

(42)

26 Methodology and Solution Proposal In the implementation, we use a protected version of the square root and division functions, where they return 1.0 in the case of division by zero or when there is a square root of a negative value.

In a second experimentation phase, in order to see if we could better our results and improve the expressions generalisability, our initial terminal and function set were enlarged, as we can see in Table5.2. This extended version of CCGP (Cooperative Coevolutionary Genetic Programming) (eCCGP) uses some of the functions and combination of parameters that appear in the expressions of the literature (Zheng, Gallego), and they are specific for each policy parameter, r and Q. Ad-ditionally, it was implemented a new feature which is using the order quantity value (Q) in the terminal set for the reorder point (r).

In addition, we introduce the inverse survival function of the Poisson distribution isf(x) with rate λ L, which allows us to obtain the inventory level that corresponds to a given survival (i.e., stockout) probability. For proper use, this survival function was protected against values outside of the [0,1] domain, including the 0 and 1 values: isf(x) = max(10−6, min(1-10−6, x)).

Table 5.2: Extended version of the Function Set and Terminal Set defined for the single-echelon model

Param. Function Set Terminal Set Q +, -, /, ×,√,2, max, min h, p, L, 2, k , Qd

r +, -, /, ×,√,2, max, min, isf h, p, L, 2, λ , Q,λ L, _h+phQ , hQ_{λ p}

For the multi-stage situation, we applied almost the same exact structure of the ECJ algorithm for cooperative coevolution but with an adjusted representation. We now have to evolve in parallel four subpopulations to evolve the main parameters of a (r, nQ) policy applied to a multi-stage system: Q1, r1, r2and n, being Q2= nQ1.

Our function and terminal sets were also modified accordingly. We created a second inverse survival function of the Poisson distribution with rate λ (L1+ L2). We added the holding costs

parameters, setup costs parameters and lead times from each stage as well as some of the expres-sions from the heuristics implemented in this work (DeBodt and Graves, Schwarz). With the hope of result’s improvement, we used n in the Q1and R2 expresions evolution process and we added

(43)

Methodology and Solution Proposal 27 Table 5.3: Function Set and Terminal Set defined for the multi-echelon model

Param. Function Set Terminal Set

Q1 +, -, /, ×,√,2, max, min h1, h2, k1, k2, p, L1, L2, n, Qd

r1 +, -, /, ×,√,2, max, min, isf1 h1, h2, p, L1, L2, 2, λ ,Q1, h_{λ p}1Q1

r2 +, -, /, ×,√,2, max, min, isf2 h1, h2, p, L1, L2, n, 2, λ ,Q1, (h1+nh_{λ p}2)Q1

(44)

(45)

Chapter 6

Tests and Results

For a better understanding and structure wise, this chapter will be divided in two sections: the first one is for the single-echelon model, where we demonstrate how the tests were conducted, the data-sets used and other relevant parameters; then, in the next section, we present our experimental tests regarding the multi-stage model.

The programs were executed using FEUP’s new addition to its computational resources, the powerful Avalanche cluster, enabling us to run various programs simultaneously in a very practical and fast way. Here is a list of Avalanche’s specifications that were used for this work:

• Walltime/runtime: 65 hours • Memory: 4 GB • Number of cores: 8 • DDR3 RAM clocked at 1333 MHz

6.1 Single-Stage Model

6.1.1 Numerical Tests

To develop our numerical tests, we used two main data-sets that were already built. The first set (Table 6.1) was created and presented by Federgruen and Zheng, with a total of 135 instances, used for the verification of the optimal algorithm and the heuristics implemented. And the other set (Table6.2) was proposed by Kleinau and Thonemann, with a total of 720 instances, used for analysing our evolutionary computing results.

For the Zheng set, all the instances where p<h were discarded.

Now, we specify which control parameters were used for our test runs. After GP tuning, we defined the parameter values in order to our program could work adequately. So, the maximum number of generations was set to 600 and the subpopulations size was set to 400 individuals each. The maximum tree depth was set to 17 to complement the effect of the double tournament in limiting bloat. The best solution was found with a mixture of one-point crossover and uniform

(46)

30 Tests and Results Table 6.1: Zheng’s data set

λ 5,25,50 h 1,10,25 k 1,5,25,100,1000 p 5, 10, 25, 100 L 1

mutation, both with a low probability rate (20%). For the training phase, we made 20 runs, each of them using a training set composed by 10% of the full data-set. In the end, the best individual of the whole run was submitted to the full data-set for validation. During the evolutionary process, if an individual is < 1, then Q would be set to 1.

To summarize: • Number of tests: 20 • Generations: 600 • Subpopulation size: 400

• Initialization method: Ramped half-and-half • Crossover operation: One point

• Crossover prob. rate: 20% • Mutation operation: Uniform • Mutation prob. rate: 20%

• Selection Operation: Double Tournament • Max depth: 17

6.1.2 Results

In this section we will present the closed-form expressions for each policy parameter that resulted from our cooperative coevolutionary program. The size and complexity of the final expression were higher than we were expecting. In Table6.3we have the average gap to the optimum (Fed-ergruen and Zheng) over the Kleinau data-set and Zheng data-set for the CCGP version, for the

Table 6.2: Data-set for single-echelon model proposed by Kleinau and Thonemann λ 1,5,10

h 1,5,10,50,100 k 1,10,100,500 p h*5, h*10, h*25, h*100 L 1,2,5

(47)

6.1 Single-Stage Model 31 extended version (eCCGP) and for the Zheng’s and Gallego’s heuristics. Due to its excessive size, the best individual expressions that we found are placed in Appendix Aas well as some of the values that constitute a validation run from the best individual from eCCGP with the Kleinau set. Not being the ideal results that we were expecting, specially for the CCGP version, these gaps still represent a very solid proof that in fact, through this approach, it is possible to get very near optimal solutions.

Table 6.3: Results and comparison of our GP solutions (RCCGP and ReCCGP), Zheng’s heuristic

(RZg) and Gallego’s heuristic ( Rg)

RCCGP ReCCGP RZg Rg

4.34% 1.42% 8.17% 0.97%

We now compare the average time that was needed to compute one instance in both cases (CCGP and eCCGP) and this information is shown in miliseconds in Table6.4. We can clearly see that the eCCGP solution performs a bit faster than the CCGP version.

Table 6.4: Comparison of the computational effort, in miliseconds of our GP solutions RCCGP ReCCGP Gallego

1.40 1.06 65.4

It was necessary to run various times in order to get the best individuals due to the stochastic nature of the problem, being 60 hours the average time to complete an entire run.

In Figure6.1, we can see the evolution of the fitness during the run that gave us our best indi-vidual in eCCGP, as it is identical to the fitness evolution in the CCGP. As we already mentioned, the fitness is calculated doing the sum of the total costs of all training instances.

16000 16200 16400 16600 16800 17000 17200 17400 17600 17800 18000 1 31 61 91 121 151 181 211 241 271 301 331 361 391 421 451 481 511 541 571 Fi tn es s Va lu e Generation

(48)

32 Tests and Results In Figure6.2, we compare, in more detail, our best evolved solution with Gallego’s heuristic. It is presented a sensitivity analysis that our eCCGP and Gallego’s heuristic have to the problem parameters. This heuristic clearly thrives when, for instance, the lead time demand (λ L) or the backorder penalties (p) assume higher values. However, when the setup costs (k) increases or the lead time demand is lower, it is noticeable that Gallego’s heuristic presents a larger dispersion which results in higher average optimality gaps. Focusing in the eCCGP solution, we conclude that, in general, its average optimilaty gap are more consistent and regular than the heuristic’s. One major difference to the heuristic is that eCCGP has a good performance with low lead time demands and it actually has a better average optimality gap when k increases.

0 0,5 1 1,5 2 2,5 3 3,5 0 10 20 30 40 50 60 70 1 10 100 1000 Av g. G ap G ap K 0,00 1,00 2,00 3,00 4,00 5,00 6,00 7,00 0 10 20 30 40 50 60 70 1 10 100 Av g. G ap G ap λL 0 0,5 1 1,5 2 2,5 3 0 10 20 30 40 50 60 70 1 10 100 1000 Av g. G ap G ap p Gap_gallego Gap_eCCGP Avg. Gap_gallego Avg. Gap_eCCGP 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 0 10 20 30 40 50 60 70 1 10 100 Av g. G ap G ap h

Figure 6.2: Sensitivity analysis over the problem parameters for the eCCGP solution and Gallego’s heuristic (λ L: bottom left; h: bottom right; p: up left; k: up right)

(49)

6.2 Multi-Stage Model 33

6.2 Multi-Stage Model

6.2.1 Numerical Tests

For this model, we applied the same approach as before (Cooperative Coevolution) but now we need to evolve in parallel four subpopulations in order to obtain the four closed-form expressions for the policy parameters: Q1, r1, r2and n, being Q2= nQ1. We now use a new data-set, suited for

the multi-stage situation, with a total of 972 instances, also proposed by Kleinau and Thonemann. In terms of the parameters used, we maintained the majority of the specifications for our pa-rameters that were utilized for the single-stage model. For the training phase we also used 10% of the entire set and, after getting the best individual, we validate it for the whole set. We ran several tests with a different number of generations, changing the subpopulation size accordingly. Because of limited time, it was not possible to set a solid specification for the number of genera-tions and subpopulation size. Our best result was found setting the generagenera-tions number to 200 and setting the subpopulation size to 600 individuals.

For coherent and valid results, we protected the order quantity Q1 and n against values < 1,

setting them to 1.

Table 6.5: Data-set for multi-echelon model λ 1,2,5 h1 1,5,10,50 h₂ h₁*0.2,h1*0.5,h1*1 k1 1,10,50 k2 k1*2,k1*5,k1*10 p (h1+h2)*2,(h1+h2)*5,(h1+h2)*10 L1 1 L2 2 6.2.2 Results

Similarly to the single stage situation, the final expressions of the best individuals for our param-eters are somewhat extensive and complex, so, for better visualization and organization, they will be put in AppendixA. In Table6.6, are presented the average gaps between our GP solutions and the De Bodt and Schwarz heuristics.

Table 6.6: Results and comparison of our GP solution with De Bodt and Schwarz heuristics De Bodt and Graves Schwarz

10.67% 13.93%

The gaps between our solution and the heuristics are not the best nor the expected ones. How-ever, during the validation of the best individual, in some instances, our GP solution gets very

(50)

34 Tests and Results close to the heuristics values and in some it even surpasses both the heuristics, as we can see in Table6.7:

Table 6.7: GP results in which our solution gets close the heuristics or bests them

λ h1 h2 k1 k2 p L1 L2 Q1 Q2 r1 r2 n Costs RDB(%) RSZ(%)

1 10 10 10 20 110 1 2 2 4 1 3 2 83.88 0.757 2.132 1 10 10 10 50 55 1 2 2 4 0 2 2 79.93 -2.71 -7.35 1 10 5 50 250 105 1 2 5 10 0 2 2 112.83 -9.18 -6.24 2 1 0.2 10 50 10.4 1 2 9 27 0 1 3 13.18 -0.93 2.003 In Figure6.3, we present the evolution of our best individual’s fitness value, that in comparison to the single-stage model, does not stabilize as efficiently but it seems that gets very near to the point of stabilization regardless.

Figure 6.3: Evolution of fitness value during the best run

In Figures 6.4 and6.5, we made a sensitivity analysis that our GP solution have with the problem parameters. We see that when the setup costs (k1 and k2) get higher our GP solution’s

average gap (with Schwarz heuristic) improves. Also, when the demand rate takes lower values, the GP can surpass both heuristics, presenting a negative average gap. But when the demand rate (λ (L1+ L2)) increases our GP solution gets higher gaps.

(51)

6.2 Multi-Stage Model 35 1 10 100 0 2 4 6 8 10 12 14 16 18 -60,00 -40,00 -20,00 0,00 20,00 40,00 60,00 80,00 100,00 120,00 140,00 160,00 Av g. G ap G ap h1 Gap_GP Avg.Gap_GP 1 10 100 1000 0 5 10 15 20 25 30 -60,00 -40,00 -20,00 0,00 20,00 40,00 60,00 80,00 100,00 120,00 140,00 160,00 Av g. G ap G ap k2 1 10 100 0 5 10 15 20 25 30 35 -60,00 -40,00 -20,00 0,00 20,00 40,00 60,00 80,00 100,00 120,00 140,00 160,00 Av g. G ap G ap λ(L1+L2) 1 10 100 0 5 10 15 20 25 -60,00 -40,00 -20,00 0,00 20,00 40,00 60,00 80,00 100,00 120,00 140,00 160,00 Av g. G ap G ap k1

(52)

36 Tests and Results 0,1 1 10 100 0 5 10 15 20 25 -60,00 -40,00 -20,00 0,00 20,00 40,00 60,00 80,00 100,00 120,00 140,00 160,00 Av g. G ap G ap h2 1 10 100 1000 0 5 10 15 20 25 30 -60,00 -40,00 -20,00 0,00 20,00 40,00 60,00 80,00 100,00 120,00 140,00 160,00 Av g. G ap G ap p

(53)

Chapter 7

Conclusions and Future Work

Following successful implementation and verification of the basic models and algorithms, a new meta-heuristic was proposed to find closed-form expressions for both parameters, by means of GP and coevolution.

We presented an alternative approach for solving inventory-control problems. CCGP was applied to the search for the closed-form expressions of each inventory-control policy parameter that would give us the nearest-to-optimal solutions. This method, based on GP, manages to surpass Zheng’s heuristic. In order to improve the usability of CCGP’s expressions, an extended primitive set was also used (eCCGP). This allowed to evolve smaller solutions and improve the performance. This version of CCGP presents a significantly better average gap to the optimal value (1.42%) than CCGP, proving that the changes made to the function and terminal sets were crucial. In the multi-stage model, we applied the same approach as before (Cooperative Coevolution) but now we are evolving in parallel four subpopulations in order to obtain the four closed-form expressions for the policy parameters. The results were not what we were expecting since the average gaps to both heuristics (De Bodt and Graves, Schwarz) were relatively high. Nonetheless, we could see that in many instances our best individual manages to get very close to both of the heuristics and even get better results.

For future work, the algorithm can always be optimized and worked on for it to give us bet-ter solutions with even lower gaps to the optimal solution. The continuation of this work can be focused, for instance, in changing the terminal and function sets by adding new functions or parameters, tuning the GP parameters, especially for multi-stage. Also, finding or constructing a new dataset can be helpful in enhancing the final results.

It is possible to extend the problem by trying to utilize the same approach in a system that has stochastic lead time. We can also try to explore the continuous model of this problem. In this case, one feature that could help improve the final results would be the mathematical derivation.

(54)

(55)

Appendix A

A.1 Best Individuals expressions

A.1.1 Single-Stage

Here, we present the expressions of our best individual that resulted from our evolutionary code: • rate = λ L

• divlbd = hQ_{λ p} • divhp = _h+phQ

Q: Qd +sqrt(((L∗(L∗((square(((hmin(p+L))minL)− p)/k)minsqrt((sqrt(Qd)∗2)maxsqrt((p/k)∗ L)))))min((((square((Qd −L)min(square(L∗ p)/(square(h)−(k −sqrt(sqrt(2))))))∗((Qd −L)+ (((Qd − L) + ((kmaxp)/k))/k))) + L) ∗ ((((Qd + Qd) ∗ Qd) + Qd) + Qd))min(((square((2max ((square((LmaxQd)∗(Qd +Qd))/k)∗Qd))∗2)/k)min((L∗sqrt(sqrt((sqrt(Qd)∗2)maxsqrt((p/k)∗

L))))min(((hmin2)∗2)∗Qd)))min(square(((Lmaxsquare((Qd +Qd)∗(Qd +Qd)))maxL)−(square((Qd − L)min((square(h)/k)/(square(h)−(2+k))))∗2))min(square((2max(2max(L∗Qd)))∗2)/k)))))min ((((pmaxh)max(L ∗ (square(((L ∗ (square(Qd + Qd)/k))minsqrt(pmax((p ∗ 2) ∗ Qd))) ∗ h)/(Qd +

Qd))))/k)min(square((square(square(((square(Qd ∗(Qd +Qd))/k)∗Qd)−(Lmax(Qd ∗sqrt(k)))))/k)∗ h)/k)))

r: (is f ((rate+sqrt(is f (((2/square((p−Q)minsqrt((divlbd ∗k)/L)))max(square(hmax(2/Lam))∗ square(2/Lam)))/k)))/is f (((divhp/(p−divlbd))max(ratemax2))∗is f (is f (Q/divhp))))−((h/(h+ p))∗sqrt(sqrt((((is f (k)−2)−(divhp/((p−rate)−divlbd)))−(divhp/is f (((((p−rate)−divlbd)− k) + (k/divlbd))/2))) − is f (((2/Lam) − k) + (hminrate))) ∗ is f ((((square(L)/k)maxis f (((((p − rate) − divlbd) − k) + sqrt(divlbd))/((ratemax(((p − rate) − divlbd) − k))minL)))/Lam)max ((((h/2)/Lam)maxis f (((((p − rate) − divlbd) − k) + (k/divlbd))/((2 − (divhp/((p − rate) −divlbd)))minL)))/Lam))))) − divhp

(56)

40 Appendix A A.1.2 Multi-stage • heur1 = h1Q1 λ p • heur2 = (h1+nh2)Q1 λ p n = sqrt(sqrt(square(k2/(sqrt((h1max((k1−L)∗(((k2/sqrt(h2))max((h2/Lam)−sqrt(h1)))∗ L))) − h1)min(Lmax(((((k2/sqrt(Lam + (h2 + k2)))max(k2/sqrt((k2/sqrt(h2))max((k2/Lam) − (L ∗ Lam)))))max((sqrt((Lmax(k2/Lam))maxh1)/sqrt(sqrt(h1)))/(square(L) ∗ p))) +(((L−h1)max(k2/Lam))/sqrt((p∗sqrt(h1))+(L−((h2+sqrt(h2))maxk1)))))min(((k2∗((((Lam+ L2)max(k2/Lam))/sqrt(L + k2)) − L)) ∗ (h2/(p ∗ sqrt(h1)))) ∗ h1)))))) + sqrt(square(k2/ (((((k2/sqrt(h2max((((k2/Lam)−k2)maxsqrt(square(h1)))/sqrt(Lam+(h2+k2)))))−k1)max(k1− L))min(((h2 + L2)maxk1)min(square(h2 + L2)maxLam)))min(h2max(k2/sqrt(h2)))))))

Q₁= Lammax((Lam+(((2+(Lam+(h2maxLam)))−(Lammax(((Lam+Lam)+((((Lam/(h2− (L − Lam)))min2) − 2)max((h2 + (((Lammin(Lam/L)) ∗ (k1 − h1)) + ((Lam ∗ (k1 − h1))

+square(2))))max(square(Lam)max(((Lam + (square(Lam) ∗ Lam)) − ((p + p) + ((Lam ∗ p) + L))) + (h1 + (square(Lam) ∗ Lam))))))) − (((k1 + ((((Lammin(Lam/L)) ∗ (k1 − h1)) + ((Lam ∗ (k1−h1))+(((L−Lam)maxn)minL2)))−2))−(((Lam+h2)min((Lam/L)/L))+(h2maxLam)))+ square(Lam)))))maxLam)) − h2)

r1= (h1+(((square(sqrt(2))∗L)minsqrt((((2max(pmin(is f 1(Q1)/(L2minLam))))−L)maxLam)max(L2−

square(heur1))))max(sqrt((((h1/heur1)min(sqrt(((2−heur1)/sqrt(p−L))−Q1)minsqrt((h2/square(h1)) min((h1min(L/Lam))max(sqrt(Lam)∗((2maxp)∗sqrt(is f 1(heur1))))))))∗h2)−(is f 1((is f 1(Q1+

L)∗L2)−((p+2)/L2))min(Lminsqrt(Lammin((is f 1(heur1+h2)minLam)min(2maxheur1))))))minsqrt(Q1− p))))min(square(sqrt((sqrt(Lammin(((heur1∗Q1)−(Lam/h2))minis f 1(L)))max(square(sqrt(2))min square(Lam)))max(((2maxp) − ((L2 − p)max(L − p)))minLam))) − (((square(p) − (square(p) −

((h1minLam)max(LamminL2))))min(sqrt(square(L2)−(Lammin((2maxp)−(((is f 1(Q1)−Lam)/p)∗ ((Lammin(sqrt(Lam)∗((2maxQ1)∗sqrt(Q1−L))))∗sqrt(Q1−L))))))maxis f 1((Lminis f 1((Lmin is f1(heur1/(sqrt(h1) ∗ Q1)))/sqrt(Lam ∗ (Lam + Q1))))min((Lmin((is f 1(Q1) − 2)/h1))min (sqrt((is f 1(Q1)/(L2minLam))−Q1)minsqrt((h2/square(Lam+L2))min((2maxp)−(((is f 1(Q1)−

Lam)/h1) ∗ ((LamminQ1) ∗ sqrt(Q1 − L))))))))))minis f 1(heur1)))

r2= sqrt(((((((((((h1/(2∗heur2))−h1)−h1)max((Q1maxp)−(((square(is f 2(p)minQ1)max(n−

(Q1/L2)))/(nmax((Q1maxp)−(heur2∗L2))))maxsquare(L2))))max((((((h1/p)−h1)−h1)max((Q1maxp)− (L2∗((Lammax(LammaxL))−(2∗heur2)))))max(heur2/h2))min(((square(is f 2(p)minQ1)max(n−

(h2∗2)))/(nmax((Q1maxp)−(heur2∗L2))))maxsquare(L2))))min(((square(is f 2(p)minQ1)max(n− (Q1/L2)))/(nmax((Q1maxp)−(((LammaxL)∗h1)minQ1))))maxsquare(L2)))∗sqrt(is f 2((((square(Q1)max

(57)

A.1 Best Individuals expressions 41

(heur2/h2))−((2∗heur2)/2))min(((square(Q1maxL)max(is f 2(n)∗h1))−is f 2(heur2/L))min(Q1/(heur2∗ 2))))−((2min(n−((2∗heur2)∗2)))∗((Q1−((2∗heur2)/L))minh2)))))minsquare(((square((n−

(heur2∗2))min(((Lammax((L2maxQ1)−heur2))−heur2)min(Q1/(heur2∗2))))min(is f 2((nmaxh2)/L)∗ (square(heur2 ∗ (((heur2 − Q1) − h2) − square(Lam))) − (L + Q1))))min(is f 2(heur2)∗

(square((nmaxh2)minQ1)−(L+Q1))))min((h2min(Lammin((((h1max(L2∗sqrt(p)))−(2∗L2))min(L2∗ 2))min((square(Lam)−(L+Q1))min(L2∗(Q1−(Lam−Q1)))))))min(p∗((square((Lam−Q1)− (h2 ∗ 2))maxLam) − (L + Q1)))))) ∗ (2/heur2))minsquare(square(Lam))) + ((2/heur2) ∗ ((L2 ∗ ((Q1−((2∗heur2)/L))minh2))min(Lammax(Lammax((Lam−(L2minL2))−(h1/(2∗heur2))))))))

A machine learning approach to the optimization of inventory management policies

F

E

U

P

A Machine Learning Approach to the

Optimization of Inventory Management

Policies

Álvaro Silva de Melo

F

J

E

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

Abbreviations and Symbols

Chapter 1

Introduction

Chapter 2

Background on Inventory Management

2.1

Context

Chapter 3

Machine Learning

3.1

Machine Learning types

3.2

Genetic Programming

Chapter 4

Continuous-review policies with Poisson

demand

4.1

Single Stage

C(r, Q) =





G(y) = (h + p)

∑

P

+ p(λ L − y)

Q

=

q

r

= λ L −

Q

Q

= min



√

2,

v

u

u

t

s

1 +







4.2

Multi Stage

∑

λ

∑

h

IL

(t) + (p + H

)B(t)

G

(y) = E[h

(y − D[L

])

∑

G

(y)

G

(y) = E[h

_√