4.2 A Machine Learning-based framework for predicting supply delay risk using Big Data
4.2.5 Material and methods
4.2.5.6 Evaluation
We use a realistic and robustRWscheme to evaluate the classification models as illustrated in Figure 46. This scheme is realistic in the sense that simulates the real environment in which a model would be used, producing several training and test iterations over time. And, is it robust because each iterations produces a set of predictions, thus there are several model evaluations through time, as opposed to the popular single hold-out train and test scheme.
Rolling Window data (D Samples)
Training Test S
Training Test
S
W T Time
Training
S
W T
Test
Training
S
W T
Test
...
2
3
U
... ...
W T
1
Figure 46: Rolling window scheme adapted from (Oliveira et al.,2017)
This scheme works as follows. In the first iteration (U = 1), the model is trained using a fixed training windowWwith the oldest samples and predicts the subsequentTsamples. Then, in the second iteration (U = 2), the training window slides inSinstances, causing the replacement of theSoldest instances of the training window by theS recent ones. A new model is fit and then predicts the new subsequentT samples. This process is repeated until the last interaction of theRWscheme. The total of iterations is determined using the following formula:
𝑈 = (𝐷− (𝑊 +𝑇))/𝑆.
4.2.5.6.1 Measuring model performance
The evaluation criteria are a key factor for quantifying the model performance (Galar et al., 2012;
Witten et al., 2016). In this work, the overall performance of classification models is given by the AUC of ROC analysis, also known as AUC or AUC-ROC (Fawcett, 2006). The ROC analysis is obtained by considering the predictions as probabilities (p) for a binary class. The class is assumed true if𝑝 > 𝐷, where D is a decision threshold. When using a fixed D, the predicted class labels can be used to compute the well known the confusion matrix. Figure47 illustrates an example of such matrix, which matches predicted outcomes with the actual values and includes four main statistics for the binary classification task (Spark,2020):
• True Positives (TP)- number of positive class correctly classified;
• True Negatives (TN)- number of negative class correctly classified;
• False Positives (FP)- number of negative class incorrectly classified as positive class and;
• False Negatives (FN)- number of positive classes incorrectly classified as negative class.
N TN N
FP P
P FN TP
Actual value
Prediction outcome
N - Negative; P - Positive
Figure 47: Confusion matrix for a binary classification task.
Several statistics and insights can be obtained from the confusion matrix. Metrics such asTrue Positive Rate (TPR), True Negative Rate (TNR), Positive Predictive Value (PPV), Negative Predictive Value (NPV),
TECHNOLOGY False Positive Rate (FPR), F1 score and accuracy are defined using the following formulas (Larose,2005;
Sun et al.,2009):
• 𝑇𝑃𝑅 =𝑇𝑃𝑇𝑃+𝑇 𝑁, also known as recall, hit rate or sensitivity;
• 𝑇 𝑁𝑅 =𝑇 𝑁𝑇 𝑁+𝐹𝑃, also known as specificity or selectivity;
• 𝑃𝑃𝑉 = 𝑇𝑃+𝐹𝑃𝑇𝑃 , also known as precision;
• 𝐹𝑃𝑅 = 𝐹𝑃+𝑇 𝑁𝐹𝑃 , also known as fall-out;
• 𝑁𝑃𝑉 =𝑇 𝑁𝑇 𝑁+𝐹𝑁;
• 𝐴𝐶𝐶 = 𝑇𝑃+𝑇 𝑁+𝐹𝑃+𝐹𝑁𝑇𝑃+𝑇 𝑁 ;
• 𝐹1=2∗ 𝑃𝑃𝑉𝑃𝑃𝑉+𝑇𝑃𝑅∗𝑇𝑃𝑅.
TheROCcurve is a two-dimensional graphical representation technique for visualizing, organizing and selecting classifiers based on their performance. It is a curve that summarizes the trade-off between the TPR(y-axis) and FPR(x-axis) for different threshold points (D) between 0.0 and 1.0 (Fawcett,2006; Sun et al., 2009). The AUC-ROC measures the quality of the probabilistic classifier and is calculated using Equation (4.10). A random classifier hasAUC-ROCof 0.5, while a perfect classifier has AUC-ROC of 1.
𝐴𝑈𝐶 −𝑅𝑂𝐶 =
∫ 1
0
𝑇𝑃
𝑇𝑃 +𝐹𝑁𝑑 𝐹𝑃
𝐹𝑃+𝑇 𝑁 𝑑 =
∫ 1
0
𝑇𝑃 𝑃 𝑑
𝐹𝑃 𝑁
(4.10)
4.2.5.6.2 Measuring misclassification impact on inventory performance
In order to measure the impact of a model misclassification on the inventory performance of the concerned company, we design a cost matrix that determines the cost of classifying samples from one class as another, as shown in Table48. Following the notation of Elkan (2001) and Sun et al. (2009), 𝐶(𝑖, 𝑗) denotes the cost of predicting an instance from classias classj. Hence,𝐶(1,0) represents the cost of misclassifying a positive instance as a negative one, whereas𝐶(0,1)is the cost of misclassifying a negative instance as a positive one.
In this work, the cost matrix is based on inventory-related costs, namely the special freight costs calculated using the business domain expert’s knowledge and unitary holding costs. Inventory Holding Costs (IHC)are costs incurred to hold inventory and include capital costs and storage costs. It is calculated using the following formula:
𝐼𝐻𝐶𝑚 =(𝐼𝑚 ×𝑃𝑚) ×𝑉𝑚 (4.11)
C(0,0) N
N
C(1,0) P
C(0,1)
P C(1,1)
Actual value
Prediction outcome
N - Negative; P - Positive
Figure 48: Cost matrix for the binary classification task.
where𝐼𝑚 is the holding rate for raw material𝑚 per unit of time,𝑃𝑚 denotes the raw material standard unit price and𝑉𝑚represents the order volume. On the other hand, special or premium freight is a type of shipment offered by transportation providers for urgent deliveries. This type of shipments tends to be very expensive and normally performed by airways. In general, the special freights are caused by inventory mismanagement that leads to high stockout risk (Avci & Selim,2017).
In the context of the case study company, the calculation of the special freight costs follows specific business-oriented rules essentially dependent on two factors: the supplier location and the transport mode. In addition, the company sets a penalty cost that differs according to each combination of these factors. Concretely, national special freights are typically made by land, where the carriers define the price from the number of load units to be transported. Here, the penalty cost (𝑠𝑛,𝑙) depends on the distance in kilometers from the company to the national supplier. For in-Europe special freights, the carriers provide shipments by land and air. For shipments by land, the price is defined by the distance from the company.
In contrast, for shipments by air, the weight to be transported is the main cost driver. In both cases, the magnitude of the penalty costs𝑠𝑖,𝑙 and𝑠𝑖,𝑎 is dependent on the urgency required for receiving the supply order. Finally, Out-Europe special freights, typically more costly to manage, are made by air and the corresponding cost function is also determined from the weight to be transported and the transit time necessary for shipping the order. Table21summarizes the algebraic expressions used by the analyzed company to determine the special freight costs.
Recalling the cost matrix presented in Fig.48, it is noteworthy that while the costs related to𝐶(1,0) involve only the special freight component, those related to𝐶(0,1) comprise both the special freight cost as well as the holding cost component. The latter case is motivated by the fact that the classifier triggers a need to carry out a special freight that is unnecessary in light of the actual production requirements. As such, in addition to the special freight cost, the corresponding chartered quantity will cause an increase in the inventory on-hand, which represents an extra stock to be stored.
TECHNOLOGY Table 21: Special freight cost functions according to the
transporta-tion mode and supplier locatransporta-tion.
Supplier location Transport mode Cost function
National Land 𝐶 =𝑁𝑞 ×𝑃𝑝 ×𝑠𝑛,𝑙
In-Europe Land 𝐶 =𝐷×𝑃𝑘𝑚×𝑠𝑖,𝑙
Air 𝐶 =𝑊𝑞 ×𝑃𝑘𝑔𝐸 ×𝑠𝑖,𝑎
Out-Europe Air 𝐶 =𝑊𝑞 ×𝑃𝑘𝑔𝑂 ×𝑠𝑜,𝑎
Legend:𝐶 - special freight cost;𝑞- quantity to be transported;
𝑁𝑞 - number of load units required to transport𝑞;𝑊𝑞 - total weight (packaging weight + loading weight) of quantity𝑞;𝐷 -distance from the company to supplier;𝑃𝑝 - price per load unit;
𝑃𝑘𝑚 - price per kilometer;𝑃𝑘𝑔𝐸 - price per kilogram for suppliers located in Europe;𝑃𝑘𝑔𝑂 - price per kilogram for suppliers located out of Europe;𝑠𝑛,𝑙 - penalty cost for national express services carried out by land;𝑠𝑖,𝑙 - penalty cost for In-Europe express services carried out by land;𝑠𝑖,𝑎- penalty cost for In-Europe express services carried out by air;𝑠𝑜,𝑎 - penalty cost for Out-Europe express services carried out by air.
4.2.6 Experiments and Results
This section describes the experimental and modeling setup, as well as the results of model perfor-mance comparison and misclassification cost.