• Nenhum resultado encontrado

Traffic Network Is Group Dep Day TP_to Dep Month Day TP_from Age Sch Connection Time

Low High

Feature value

Figure 5.9: SHAP summary plot of the Pre-Tactical model.

It can be observed that the age is the feature with the second greatest feature importance value.

However, as seen in the results, the passenger’s features do not contribute to improve the model’s performance. This can be explained by the range of SHAP values: despite having an absolute mean distant from zero, the value varies little from instance to instance. This means that the contribution of the feature is often similar, whereas a feature whose contribution varies a lot with its value is more informative.

Of the new features introduced, it is worth highlighting the behaviour of the binary variable which indicates whether the passenger travels in a group (represented in pink) or not (in blue). As can be seen from Figure 5.9, there is a clear division between the two classes near the SHAP value zero. Samples in blue have positive values, meaning that passengers travelling alone are more likely to be predicted as missed connecting passengers.

Analysing the feature which represents the gender of the passenger, it can be concluded that the model cannot distinguish any trends regarding the influence of this feature.

After applying the encoding, the different categories of classes in which the passengers travel in the arrival flights (‘Class From’) are numerically ordered as follows: Groups<Business<Economy<R1

<Allots. Interpreting the results of the SHAP summary plot, it can be observed that, in fact, high values correspond to categories that are often associated with positive targets, producing the expected results.

Regarding the classes of the departure flights (‘Class To’), after the encoding the categories are ordered as Groups<Business<Economy<Allots<R1. In this case, low values have a positive impact in the model and, therefore, are more likely to be predicted as missed connecting passengers.

5.4 Tactical Decision Support Model

5.4.1 Baseline

As already stated, the feature used as baseline for the Tactical DSM was the perceived connection time.

Figure 5.10 shows the ROC and PR curves of the baseline model.

Regarding the ROC space, the AUCROC is 0.93 and the threshold which gives the greatest G-mean value is 60 minutes. In fact, this value is the same as the one used by TAP (see Figure 5.10(a)). This point corresponds to a G-mean value of 0.88.

Figure 5.10(b) shows the performance of the baseline model in the PR space. The optimal threshold is 70 minutes with a F1score value of 0.67. The obtained value of AUCPRwas 0.70.

The overall performance of this model is better than the Strategic and Pre-Tactical baseline. This may be and indicator that taking into account the arrival flight delays significantly improves the model performance.

Figure 5.10: Baseline performance of the Tactical model. The colormap represents the threshold value in minutes. The threshold which gives the best performance of ROC curve of the baseline classifier (Figure (a)) is 60 minutes. Figure (b) represents the PR curve of the Tactical baseline classifier. The best threshold for this curve is 30 minutes. TAP’s threshold corresponds to the currently value used as MCT, i.e., 60 minutes.

5.4.2 Model

Oversampling

The obtained results for the AIC and BIC scores are very similar to those of the previous models (Figure 5.11). Thus, for the same reasons, the GMM was chosen to be defined by a diagonal covariance with 200 components.

Hyperparameters

For the set of hyperparameters chosen to carry out the grid search, the set which returned the best per-formance was: learning rate equal to 0.5, maximum depth of 15 and a median value of boosting rounds

100 101 102 103

Figure 5.11: GMM selection using AIC and BIC scores for the Tactical model.

of 137. The best F1 score is located in one of the corners, which leads to believe that increasing the parameters’ values could improve the results. As already stated, increasing the maximum depth makes the model more complex and aggressively consumes memory when training the model. Therefore, other sets of parameters were not tested.

13 14 15

Figure 5.12: Tuning the hyperparameters of the Tactical classification model with cross-validation.

Table 5.7 presents the results for the tuning process of the majority chunks for the Tactical model.

Analysing the results, it appears that the trend is very similar to the previous model. That is, with 3 and 4 chunks the results are the same, but 4 is preferable to avoid introducing too much noise in the data.

Between a number of chunks equal to 4 or 5, it is preferable to chose 4 as the difference in F1 score values is considered to be more important than the increase in the G-mean.

Table 5.7: Tuning the number of majority chunks for the Tactical model.

c G-Mean F1 Score

3 0.94 0.87

4 0.94 0.87

5 0.95 0.84

Results

With the test set, the final results are computed. As shown in Figure 5.13, the AUCROCobtained is 0.99 and the AUCPRis 0.94.

Figure 5.13: ROC and PR curves of the Tactical model.

The G-mean and F1 scores of the Tactical DSM can be seen in Table 5.8 as well as the baseline results.

As expected, the results obtained with the proposed model are significantly better than those ob-tained with the baseline.

It can be concluded that this model has a better performance than the previous ones, which is in line with the intuition that the predictive capacity increases as the more information becomes available.

Model Explanation

From the SHAP summary plot presented in Figure 5.14 it is possible to analyse that the overall behaviour of the features is the same as in the previous models.

Note that concerning the traffic network, the division of the feature classes is more evident and there is a clear separation between the NS connections (in pink) and the other type of connections. In fact, NS samples are the only ones with positive SHAP values.

Regarding the perceived connection time, it is interesting to note that, unlike the scheduled connec-tion time, most of the samples have negative SHAP values. This means that the majority of the samples are more likely to be mapped to the negative class, i.e., not missed connections.

7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

SHAP value (impact on model output) Class To

Figure 5.14: SHAP summary plot of the Tactical model.

5.5 Post-Operations Decision Support Model

5.5.1 Baseline

The Post-Operations DSM uses the actual connection time as the feature for the baseline model.

In the ROC space, the best threshold is 70 minutes with a G-mean value of 0.87, whereas in the PR space the optimal threshold is 40 minutes for a F1equal to 0.68. In fact, the metrics’ results are similar to the ones from the Tactical baseline. This is an indicator that the departure flight delay (new information added) does not improve the performance of the model. A possible reason may be that most departure delays occur after boarding ends, so passengers have already boarded and are no longer on the verge of missing their flight.

The obtained values of AUCPRand AUCPRwere 0.93 and 0.72, respectively.

0.0 0.2 0.4 0.6 0.8 1.0

Figure 5.15: Baseline performance. The colormap represents the threshold value in minutes. The threshold which gives the best performance of ROC curve of the baseline classifier (Figure (a)) is 70 minutes. Figure (b) represents the PR curve of the baseline classifier. The best threshold for this curve is 40 minutes. TAP’s threshold corresponds to the currently value used as MCT, i.e., 60 minutes.

Documentos relacionados