### TP_from Traffic Network Sch Connection Time

Low High

### Feature value

Figure 5.5: SHAP summary plot of the Strategic model.

The SHAP summary plot combines feature effects and feature importance, giving global post hoc explainability of the model.

Each point in the plot corresponds to a Shapley value of an instance and a feature. As shown in Figure 5.5, the x-axis represents the Shapley value and the position along the y-axis is determined by the feature. The features are ordered according to the respective importance in a descending order (the most important one at the top). The importance is defined as the mean absolute value of the SHAP values for each feature.

The colormap represents the feature value in a scale from low to high, i.e. from the minimum feature value to the maximum one. For the categorical variables, encoded with target encoding, these values represent the frequency with which the positive target appears in the dataset associated with each category.

For each feature, there are some jittered points along the y-axis. This is a representation of overlap-ping points, to provide a better understanding of the distribution of the Shapley values per feature.

Analysing Figure 5.5, it is possible to conclude that the feature with greatest importance value is the scheduled connection time. Samples with low connection time have positive SHAP values. This means that short connection time is likely to contribute to a positive prediction or, in other words, to classify the connections as missed. This observation is expected, since it is consistent with the already used criterion by TAP. It can also be observed that there is a high number of samples whose connection time value is more likely to be mapped to a negative prediction. Even so, there are a considerable number of samples with SHAP value between 0 and 3, approximately. This means that in fact, the time connection

initially scheduled can be improved to prevent passengers from missing their connecting flights.

Since this feature is the one that defines the baseline, the remaining features are responsible for improving the model’s performance.

The second feature with greatest importance value is the passenger traffic network within the airport.

This highlights the fact that it is important to take into account the airport’s bottlenecks (passport and security control) which the passenger has to pass through. After applying the encoding, the different categories of traffic network are numerically ordered as follows: NN<SN<SS<NS. It is interesting to notice that the model can capture that the SS connections (represented in purple in Figure 5.5) are the ones with the most negative impact in the model, which means that the lack of bottlenecks makes the samples more likely to be classified as not missed connections. The model is also able to perceive that passengers travelling in a NS connection (represented in pink), corresponding to the traffic network with the greatest number of bottlenecks, are more likely to be classified as missed connections.

Concerning the arrival and departure flight designators, high values correspond to categories that are often associated with positive targets, hence the encoding produces the expected results. Both features present a large number of samples whose impact on model output is approximately zero. The remaining samples are proof that sometimes the origin or destination might have an influence on the result. Such influence may be related to predictable flight delays, if some flights have a historically tendency to be delayed.

Unlike what happens with the other features, the departure week and month days, do not present a clear interpretation in terms of SHAP values. This does not mean that these features are not relevant, since the SHAP values are not always zero, but that the influence may be related to other factors in a way that is not noticeable.

**5.3** **Pre-Tactical Decision Support Model**

**5.3.1** **Baseline**

As stated in Section 4.1.2, comparatively to the Strategic DSM, the new introduced information is the one relative to the passengers. That is, no information regarding flights delays was added. Therefore, the feature used in the Pre-Tactical baseline model is the scheduled connection time. All the results obtained in Section 5.2.1 are still valid and will be used to establish a point of comparison to the Pre-Tactical model’s results.

**5.3.2** **Model**

**Oversampling**

Figure 5.6 presents the AIC and BIC results obtained for the Pre-Tactical model. The overall behaviour of the different types of covariance matrices is very similar to the one observed in the previous model. That is, diagonal and full covariances present similar behaviour, but the diagonal type has better performance.

Similar to the previous model, it is considered that the improvements in the scores for a number of components between 200 and 1000 is not significant. Recalling that for high numbers of components the computational cost increases significantly, the number of estimators was chosen to be 200.

10^{0} 10^{1} 10^{2} 10^{3}

Figure 5.6: GMM selection using AIC and BIC scores for the Pre-Tactical model.

**Hyperparameters**

There are four combinations of hyperparameters giving the optimal value (Figure 5.7). From these four possible solutions, the set of parameters with learning rate equal to 0.5 and maximum depth of 15 returns the lowest median number of boosting rounds, 137. This means that with this combination of parameters it takes less time to train the model.

### 13 14 15

Figure 5.7: Tuning the hyperparameters of the Pre-Tactical classification model with cross-validation.

Regarding the selection of the number of majority chunks, it was considered that between a number of chunks equal to 3 or 4, the best solution would be 4, to minimize the noise introduced in the data.

Thus, the final choice falls on parameters 4 and 5. As shown in Table 5.5, by binning the majority class into 5 chunks, the model slightly improves the G-mean value. As stated in the Section 2.2, precision and recall provide a more accurate prediction of the classification performance. Since F1score is defined as the harmonic mean of the model’s precision and recall, it is considered that this metric has more value than the G-mean. As the decrease in F1score was considerable when compared to the increase in the

G-mean value, the number of chunks was chosen to be 4.

Table 5.5: Tuning the number of majority chunks for the Pre-Tactical model.

c G-Mean F1 Score

3 0.91 0.80

4 0.91 0.80

5 0.93 0.75

**Results**

After establishing all the model’s parameters, the complete training set was used to train the model. The test set is then used to compute the final results.

Figure 5.8 shows the ROC and PR curves which have an AUC_{ROC} of 0.98 and AUC_{PR} of 0.89,
respectively.

Figure 5.8: ROC and PR curves of the Pre-Tactical model.

Since the baseline of the Pre-Tactical DSM is the same as the Strategic model, the analysis previ-ously made also applies. That is, the model performance improves significantly by considering more information.

Comparing the results of the two models, Strategic and Pre-Tactical, it is interesting to note that, although new information was introduced, the results are similar. This may mean that the new features introduced regarding the passenger do not add information relevant to the predictions.

**Model Explanation**

Figure 5.9 gives global explanation of the Pre-Tactical DSM.

5 0 5 10