** Class ToClass From**

**6.1 Future Work**

The model explainability analysis indicated that there are some features with greater importance and whose interpretation is clearer than other features. Therefore, an ablation study could be performed to understand the contribution of the “weak” components to the overall model.

The developed models can predict whether a given passenger can make the connection successfully or not. That is, the task is framed as a classification problem. As future work, it would be interesting to directly predict the MCT. By improving the predictions of the time passengers need to transverse the airport, it would enable airlines to minimize the number of missed connections and to better serve their passengers. However, airlines and airports have little knowledge regarding passengers’ whereabouts in the airport and, within the collected data, there is no feature that can be used as target to learn the

predictions. Thus, the task can not be framed as a simple regression problem.

Given the airport’s structure graph and the routes that passengers have to make on the different connections, it is possible to formulate the problem as the task of estimating the distributions of the edge values. Figure 6.1 presents a simplified version of the SS connections.

Plat 10

Plat 14N

Plat 14S

Plat 20-22

**...**

Gate S 22-23

24-25

17-21 14-16 7-13

Plat 12

Bus Jet

Bridge

Terminal 1

Figure 6.1: Graph of the airport structure and diagram of the Schengen-Schengen connections proce-dures.

As can be seen in the Figure 6.1, the graph contains bidirectional edges, so it is an undirected graph.

Each edge represents a distance and the main objective of this problem is to obtain the time it takes to travel those distances. Although it seems like a problem with a trivial solution, it is very complex, since the time it takes for a passenger to travel a given distance is influenced by a large number of factors, such as the flow of passengers at the airport a a certain time, or if the passenger travels in a group, etc. A lot of these information is not available or it is unknown. Therefore, it can be assumed that each edge has a typical value (mean value) and a given standard deviation from that value of the time it takes to travel that edge. With this assumption, each edge can be modeled by a normal distribution and the objective of the problem is to find out which parameters best define these distributions taking into account all the samples in the dataset.

0 1

2

3 4 5

Figure 6.2: Simplified representation of the graph of the airport structure, schematized as an undirected graph in which each edge follows a given normal distribution.

**Bibliography**

[1] C. Barnhart and A. Cohn. Airline schedule planning: Accomplishments and opportunities. Man-ufacturing and Service Operations Management, 6(1):3–22, 12 2004. ISSN 15234614. doi:

10.1287/msom.1030.0018.

[2] Relat ´orio de gest ˜ao e contas consolidadas. Technical report, TAP - Transportes A ´ereos Portugue-ses, SGPS, S.A., 2019. URL https://www.tapairportugal.com/pt/-/media/Institucional/

PDFs/Anual-reports/2020/Relatorio-de-Contas/RC-Consolidado-TAP-SGPS-Dez19_PT.pdf.

[3] M. Lohatepanont. Airline Fleet Assignment and Schedule Design: Integrated Models and Algo-rithms. PhD thesis, MIT, Cambridge, MA, 2001.

[4] S. Lan, J. P. Clarke, and C. Barnhart. Planning for robust airline operations: Optimizing aircraft routings and flight departure times to minimize passenger disruptions. Transportation Science, 40 (1):15–28, 2006. ISSN 15265447. doi: 10.1287/trsc.1050.0134.

[5] Z. Wu, Q. Gao, B. Li, C. Dang, and F. Hu. A Rapid Solving Method to Large Airline Disruption Problems Caused by Airports Closure.IEEE Access, 5:26545–26555, 2017. ISSN 21693536. doi:

10.1109/ACCESS.2017.2773534.

[6] E. K. Burke, P. De Causmaecker, G. De Maere, J. Mulder, M. Paelinck, and G. Vanden Berghe. A multi-objective approach for robust airline scheduling.Computers and Operations Research, 37(5):

822–832, 5 2010. ISSN 03050548. doi: 10.1016/j.cor.2009.03.026.

[7] M. Lohatepanont and C. Barnhart. Airline Schedule Planning: Integrated Models and Algorithms for Schedule Design and Fleet Assignment. Transportation Science, 38(1):19–32, 2004. ISSN 00411655. doi: 10.1287/trsc.1030.0026.

[8] S. Yan, C. H. Tang, and T. C. Fu. An airline scheduling model and solution algorithms under stochastic demands. European Journal of Operational Research, 190(1):22–39, 10 2008. ISSN 03772217. doi: 10.1016/j.ejor.2007.05.053.

[9] J. P. Pita, C. Barnhart, and A. P. Antunes. Integrated flight scheduling and fleet assignment under airport congestion. Transportation Science, 47(4):477–492, 2013. ISSN 15265447. doi: 10.1287/

trsc.1120.0442.

[10] M. A. Aloulou, M. Haouari, and F. Zeghal Mansour. Robust aircraft routing and flight retiming.

Electronic Notes in Discrete Mathematics, 36(C):367–374, 2010. ISSN 15710653. doi: 10.1016/j.

endm.2010.05.047.

[11] S. E. Atkinson, K. Ramdas, and J. W. Williams. Robust scheduling practices in the U.S. airline industry: Costs, returns, and inefficiencies. Management Science, 62(11):3372–3391, 11 2016.

ISSN 15265501. doi: 10.1287/mnsc.2015.2302.

[12] M. Dunbar, G. Froyland, and C. L. Wu. An integrated scenario-based approach for robust aircraft routing, crew pairing and re-timing.Computers and Operations Research, 45:68–86, 5 2014. ISSN 03050548. doi: 10.1016/j.cor.2013.12.003.

[13] V. Vaze and C. Barnhart. Modeling airline frequency competition for airport congestion mitigation.

Transportation Science, 46(4):512–535, 2012. ISSN 15265447. doi: 10.1287/trsc.1120.0412.

[14] S. Yan and C. H. Chen. Coordinated scheduling models for allied airlines.Transportation Research Part C: Emerging Technologies, 15(4):246–264, 2007. ISSN 0968090X. doi: 10.1016/j.trc.2006.

05.002.

[15] H. Jiang and C. Barnhart. Dynamic airline scheduling. Transportation Science, 43(3):336–354, 2009. ISSN 15265447. doi: 10.1287/trsc.1090.0269.

[16] E. R. Mueller and G. B. Chatterji. Analysis of Aircraft Arrival and Departure Delay Characteristics.

AIAA’s Aircraft Technology, Integration, and Operations (ATIO), 2002.

[17] N. Xu, G. Donohue, K. B. Laskey, and C.-H. Chen. Estimation of Delay Propagation in the Na-tional Aviation System Using Bayesian Networks. InInternational Conference for Research in Air Transportation, 2004.

[18] C. L. Wu and K. Law. Modelling the delay propagation effects of multiple resource connections in an airline network using a Bayesian network model.Transportation Research Part E: Logistics and Transportation Review, 122:62–77, 2 2019. ISSN 13665545. doi: 10.1016/j.tre.2018.11.004.

[19] Y. Guleria, Q. Cai, S. Alam, and L. Li. A Multi-Agent Approach for Reactionary Delay Prediction of Flights. IEEE Access, 7:181565–181579, 2019. ISSN 21693536. doi: 10.1109/ACCESS.2019.

2957874.

[20] H. Zhong, G. Qi, W. Guan, and X. Hua. Application of non-negative tensor factorization for airport flight delay pattern recognition. IEEE Access, 7:171724–171737, 2019. ISSN 21693536. doi:

10.1109/ACCESS.2019.2955735.

[21] Y. Xiao, Y. Zhao, G. Wu, and Y. Jing. Study on Delay Propagation Relations among Airports Based on Transfer Entropy.IEEE Access, 8:97103–97113, 2020. ISSN 21693536. doi: 10.1109/ACCESS.

2020.2996301.

[22] M. Efthymiou, E. T. Njoya, P. L. Lo, A. Papatheodorou, and D. Randall. The impact of delays on customers’ satisfaction: An empirical analysis of the british airways on-time performance at heathrow airport. Journal of Aerospace Technology and Management, 11, 2019. ISSN 21759146.

doi: 10.5028/jatm.v11.977.

[23] S. Khanmohammadi, S. Tutun, and Y. Kucuk. A New Multilevel Input Layer Artificial Neural Network for Predicting Flight Delays at JFK Airport. Procedia Computer Science, 95:237–244, 2016. ISSN 18770509. doi: 10.1016/j.procs.2016.09.321.

[24] L. Belcastro, F. Marozzo, D. Talia, and P. Trunfio. Using scalable data mining for predicting flight delays. ACM Transactions on Intelligent Systems and Technology, 8(1), 7 2016. ISSN 21576912.

doi: 10.1145/2888402.

[25] Y. Tian, Q. Wang, H. Li, and R. Vanga. Assessment of flight block time reliability under different delay time windows: A case study. IEEE Access, 8:9565–9577, 2020. ISSN 21693536. doi:

10.1109/ACCESS.2020.2963986.

[26] S. Bratu and C. Barnhart. An Analysis of Passenger Delays Using Flight Operations and Passenger Booking Data. Air Traffic Control Quarterly, 13(1):1–27, 1 2005. ISSN 1064-3818. doi: 10.2514/

atcq.13.1.1.

[27] S. Bratu and C. Barnhart. Flight operations recovery: New approaches considering passen-ger recovery. Journal of Scheduling, 9(3):279–298, 6 2006. ISSN 10946136. doi: 10.1007/

s10951-006-6781-0.

[28] J. Rosenow, P. Michling, M. Schultz, and J. Sch ¨onberger. Evaluation of strategies to reduce the cost impacts of flight delays on total network costs. Aerospace, 7(11):1–21, 11 2020. ISSN 22264310.

doi: 10.3390/aerospace7110165.

[29] C. Barnhart, D. Fearing, and V. Vaze. Modeling passenger travel and delays in the national air transportation system. Operations Research, 62(3):580–601, 2014. ISSN 15265463. doi: 10.

1287/opre.2014.1268.

[30] X. Guo, Y. Grushka-Cockayne, and B. De Reyck. Forecasting Airport Transfer Passenger Flow Using Real-Time Data and Machine Learning. 2020.

[31] J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Elsevier Inc., third edition, 2011. ISBN 9780123814791.

[32] E. Alpaydin. Introduction to Machine Learning. The MIT Press, second edition, 2009. ISBN 9780262012430.

[33] J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1:81–106, 1986.

[34] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., 1993.

ISBN 9781558602380.

[35] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees.

Chapman and Hall, Monterey, CA, 1984. ISBN 9780412048418.

[36] S. Raschka and V. Mirjalili. Python Machine Learning: Machine Learning and DeepLearning with Python, scikit-learn, and TensorFlow 2. Packt Publishing, Limited, third edition, 2019. ISBN 9781789955750.

[37] L. Breiman. Bagging Predictors. Machine Learning, 24(2):123–140, 1994.

[38] R. E. Schapire. The Strength of Weak Learnability. Machine Learning, 5:197–227, 1990.

[39] J. H. Friedman. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5):1189–1232, 2001.

[40] T. Chen and C. Guestrin. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol-ume 13-17-August-2016, pages 785–794. Association for Computing Machinery, 8 2016. ISBN 9781450342322. doi: 10.1145/2939672.2939785.

[41] S. Garc´ıa and F. Herrera. Evolutionary Undersampling for Classification with Imbalanced Datasets:

Proposals and Taxonomy.Evolutionary Computation, 17(3):275 – 306, 2009.

[42] A. Mahani and A. Riad Baba Ali. Classification Problem in Imbalanced Datasets. InRecent Trends in Computational Intelligence. IntechOpen, 2019. doi: 10.5772/intechopen.89603.

[43] R. Barandela, J. S. S ˜a Anchez B;, V. Garc ˜a, and E. Rangel. Rapid and Brief Communication Strategies for learning in class imbalance problems. Pattern Recognition, 36:849–851, 2003. URL www.elsevier.com/locate/patcog.

[44] Y. Lin, Y. Lee, and G. Wahba. Support Vector Machines for Classification in Nonstandard Situations.

Machine Learning, 46:191–202, 2002.

[45] A. Fern ´andez, S. Garc´ıa, M. Galar, R. C. Prati, B. Krawczyk, and F. Herrera. Learning from Imbalanced Data Sets. Springer International Publishing, Cham, 1 edition, 2018. ISBN 978-3-319-98073-7. doi: 10.1007/978-3-319-98074-4. URL http://link.springer.com/10.1007/

978-3-319-98074-4.

[46] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16:321–357, 2002.

[47] H. W. Sorenson and D. L. Alspach. Recursive Bayesian Estimation Using Gaussian Sums. Auto-matica, 7:465–479, 1971.

[48] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[49] W. Ron and G. Varoquaux. GMM covariances. URL https://

scikit-learn.org/stable/auto_examples/mixture/plot_gmm_covariances.html#

sphx-glr-auto-examples-mixture-plot-gmm-covariances-py[21/10/2020].

[50] H. Akaike. A New Look at the Statistical Model Identification. IEEE TRANSACTIONS ON AUTO-MATIC CONTROL, 19(6):716–723, 1974.

[51] M. Stone. Comments on Model Selection Criteria of Akaike and Schwarz. Journal of the Royal Statistical Society. Series B (Methodological), 41(2):276–278, 1979.

[52] T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861–874, 6 2006.

ISSN 01678655. doi: 10.1016/j.patrec.2005.10.010.

[53] A. E. Bradley. The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Technical Report 7, 1997.

[54] T. Saito and M. Rehmsmeier. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE, 10(3), 3 2015. ISSN 19326203.

doi: 10.1371/journal.pone.0118432.

[55] D. Micci-Barreca. A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classifi-cation and Prediction Problems. ACM SIGKDD Explorations Newsletter, 3(1):27–32, 2001.

[56] S. M. Lundberg and S.-I. Lee. A Unified Approach to Interpreting Model Predictions. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 4768–4777, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.

[57] L. Shapley. Notes on the n-Person Game — II: The Value of an n-Person Game. 1951.

[58] J. W. Tukey. Exploratory Data Analysis. Addison-Wesley, Reading, Mass, 1977.

[59] C. Soares. Processing Big Data - Exploratory Data Analysis, lecture notes, 2018.

[60] N. Andrienko and G. Andrienko. Exploratory Analysis of Spatial and Temporal Data: A Systematic Approach. Springer, 2006. ISBN 978-3-662-49996-2.

[61] IATA. Passenger Glossary of Terms (15 June 2018). URLhttps://www.iata.org/contentassets/

c33c192da39a42fcac34cb5ac81fd2ea/iata-passenger-glossary-of-terms.xlsx.

[62] P. Domingos. A Few Useful Things to Know About Machine Learning.Communications of the ACM, 55(10):78–87, 10 2012. doi: 10.1145/2347736.2347755.

[63] D. B. Rubin. Inference and missing data. Biometrika, 63(3):581–592, 1976. URLhttp://biomet.

oxfordjournals.org/.

[64] A. Bilogur. Missingno: a missing data visualization suite. The Journal of Open Source Software, 3 (22):547, 2 2018. ISSN 2475-9066. doi: 10.21105/joss.00547.

[65] S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmel-farb, N. Bansal, and S. I. Lee. From local explanations to global understanding with explain-able AI for trees. Nature Machine Intelligence, 2(1):56–67, 1 2020. ISSN 25225839. doi:

10.1038/s42256-019-0138-9.