• Nenhum resultado encontrado

Decision Support Models

4.1 Proposed Models

4.1.4 Post-Operations

The last proposed model is the Post-Operations DSM. At this stage, all the information regarding the departure is known, namely the delay of the flight.

Taking into account that the events have already occurred, there is no need to make a decision on those same events. Therefore, unlike the other models described, this model works as an analysis model and not as a predictive one. This model can be useful to evaluate through historical data whether it is possible to make decisions for similar future events.

As the connection time is defined to be the interval between ‘on-blocks’ and ‘off-blocks’, the boarding stage and the number of buses used to carry the passengers from the boarding gate to the platform are included in this time interval. Therefore, the boarding time interval and number of buses used in the departure flight were added to the already used feature set.

It was decided to consider the actual connection time feature instead of using the scheduled time and the arrival and departure delays separately. Hence, the time that the passengers had to make the connection is introduced directly into the ML model, instead of being deducted through the other mentioned features.

After applying listwise deletion, the input data consists of 3451979 samples, which represents a reduction of 3.90%. From these samples 5.51% belong the positive class.

Table 4.1 summarizes the features available for each model.

4.2 Baselines

It is important to establish the baseline performances in order to provide a point of comparison for each model. The baseline classifiers are inspired by the criterion used by TAP which uses the MCT established by ANA - Aeroportos de Portugal, defined as 60 minutes.

The baseline classifier only uses one feature, the connection time. The target is the variable which indicates if a passenger misses the connecting flight (‘Missed’). A connection is predicted to be missed if the time between flights is smaller than the MCT.

Table 4.1: Features considered for each problem.

Features Strategic Pre-Tactical Tactical Post-Operations TP From

TP To

Traffic Network Dep. Day Dep. Month Day Boarding Delta N Bus

Sex Age Is Group Class From Class To

Sch. Conn. Time Perceived Conn. Time Actual Conn. Time

Instead of fixing one threshold of connection time, each baseline classifier uses a range of values from 0 to 500 minutes with a 10 minutes step value, to select the one with the best performance.

Depending on the problem, the connection time which the baseline classifier considers can be the scheduled, perceived time or actual.

4.3 Model

As already stated, the model addressing each problem presents the same structure, consisting of two stages: oversampling and ensemble learning. Each stage is described with more detail in the following Sections.

4.3.1 Oversampling

This is an imbalanced classification problem (see Section 2.2), therefore the proposed approach is to oversample the minority class of the training set. The procedure shown in the Figure 4.3 was applied.

The majority class was divided incchunks, where the value ofcis considered to be a hyperparameter of the model and thus it is analysed in Section 5.1.

For each majority class chunk all the minority class examples were added. Ascis chosen in such a way that the data remains imbalanced (to avoid creating too many chunks and consequently increasing the computational cost and to ensure each chunk maintains the statistical power), one has to generate new minority class samples to balance the data.

This procedure has some advantages when compared to the method typically used to oversample the data (i.e. using all the majority class samples at once instead of dividing in chunks):

Majority Class

train data 1

train data 2

train data c

Training Data

Minority Class

...

...

...

Generate New Samples

Ensemble Learning Validation/Test

Data

Prediction Oversampling

Concatenate

Figure 4.3: Schematic diagram of the oversampling procedure.

• Dividing the majority class in chunks can be interpreted as applying an undersampling technique where samples are not deleted and therefore, all the information is kept;

• As each majority chunk has lower dimension than the original class, there is no need to oversample so much information. This may reduce oversampling of noise. Additionally it may avoid a great increase of the size of the training set and, consequently, an increase in training time and amount of memory required to hold the training set;

• Using all the original minority class examples in each set of balanced data ensures that all the

“rare” cases are included in the small sets of training data. The aim is to guarantee that each classifier is robust to the minority class;

• Ensemble learning is employed, where simple classifiers learn from the subtraining sets. This procedure will improve the final predictions (see Section 4.3.2).

• As stated in Section 2.3, target encoding tends to overfit the data. A solution to this problem is not using the entire training data to compute the encoding. Since the training set is divided in chunks and the encoding procedure is applied to each chunk, the overfitting problem can be mitigated.

The oversampling technique used in this work was GMM due to the theoretical reasons explained in Section 2.2.1.

Documentos relacionados