• Nenhum resultado encontrado

Clinical data and Selected Biom24h (ClinBiom24h) Model Selection . 78

6.5 Mixed Models

6.5.2 Clinical data and Selected Biom24h (ClinBiom24h) Model Selection . 78

ClinBiom24h included all variables with p-values in bold present in tablesA.25andA.27, summing up 43 features. Table6.26details how models performed based on their AUC scores. ClinBiom24h models also achieved better results than models with Biom24h, and, were also better than the best base clinical models, unlike ClinBiom0h. The best ClinBiom24h model is a SVM with kernel transformation model on non-augmented

data, which achieved med(AUC) = 0.88, nearly a 10% improvement over the best clin-ical model. This group did not show clear improvements with data augmentation and only LGBM and Linear SVC have shown noticeable improvements in their median AUC values. Metrics differences among top performing models were small, with CIs strongly overlapping in the maximum upper CI being achieved by seven different models. These results suggest there were too many noisy features in the ClinBiom0h for SVC with kernel transformations to properly fit.

TABLE6.26: Table comparing the best ClinBiom24h tabular models against augmented models and clinical dataset models.

meanrank median mad ci lower ci upper effect size magnitude

Random Forests (Base) 6.10 0.77 0.08 0.71 0.86 0.00 negligible

XGBClassifier (ClinBiom 24h) 6.19 0.79 0.08 0.71 0.88 -0.16 negligible

XGBoost (Base) 6.23 0.80 0.08 0.71 0.86 -0.22 small

LGBMClassifier (ClinBiom 24h) 6.39 0.81 0.10 0.74 0.88 -0.30 small

XGBClassifier (ClinBiom 24h AUG) 6.46 0.79 0.12 0.71 0.92 -0.13 negligible LogisticRegression (ClinBiom 24h) 7.04 0.82 0.10 0.71 0.92 -0.33 small

Logistic Regression (Base) 7.20 0.80 0.08 0.74 0.88 -0.22 small

LogisticRegression (ClinBiom 24h AUG) 7.24 0.81 0.10 0.75 0.92 -0.29 small

LinearSVC (ClinBiom 24h AUG) 7.31 0.81 0.11 0.75 0.92 -0.28 small

SVC (ClinBiom 24h AUG) 7.51 0.83 0.08 0.75 0.88 -0.49 small

LinearSVC (ClinBiom 24h) 7.58 0.83 0.08 0.75 0.92 -0.49 small

LGBMClassifier (ClinBiom 24h AUG) 7.61 0.83 0.08 0.75 0.92 -0.49 small

SVC (ClinBiom 24h) 8.14 0.88 0.08 0.79 0.92 -0.83 large

These new models were stat. sig. different from the two base models, base RF and XGBM. Although metrics have improved considerably, there is still no stat. sig. from base LRthe overall best clinical model, as shown in figureA.23.

Figures6.15and6.16show ClinBiom24h models’ ROCs on non-augmented and mented test data respectively. All models but Linear SVC trained without data aug-mentation have dropped AUC scores on the test set, but onlySVC with kernel trans-formationAUC shows clear signs of overfitting, considering its value dropped below its validation CI, AUCSVC val = [0.790.92] and AUCSVC AUC val = [0.750.88], with AUCSVC test =0.74 andAUCSVC AUG test=0.70.

6.5.3 Clinical models with Hemispheric Contrast (HC) Imaging Biomarker (IM) (ClinCA0h) and ClinCA0h with Feature Selection (ClinCA0h FS)

Most clinical models benefited from absolute HC addition but statistic comparison as shown no improvement over the best baseline LR, figureA.30.

While conducting HP analysis, figure 6.4, absolute HC appeared as the fifth most important variableby its ANOVA F-value,preceded only by mRS before event, NIHSS at admission to hospital, history of heart failure, and if the AIS was detected while

FIGURE 6.15: ROCs comparing Clin-Biom24h models.

FIGURE 6.16: ROCs comparing Clin-Biom24h models trained on

aug-mented data.

waking up. The best median for all models cross-validate AUCs was achieved adding previous intake of anti-hypertensors and knowing if AIS’ aetiology was prothrombotic, also allowing LGBM to achieve med(AUC) = 0.86±0.08. The highest LR AUC was achieved with 11 variables, med(AUC) = 0.83±0.08, adding patient’s age, history of chronic renal disease, AIS and previous intake of medication for diabetes.

F1-score regularizing effect while conducting GS makes the model continue to peak at AUC = 0.80, with no model surpassing LR on the clinical ds. after renewed GS. As such, the original FS model is selected due to its better final metrics, tableA.17, although better metrics are not stat. sig. different from the best previous models A.20. It should be noted both SVCs have improved the most with FS, showing they were assigning too much weight to irrelevant features.

6.5.4 ClinBiom24h with Feature Selection (FS) and Hemipheric Contrast (HC)

ClinBiom24h FS did not improve central tendency nor dispersion metrics over the base ClinBiom24h, med(AUCClinBiom24hFS) = 0.87±0.09, although this LGBM achieves it with only three variables: mRS before event, NIHSS and patient’s age. Adding HC at admission (ClinBiom24hCA0h FS) does not improve the best model either, since absolute HC is the seventh variable to be selected; therefore creating a model with more require-ments, and because metrics further decrease, med(AUCClinBiom24hCA0hFS) = 0.83, as can be seen in tableA.18. Despite lower metrics, these models are not stat. sig. fromthe best model found so far for post-thrombectomy data, an SVC with kernel transformation trained on ClinBiom24h, as seen in figureA.21.

Discussion

7.1 Clinical Models

The variables mostly correlated to outcome were the clinical scores for patients functional evaluation: mRS before event, and NIHSS at admission. These make sense since incom-ing patients with previous AIS damage have a worse baseline to recover from, and any subsequent damage stacks-up. While conducting FS, it was possible to verify that FWER corrections were too aggressive on these ds. since models continued to improve peak scores up to 16 variables — dummy classes included —, which suggests extra informa-tion from those variables is relevant to the outcome, and, therefore, correlated in some way, with ANOVA F-values sorting variables in a way similar to unadjusted p-values.

Although neurology’s motto is ”Time is brain”, and the ds. contained detailed timing information from incident to thrombectomy, time differences were not successful indica-tors, mostly due to two factors: patient’s specific tolerance to ischaemia differences, and the bias created by imputing event times for cases with symptoms while waking up. A related boolean variable, AIS at wake-up (WkUp), was among the most relevant variables.

Clinical models provide a baseline for modelling, considering the ds. is composed by information acquired once the patient arrives to hospital, and much of it can be auto-matically obtained through the patient’s health records. Baseline models trained on this ds. had good predictive capacity, withLR dominating the first modelling phase. This can be due to the high bias that this modelling strategy has, conforming with linear DBs, ℓ1 regularization technique, and the ability to deal moderately well with collinear vari-ables.Initial FS was minimalto allow evaluating modelling strategies on their ability to deal with collinearity, with models that integrate good FS methods — such as tree-based

81

models — to differentiate positively. This strategy was sub-optimal, consideringHP im-proved metrics while reducing model complexity and training time. Multicollinearity is a particular problem for some modelling strategies, particularlyNNs,LDAandSVM

— which are sensitive to this phenomenon —, and those strategies could have had better results, if multicollinear variables were handled more strictly beforehand.

Minority class balancing and data augmentation were not effectiveon the best model trained on this ds., the hyperparametrized LR, but data augmentation has helpedXGBM andLGBMcome closer to the originalLR, which suggests the regularizing effect of data augmentation was more effective in models with high variance, probably because they were overfitting noisy features.