[PENDING] Parkinson’s disease detection using voice recordings and Deep Learning

The detection and diagnosis of Parkinson's during its early stages are considered crucial for the progression and management of the disease. The dataset contains extracted features from voice recordings collected from 252 subjects (188 PD – 64 control), with each subject providing three examples of the sustained phonation of the vowel /a/.

Η νόσος του Πάρκινσον

Εισαγωγή

Επιδημιολογικά δεδομένα

Παϑοφυσιολογία της νόσου

Διάγνωση

Θεραπεία και παρακολούϑηση

Βαϑιά Μάϑηση

Εισαγωγή

TabNet

Εισαγωγή
Αρχιτεκτονική του δικτύου
Επεξεργασία των χαρακτηριστικών
Επιλογή των κυρίαρχων χαρακτηριστικών

Ερμηνευσιμότητα του μοντέλου

Αρχιτεκτονική του αποκωδικοποιητή

Υλικά και μεϑοδοι

Δεδομένα

Διαδικασία εκπαίδευσης

Εξαγωγή των προβλέψεων του μοντέλου

Αποτελέσματα

Ταξινόμηση με χρήση του TabNet

Βέλτιστα αποτελέσματα ταξινόμησης
Επιλογή υπερ-παραμέτρων
Μέϑοδος αρχικοποίησης του TabNet
Feature reduction

Ερμηνευσιμότητα του TabNet

Συζήτηση

Χρησιμοποίησαν πολλαπλούς ταξινομητές για να αξιολογήσουν την ικανότητα διάκρισης υγιών ατόμων από ασθενείς με νόσο του Πάρκινσον, επιτυγχάνοντας ακρίβεια 97,62% χρησιμοποιώντας τον ταξινομητή μεμονωμένου πλησιέστερου γείτονα (1NN). Ο Olivares και οι συνεργάτες του χρησιμοποίησαν τη μέθοδο Bat Algorithmin in Extreme Learning Machine (BA-ELM), η οποία απέδωσε την υψηλότερη ακρίβεια πρόβλεψης. Μια άλλη μελέτη χρησιμοποιεί SMOTE (Τεχνική Συνθετικής Μειονότητας Υπερ-δειγματοληψίας) και ταξινομητή Τυχαίου Δάσους, ο οποίος επιτυγχάνει ακρίβεια ταξινόμησης 94,9%. στο 94%.[31] Τέλος, η βιβλιογραφία αναφέρει επίσης την ταξινόμηση επιλέγοντας ένα υποσύνολο χαρακτηριστικών του Wrappers, το οποίο είχε ακρίβεια 94,7% με τον ταξινομητή SVM(RBF) [32].

Σύνοψη και μελλοντικές επεκτάσεις

Σύνοψη

Μελλοντικές επεκτάσεις

According to studies, aging is the main cause of these disorders, with the main scientifically proven factors being damage to the nuclear DNA of neurons, increased oxidative stress and chronic neuroinflammation.[1] Parkinson's disease (PD) was first described by James Parkinson in 1817 in his publication Essay on the Shaking Palsy. PD is a progressive neurodegenerative disease caused by the necrosis of dopaminergic neurons in the substantia nigra, the region of the brain responsible for synthesizing the neurotransmitter dopamine.[1] , 2] It is the second most common neurodegenerative disease in the elderly, after Alzheimer's disease, affecting men and women of all races and social classes.[1, 3].

Epidemiology

In most populations, Parkinson's disease is twice as common in men as in women, except for a few populations, including a study from Japan, where no difference or a small excess was observed in women. The protective effect of female sex hormones, a sex-associated genetic mechanism, or sex-specific differences when an individual is exposed to environmental risk factors may explain the male predominance observed in PD. Hereditary forms of Parkinson's disease are estimated to be only 5-10% of all cases.[2, 5] The interactions between environment and genes modify the risk of sporadic PD.

The incidence of Parkinson's disease is greatly increased in individuals exposed to certain environmental factors, such as pesticides and traumatic brain injury, but is lower in smokers or caffeine users. [2, 4, 5] Higher dietary intake of dairy products is also associated with higher risk of Parkinson's disease, which may be due to the concentration of toxic substances in milk.[5] Conversely, physical activity and a healthy diet with high amounts of fruits, vegetables, and whole grains are associated with a lower risk of PD.[5]

Pathophysiology

Diagnosis

Vocal disorder based diagnosis

Management

AI was one of the oldest fields in computer science.[10] By definition, artificial intelligence (AI) is "a field of science and engineering concerned with the computational understanding of what is commonly called intelligent behavior, and with the creation of artifacts that exhibit such behavior".[11, 12] Alan Turing becomes considered one of the founders of modern computer science and AI, as he defined intelligent behavior in a computer as the ability to achieve human-level performance in cognitive tasks, later referred to as the "Turing Test".[12] . This fact has led to a growing need for the use of analytical instruments in medicine.[12] Researchers have been investigating the potential applications of AI in all fields of medicine for more than 50 years.[12] Medical artificial intelligence is used to develop AI programs to help clinicians formulate a diagnosis, make therapeutic decisions and predict an outcome.[12] These systems, including Artificial Neural Networks (ANNs), fuzzy expert systems, evolutionary computation and hybrid intelligent systems, aim to support healthcare workers in everyday tasks related to the manipulation of data and knowledge.[12] Gunn was the first to consecutively investigate the application of AI in surgery. He specifically attempted to diagnose acute abdominal pain with computer analysis.[12] Furthermore, more than 100 studies have been published since the outbreak of the SARS-Cov-2 virus pandemic using Artificial Intelligence to screen, diagnose and make a prognosis of the coronavirus disease.

Artificial Neural Networks

ANNs are built from networks of highly interconnected processing units, known as "neurons".[12, 13] Neurons can perform parallel computations related to data processing and knowledge representation.[12] These processing units are organized into input, hidden and output layers, while each neuron is connected to other neurons in adjacent layers via a corresponding weight value.[13] The sum is then transformed as it passes through a chosen activation function, which is usually a sigmoid, tan-hyperbolic, or rectified linear unit (ReLU) function.[13] The aforementioned functions are easily differentiable. This property makes them ideal for easy computation of partial derivatives of the error delta with respect to individual weights.[13] Activation functions transform the input value into a narrow output range, usually [0,1] or The output of each activation function is then sent as input to the subsequent device in the following layer.[13] The output of the last layer (output layer) is considered the solution to the initial problem.[13]. The capabilities of ANNs, such as learning from historical examples, analyzing nonlinear data, generalizing, classifying, and recognizing patterns, have made them an attractive analytical tool in medicine.[12] ANNs have been successfully applied in clinical diagnosis, image analysis in radiology and histopathology, data interpretation in intensive care settings, diagnosis of carotid atherosclerosis, and waveform analysis.

Deep Neural Networks

In recent times, deep learning has been among the most significant developments in computer science.[13] Deep learning has already surpassed human-level capabilities and performance in many areas, including predicting movie ratings, decisions about approving loan applications, car delivery time, etc.[13] It also seems to have the ability to improve people's lives, providing more accurate diagnosis of diseases such as cancer, the discovery of new drugs or the prediction of natural disasters.[13]

TabNet

TabNet Encoder architecture

Feature processing
Feature selection

Each decision step inputs the same B×D feature matrix, where B is the batch size and D is the feature dimension. Each batch The first part, d[i]ϵRB×nd, is passed through a ReLU (or other activation function) and then given as the output of the current decision step.

As expected, all the fully connected layers of the Feature transformer except the first have (nd+na) input and 2(nd+na) output.[19]. The output of the hi layer is multiplied by the Prior scale, P[i−1], from the previous decision step. The prior scale is a representation of the use of each function in the previous steps.

Interpretability of the model

TabNet Decoder architecture

The model was trained with data available in the UCI Machine Learning Repository at the University of California Irvine. The dataset consists of extracted features from voice recordings of 188 patients with Parkinson's disease (107 men and 81 women) between the ages of 33 years, collected at the Department of Neurology of Cerrahpaşa Faculty of Medicine, Istanbul University. Each subject was initially examined by a clinician, and then the sustained sound of the vowel /a/ was collected with three repetitions. All subjects participating in the study were informed about the data collection process, signed informed consent, and voluntarily attended the test, according to the approval of the Clinical Research Ethics Committee of Bahcesehir University.[3]

As shown in Table 4.1, the dataset includes baseline features, time-frequency features, Mel-frequency cepstral coefficients (MFCCs), vocal fold features and additional features extracted with the tunable Q-factor wavelet transform (TQWT) related to the fundamental frequency, proposed by Sakar et.

Training method

Model

We chose the number of these epochs to be double the patience of the early stop.

Hyperparameter tuning

Classification predictions

Evaluation metrics

Best results
Hyperparameter tuning
TabNet’s initialization method
Feature reduction

More precisely, the model was set up with the selected hyperparameters (Table 4.2) and validation set size equal to 10% of the total training samples of each ensemble classifier. Figures 5.4, 5.5 show the change in the average performance of the model in the indicative cases of using a single-step and a three-step architecture. The model's performance was tested with validation set sizes equal to 10%, 20% and 30% of the total training samples of each ensemble classifier.

For each combination of hyperparameters in the network search, we repeated LOSO cross-validation training four or more times to obtain a more general evaluation of TabNet's performance. The results show that the validation set affects the classification ability of the model, considering that TabNet results are set to be fully reproducible for a specific seed. In the last part of this section, we tried to investigate the effect of the initial number of features that TabNet has to choose from to select salient ones for classification.

Interpretability of the model

There have been numerous studies in the literature (Table 6.1) about PD classification based on features extracted from voice recordings. This is caused by the fact that each subject has multiple recordings, and the training and test sets used to train the models included sound samples of the same subject. /test) groups using the Leave-One-Subject-Out cross-validation technique results in a dramatic drop in evaluation metrics.[3, 33]. According to our results, TabNet outperforms the classifiers in the literature with average and maximum accuracy for each sample reaching 94.54% and 95.24%, respectively, when using all available features in the dataset [3] and cross-validation LOSE.

In the case of predictions for each subject, the average and maximum observed results were 95.93% and 97.22%. Furthermore, TabNet internally selects the best features for the classification of each topic, achieving interpretability (global and local) in contrast to the studies in the bibliography. Pytorch-TabNet, the selected DNN model, outperformed the other classifiers in the literature using the leave-one-subject-out cross-validation technique.

Future work

This thesis explored the potential of using deep learning in the classification of Parkinson's disease based on features extracted from voice recordings. A Comparative Analysis of Speech Signal Processing Algorithms for Parkinson's Disease Classification and the Use of the Tunable Q-Factor Wavelet Transform”. Automatic detection of early-stage Parkinson's disease as a pre-diagnosis tool using classifiers and a small set of vocal features”.

A novel framework of two successive feature selection levels using weight-based procedure for voice loss detection in Parkinson's disease”. Simple logistics hybrid system based on greedy stepwise feature analysis algorithm to diagnose Parkinson's disease based on gender”.

Artificial neural network architecture.[43]

TabNet Encoder architecture

Topological structure of the Feature transformer block

Topological structure of the Attentive transformer block

TabNet Decoder architecture

Best results obtained with TabNet in the cases of Parkinson’s disease

Confusion matrix that presents the correct and incorrect classification

TabNet’s classification results with single-step architecture and differ-

TabNet’s classification results with three-step architecture and different

TabNet’s classification box plots with single-step architecture and dif-

TabNet’s classification box plots with three-step architecture and dif-

Box plots of TabNet’s predictions for the test samples with constant and

Average TabNet’s evaluation metrics for classification of individual

Average TabNet’s evaluation metrics for classification of subjects during

Masks for the samples of a subject with Parkinson’s disease (ID:100)

Explain matrices for the samples of a subject with Parkinson’s disease

Masks for the samples of a subject from the control group (ID:251)

Explain matrices for the samples of a subject from the control group

Επιλεγμένες τιμές για τις υπερ-παραμέτρους του TabNet

Αποτελέσματα που παρουσιάζονται στη βιβλιογραφία για την ταξ-

Extracted features from spectograms of speech signals.[3, 32]

Selected hyperparameters and training parameters for TabNet

Evaluation metrics.[44]

Parkinson’s disease classification results in the literature. All the