Diagnosis and Prognosis of Occupational disorders based on Machine Learn- ing Techniques applied to Occupational Profiles

(1)

PHYSICS

DIAGNOSIS AND PROGNOSIS OF

OCCUPATIONAL DISORDERS BASED ON

MACHINE LEARNING TECHNIQUES APPLIED TO OCCUPATIONAL PROFILES

NAFISEH MOLLAEI

Master of Science in Electrical Engineering

DOCTORATE IN BIOMEDICAL ENGINEERING

NOVA University Lisbon

(2)

PHYSICS

DIAGNOSIS AND PROGNOSIS OF OCCUPATIONAL DISORDERS BASED ON MACHINE LEARNING

TECHNIQUES APPLIED TO OCCUPATIONAL PROFILES

NAFISEH MOLLAEI

Master of Science in Electrical Engineering

Adviser: Hugo Filipe Silveira Gamboa

Associate Professor with Habilitation, NOVA School of Science and Technology

Examination Committee:

Chair: José Paulo Moreira dos Santos

Full Professor, NOVA School of Science and Technology

Rapporteurs: Miguel Ângelo Fernandes Carvalho

Assistant Professor, Universidade do Minho

André Valério Raposo Carreiro

Researcher, Associação Fraunhofer Portugal Research

Adviser: Hugo Filipe Silveira Gamboa

Associate Professor, NOVA University Lisbon

Members: Miguel Ângelo Fernandes Carvalho

Assistant Professor, Universidade do Minho

Liliana Maria da Silva Cunha

Assistant Professor, Faculdade de Psicologia e de Ciências da Educação da Universidade do Porto

José Paulo Moreira dos Santos

Full Professor, NOVA School of Science and Technology

André Valério Raposo Carreiro

Researcher, Associação Fraunhofer Portugal Research

DOCTORATE IN BIOMEDICAL ENGINEERING

(3)

ing Techniques applied to Occupational Profiles

The NOVA School of Science and Technology and the NOVA University Lisbon have the right, perpetual and without geographical boundaries, to file and publish this dissertation through printed copies reproduced on paper or on digital form, or by any other means known or that may be invented, and to disseminate through scientific repositories and admit its copying and distribution for non-commercial, educational or research purposes, as long as credit is given to the author and editor.

(4)

(5)

This thesis document is the final result of a long period of work. This PhD would not be possible without the contribution and collaboration of many remarkable people that allowed me to grow professionally and personally. I would also like to thank my super- visor. Hugo Gamboa, for help, advice, and mentoring throughout my graduate career.

Without his support and guidance, none of the work presented in this thesis would have been possible. Moreover, I am deeply indebted to his for showing me how to success- fully do research and communicate the research results effectively. To Carlos Fujao, for welcoming me in Volkswagen Autoeuroupa and introducing me in an environment of industry and other ergonomists. Even from a totally different field, he was always available to hear my ideas related to ergonomic topics and also interested in understanding how worked the engineering part of this thesis. Carlos was also responsible for dealing with the preparation of the studies regarded, making possible the large data set of this thesis.

He also helped me to improve my writing skills, by revising the academic documents prepared. My deepest gratitude goes to all staffof the faculty of Physics at Science and Technology faculty in FCT-UNL as well. I have also had this great opportunity to meet a lot of encouraging and kind friends during my PhD studies. Thank to Volkswagen Autoeuropa and to Portuguese National Funding Agency FCT for the financing support.

I would like to thank all the Multimorbidity and Complexity Research Group for the experience of working in such a multidisciplinary team, with whom I could participate in interesting collaborative projects. Thanks to Joao and Catia, that were guided for the studies direction. This PhD also gave me a friend for life. Even from another country and for a different field, I am grateful to share the life-experience of making a PhD. Talking about friends, I cannot forget the Libphys team. They were, for sure, a huge contribution in all this process. It was certainly a good reason in bad days to go to the lab and find David, Daniel, Ricardo and more recently Philipe, Luis and Mariana. They were always available for hearing my problems, for inspiring me and to brainstorm in the white board to discuss new ideas and find solutions. Thanks for all the collaborations, your support, patient and friendship. Behind the wall but not forgotten, a special thanks to Ana Rita for the persuading and supporting. My biggest thank you go to my parents, sister and

(6)

am today. Thanks for teaching me that everything is possible and that there is a solution for everything in life. Thanks for your help and support in everything of my life and for always believing in me.

(7)

discover it in himself.” (Galileo)

(8)

Work-related disorders have a global influence on people’s well-being and quality of life and are a financial burden for organizations because they reduce productivity, increase absenteeism, and promote early retirement. Work-related musculoskeletal disorders, in particular, represent a significant fraction of the total in all occupational contexts. In automotive and industrial settings where workers are exposed to work-related musculoskeletal disorders risk factors, occupational physicians are responsible for monitoring workers’ health protection profiles. Occupational technicians report in the Occupational Health Protection Profiles database to understand which exposure to occupational work- related musculoskeletal disorder risk factors should be ensured for a given worker. Occu- pational Health Protection Profiles databases describe the occupational physician states, and which exposure the physicians considers necessary to ensure the worker’s health protection in terms of their functional work ability. The application of Human-Centered explainable artificial intelligence can support the decision making to go from worker’s Functional Work Ability to explanations by integrating explainability into medical (re- striction) and supporting in two decision contexts: prognosis and diagnosis of individual, work related and organizational risk condition. Although previous machine learning approaches provided good predictions, their application in an actual occupational setting is limited because their predictions are difficult to interpret and hence, not actionable.

In this thesis, injured body parts in which the ability changed in a worker’s functional work ability status are targeted. On the one hand, artificial intelligence algorithms can help technical teams, occupational physicians, and ergonomists determine a worker’s workplace risk via the diagnosis and prognosis of body part(s) injuries; on the other hand, these approaches can help prevent work-related musculoskeletal disorders by identifying which processes are lacking in working condition improvement and which workplaces have a better match between the remaining functional work abilities. A sample of 2025 for the prognosis part (from the years of 2019 to 2020) and 7857 for the prognosis part of Occupational Health Protection Profiles based on Functional Work Ability textual reports in the Portuguese language in automotive industry factory. Machine learning-based

(9)

mation. The prognosis and diagnosis of Occupational Health Protection Profiles factors were developed in reliable Human-Centered explainable artificial intelligence system to promote a trustworthy Human-Centered explainable artificial intelligence system (enti- tled Industrial microErgo application). The most suitable regression models to predict the next medical appointment for the injured body regions were the models based on CatBoost regression, with R square and an RMSLE of 0.84 and 1.23 weeks, respectively.

In parallel, CatBoost’s best regression model for most body parts is the prediction of the next injured body parts based on these two errors. This information can help technical industrial teams understand potential risk factors for Occupational Health Protec- tion Profiles and identify warning signs of the early stages of musculoskeletal disorders.

Keywords:Natural Language Processing, Functional Work Ability, Occupational Health Protection Profiles, Human-Centered explainable artificial intelligence, Musculoskeletal Disorders, Prognosis, Diagnosis

(10)

Os transtornos relacionados ao trabalho têm influência global no bem-estar e na quali- dade de vida das pessoas e são um ônus financeiro para as organizações, pois reduzem a produtividade, aumentam o absenteísmo e promovem a aposentadoria precoce. Os distúr- bios osteomusculares relacionados ao trabalho, em particular, representam uma fração significativa do total em todos os contextos ocupacionais. Em ambientes automotivos e industriais onde os trabalhadores estão expostos a fatores de risco de distúrbios osteomusculares relacionados ao trabalho, os médicos do trabalho são responsáveis por monitorar os perfis de proteção à saúde dos trabalhadores. Os técnicos do trabalho reportam-se à base de dados dos Perfis de Proteção da Saúde Ocupacional para compreender quais os fatores de risco de exposição a perturbações músculo-esqueléticas relacionadas com o trabalho que devem ser assegurados para um determinado trabalhador. As bases de dados de Perfis de Proteção à Saúde Ocupacional descrevem os estados do médico do trabalho e quais exposições os médicos consideram necessária para garantir a proteção da saúde do trabalhador em termos de sua capacidade funcional para o trabalho. A aplicação da inteligência artificial explicável centrada no ser humano pode apoiar a tomada de decisão para ir da capacidade funcional de trabalho do trabalhador às explicações, integrando a explicabilidade à médica (restrição) e apoiando em dois contextos de decisão: prognóstico e diagnóstico da condição de risco individual, relacionado ao trabalho e organizacional . Embora as abordagens anteriores de aprendizado de máquina tenham fornecido boas pre- visões, sua aplicação em um ambiente ocupacional real é limitada porque suas previsões são difíceis de interpretar e portanto, não acionável. Nesta tese, as partes do corpo lesionadas nas quais a habilidade mudou no estado de capacidade funcional para o trabalho do trabalhador são visadas. Por um lado, os algoritmos de inteligência artificial podem ajudar as equipes técnicas, médicos do trabalho e ergonomistas a determinar o risco no local de trabalho de um trabalhador por meio do diagnóstico e prognóstico de lesões em partes do corpo; por outro lado, essas abordagens podem ajudar a prevenir distúrbios muscu- loesqueléticos relacionados ao trabalho, identificando quais processos estão faltando na melhoria das condições de trabalho e quais locais de trabalho têm uma melhor correspon- dência entre as habilidades funcionais restantes do trabalho. Para esta tese, foi utilizada

(11)

rios textuais de Aptidão para o Trabalho em língua portuguesa, de uma fábrica da indús- tria automóvel (Auto Europa). Uma amostra de 2025 ficheiros foi utilizada para a parte de prognóstico (de 2019 a 2020) e uma amostra de 7857 ficheiros foi utilizada para a parte de diagnóstico. . Aprendizado de máquina- métodos baseados em Processamento de Lingua- gem Natural foram implementados para extrair informações padronizadas. O prognóstico e diagnóstico dos fatores de Perfis de Proteção à Saúde Ocupacional foram desenvolvidos em um sistema confiável de inteligência artificial explicável centrado no ser humano (inti- tulado Industrial microErgo application). Os modelos de regressão mais adequados para prever a próxima consulta médica para as regiões do corpo lesionadas foram os modelos baseados na regressão CatBoost, com R quadrado e RMSLE de 0,84 e 1,23 semanas, res- pectivamente. Em paralelo, a previsão das próximas partes do corpo lesionadas com base nesses dois erros relatados pelo CatBoost como o melhor modelo de regressão para a mai- oria das partes do corpo. Essas informações podem ajudar as equipes técnicas industriais a entender os possíveis fatores de risco para os Perfis de Proteção à Saúde Ocupacio- nal e identificar sinais de alerta dos estágios iniciais de distúrbios musculoesqueléticos.

Palavras-chave:Processamento de Linguagem Natural, Capacidade funcional para tra- balhar, Perfis de Proteção de Saúde dos Trabalhadores, Inteligência Artificial Explicável centrada no ser humano, Doenças Muscúlo-Esqueléticas, Prognóstico, Diagnóstico

(12)

List of Figures xv

List of Tables xviii

Acronyms xxi

1 Introduction 1

1.1 Motivation and Background . . . 1

1.2 Problem Statement . . . 1

1.3 Research Objectives . . . 3

1.4 Thesis Structure . . . 4

2 Theoretical Concepts 6 2.1 Introduction . . . 6

2.2 Artificial Intelligence in Medicine . . . 6

2.3 Supervised Learning . . . 7

2.3.1 Ensemble Algorithms . . . 8

2.3.2 Machine learning and Gradient Boosting Decision Tree . . . 9

2.3.3 Artificial Neural Network . . . 14

2.4 Unsupervised Learning . . . 21

2.5 Reinforcement Learning . . . 22

2.6 Natural Language Processing . . . 22

2.6.1 Definitions . . . 23

2.6.2 Theory, Methods and Tools of NLP . . . 24

3 Literature Review 31 3.1 Introduction . . . 31

3.2 Association Rules Mining in Medicine . . . 31

3.3 GBDT, ANN and Ensemble Algorithms in Medicine. . . 33 3.4 Machine Learning-based Natural Language Processing in Medical Database 33

(13)

3.5 Application microErgo for Industrial Context . . . 36

3.5.1 introduction . . . 36

3.5.2 Related Work . . . 37

3.5.3 Design microErgo for Industrial Context . . . 42

3.5.4 microErgo contribution in both EAWS and OHPPs parameters . 45 4 Research Methodology 46 4.1 Introduction . . . 46

4.2 Research Design . . . 46

4.3 Proposed Methods. . . 49

4.3.1 Data Collection . . . 50

4.3.2 Case Studies . . . 51

4.4 Setup Implementation . . . 59

4.4.1 Unsupervised Learning Approach . . . 59

4.4.2 Supervised Learning Approach . . . 60

5 Design and construction Algorithms 63 5.1 Introduction . . . 63

5.2 Diagnosis Part . . . 63

5.2.1 Assembly Area. . . 63

5.2.2 Body Construction Area . . . 64

5.2.3 Special Projects Area . . . 65

5.2.4 Quality Assurance Area . . . 67

5.2.5 Paint Area . . . 67

5.2.6 Metal Stamping Area . . . 67

5.3 Prognosis outcomes . . . 68

5.3.1 Regression Models Result . . . 68

5.3.2 Second Scenario . . . 84

6 Industrial microErgo application 102 6.1 Introduction . . . 102

6.2 Application of Designed Interface of microErgo . . . 104

6.3 How to formulate microErgo Reinforcement Learning-based recommendation?. . . 105

7 Conclusion 108 7.1 introduction . . . 108

7.2 Objectives . . . 108

7.3 Overall Results. . . 109

7.4 Contribution . . . 110

7.4.1 List of Contribution . . . 110

7.4.2 Journal Articles . . . 110

(14)

7.4.3 Conference Papers . . . 111

7.4.4 Book Chapter . . . 111

7.5 Future Work . . . 112

7.5.1 Research Extension . . . 112

7.5.2 Future implications . . . 113

Bibliography 115

(15)

1.1 Steps of whole chapters conceptual vs theoretical framework . . . 5

2.1 AI applicability in healthcare . . . 8

2.2 In XGboost trees, level-wise growth occurs. The blue circles represent an older level of leaves that have already been calculated, while the red circles represent the level of newly added leaves to the tree. The purple circles depict end-nodes, which are connected with parent nodes [8]. . . 11

2.3 The leaf-wise growth occurs in LightGBM. The Blue circles denote older leaves, which have already been explored. In this case, the red circles denote the current leaf that is being considered. The leaves that are expanded upon are the ones that have the highest predicted max delta loss. If the new nodes have a lower max delta loss than one of the previous branches, then backtracking to the earlier leaf with the now highest max delta loss. The purple circles depict end-nodes, which are connected with parent nodes [8]. . . 12

2.4 Catboost has the symmetric in building trees, as well as it is the level-wise algorithm. The uniquness of symmetric value highlighted this algorithmes in comparision with Gradient Boosting Decision Trees (GBDT) . . . 13

2.5 Structure of ANN . . . 14

2.6 Input, hidden and output layers . . . 15

2.7 The simplest form of Feed forward Neural Network where input data travels in one direction to output nods. . . 17

2.8 FFNN multi-Layer Perceptron. . . 17

2.9 Recurrent Neural Networks . . . 19

2.10 LSTM structure . . . 20

2.11 Modular Neural Network . . . 20

2.12 Skipgram and Cbow models of the word2vec approach. Input and output words (w_iandw_j), vocabulary size ofV, word-embedding matrixWand output context matrixW’. Images taken from [168] . . . 29

2.13 Transformer Encoder Layer with masked embeddings [76]. . . 30

2.14 BERT input representation [39]. . . 30

(16)

4.1 Human-Centered design (marked with dashed lines) research lays over a broad spectrum, as shown here. The intersection of Artificial Intelligence (AI) Re- search and Human-Centered Design is the domain we identify as Human-

Centered Explainable Artificial Intelligence (HCXAI). . . 48

4.2 Outline of the main idea of this research to use the mined association rules in the diagnosis of musculoskeletal injuries.(a)injured worker,(b)the occupational physicians examine him and, after(c)diagnosing the type of injury, (d)in workers OHHPs, enter information about the worker’s medical status as well as medical restrictions relating to his workplace,(e)the information in the texts is extracted by NLP techniques,(f)stored in structured databases, (g)by association rules mining and regression methods, they find diagnosis and prognosis body part(s) injuries, which are used in the subsequent exami- nations by occupational physicians for medical recommendations. . . 50

4.3 Distribution of data records among the areas. . . 53

4.4 Phases of the methodical process used in the investigation. . . 54

4.5 >7k OHPP between 2019 to 2021 in Assembly, Body construction, Special projects, Quality assurance, Metal stamping and paint. . . 55

4.6 Phases of the methodical process used in the investigation. . . 57

4.7 Distribution of records based on week and medical appointment times. . . 58

4.8 Statistics of mined association rules from different areas withSupp = 0.01, Conf = 0.1 andLif t >1. . . 61

5.1 Scheme of several Association Rules mined from areas: a) Assembly, b) Body Construction, c) Special Projects, d) Quality Assurance, e) Paint, and f) Metal Sampling (The numbers on each shape correspond to that area’s mined Asso- ciation Rules table row). . . 64

5.2 Learning curves of models for (a) right elbow and (b) left fingers. . . 77

5.3 Average body injuries in men for (a) train data and (b) test data. . . 77

5.4 Average body injuries in women for (a) train data and (b) test data. . . 78

5.5 Average body injuries in seniority of 1 - 10 for (a) train data and (b) test data. 79 5.6 Average body injuries in seniority of 10 - 20 for (a) train data and (b) test data. 79 5.7 Average body injuries in seniority over 20 for (a) train data and (b) test data. 80 5.8 Average body injuries in Special Projects for (a) train data and (b) test data. 81 5.9 Average body injuries in Assembly for (a) train data and (b) test data. . . . 81

5.10 Average body injuries in Body Construction for (a) train data and (b) test data. 82 5.11 Average body injuries in Metal Stamping for (a) train data and (b) test data. 83 5.12 Average body injuries in Paint for (a) train data and (b) test data. . . 83

5.13 Average body injuries in Quality Assurance for (a) train data and (b) test data. 84 5.14 Decision Tree Regressor (a) learning curve and (b) true vs predicted data plots. 87 5.15 Random Forest Regressor (a) learning curve and (b) true vs predicted data plots. . . 88

(17)

5.16 Gradient Boosting Regressor (a) learning curve and (b) true vs predicted data

plots. . . 89

5.17 Light Gradient Boosting Regressor (a) learning curve and (b) true vs predicted data plots. . . 89

5.18 XG Boosting Regressor (a) learning curve and (b) true vs predicted data plots. 90 5.19 CatBoost Regressor (a) learning curve and (b) true vs predicted data plots. 91 5.20 Bayesian optimization curve. . . 92

5.21 ANN model learning curve. . . 94

5.22 Summary plot per body parts for train data. . . 95

5.23 Summary plot per body parts for test data. . . 96

5.24 Force plot for (a) one record and (b) all records of train data. . . 97

5.25 Force plot for (a) one record and (b) all records of test data. . . 98

5.26 Bar plot for a sample record of train data. . . 98

5.27 Bar plot for 5 sample records of test data. . . 99

5.28 Decision plot for a sample record of train data. . . 100

5.29 Decision plot for 5 sample record of test data. . . 100

5.30 Embedding plot for shoulder_R feature in train data. . . 101

5.31 Embedding plot for shoulder_R feature in test data. . . 101

6.1 Capacity (work ability) vs Exposure [11] . . . 104

6.2 Range of dataset based as a input of framework . . . 106

6.3 Detaset of framework . . . 106

6.4 Application of Industrial microErgo. . . 107

7.1 Overall view of project . . . 112

(18)

3.1 Summary of the tools of NLP. . . 34

3.2 Levels of intervention and risk factors [18] . . . 43

4.1 Two rows of final tabular structured data. . . 52

4.2 Seniority based on M:Male and F:Female and their min, max, mean and SD. 54 4.3 Scoring to each body region(s) injuries. . . 59

4.4 Statistical description of the injured body parts variables. The first, second and third quartiles are shown as 25%, 50% and 75% respectively.. . . 60

5.1 Mined association rules from Assembly area. . . 65

5.2 Mined association rules from Body Construction area. . . 66

5.3 Mined association rules from Special Projects area. . . 66

5.4 Mined association rules from Quality Assurance area. . . 67

5.5 Mined association rules from Paint area. . . 67

5.6 Mined association rules from Metal Stamping area. . . 68

5.7 Results of injury severity for the left shoulder. . . 69

5.8 Results of injury severity for the right shoulder. . . 69

5.9 Results of injury severity for the left elbow. . . 70

5.10 Results of injury severity for the right elbow.. . . 70

5.11 Results of injury severity for the left wrist. . . 71

5.12 Results of injury severity for the right wrist. . . 71

5.13 Results of injury severity for the left fingers. . . 72

5.14 Results of injury severity for the right fingers. . . 72

5.15 Results of injury severity for the left knee. . . 73

5.16 Results of injury severity for the right knee. . . 73

5.17 Results of injury severity for the left foot. . . 73

5.18 Results of injury severity for the right foot. . . 73

5.19 Results of injury severity for the trunk. . . 74

5.20 Results of injury severity for the neck. . . 74

5.21 Results of vibration using forbiddance. . . 74

(19)

5.22 Results of weight lifting forbiddance. . . 75

5.23 Best overall model per each body part. . . 75

5.24 ML regression models results. . . 85

5.25 Hyperparameter tuning result.. . . 92

5.26 ANN model regression results on the second scenario. . . 93

(20)

(21)

ACTRE Activity Record39

ADRs Adverse Drug Reactions24

AI Artificial Intelligencexvi,1,3–7,14,22,31,32,36,45–49,52,55,56, 102–109

ANN Artificial Neural Networks10,11,14–18,35,36,57,62,93,110 Apache cTAKES clinical Text and Knowledge Extraction35

ARs Association Rules21,31,32,36,52,53,58,59,63–65,67,110

AUC Area Under Curve35

BERT Bidirectional Encoder Representations from Transformers28

BiLSTM-CRF Bi-directional Long Short-termmemory-conditional Random Field34 BioNER Biomedical Named entity Recognition34

BoW Bag of Words26,27,35

CANS Complaints of Arm Neck and Shoulder41 CART Classification and Regression Tree11

CNN Convolutional Neural Network15,18,33,35 CRNN convolutional recurrent neural network35 CV Cross-validation68,71–74,86

DL Deep Learning14,16,18,24,31,35,110

DT Decision Tree8,9,11,13,31,36,58,69,70,72,74–76,85–87,109

DUNs unified networks35

EAWS European Assembly Worksheet40,43–45,102,104,105 EHRs Electronic Health Records24,35

ELSA Longitudinal Study on Aging33

EU-OSHA European Agency for Safety and Health at Work42

(22)

FDA Food and Drug Administration102,104 FFNN Feed Forward Neural Network16

FWA Functional Work Ability1,2,4,36,37,42,50,51,54–56,59,67,84, 104,108,113

GB Gradient Boosting9,13,35,57,58,74,76,86,88–91,109 GBDT Gradient Boosting Decision Treesxv,10–13,33,57,58,108–110 GDPR General Data Protection Regulation10,47

GloVe Global Vector28

GMLP Good Machine Learning Practice102–104

GMM Gaussian Mixture Model35

GOSS Gradient-based One-Side Sampling70

GP General Physician7

HCAI Human-Centered Artificial Intelligence46,47

HCXAI Human-Centered Explainable Artificial Intelligence xvi, 47, 48, 56, 109,110,113

i2b2 Informatics for Integrating Biology and the Bedside36 ICA Independent Component Analysis35

iDASH Integrating Data for Analysis,Anonymization, and Sharing35 IMDRF International Medical Device Regulators Forum103

KNN K-Nearest Neighbors36

LG Logistic Regression33,35

LightGBM Light Gradient Boosting Machine11,12,69,71,74–77,86,90 LIME Local Interpretable Model–agnostic Explanations48

LRP Layer-wise Relevance Propagation48 LSTM Long Short-Term Memory15,19,35

MAE Mean Absolute Error57,58,68–76,87,88,94,109,110 MDP Markov decision process22

MHRA Medicines and Healthcare Products Regulatory Agency102,104 ML Machine Learning1–4,9,10,14,23,24,33,55,63,86,94,102–104,

107–109,113

ML-NLP Machine Learning-based Natural Language Processing24,32–34,36, 55,56,108

MLP Multilayer Perceptron18

MNN Rectified Linear Activation Function19

(23)

MRI Magnetic Resonance Imaging7,35,48

MRs Medical Restrictions2,3,37,43,53,56,59,68,110,113 MSDs Musculoskeletal Disorders1–4,36,37,43–45,76,104,109 MUEQ Maastricht Upper Extremity Questionnaire41

NEARM Natural Language Enhanced Association Rules mining36 NIH National Institutes of Health39

NLP Natural Language Processing4,22–24,28,31–36,49,51,57,108,110 NSP Next sentence prediction28

OCRA OCcupational Repetitive Action39

OHPPs Occupational Health Protection Profiles1–5, 22, 36, 42, 45, 49–52, 54–56,76,108–110,113

OiRA Online interactive Risk Assessment42 PCA Principal component analysis35,48,101 POS Part of Speech Tagging25

R2 R Square57,58,68–76,85–88,90,92–94,96,109,110 REBA The Rapid Entire Body Assessment41

ReLU Rectified Linear Activation Function18

RF Random Forest9,35,57,58,69–72,74–76,85,86,88,89,109 RGF Regularized Greedy Forests33

RL Reinforcement Learning22,47,113,114 RMSLE RMSLE57,58,68–76,85,86,92,93,109,110 RNN Recurrent Neural Networks14,16,19,28,33 ROSA Rapid Office Strain Assessment40

RPNI Regular in Positive and Negative26 RULA Rapid Upper Limb Assessment40

SD Standard Deviation54

SDM Shared Decision Making47

SHAP SHappley Additive exPlanations48,57,94,95,98,99,101,109,110 SVM Support Vector Machine35,36

t-SNE t-Distributed Stochastic Neighbor Embedding48 TF-IDF Term Frequency/Inverse Document Frequency27,35 TKM Traditional Korean Medicine32

TLU Threshold Logic Unit16

(24)

UMLS Unified Medical Language System35

WAI Work Ability Index58

XAI eXplainable AI7,36,57,109,113

(25)

1

I n t r o d u c t i o n

1.1 Motivation and Background

Every year, 400,000 people die as a result of work-related risks such as workplace injuries [162]. Human operator in manufacturing processes in the automotive industry are de- fined by repetitive and physically demanding work tasks performed at high frequencies.

In this case, industry executives have concluded that taking care of the health of industrial workers can increase the efficiency of that industry. Prediction of the occurrence of these problems has been a topic of focus in scientific community, mainly ergonomics and biomedical engineers fields.

In automotive industrial settings where workers are exposed to work-related Mus- culoskeletal Disorders (MSDs) risk factors [129], occupational physicians have the re- sponsibility to monitor workers’ health. On the one hand,Occupational Health Protec- tion Profiles (OHPPs)have been introduced in various companies. On the other hand, worker’s health protection protocols can differ between organizations [163].OHPPsare described by workers’Functional Work Ability (FWA)status. Artificial Intelligence (AI) algorithms, especiallyMachine Learning (ML)methods can support the prevention of MSDsby identifying which processes are lacking working condition improvements and which workplaces have a better match between the remainingFWAand the job demands.

Causality and explainability of these AI algorithms refers to a human understandable model since it is measured in terms of efficacy, satisfaction related to causal understanding, and transparency for industrial technical teams like Ergonomists [61]. My personal reason to motivate me to involve in this project is to know how we can help society with computational techniques.

1.2 Problem Statement

In the automotive industry, production lines are designed with multiple workplaces [139]

where different physical tasks are executed by different workers. It is expected that each will occupy its place for many years, making the task more harmful or more difficult to

(26)

execute over time because of the accumulated workload of repetitive tasks or simply by ageing.

One of the critical issues in supporting the health of workforce industries is the cre- ation and maintenance of workers’ occupational health profiles [102]. Preventing injuries to workers’ body parts has many benefits for industries and their workers [33]. Indus- tries prevent the loss of skilled labor due to medical absences by protecting the health of workers. This is so important that some large industries have developed systems to create workers’ OHPPs. Occupational physicians monitor the health status of workers and recordMedical Restrictions (MRs)for them in the system according to the diagnosis or prognosis, e.g., injuries, so that technical managers can consider them in determining their function and tasks to the specific workplace. In addition, they reported the occupational technician’s understanding of which exposure to occupational risk factors (mainly MSDs risk factors) should be ensured for a given worker; the occupational physician states which biomechanical exposure the physicians considers to be necessary to ensure the worker’s health protection by:

• i) Establishing two levels of severity; a) MN-Must Not Use; when theFWAis at its lowest level- b) SN-Should Not Use; when theFWAstill allows a given exposure dose

• ii) Identifying the body part(s) in which the ability changed; for example, in occupational incidents, Kakhki et al. [86] distinguished the type of injury of the body part(s) based on medical, permanent partial disabilities, and temporary total or partial disabilities.

• iii) Characteristics of theFWAitself: rotation/bending; flexion/extension; move- ments above the shoulder; torsion; application of force; manual handling of loads;

use of vibration-transmitting tools, and so on. However, this project focused on injured body parts. Diagnosis of these injuries are one of the most challenging topics in the automotive industry. Hence, sometimes injuries to workers’ body parts are asymptomatic during a medical examination, or a body part may be developing the conditions that may reveal injuries soon. In parallel, prognosis of injured body parts [138] has advantages in the automotive industry as well. It helps our understanding and identifies warning signs of the early stages of these injuries.

Development in the healthcare industry has tremendous potential to improve healthcare quality and lower its costs. Occupational physicians have access to health data on thousands of workers, giving them the opportunity to develop new information about comparative effectiveness and to perform other forms of innovation [43]. Furthermore, as health-care costs continue to rise,MLtools aimed at simplification and efficiency will become increasingly important in both the diagnosis and prognosis. ML models have a holistic approach to monitoring OHPPs based on injured body parts. These models

(27)

can also prevent injuries to workers in a way to avoid disruptions to work ability and/or work-relatedMSDs.

1.3 Research Objectives

Automotive organizations throughout Europe are being challenged to adapt and promote several procedures to ensure that each worker has a workplace where he/she is fully able to work and to ensure musculoskeletal health prevention (e.g. [136]). Mainly, production and logistics have already set up several initiatives that are already part of the daily work in their areas of plants. One of the examples is the rotating system [128], in which each worker has to change his/her workplace four times during each shift. However, these procedures to prevent injuries caused by excessive workload were found to be insufficient.

Hence, the automotive industry established complementary initiatives that are targeted at monitoring each worker’s exposure to work-related musculoskeletal risk factors in order to manage their exposure in a way to avoid disruptions to work ability and/or work-relatedMSDs. The information from theOHPPs is available to visually analyze the worker’s MRshistory and individual, organizational, and work-related factors that contribute to work-related injuries.

The ability to predict the occurrence of work-related injuries has driven the scientific community to design several studies in order to understand the correlation between a very complex association among many variables. This complexity increases when these studies take place in real working environments, where assessing workers’ exposure to occupational risk factors might be a complex procedure. Other complexity rises from the health field, as the protocols for diagnosis have been under scientific community attention.

AItechniques, particularlyML, are used to infer the current condition (diagnose tasks) [115] and predict potential injuries (prognostics tasks) [144]. This is implemented in the following three phases:

• to identify the complexity on different levels and to design procedures in order to select the most relevant parameters collected in real industry working environments

• to describe health-related procedures towards injury diagnosis/prognosis and to select the most suitable diagnosis protocols for real working environments to recognize which prognostic variables could be used

• to design the "trigger point"that should be recognized as the identifiable moment before injuries develops.

Regarding the first phase, investigation, the current state of the art was explored in text and/or tabular data gathered in real working environments to infer work-related injuries [86,81,23]. Therefore, research in design techniques and principles for biomechanical and occupational profiles data is used or developed for diagnosis and prognosis

(28)

purposes [38]. In terms on two other second and third points, MLtechniques coupled withNatural Language Processing (NLP)[111] was designed. According to second phase, the designed rule-base learning tool that can assist occupational physicians to monitor workers’ health protection. It was determinant of which are the most relevant parameters for the evaluation of workload potential injuries in real working environments. This approach found significance and meaningful hidden relationships among large volumes ofOHPPsbased onFWAstatus (text). Based on the third phase, concerning the prognosis part, occupational technicians’ decision-making can be accelerated in the context of prevention ofMSDs, in which the predictive models previously built are applied and tested.

Work-related musculoskeletal disorders (WRD) have a high prevalence among office workers. Tools to assess work setup in order to create awareness and reduce ergonomic risks are in need. microErgo is a concept to build an easy-to-use tool that combines existing ergonomic models with statistical data. Furthermore, the industrial microErgo risk assessment tool concept was introduced as an "alert point". In other words, Industrial microErgo in the automotive industry proposed a tool that combines existing ergonomic models with statistical data to evaluate the risk and provide recommendations based on the risks found.

1.4 Thesis Structure

This thesis documents the work developed during the PhD program. It is organized into seven chapters. Figure 1.1presents the overall vision of this thesis regarding introducing chapters. It shows worker’s health protection has critical information across a technical team, especially the ergonomic department, weekly; however, there is great variability in the quality of theFWAstatus of worker’s health protection. In the next step, text mining was implemented to find the organizational, individual, and work-related features from theFWAstatus of workers. The outcomes of this task related to changes in work ability (FWAfeatures in capacity) as this data is directly reported onOHPPs. The further stages explained the theoretic parts, which are the prognosis and diagnosis of workers risk in terms of diagnosis and prognosis algorithms. All of these processes are gathered in multidisciplinary research to discover improvement guidance for occupational physicians.

Finally, this learning framework proposed industrial microErgo as an application of risk assessment (last chapter described).

Chapter 1 introduces the background and motivation of this study and then presents the research problem, including the main goals of this thesis. Finally, the structure of the thesis is presented.

Chapter 2 introduces the theoretical concepts addressed throughout the present thesis.

This section defines the differentAIapproaches that were used in this thesis.

Chapter 3 covers an introduction to occupational physicians’ decision-making tasks considering diagnosis and prognosis in various medical domain topics. A literature

(29)

Theoretical Concepts

Chapter 2

Literature review

Chapter 3

A Learning Framework Introduction

Chapter 1

Research methodology

Chapter4

Research and construction explainable AI

algorithms Chapter 5

Industrial microErgo application

Chapter 6

Conclusion

Chapter 7

Figure 1.1: Steps of whole chapters conceptual vs theoretical framework

review of the analysis of an individual’s health, including medical data, based onAIap- proaches, problems that happened as well as decision-making measurements, are detailed in this section.

In Chapter 4, the text data mined and specific adaptations performed according to the thesis context are presented. The scientific computation required to process the designed data is also introduced. Finally, the workers and procedure of testing three parameters of work ability (individual, organizational and work-related) carried out in this thesis are described in this chapter. The technical approaches used to process and extract useful information from the data processed during the different studies of this thesis are presented in Chapter 5. TheseOHPPscharacteristics are then combined using machine learning algorithms to build predictive models, whose results are also described in the current chapter. The produced models are finally applied to medical decision-making contexts and their results are presented.

In Chapter 6, the industrial microErgo application for risk assessment tool is proposed.

Chapter 7 describes how this study contributes to the current academic literature on this topic. Future work is also suggested in the final chapter.

(30)

2

T h e o r e t i c a l C o n c e p t s

2.1 Introduction

Chapter 2 lays out the theoretical dimensions of the research and looks at howAIcan help medicine in terms of diagnosis and prognosis domains. The mainAIalgorithms, such as supervised learning, unsupervised learning, and reinforcement learning, are described according to their advantages and shortcomings.

2.2 Artificial Intelligence in Medicine

AI is a broad topic that refers to the use of a computer to model intelligent behavior with the least amount of human interaction. MedicalAIdevelopment is booming, and it needs more developers. The key to growing medicalAI, is to be trustworthy, and the key to growing health care is to grow the developer talent that builds the products and engines on which it runs. AIsolutions can help improve healthcare professionals’ ability to access patients, improve patients’ ability to access healthcare professionals, simplify communications, documentation, or reporting, and help patients overcome barriers in a complex healthcare system.

In medicine,AIis divided into two categories: virtual and physical. The virtual branch encompasses informatics approaches ranging from deep learning data management to health management system control, including electronic health records and active physician guidance in treatment decisions. Robots that aid the elderly patient or the attending surgeon best reflect the physical branch. The societal and ethical complexity of these applications need more consideration, demonstration of their medical utility and economic worth, and development of interdisciplinary strategies for their wider deployment [67].

By inspiration of [143], Figure 2.1shows the integration ofAI in healthcare [141].

There are jobs and tasks in healthcare that are related to the red axis. The red axis is associated with: a) compassion needed; b) compassion not needed. On the one hand, compassion is needed for tasks that need emotional support. On the other hand, mechanical tasks do not need touch and compassion. The blue axis is divided into three categories:

(31)

a) prevention data gathering, b) diagnosis, and c) treatment. For example, hematolo- gists, radiologists [22], andGeneral Physician (GP)may never contact you. Someone who reports the blood test to the hematologist does not need to be contacted. In addition, chatbots can assist theGP. Human interaction [173] is not entirely necessary. Chatbots can automatically answer questions made by patients of online platforms. Finally, it is preferable that emotions and fatigue have no influence on the reports written by radiologist. There are other jobs that need touch and emotion. Nurses, elderly caretakers, and elderly companions include these groups.

Mostly,AIsystems are focused in the part of diagnosis, and there are fewer studies in the treatment parts. In the treatment, we can divide the jobs into two groups: a) specialists and b) development and research. In the first group, which is related to specialists, you may count on jobs like i)neurologists, ii)surgeons, iii)psychiatrists, and iv)care specialists.

The second part consists of drug development and medical research (improving new medical techniques). The first job group can useAIwithout any human interaction, with high accuracy, and on a daily basis. For example, AIcan analyze Magnetic Resonance Imaging (MRI)images and highlight abnormalities in them for experts. AI in the second group cannot be implemented purely, but it can be integrated into human interaction . In other words, the supervision of humans supportAIwith some insights. For example, in a care home, patients fall, and the cameras can detect it by using "fall detection"and inform the nurses taking care of the patients. The third group, which are specialists, involved in treatment. Humans can assistAI, and it can record all the data of the patient during her/his life, so patients can have a clinical history when they make an appointment with a specialist. So, if the specialist does not have a snapshot of the appointment day, AI provide all the history of that patient. The fourth group can have an equal contribution between human andAIassistants; .e.g, nanorobots, a novel new drug discovery, are also included in this category (Figure 2.1).

Another sub-field of AIentiteledeXplainable AI (XAI), which has the potential to interpret and explainable AI framework for humans. Medical AI needs not only precise algorithms, but also interpretable and explainable methodologies, so that medical experts can verify why this choice was made. Interestingly, Neves et al. [113] proposedXAIaim- ing to go from medical questions to explanations by integrating explainability in medical AI and supporting clinicians in three clinical decision contexts: prognosis, diagnosis, and epidemiological prediction in chronic diseases. As an asset, this project departed from three existing clinical longitudinal datasets: (1) epidemiological national data; (2) diagnosis of amyotrophic lateral sclerosis based on electromyographic recording; and (3) prognosis based on patient-reported outcome measures of cardiac surgery interventions.

2.3 Supervised Learning

Supervised algorithms were tested for each set of workers and their health in order to train the model. In the implementation of the current research, regression methods were

(32)

prevention Data Gathering Compassion needed

Compassion not needed Elderly caretaker

Diagnosis Treatment

Care Specialist

Elderly companion Nurses

Surgery

Neurologist Psychiatrist

Hematologist

GP Radiologist

Medical Research

Drug Development

prevention Data Gathering Compassion needed

Compassion not needed

Diagnosis Treatment

Human AI

AI

Human + AI assistant

Figure 2.1: AI applicability in healthcare

employed as restrictive algorithms, whereas they were used as more flexible algorithms.

2.3.1 Ensemble Algorithms

Bastos used ensembleDecision Tree (DT)in clinical decision-making for patients with multiple acute or chronic diseases (i.e. multimorbidity). This study showed one factor that can influence how decisions are made under conditions of risk and uncertainty is the decision maker’s personality. The variables were well modelled by, at least, one of the sets of features extracted [14]. The description of these algorithms list as below:

Decision TreeWhen the relationship between the features and the output is highly non-linear and complex,DTs outperform traditional approaches such as linear regression.

To construct a DT, the predictor space is divided into J unique and non-overlapping regions (R1;R2;...;RJ), each of which is used to make a prediction of the observed response for each observation, with the goal of finding the region that minimizes the residual sum

(33)

of squares [83].

In a nutshell, aDT’s final structure is a flowchart, with each internal node representing a

"test"on an input variable, each branch representing the test’s conclusion, each leaf node representing a label (i.e. a final decision), and the paths from node to leaf representing rules. The decision process behind the DTis easier to understand and describe using this structure. However,DTs are extremely sensitive to the trained data, and even minor changes can have a big impact on the outcome [83].

Random Forestis an ensemble ofDTs that combines many DTs to provide a more accurate and reliable prediction. To overcome theDTs’ sensitivity, each tree inRandom Forest (RF)is trained on various sets of data using bagging, which is a method of randomly sampling a data set with replacement. Each node considers a random selection of features to produce an uncorrelated forest of trees whose prediction is more accurate than any single tree. If highly predictive traits exist, they show at the top of the tree and produce similar trees [20]. According to [59], the subset of variables used at each node to tune the model should be one-third of the total features.

Gradient Boostingis an ensemble ofDTs, similar toRF. What sets it apart fromRF is the technique of tree growth; instead of bagging, it uses boosting. Boosting applied to data set is different from bagging, where the data set is randomly sampled. Unlike bagging, where the data set is randomly sampled, boosting uses a weighted data set that is more likely to be included in new sets. As a result, each tree is trained using information from previously trained trees, ensuring that they grow sequentially and that weak learners become strong learners. Gradients in the loss function, a measure of how well the model’s coefficients fit the data, are used byGradient Boosting (GB)to identify weak learners. To minimize over-fitting, unlikeRF, the number of trees in boosting should be limited. Furthermore, the shrinkage parameter should be chosen with the number of trees in mind in order to manage the learning rate, as many trees are required to obtain good performance with a small learning rate [53,70,80].

2.3.2 Machine learning and Gradient Boosting Decision Tree

The Bayes theorem, which was proposed by Bayes in 1763 [15], is shown to be the foun- dation ofML. For reference, the Bayes theorem’s mathematical formula is as follows:

p(A|B) =P(B|A)P(A)

P(B) (2.1)

This mathematical formula can be used to assess the probability of an event occurring based on previous observations. Although the Bayes Theorem may appear simple today, it was a groundbreaking scientific accomplishment when it was originally discovered and has made significant contributions to a wide range of fields. Turing et al.(1950) [158]

and Rosenblatt et al. (1958) [131], for example, began their research in the 1950s, which would eventually be connected withML. However, Samuel et al. (1959) [135] was the first to use the termMLin his groundbreaking study on usingMLto construct an artificial

(34)

program that could compete with a human player in a game of checkers. Since then,ML has advanced significantly, and it is now used in virtually every aspect of life.

When evaluating and choosingMLmodels in the past, performance, speed, and flexi- bility were often emphasised. However, some researchers, such as [98], suggest that the General Data Protection Regulation (GDPR)in the European Union, as well as the pub- lic’s increasing knowledge of their privacy rights, has made choosing a model far more difficult. TheGDPRis designed to prevent, or at the very least discourage, the employ- ment of algorithms that might exploit, oppress, marginalize, discriminate, or otherwise infringe on an individual’s liberty and right to privacy.

This is especially true in situations when the planned solution deals with personal information. And, as a result, Lepri et al. (2018) [98] argue that attributes like transparency, intelligibility, and accountability have become important, if not mandatory, considera- tions when choosing aMLmodel. Models that have the potential to be highly efficient, such as Artificial Neural Networks (ANN)s, often lack these transparency properties, posing legal and ethical problems if used in certain decision-making contexts. There are various differentMLmodel types and tools accessible when it comes to employing regression approaches for decision making. GBDTs are a sort of regressionMLmodel that can provide a high level of transparency and intelligibility while also performing well in terms of correct predictions, scalability, and efficiency. Friedman first proposed the concept ofGBDTs [54]. They introduced a new classification method in their paper that would merge numerous weak classifiers to form a single robust classifier.

Condorcet et al. [35] proposed that as the number of predictors increases, the probability of reaching a correct conclusion increases, as long as the predictors are more likely to be correct than incorrect. In essence, an ensemble of weak predictors works in a similar way, with individual weak predictions instead of people. InGBDTs, according to [87], a decision tree is a flowchart diagram with a number of different nodes. These nodes are further divided into three types: decision nodes, chance nodes, and end nodes.

• Decision nodes are sub-nodes that have been subdivided into new sub-nodes.

• Chance-nodes are nodes that indicate a group of uncontrollable probable events.

• End-nodes (also known as "leaf nodes") they are connected to the parent nodes and generally signify a result or decision.

GBDTs also have various advantages over otherMLmodel types, one of which is that they are very good at avoiding overfitting. When implementing aMLmodel, overfitting is a typical issue that occurs when the model has learnt to recognize training data by memorising rather than abstraction. OverfittingMLmodels is undesirable since it cause them to fail to make predictions from new data. In study [10] provides an indicator of where to begin if you are trying to develop aGBDTmodel. Chen Guestrin et al. [32], Ke et al. [88], and Prokhorenkova et al. [121] provide three state-of-the-artGBDTs:

(35)

XGBoost,Light Gradient Boosting Machine (LightGBM), and CatBoost, respectively. The extreme gradient boostingXGBoost, due to its good decision effect, fast computing speed and other features XGBoost algorithm based onDThas attracted considerable research interest in the industrial machinery, power system, and industrial infrastructure domains [171,176].

The XGBoost-based feature importance ranking, in particular, can analyze the relationship between output results and input features, assisting network operators in com- prehending failure detection results. XGBoost is an integrated model algorithm that uses theClassification and Regression Tree (CART) as its base learner. XGBoost is made up of simple sub units that are coupled to form a system with a high model complexity and learning ability. Unlike typicalANNs, XGBoost’s base learner is made up of the root node, branches, and leaf nodes, and it enhances the model’s performance by greedily adding trees. The feature that can offer the largest gain to the loss function and its splitting point is chosen as the node to perform node splitting while creating theCART DT[32].

Furthermore, the node splitting procedure is parallel, which increases the model’s processing speed. Figure 2.2shows each tree grows one after the other, each with its own prediction score, and the final result is calculated by adding the scores of all individual trees together.

Figure 2.2: In XGboost trees, level-wise growth occurs. The blue circles represent an older level of leaves that have already been calculated, while the red circles represent the level of newly added leaves to the tree. The purple circles depict end-nodes, which are connected with parent nodes [8].

The expansion of an XGBoost tree growth expands in a level-wise manner (Figure 2.2).

XGBoost is one of theGBDT algorithms ([10]) that generates exponential leaf growth.

When working with larger datasets, it is vital to properly optimize the XGBoost, as it otherwise tends to quickly use available memory. As a result, otherGBDTalternatives are likely to be better choices for instances where applications must scale to large datasets.

Chen et al. outlines some of the new novel features that contribute to XGBoost’s scalability, which can be summarized as follows:[32,31]

• A new tree learning approach for sparse datasets has been developed.

(36)

• A theoretically justified weighted quantile sketch approach that allows approximate tree learning to handle instance weights.

• Using distributed computing, data scientists can process vast amounts of data more quickly.

LightGBMis a popularGBDTmodel developed by Ke et al. [88] that consistently proves to be quite capable for tackling classification challenges. They argue in their study that, up until then,GBDTs lacked efficiency and scalability in scenarios involving large amounts of data and a large number of features. TheLightGBMmodel uses XGBoost as a baseline but takes a different approach to classification by introducing and combining two new techniques, Gradient-based One-Side Sampling and Exclusive Feature Bundling.

Gradient-based One-Side Sampling means that the model ignores the vast majority of cases in which the Gradient weight is expected to be lower. This could prevent the algorithms from travelling down branches that are thought to be less important. The authors came up with this strategy after noticing a common trend in which data with varying gradient values had a different impact on the expected information gain. As seen in Figure 2.3, the expansion of the trees inLightGBMis performed leaf by leaf.

Figure 2.3: The leaf-wise growth occurs in LightGBM. The Blue circles denote older leaves, which have already been explored. In this case, the red circles denote the current leaf that is being considered. The leaves that are expanded upon are the ones that have the highest predicted max delta loss. If the new nodes have a lower max delta loss than one of the previous branches, then backtracking to the earlier leaf with the now highest max delta loss. The purple circles depict end-nodes, which are connected with parent nodes [8].

The Exclusive Feature Bundling approach, which is used by LightGBM, helps to reduce feature sparsity in circumstances when one-hot-encoding is used, as the resulting encodings of one feature are virtually always exclusive with the others. By combining these sparse features and reducing their total number, the model’s training time can be reduced and it can be deployed considerably faster than otherGBDTs, such as XGBoost.

In summary, this results in aGBDTthat is extremely fast in terms of training, tolerant in situations when memory is limited, and capable of making correct predictions. The LightGBM’s key disadvantage is that it overfits the data when used in smaller datasets.

(37)

In instances where there is only a small amount of data available for the model to train on, other models should be chosen.

CatBoost(Figure 2.4) uses balanced oblivious trees to anticipate labels, which in- cludes reducing significantly on parameter tuning time [40,121].

Figure 2.4: Catboost has the symmetric in building trees, as well as it is the level-wise algorithm. The uniquness of symmetric value highlighted this algorithmes in comparision withGBDT

Prokhorenkova et al. proposed in heterogeneous data, utilizingGBDT algorithms.

"GBhas been the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies for many years: web search, recommendation systems, weather forecasting, and many others,"they write. Heterogeneous datasets include features of various data kinds. In relational databases, tables are frequently heterogeneous. Homogeneous data is the contrary of heterogeneous data. Data that is all of the same type is referred to as homogeneous data. CatBoost is made with category features in consideration. The CatBoost method adds two new fundamental functions: a permutation-driven ordered boosting algorithm and a novel algorithmic approach specif- ically for processing category information. CatBoost deals with exponential feature com- bination growth by employing a greedy technique for each new split in the existing tree.

CatBoost can also handle cases when the number of categories is too large for current GBDTmodels to manage. CatBoost takes three measures to address this issue [8]:

• Divide the data into random subsets initially.

• The labels are then converted to integers.

• Finally, the remaining categorical attributes are converted to numerical values.

Another feature of CatBoost is the ability to choose the maximum number of iterations, the maximum depth of constituentDTs, and the maximum number of categorical feature pairings to combine. These are all numbers that the user can change to trade resource usage for performance. Furthermore, the settings that researchers select for these hyper- parameters may help explain why catboost performs differently than other learners. In theSection 3.3, catboost will explain more in terms of its state-of-the-art.

(38)

2.3.3 Artificial Neural Network

ANNoutperforms many of the earlierMLmethods in the field ofAI. For example, Belo [17] proposed that the healthcare system is generating a burden on physicians, losing effectiveness on the collection of patient data. Different architecture configurations were explored for signal processing and decision making. ARecurrent Neural Networks (RNN)- based architecture was able to replicate autonomously three types of biosignals with a high degree of confidence.

Deep Learning (DL)withAIis represented byANNs.ANNs, pitch in such situations and fill the gap. ANNare based on the biological neurons in the human body that activate when certain conditions are met, causing the body to respond with a certain activity.

Artificial neural nets are made up of multiple layers of interconnected artificial neurons that are controlled by activation functions that turn on and off. In the training phase, neural nets learn particular values, just as traditional machine algorithms.

In a nutshell, each neuron receives a multiplied version of inputs and random weights, which is then combined with a static bias value (which is unique to each neuron layer), and this is then passed to an appropriate activation function, which determines the final value to be output by the neuron. Depending on the nature of the input values (X_n), different activation functions are available. Once the final neural net layer’s output is generated, the loss function (input vs output) is determined, and backpropagation is used to change the weights to make the loss as small as possible in Figure 2.5. The overall operation revolves around determining the best weight values (W_n).

X₀

X1

X₂ X₂W₂

X₁W₁ X0W0

W₀

W₁

W₂

F O AF(XW)+

Bias

Figure 2.5: Structure of ANN

The inputs are multiplied by theweights, which are integer values (X_nW_n). They are

(39)

modified to minimise loss in backpropagation. In basic terms, weights areANNvalues that have been learned allegorically. The gap between predicted outputs and training inputs causes them to self-adjust. TheActivation Function(AF) is a mathematical function that assists in the ON/OFF switching of neurons (AF(XW)+Bias). Theinput layerwhich is called (I) represents the input vector’s dimensions. The intermediate nodes that divide the input space into regions with (soft) edges are represented by the hidden layer. It takes in a set of weighted input and, using an activation function, generates output. The output layer(O) represents the neural network’s output (Figure 2.6).

I Hidden O

Figure 2.6: Input, hidden and output layers

2.3.3.1 Types of Neural Networks

There are many different types of ANNs that now exists or are in development. They can be categorized based on their: structure, data flow, neuron density, layers, and depth activation filters,etc. The following neural networks will be discussed:

• Perceptron

• Feed Forward Neural Network

• Multilayer Perceptron

• Convolutional Neural Network (CNN)

• Long Short-Term Memory (LSTM)

(40)

• RNN

• Modular Neural Network

PerceptronThe Minsky-Papert [106] perceptron model is one of the simplest and old- est Neuron models. It is the smallest unit of aANNthat performs certain computations in order to discover features in input data. It takes weighted inputs and applies the activation function to produce the final result. Threshold Logic Unit (TLU)is another name for perceptron. Perceptron is a binary classifier that is a supervised learning system that divides data into two groups. A perceptron divides the input space into two categories using a hyperplane.

Logic Gates such as AND, OR, and NAND can be implemented with perceptrons.

Only linearly separable tasks, such as the boolean AND problem, may be learned using perceptrons. It does not work for non-linear situations like the boolean XOR problem.

Feed Forward Neural Network The most basic version of ANNs, in which input data only flows in one direction, passing via artificial neural nodes and out through output nodes. Input and output layers are present when hidden layers may or may not be present. They can be characterized as a single-layered or multi-layered feed-forward neural network based on this (see Figure 2.7). The number of layers is determined by the function’s complexity. Forward propagation is unidirectional, but there is no backward propagation. Here, the weights are fixed. Inputs are multiplied by weights and sent into an activation function. A classification activation function or a step activation function is used to do this. Consider the following scenario: If the threshold (typically 0) is exceeded, the neuron is engaged, and the neuron produces 1 as an output. If the neuron is below the threshold (typically 0), it is deemed -1 and is not activated. TheFeed Forward Neural Network (FFNN) simple to maintain and are equipped with to deal with data which consists a lot of noise.

The following are the advantages and disadvantages of this model; a) It’s easier to build and manage because it’s less complicated, b) one-way propagation is quick and efficient, c) highly responsive to noisy data, d) due to the lack of dense layers and back propagation, it cannot be used forDL.FFNNapplications are as below:

• Face recognition (Simple straight forward image processing)

• Computer vision (for difficult-to-classify target classes)

• Speech Recognition

As aFFNN is Multilayer perceptron. An introduction to sophisticated neural nets which is Figure 2.8presents, in which input data is transmitted through multiple layers of artificial neurons. It is a completely linked neural network since every node is connected to all neurons in the following layer. FFNNbased on multiple hidden layers, i.e. at least three or more layers, are present in the input and output layers. It possesses bi-directional

(41)

I O

Figure 2.7: The simplest form of Feed forward Neural Network where input data travels in one direction to output nods.

I O

Hidden Layers

Figure 2.8: FFNN multi-Layer Perceptron.

propagation, which means it can propagate both forward and backward. Inputs are multiplied by weights and sent to the activation function, where they are changed in backpropagation to minimise the loss. Weights are machine-learned values fromANNs, to put it simply. Depending on the variance between predicted outputs and training inputs, they self-adjust. Softmax is used as an output layer activation function alongside nonlinear activation functions.

(42)

The following are the advantages and disadvantages of this model; a) due to the presence of dense completely connected layers and back propagation, it can be used forDL, b) design and maintenance are both somewhat difficult, c) slow in comparison (depends on number of hidden layers). It is worthwhile to add the applicability of Multi- Layer Perceptron is in the areas of speech recognition, machine translation, and complex classification.

Convolutional Neural Network instead of a two-dimensional array, a CNNhas a three-dimensional layout of neurons. A convolutional layer is the first layer. Each convolutional layer neuron only analyzes data from a small part of the image field. Like a filter, input features are taken in batch-wise. The network decodes images in chunks and can perform these operations numerous times to complete the entire image processing. The image is from RGB gray scale during processing. Further variations in pixel value will aid in the detection of edges, allowing images to be categorised into several categories. The output of the convolution layer goes to a fully connected neural network for classification, as shown in the above diagram. Propagation is unidirectional where RGB contains one or more convolutional layers followed by pooling, and bidirectional where the output of the convolution layer goes to a fully connected neural network for classification, as shown in the above diagram. InMultilayer Perceptron (MLP), the inputs are weighted and supplied into the activation function. CNNis used in convolution, whileMLPemploys a nonlinear activation function followed by softmax. In picture and video recognition, semantic pars- ing, and paraphrase detection, CNNs produce excellent results. The following are the advantages and disadvantages of this model; a) with only a reward number of parameters compared with fully layerANNs, it’s used for deep learning, b) when compared to a fully linked layer, there are fewer parameters to learn, c) comparatively complex to design and maintain, d) very slow (Depends on the number of hidden layers). It is worthwhile to add the applicability of Image processing, computer vision, speech recognition and machine translation.

Recurrent Neural Network(RNN) are designed as you can see in Figure 2.9using the output of a layer and feed it back to the input to help predict the layer’s outcome.

The first layer is usually a feed forward neural network, followed by a recurrent neural network layer, where a memory function remembers some information from the previous time step. In this situation, forward propagation is used. It has information memory that will be needed in the future. If the prediction is incorrect, the learning rate is used to make minor adjustments. As a result, gradually increasing the probability of making the correct prediction using back propagation learning algorithm.

The following are the advantages and disadvantages of this model; a) one of the benefits of modeling sequential data is that each sample can be believed to be dependent on previous ones, b) used in conjunction with convolution layers to extend the powerful pixel neighbourhood, c) problems with vanishing and exploding gradient, d)recurrent neural networks may be challenging to train, e) processing long sequences of data with Rectified Linear Activation Function (ReLU) as an activation function is difficult. Text