Setup Implementation - Diagnosis and Prognosis of Occupational disorders based on Machine Learn

categories: good (8–10) and poor (0–7). In this thesis, due to injuries of workers assigned to medical appointments, the range of (0-7) was took into account. So, according to table 4.4in built based on4.1, each injured body parts weighted based on severity of MN_used and SN_used. For example, one of the workers had the medical restrictions like below:

• must not perform tasks that involve performing movements above the shoulder line both

• should not perform tasks that involve performing flexion/rotation movements of the trunk

• must not perform tasks that imply performing tasks using tools with associated vibration

• should not perform tasks that involve performing left and right wrist rotation move-ments

• must not perform tasks that involve performing tasks that require force with appli-cation point on the fingers of both hands

Categorization for each medical restriction starts with SN_used =1 , MN-Must Not Use= 0.5. Scoring MN-Must Not Use: We had 2 MN-Must Not Use. So, the scoring point for 2 MN-Must Not Use is 2. We have to divided this 2 between 2 sentences and their body injured parts. In this case, 0.5 goes to shoulder_R, shoulder_L and also finger_L and finger_R. We have two SN-Should Not Use which are 5 score to injured body parts.

So, 2.5 should divided to first sentences and second one. We have to score to trunk 1.66 and wrist_R and wrist_L, 0.8. So the scoring is presented in table 4.3:

Table 4.3: Scoring to each body region(s) injuries.

trunk Shoulder_L Shoulder_R Finger_L Finger_R Wrist_R Wrist_L

worker 1.66 0.5 0.5 0.5 0.5 0.8 0.8

Table 4.4: Statistical description of the injured body parts variables. The first, second and third quartiles are shown as 25%, 50% and 75% respectively.

count mean std min 25% 50% 75% max Elbow_L 544 0.72 0.64 0.19 0.25 0.50 1.00 3.50 Shoulder_R 651 0.74 0.61 0.19 0.28 0.50 1.00 3.50 ElbowL 186 1.05 0.76 0.19 0.00 0.00 0.00 3.50 Elbow_R 241 1.09 0.83 0.19 0.50 0.75 1.58 3.50 Wrist_L 231 1.16 0.84 0.25 0.50 1.00 1.58 3.60 WristR 237 1.09 0.95 0.25 0.50 0.75 1.50 5.33 Finger_L 165 0.73 0.70 0.21 0.25 0.50 0.88 3.50 Finger_R 185 0.76 0.72 0.21 0.25 0.50 1.00 3.50 KneeL 72 0.81 0.60 0.25 0.38 0.50 1.00 3.50 Knee_R 94 1.18 0.88 0.25 0.50 1.00 1.50 4.00 Foot_L 154 0.85 0.43 0.25 0.50 0.83 1.25 2.50 FootR 154 0.85 0.43 0.25 0.50 0.83 1.25 2.50 Trunk 292 1.73 1.09 0.38 0.67 1.50 2.75 3.50 Neck 103 1.23 0.98 0.38 0.50 0.67 1.65 3.50

from the Assembly area, 166 rules from the Body Construction area, 488 rules from the Special Projects area, 63 rules from the Quality Assurance area, 25 rules from the Paint area, and 103 rules from the Metal Stamping area (Figure4.8). We only considered "MN-Must Not Use-labeled injured body parts so that we could detect the link between high-risk injuries. Also, we only list the rules withConf idence= 1 in the tables in this section.

In areas where many rules have been discovered (Assembly area, Body construction area, and Special projects area), we have imposed more limitations and only listed stronger rules in the tables in this section. Figure5.1shows some of the rules extracted from each work area. In the following subsections, we have evaluated the association rules mined from each area separately.

4.4.2 Supervised Learning Approach

We designed a pipeline using the Scikit-Learn package⁵for each type of the regression models, which consists of simply two steps. In the first step, data normalization is per-formed using a standard scaler from the same package, and the second step is where the actual regression algorithm happens using the regressor.

For the regression methods, we selected the following models, which their related settings are presented.

• Decision Tree Regressor: splitter = ’best’ which indicates the strategy used to choose the split at each node, min_samples_split = 2 as the minimum number of samples required to split an internal node, min_weight_fraction_leaf = 0.0 for

5https://scikit-learn.org/

299

166

488

103

Figure 4.8: Statistics of mined association rules from different areas withSupp = 0.01, Conf = 0.1 andLif t >1.

samples to have equal weight when, min_samples_leaf = 1 as at least 1 sample to be required to be at a leaf node, ccp_alpha = 0.0 as no cost-complexity pruning being performed.

• Random Forest Regressor: n_estimators = 100 as the number of trees in the forest, criterion = ’Poisson’ as a function to measure the quality of a split, min_samples_split

= 2, min_samples_leaf = 1, min_weight_fraction_leaf = 1, bootstrap = True to use bootstrap samples when building trees, oob_score = False for not using out-of-bag samples to estimate the generalization score, ccp_alpha = 0.0.

• Gradient Boosting Regressor: loss = ’squared_error’ as a robust loss function should to be optimized, learning_rate = 0.1 as a way of shrinking the contribu-tion of each tree. n_estimators = 100 as the number of boosting stages to perform, subsample = 1.0 defining the fraction of samples to be used for fitting the individ-ual base learners, criterion = ’friedman_mse’ using to measure the qindivid-uality of a split, min_samples_split = 2, min_samples_leaf = 1, max_depth = 3, min_impurity_decrease

= 1, alpha = 0.9 as alpha-quantile of the huber loss function and the quantile loss function, warm_start = False, to not reuse the solution of the previous call to fit and add more estimators to the ensemble and just erase the previous solution, valida-tion_fraction = 0.1 indicating proportion of training data to set aside as validation set for early stopping, tol = 1e-4 as tolerance for the early stopping.

• LightGBM regressorfrom python lightgbm package⁶: boosting_type = ’gbdt’ for selecting the traditional Gradient Boosting Decision Tree, num_leaves = 31, as

6https://lightgbm.readthedocs.io/

maximum tree leaves for base learners, max_depth = -1 to defined no depth limit, learning_rate = 0.1, n_estimators = 100, subsample_for_bin = 200000, objective =

’regression’, min_child_samples = 20 specifies minimum number of data needed in a child (leaf).

• H2O Gradient Boosting Estimatorfrom python h2o.estimators package⁷: ntrees

= 50, max_depth = 5, min_rows = 10.0, nbins = 20, nbins_top_level = 1024, stop-ping_tolerance = 0.001, quantile_alpha = 0.5, col_sample_rate_per_tree = 1.0 as column sample rate per tree.

• CatBoost Regressorfrom python catboost package⁸. The default non-tuned ver-sion of this regressor which we used for comparing all regresver-sion model perfor-mances has the following parameter setup: iterations = 99, learning_rate = 0.05, grow_policy = ’Depthwise’.

• XGB Regressorfrom python xgboost package⁹: colsample_bytree = 1.0, gamma = 4.5, max_depth = 6, min_child_weight = 18.8, subsample = 1.0.

• ANN regressorwe use tensor flow keras package¹⁰for defining layers of our model which acts as an interface for the TensorFlow library¹¹. TheANNmodel consists of 3 dense layers. The first layer has an input with a size equal to the number of features in each record of the data set, and its output size is equal to 20. The second layer is designed with an input size equal to the previous layer and an output size of 10. The activation function of the first two layers is the hyperbolic tangent (TANH) function. A single node dense layer creates our lastANNmodel layer for the final regression estimation. The other parameters of our designed model are as follows:

loss = ’MAE’, optimizer = ’sgd’, epochs = 500.

7https://docs.h2o.ai/

8https://catboost.ai/

9https://xgboost.readthedocs.io/

10https://keras.io/

11https://www.tensorflow.org/

5

D e s i g n a n d c o n s t r u c t i o n A l g o r i t h m s

No documento Diagnosis and Prognosis of Occupational disorders based on Machine Learn- ing Techniques applied to Occupational Profiles (páginas 83-87)