• Nenhum resultado encontrado

Performance evaluation of feature classification for damage detection 31

2.5 Statistical modeling for feature classification

2.5.4 Performance evaluation of feature classification for damage detection 31

The performance evaluation of damage detection is a fundamental aspect for com- paring models and methods. For the two-class problem (binary classification) in SHM, in which the two sets of cases are labeled as damaged (or positive, P) or undamaged (or negative, N), assuming a given threshold, there are four possible outcomes as synthesized in Table 1 and Figure 7. For a positive outcome, the case can be either true positive (TP), if the observed is positive, or false positive (FP), if the observed is negative. On the other

hand, for a negative outcome, the case can be either false negative (FN), if the observed is positive, or true negative (TN), if the observed is negative. The shaded portion of Table 1 represents the confusion matrix (also known as contingency table), where the numbers along the major diagonal represent the correct classifications, and the numbers out of the diagonal represent misclassifications, also known as Type I (FP) and Type II (FN) errors.

Table 1 – Accuracy of binary classification.

Observed

Outcome Positive Negative Total

Positive True Positive (TP) False Positive (FP) TP+FP Negative False Negative (FN) True Negative (TN) FN+TN

Total TP+FN FP+TN TP+FN+FP+TN

Undamaged Condition

Damaged Condition

Probability

Damage Indicator (DI) FP

FN

TP TN

Figure 7 – Distributions from the undamaged and damaged conditions.

Therefore, false indications of damage fall into two categories: (i) false-positive (indication of damage when none is present, Type I error) and (ii) false-negative (no indication of damage when damage is present, Type II error). Errors of the first type are undesirable, as they cause unnecessary downtime and consequent loss of revenue as well as loss of confidence in the monitoring system. More importantly, there are clear safety issues if misclassifications of the second type occur (FARRAR; WORDEN, 2007). Pattern recognition algorithms allow one to weigh one type of error above the other; this weighting may be one of the questions answered at the operational evaluation phase.

Also receiver operating characteristic (ROC) curves provide a comprehensive and graphical manner to summarize the performance of different methods (BRADLEY, 1997).

The ROC curves were introduced in signal detection theory by electrical and radar en- gineers during the World War II for detecting enemy objects in battle fields. Since that time, the ROC curves have become increasingly common in fields, such as finance, atmo- sphere science, engineering and medicine. In the field of machine learning, these curves have become a standard tool to evaluate the performance of binary classifiers.

The ROC curves focus on the tradeoff between sensitivity and 1-specificity. As shown in Figure 8, the sensitivity is sometimes called the true-positive rate, TPR = TP/(TP+FN), and defines the fraction of true detection. The 1-specificity is sometimes called false-positive rate, FPR = FP/(FP+TN), and defines the fraction of false alarm.

Each point on the ROC curve corresponds to a specific threshold, although the values of thresholds are not evident from the square plot. The diagonal line divides the ROC space into two parts and represents a classifier that performs random classifications. Any point in the upper-left triangle means that the classifier has some understanding of the classes. Moreover, the closer the ROC plot is to the upper-left corner, the higher the overall accuracy of the classifier. On the other hand, any point in the lower-right triangle means that the classifier is performing worse than random, i.e., the classifier has some underlying information about the classes but applies it in the opposite manner.

1

0.5

0

0.5 1 0

False Alarm - FPR

True Detection - TPR

No Pre

dictive Va lue Actual Test

Ideal Test

Figure 8 – Example of a ROC curve; the diagonal line divides the ROC space into two parts and represents a classifier which performs random classifications.

2.6 Challenges for statistical modeling for feature classification

In this chapter, an overview related to the SPR paradigm for SHM solutions was presented. The challenges of the first two phases of this paradigm were discussed in terms of planning for the deployment and operation of the SHM systems. In the feature extrac- tion phase, the need for damage-sensitive features correlated to damage and completely uncorrelated to everything else was demonstrated and the techniques used in this thesis were briefly highlighted. A major focus was dedicated to the fourth phase (statistical modeling for feature classification) because the mathematical formulations of the state-

of-the-art machine learning algorithms for data normalization and the statistical models for damage detection, as well as other important issues discussed in this phase, are the basis for the understanding of novel methods proposed in Appendices from A to F.

Finally, based on the major focus of this study, some of the challenges for statistical modeling for feature classification are discussed in the following:

∙ The damage detection process is currently posed in the context of false-positive and false-negative indications of damage. This technique recognizes that a false-positive classification may have different consequences than false-negative ones. Thereby, analytical approaches to defining threshold levels must: balance tradeoffs between false-positive and false-negative indications of damage, minimizing false-positives when economic concerns drive the SHM applications, and minimizing false-negatives when life-safety issues are the motivations to deploy the SHM systems;

∙ Updating statistical models as new training data become available;

∙ Managing the massive volumes of data that will be produced hourly or daily by an online monitoring system;

∙ Learning the normal condition of a structure considering all normal variability (e.g., temperature and traffic loading) and yielding reliable results, i.e., estimating the undamaged model with a method that takes into account minimal loss of information and avoids dependence on the initial parameters;

∙ The choice of a method for a specific application must be done as a function of the damage-sensitive features used, as well as the distribution of these features when influenced by linear or nonlinear operational and environmental variability.

3 Summary of original work and discussion

This chapter summarizes the original methods proposed in this study for damage identification in SHM. First, the overall methodology for damage detection and quantifi- cation is presented and important rules are discussed to accomplish it. Second, the papers which composed this thesis are summarized. A brief comparison between the proposed methods is then performed on natural frequencies from the Z-24 Bridge. In addition, a list of publications in the context of this thesis is also highlighted.

3.1 Methodology for damage detection and quantification

The overall methodology for damage detection and quantification used in all papers (Appendices from A to F) is depicted in Figure 9. This flow chart shows that after several vibration signals have been acquired, the feature extraction phase estimates the damage- sensitive features, which can be natural frequencies or parameters/residuals from an AR model. Afterwards, some part of the undamaged features is used as learning data to the model and threshold estimations. Note that this selected part should cover all operational and environmental variability. The test phase is then accomplished by computing a DI (i.e., quantifying the damage) for any new feature, with support from the undamaged model estimated in the training phase. Finally, the damage detection is performed by classifying the DI as undamaged or damaged according to the threshold.

A graphic representation of the expected results from the damage detection pro- cess is shown in Figure 10. In this case, the test matrix is composed of undamaged observations and () damaged observations. After the test phase, the DIs computed from new features are classified according to a threshold with a given confidence level.

The Type I errors are DIs that exceed the threshold value in the undamaged condition domain, (1 to ). On the other hand, the Type II errors are DIs that do not surpass the threshold value in the damaged condition domain, ((+ 1) to ). Additionally, note that through the amplitude of the DIs it is possible to relatively quantify the damage.

This fact can be used to discover which test case is the most severe one, and to determine whether a method can remove operational and environmental variability in the training and test phases (for example, see Appendix A).

Data acquisition

Feature extraction

Test phase

Outlier detection

Structural response data

Test data Training data

Damage indicators Threshold

Learned model Training

phase START

FINISH

Figure 9 – Flow chart of the methodology for damage detection and quantification.

Statistical modeling for feature classification Outlier detection

DI

Test number

1 m l

BC (Test) DC Outliers

Threshold

f (x1) f (x2)

f (xn) ...

Training data

Machine learning algorithm

DIj

j=1,...,l

f (z1) f (z2)

f (zn)

h (z1) h (z2)

h (zn)

Z l x n

Test data

F H

c DI Pr

...

X m x n

Figure 10 – Graphic representation of the statistical modeling for feature classification and results related to damage detection and quantification.

A wide range of algorithms can be employed to the training phase, test phase and threshold estimation, thus many combinations of different algorithms are available to compose a method. To avoid unfeasible combinations and to easily define the training data matrix and thresholds, some general rules applied in all papers are enumerated below.

1. The training data matrix should cover all operational and environmental variability under which the structure of interest operates. Therefore, it was assumed the need for training the machine learning algorithms with almost one-year baseline data to

cover one seasonal cycle, taking into account the temperature and humidity effects of winter and summer;

2. If the training phase is performed by the kernel- or PCA-based methods, there are three possible results from this procedure depending on the availability of the residual matrix E during the test phase (see subsections 2.5.2 and 2.5.3);

a) If a method explicitly computes the matrix E, one can estimate an ad hoc threshold and compute the DIs with the Euclidean distance or a threshold based on central Chi-square hypothesis and compute the DIs with the MSD;

b) If a method implicitly estimates the matrix E, one can estimate an ad hoc threshold and compute the DIs with the Euclidean distance;

c) If a method does not estimate the matrix E, one can estimate an ad hoc threshold and compute the DIs with the Euclidean distance or MSD (depending on the undamaged model available from the training phase);

3. When the training phase or data normalization is performed by clustering, there are two possible results from this procedure: only a mean vector (also known as centroid) for each cluster, or a mean vector and a covariance matrix for each cluster;

a) For the first case, one should estimate an ad hoc threshold and compute the DIs with the Euclidean distance;

b) For the second case, one should estimate a threshold based on central Chi- square hypothesis and compute the DIs with the MSD. Note that the second case includes the first one when only the centroids are used.

For general purposes of correspondence, all cases from rule 2 are applicable to the proposed methods in Appendix A. The first case from rule 3 is applicable to the proposed methods in Appendices from B to F. The second case from rule 3 is applicable only to the proposed methods in Appendices C and D.