Classification of Clinical Data using Sequential Patterns: A case study in Amyotrophic Lateral Sclerosis.

(1)

Classification of Clinical Data using Sequential Patterns: A case study in Amyotrophic Lateral Sclerosis.

Andr´e V. Carreiro

^⇤†

Susana Pinto

^‡

Mamede de Carvalho

^‡

Sara C. Madeira

^⇤†

Cl´audia Antunes

^⇤

Abstract

Until recently, knowledge discovery would be restricted to a static analysis, disregarding any temporal or sequential relations within the data. In the last decade, temporal data mining developed to be a hot topic of research, looking for those temporal dependencies, unveiling new insights in various areas of interest, including bioinformatics. Sequential pattern mining tries to achieve such goals, by finding frequent patterns within a population and returning them to the user. However, its application as a basis for a direct classification problem with clinical data was never studied, to our knowledge. Hence, this work uses discovered sequential patterns as features for standard classifiers, using a clinical dataset obtained from Amyotrophic Lateral Sclerosis (ALS) patients. The preliminary results are very promising, achiev- ing a prediction accuracy over 83% with a very reduced set of features, both from original data and sequential patterns.

Future work includes advancing from a classification problem to prognosis prediction.

1 Introduction

Knowledge discovery has been applied for many decades, with the goal of extracting useful hidden information from data. Nonetheless, most of the approaches in this area would, until more recently, be limited to a static analysis, disregarding possible (and sometimes clear) temporal dependencies within the data. In the last decades, however, significant advances have been made with regard to this issue, thus resulting in several techniques for temporal mining [1, 2, 3, 4, 5, 6].

There are many applications of temporal mining in real-life problems [7], ranging from market analysis [1], telecommunications [5] and multimedia [8], evolution of financial data [6], genetics [5], to healthcare, including treatment e↵ectiveness [2, 3], prognosis prediction and diagnosis support [3, 4]. These last domains gain particular importance, given that a faster response to disease progression, or even prior to its onset, will certainly be of great importance for such patients, their families and healthcare providers.

⇤Instituto Superior T´ecnico (IST), Technical University of Lisbon, Portugal

†Knowledge Discovery and Bioinformatics (KDBIO) Group, INESC-ID, Portugal

‡Neuromuscular Unit, Institute of Molecular Medicine and Faculty of Medicine, University of Lisbon, Portugal

In this work we focus on sequential pattern mining (SPM), and its application in medical problems.

We use a clinical dataset from Amyotrophic Lateral Sclerosis (ALS) patients with the same goal of Ama- ral [9], where standard machine learning methods are used to predict the need for non-invasive ventilation (NIV), withouth sequential patterns (SP) as features.

This paper is organized as follows: first, the related work is presented, including some techniques for SPM.

Following, we describe the used dataset and reveal the main results from the application of SPM for classification, discussing the main conclusions and future work.

2 Background

2.1 Sequential Pattern Mining In this context, a sequence is an ordered list of sets of elements called items; and the sum of the number of items in each set, corresponds to the sequence length. The sets of items are named itemsets, also called transactions due to its application in market data, orevaluationsof a batch of exams in a time window (corresponding to a time point), in the clinical context. Given the possibility of allowing a gap in sequence matching, a sequence a =< a₁a₂...a_n > is a -distance subsequence of another sequenceb=< b1b2...bm>, if there are integers i1< i2< ... < in such thata1✓bi1, a2✓bi2, ..., an ✓ bin and ik ik 1  ,8k 2 {1, ...n}, being said that a ✓ b [12]. Note that a 1-distance subsequence corresponds to a contiguous subsequence.

Given a database of sequences, D, and minimum support threshold, , a sequence is considered frequent if it is a subsequence of (or supported by) at least sequences in D. However, this threshold is usually used as a proportion of the total number of sequences in D. Finally, a sequential pattern is a frequent maximal sequence, being the goal of SPM to find all the SPs [10, 11, 12].

2.1.1 Apriori-Based Methods Generalized Se- quential Patterns (GSP) [10] was one of the first approaches for mining sequential patterns; it is based on

(2)

the Apriori method for mining association rules [11], and the idea is to generate candidate sequences and test them. The main di↵erence to Apriori, is on the candidate generation method: it joins two patterns to form a new candidate whenever the maximal prefix of one is equal to the maximal suffix of the other. Ad- ditionally, GSP introduced a set of features respecting the ordered nature of data, including gap constraints.

These constraints are useful for limiting the number of patterns, by only considering -distance subsequences, with a fixed value for .

2.1.2 Pattern-Growth Methods These methods were developed more recently with the goal of avoiding the candidate generation step completely, while focusing the search on a reduced set of the initial database [12].

One of the algorithms is the PrefixSpan [13], based on a recursive construction of the SPs. This is possible by means of projected databases: an ↵-projected database is a set of subsequences, in the database, that are suffixes of sequences with prefix↵, which can be sim- ply an itemset, or a subsequence. To account for the gap constraints, Generalized PrefixSPan (GenPrefixS- pan) was created [12]. This generalization is based on redefining the projected database construction. Instead of restricting the search to the first occurrence of the element, every element’s occurrence is considered. For performance, and gap constraints consideration, Gen- PrefixSpan was chosen for this work.

2.2 Pruning Sequential Patterns The number of found SPs can achieve a value over the millions for lower minimum support thresholds. This quantity is impractical to analyse, either from a simple mining and interpretation, or for classification purposes, thus leading to the need of reducing the set of found SPs.

The first reduction, considering the following problem of classification, is to remove the trivial SPs, which present only a single transaction with a single item (single evaluation of a single feature), losing any sequential or feature clustering properties.

A possible method of further pruning is based on a minimum improvement criterion [14]. The idea, shared with other methods, is to prune the SPs which are not significantly more informative than other returned SPs.

More specifically, for a pattern p of length n, if the following criterion is met for any supersequence qofp, thenpis excluded.

8q, p:p✓q^support(p)support(q) +minImp )exclude p Note that support(p) represents the number of sequences (or patients), in the database, sharing the sub-

sequencep. It is interesting to note that forminImp= 0, the algorithm returns exactly the set of closed SPs, whereas a value ofminImp=Inf inityreturns exactly the set of maximal SPs (theorems and proofs in [14]).

At this point, it is interesting to discuss these two types of SPs. Maximal SPs [1], are frequent patterns that are not a subsequence of any other SP. Their major limitation is that many highly supported (and possibly crucial) patterns are excluded from the result set. An- other approach is to mine the closed SPs, which are not a subsequence of any other pattern that presents exactly the same support. One of the main advantages of closed patterns is that all frequent patterns (and respective supports) can be generated from closed patterns, thus forming a condensed representation of the result set. Nonetheless, the impact of mining only closed patterns can lead to an unnoticeable reduction, especially for sparse datasets [14].

2.3 Related Work of SPM on clinical and bio- logical data In 1970, Lasker [15] studied sequences of symptoms for a set of patients, and was able to identify some of the diseases based on SP recognition. In the beginning of the last decade, Ramirez et al. [16], applied temporal pattern discovery on a HIV dataset.

The main goal was to determine if people with the same chronic illness presented a similar experience along the evaluations. Although the authors use a Decision Tree classifier to determine the health status of a patient, the goal is still to mine significant patterns, not using the patterns for classifying the health status. In a similar context, Lin et al. [17] presented a new SPM technique aiming at the discovery of time dependency patterns of clinical pathways for brain stroke patients. With information on such patterns, the health care procedures for new patients are, in turn, more e↵ective and efficient.

Concaro et al. [18], in 2007, applied SPM in order to unveil association patterns of diagnoses shared by hospi- tals of the USA. The identified SPs can provide information about the most frequent healthcare episodes in the country, and even expose temporal precedence between diseases, suggesting a possible causal relationship.

More recently, Choi et al. [19] studied the possibility of predicting a patient’s revisit to a public health center, and applied SPM to anticipate diseases of patients who do revisit such centers. Some of the found SPs among the diseases of revisiting patients can provide a better and earlier understanding of foreseeable diseases, as well as adequate precautionary measures.

Tseng and Lee [20] introduced a method called Classifying-By-Sequence (CBS) where probabilistic in- duction is used, to extract inherent SPs which then can be efficiently used for classification, although this was

(3)

tested only on synthetic time series data.

Nonetheless, to our knowledge, classification of clinical data using SPM was never studied. It is clear that it is crucial to improve the efficiency of healthcare providers, as well as developing and improving the techniques to unveil more and more specific temporal relations (patients, symptoms, etc), in order to achieve a faster diagnosis, but also a more confident prognosis.

3 Methods

In this section, the used dataset (and preprocessing) is described. Then, we discuss the classification strategies.

3.1 Amyotrophic Lateral Sclerosis Dataset This dataset consists of clinical and laboratory data, and following the work of [9] with expert validation, it comprises 29 di↵erent features (excluding the patient identication code) of 506 patients, resulting in 2694 evaluations (or transactions) in total.

The distribution of the number of transactions per patient is shown in Figure 1 (left). The ALS dataset presents an average value of 5,3 transactions per sequence (or temporal evaluations per patient), ranging from 1 to 22 time points in length, thus confirming that this dataset is appropriate for SPM, since its time series are relatively long and consistent across patients. Figure 1 (right) shows the distribution of the mean number of items for each patient’s evaluation (or transaction), presenting an approximate normal curve, where, in average, 18 di↵erent features were assessed.

3.2 Data Preprocessing One of the best known applications of SPM is in market basket data, where an example of a sequence can be, for a list of products bought by a customer:

(Book1;Book2)(Book3;Reading Glasses) which means that the customer is associated to two transactions: two books and, later, a pair of reading glasses with a third book. In order to apply such methods to our clinical datasets, they have to go through some transformations so the representation can resemble a registry of transactions. Hence, the first step was to discretize the data, according to some expert guidelines, already studied in [9]. This is crucial in SPM, since it can only be applied on categorized data.

Then, all features were binarized, so that the end result consists of sequences such as

(Att1 = V al1₁;Att2 = V al2₁)(Att1 = V al1₂) where Valij is thejth value for theith attribute.

3.3 Classification The main goal of this work is to perform classification on clinical datasets, using SPM to achieve the predictions. However, first we must define what is the feature which will be used as class. In this case, there is a feature associated to the need of NIV by the patient. In this context, the class is the information if a patient, at some point during his evaluations, re- quires (or not) NIV. The class distribution is as follows:

228 patients never needed NIV throughout their evaluation (45%), and 278 patients need NIV in the final evaluation (55%, considered the baseline for performance).

It is a simple task of classification, leaving further problems such as prognosis for future work. The idea is to start using SPs as features for standard classifiers, com- paring the performance with the case where the original features are used, and then checking if a combination of the two types of data returns some improvement. The standard classifiers used (initially with default parameters) are from the machine learning package Weka, available inhttp://www.cs.waikato.ac.nz. These include Decision Tree (DT) J48, k-Nearest Neighbor (KNN), Support Vector Machines (SVM) with SMO implementation, and Radial Basis Function (RBF) Network. A 5x10-fold cross validation scheme is used, with random seeds {1,11,21,31,41}. In what concerns the performance evaluation, we retrieve several metrics, such as the prediction accuracy, confusion matrix, kappa statistics, F-measure, precision and recall.

3.3.1 Using Original features after temporal aggregation Since the original dataset consists of different rows corresponding to the di↵erent evaluations for each patient, there was the need to transform the representation such that each patient’s information was included in a single row, for comparison purposes (and use in the enrichment phase). The permanent features, as demographics data, were maintained. The variable features were dealt with two di↵erent approaches The first one consisted in analysing the nominal variation between initial and final evaluations, assigning a symbol accordingly: {U - Up, D - Down and N - No change}.

The second approach was based on calculating the variation between initial and final records, and considering the number of evaluations to get a (normalized) slope per time point. The resulting matrix is given as input to standard classifiers. An example of such transformations can be found in Figure 2.

3.3.2 Using Sequential Patterns This approach consists of using the found SPs to build a binary matrix, M, with P rows and N columns (respectively, the number of patients and of SPs). Mij = 1 if and only if patient isupports the pattern j. Matrix M is then

(4)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

#Patients 69 63 70 55 54 4544 18 19 14 13 9 8 8 3 4 2 2 3 1 1 1 0

10 20 30 40 50 60 70 80

Number of Patients

Distribution of Transactions per Patient

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

#Patients 1 2 2 4 10 21 41 53 75 66 71 42 37 28 23 15 4 7 0 3 1 0

20 40 60 80

Number of Patients

Distribution of Mean Number of Items per Patient

Figure 1: Left - Distribution of the number of transactions (temporal evaluations) per patient for the ALS dataset. Right- Distribution of the mean number of items per patient for the ALS dataset. The mean number corresponds to the average value of the size (number of items) of all the transactions.

Gender BMI Age at onset ALS-FRS R

2 20.83 67 30 11

2 20.83 67 30 12

2 20.83 67 8 12

2 20.83 67 D U

2 20.83 67 -7.333 0.333

Original Data

Nominal Variation

Slope

Figure 2: Example of transformation of the original data: nominal variation and numerical slope. Gender = 2 is Female; BMI stands for Body Mass Index (at first visit); ALS-FRS is the Functional Rating Scale and R is a parameter related to this scale.

used as input to the mentioned standard classifiers, and the patient is classified as requiring NIV or not. It is important to stress that any SP containing the NIV attribute is removed from the analysis.

3.3.3 Using Enriched Data Taking into the consideration the improvement obtained in [21], we decided to assess if there would be any signs of improvement in the classification results when the original features were enriched with the SPs. Thus, both the previous matrices are merged, and then given as input to the standard classifiers, as in the previous situations.

4 Results and Discussion

In this section the results of applying SPM are presented and discussed, accounting for the data characteristics.

4.1 Sequential Patterns As explained in the previous section, the main parameters of the SPM algorithm are the minimum support, and allowed gap. Several experiments were performed, in order to assess the impact of changing these parameters in the resulting sequences, either in number and/or length. Figure 3 shows the total number of discovered frequent sequences, and their distribution according to their length (number of time points) for variable support (a) and gap (b).

Note that, for an easier visualization, the vertical axis in Figure 3(a) is in a logarithmic scale. This means that a minimum support of 0.2 (or 20%), returns a much higher number of SPs, reaching a total over 3 million.

On the other hand, a value of 0.6 returns a total of 97 SPs. In the context of sequence length, we have to consider once again the use of a log scale. Hence, it may seem that SPs of a single time point are in much higher numbers, when that is not the case. In fact, for a support of 0.2 (see Figure 3(a)), the most common length is 4, whereas it is 3 for 0.3, 2 for 0.4, and finally 1 for 0.5 and 0.6.

The analysis presented in Figure 3(b) does not require a vertical log scale, and thus one can easily compare and interpret the total number of SPs and their distribution according to their length. As expected, with higher values of allowed gap, one obtains a higher number of SPs, although this total tends to stabilize towards the end. However, the distribution according to the sequence length is somewhat preserved, with the greatest variations ocurring for lengths 2 and 3.

(5)

0,2 0,3 0,4 0,5 0,6

7 5420 0 0 0 0

6 114666 20 0 0 0

5 934220 3281 11 0 0

4 1086182 14594 260 3 0

3 655448 16005 926 91 5

2 220347 13118 1306 187 28

1 25371 3925 814 250 64

1 10 100 1000 10000 100000 1000000 10000000

Number of Sequences

Variable Support (fixed Gap = 0)

0 1 3 5 7

6 20 24 28 28 28

5 3281 3865 3885 3893 3893

4 14594 17719 18715 18893 19088

3 16005 20729 25720 27138 27654

2 13118 15849 18614 19963 20527

1 3925 3925 3925 3925 3925

100000 20000 30000 40000 50000 60000 70000 80000

Number of Sequences

Variable Gap (fixed Support = 0.3)

a) Minimum Support b) Allowed Gap

Figure 3: Total number of frequent sequences and distribution according to their length: a) with variable minimum support, and a fixed gap = 0 (Log scale); b) with variable allowed gap, and a fixed minimum support of 0.3.

Finally, the choice was to proceed with a fixed gap and variable support threshold, given that this revealed to be a greater source of variation in the found SPs.

4.2 Pruning the Sequential Patterns As aforementioned, the number of obtained SPs can be ex- tremely high (see Figure 3), and thus, to be able to perform classification, these data have to reduced. The first observation is that the number of SPs obtained with a minimum support of 0.2 are far beyond the accept- able, so we restrict our further analyses to the interval {0.3,0.4,0.5,0.6}. However, even for the other values of minimum support, we still end up with a great number of SPs, from which a significant number may be less significant in terms of classification purposes. Hence, pruning is applied, as discussed before. The threshold for the minimum improvement was tested only for two possible values: 0 and Infinity (in fact, the largest possible integer), corresponding, respectively, to finding the closed and the maximal SPs [14].

The results of pruning the SPs can be found in Fig- ure 4. As expected from the minimum improvement criteria, the number of closed SPs is higher than the maximal ones. However, the reduction is not so significant, and it remains somewhat consistent across the di↵erent values of minimum support, exception made to the value of 0.6, with much less original SPs (under 100). Consequently, there are still thousands of SPs for a minimum support of 0.3 and 0.4, which would be un- bearable for further classification applications. This led to the decision of introducing a parameter to greatly minimize the number of used SPs: use only the N

most supported SPs. This value was varied in the set {30,100,200,500,1000,2000,5000}, although for further analyses, according to the interestingness of the results, we restricted the variation to the set{30,200,1000}.

0,3 0,4 0,5 0,6

Original 50943 3317 531 97

Max Seq Patt 37370 2272 373 60

Closed Patt 50875 3277 498 79

1 10 100 1000 10000 100000

Number of Sequential Patterns

Pruning Results

Figure 4: Results of pruning with minimum improvement threshold of 0 (Closed patterns) and Infinity (Maximal patterns) vs minimum support (Log scale).

Still from Figure 4, one can see that, for example, a maximum of 200 or 1000 most supported SPs returns the same number of pruned ones for a minimum support of 0.6 (79 closed and 60 maximal SPs). Thus, when it is said that 200 most supported SPs were used, this is, in fact, the allowed maximum.

4.3 Classification One of the primary goals of this work was to evaluate the performance of di↵erent standard classifiers, when SPs were used as features, thus directly introducing the temporal information. We note that, to our knowledge, this kind of application has not been performed yet when considering clinical data.

(6)

First, the results of the classification problem using the original features after temporal data aggregation are presented. Then, we compare these results to the ones obtained when the found SPs are used as features. Fi- nally, we assess the classifiers’ behavior when we enrich the feature space with both original features and the discovered SPs.

4.3.1 Temporal Data Aggregation Figure 5 shows the prediction accuracy values of the classification problem using the original features, after an appropriate temporal aggregation, so that each patient is characterized by only a single row reflecting the whole set of temporal evaluations.

DT KNN SVM RBF

Nominal Variation 77,67 65,02 73,12 71,34

Slope 71,74 52,17 68,18 67,19

50,00 55,00 60,00 65,00 70,00 75,00 80,00 85,00

Prediction Accuracy (%) Temporal Data Aggregation

Figure 5: Prediction accuracy obtained when using the original features, after temporal data aggregation with nominal variation (Up, Down, No change) or slope values (change per evaluation). Weka classifiers Decision Tree J48 (DT), k-Nearest Neighbor (KNN), Support Vector Machine (SVM) with SMO implementation and Radial Basis Function (RBF) Network were used, under 5x10 cross validation. Standard deviation bars are also shown.

It is clear, from Figure 5, that when the aggregation is performed with nominal variation, using an alphabet {U,D,N}to account for changes from the initial evaluation to the last, the classification performance is significantly better than when the (numerical) slope values are used, even analyzing other metrics such as the confusion matrices, kappa statistics, precision and recall, not shown here due to space restrictions (and given the approximately balanced data).

4.3.2 Sequential Patterns Figure 6 shows the prediction accuracy values obtained when using the discovered SPs as features for di↵erent standard classifiers.

Figure 6 shows the prediction accuracy values corresponding to a minimum support of 0.5 which returned the best overall results. Never It is worth discussing some particular behavior that is not clear from Figure 6,

DT KNN SVM RBF

30 MS 78,02 72,47 77,26 75,26

200 MS 78,64 69,50 80,61 75,08

1000 MS 78,23 68,52 80,21 72,54

65,00 70,00 75,00 80,00 85,00

Prediction Accuracy (%)

Classification using only Sequential Patterns

Figure 6: Prediction accuracy obtained when using only a maximum of 30, 200 and 1000 most supported sequential patterns. Weka classifiers Decision Tree J48 (DT), k-Nearest Neighbor (KNN), Support Vector Machine (SVM) with SMO implementation and Radial Basis Function (RBF) Network were used, under 5x10 cross validation. Standard deviation bars are also shown.

since these particular results are the ones corresponding to a minimum support of 0.5, which returned the best overall results.

Nevertheless, it is interesting to analyse these results having the ones from Figure 4) in mind. For all the possible values, the results change considerably only when the maximal SPs are used, because the reduction to closed patterns is much less significant. However, there are changes when the maximum allowed is higher than the total number of returned SPs. This happens, for instance, when a maximum of 200 most supported SPs is allowed. Since the total number for a minimum support of 0.6 is under 100 (see Figure 4), in this situ- ation there is a significant change in the performance.

Similarly, a maximum of 1000 most supported SPs is higher than the total number of patterns for a minimum support of 0.5 and 0.6. As aforementioned, given that the results for the values 0.3 and 0.4 are considerably worse, the prediction accuracy values shown in Figure 6 correspond to a minimum support of 0.5, although all of its patterns are used, not reaching the maximum of 1000. Finally, in what concerns to the minimum improvement as the pruning criteria, there is not a clear advantage of using either closed or maximal SPs. In fact, for a maximum of 30 most supported SPs, the best result is associated to the maximal patterns, whereas for values of 200 and 1000 most supported SPs, the best results are obtained using closed patterns.

With expert validation, the most supported SPs can be studied to verify if any of them can bring new insights on disease progression.

(7)

DT KNN SVM RBF 30 MS 74,11 69,37 77,27 73,72 200 MS 77,08 69,57 78,06 73,72 1000 MS 76,88 68,97 78,46 72,92

65,00 70,00 75,00 80,00 85,00

Prediction Accuracy (%)

Slope + Sequential Patterns

DT KNN SVM RBF

30 MS 78,06 72,92 78,46 74,90 200 MS 78,66 71,34 79,64 76,09 1000 MS 77,47 70,55 80,24 74,31

65,00 70,00 75,00 80,00 85,00

Nominal Var. + Sequential Patterns

Figure 7: Prediction accuracy obtained when using the original features, after temporal data aggregation, enriched with the information from sequential patterns (30, 200 and 1000 most selected (MS)). Weka classifiers Decision Tree J48 (DT), k-Nearest Neighbor (KNN), Support Vector Machine (SVM) with SMO implementation and Radial Basis Function (RBF) Network were used, under 5x10 cross validation. Standard deviation bars are shown.

4.3.3 Enrichment As stated in a previous section, enrichment with temporal information in the form of SPs has been shown to improve results [21]. Figure 7 shows the prediction accuracy values obtained with both types of data: (left) nominal variation or (right) slope, and a maximum of 30, 200 and 1000 SPs.

The first observation is that, again, the nominal variation returns higher prediction accuracy values than when the numerical slope is used, although in this case, the di↵erences are less significant. Assessing a possible improvement in the classification performance, it is possible to conclude that the results in Figure 7 are higher than the ones shown in Figure 5, relative to using only the original features. However, the di↵erences from the former results to the ones in Figure 6, using only the SPs, are unsignificant. This could mean that the classifiers are discarding, almost completely, the original features, making use solely of the SPs.

To further analyse this question, feature selection (FS) was applied in order to verify if, in fact, the original features would be discarded as being irrelevant to the classification, and, on the other hand, if there is a reduced set of SPs, still useful for the classification problem. Figure 8 shows the corresponding results, when using the BestFirst algorithm in Weka, although other FS algorithms were also tested.

The results obtained with FS are considerably better than before for most of the classifiers, especially KNN. As it was somewhat anticipated, the selected features include only one to three of the original features.

Nevertheless, these results, even slightly higher than when using only SPs, are very interesting, taking into account that the whole set of SPs is reduced to about only seven. The selected features are shown in Table 1 (note that the kept features were similar for nominal variation and slope). It is interesting that the attribute

(MND = 2), meaning that the patient has no family history of Motor Neuron Disease (MND), is present in most kept SPs, while the original feature MND is discarded.

Table 1: Features selected by Weka’s BestFirst algorithm. * Only for slope aggregation ** Not present for the 30 most supported SPs. MND stands for family history of Motor Neuron Disease (1 - yes, 2 - no).

MIP stands for Maximal Inspiratory Pressure (%). R is the respiratory section of the ALS-FRS-R (Revised Functional Rating Scale). Sp02mean is the mean value of oxygen saturation, Dips/h being another respiratory parameter.

Original Sequential Patterns (MND=2;R<11)

(MND=2)(MND=2)(MND=2)(MND=2)

R (MND=2;MIP<60)(MND=2)

Weight* (R<11;ALS-FRS-R<36)**

(SpO2min>80;Dips/h<4)**

(MND=2)(MND=2)**

4.3.4 Sensitivity to Parameter Optimization Since all these results are relative to using the parameters by default in Weka classifiers, we proceeded to study how sensitive the classifiers based on SPs are to parameter variation. Thus, a simple grid search was performed for each of the used classifiers and classification approach (original features, SPs and enriched data). The best prediction accuracy values can be found in Fig- ure 9, where it can bee seen that the results are close to the ones obtained with default parameters. Note that the complexity factorcof the SVM classifier seems to be

(8)

DT KNN SVM RBF 30 MS 78,46 80,83 78,85 79,45 200 MS 81,82 81,62 78,66 79,05 1000 MS 81,82 81,62 78,66 79,05

65,00 70,00 75,00 80,00 85,00

Nominal Var. + Seq. Patterns (FS)

DT KNN SVM RBF

30 MS 78,85 67,19 78,26 77,47 200 MS 78,66 75,10 77,27 77,67 1000 MS 78,66 75,10 77,27 77,67

65,00 70,00 75,00 80,00 85,00

Slope + Seq. Patterns (FS)

Figure 8: Prediction accuracy obtained when using the original features, after temporal data aggregation, enriched with the information from sequential patterns (30, 200 and 1000 most selected (MS)). Feature Selection (FS) is performed (BestFirst in Weka). Weka classifiers Decision Tree J48 (DT), k-Nearest Neighbor (KNN), Support Vector Machine (SVM) with SMO implementation and Radial Basis Function (RBF) Network were used, under 5x10 cross validation. Standard deviation bars are shown.

the most variable parameter. For example, we obtain, for the enriched data with FS, a prediction accuracy of 83.36% with SVM (c= 100 ; poly degree = 3), where the default parameters (c = 1 ; poly degree = 1) returned a prediction accuracy under 79% .

DT SVM KNN RBF

Enrich_FS 82,21 83,36 81,42 78,85

Enrich 79,09 81,54 75,85 76,25

SP_FS 78,93 77,55 79,21 78,30

SP 79,80 80,71 73,95 75,49

Nom_FS 75,38 76,48 74,19 74,74

Nom 77,51 73,04 70,59 72,37

65,00 70,00 75,00 80,00 85,00

Prediction Accuracy (%)

Results after Parameter Optimization

Figure 9: Prediction accuracy obtained with simple parameter optimization, with original features after nominal variation aggregation (Nom), sequential patterns (SPs) and enriched (Enrich) data, with and without feature selection (FS). Weka classifiers Decision Tree J48 (DT), k-Nearest Neighbor (KNN), Support Vector Machine (SVM) with SMO implementation and Radial Basis Function (RBF) Network were used, under 5x10 cross validation. Standard deviation bars are shown.

5 Conclusions and Future Work

SPM has been used with success in several di↵erent domains, including bioinformatics. However, to our knowl-

edge, it was never successfully applied for classification of clinical time series data.

The data statistics can be a good indicator of wether the dataset is, or not, a suitable candidate for classification based on SPM. As seen for the ALS dataset, there are a significant number of longer transactions (more than two time points), and a reasonable mean number of items per transaction. Another crucial aspect for the application of SPM is the possibility of an appropriate discretization, which, in this case, was based on expert knowledge.

Then, several statistics were shown for the obtained SPs, using the original set, and also closed and maximal patterns to assess their influence. In fact, due to being a somewhat sparse dataset, we conclude that the pruning of non-closed SPs is almost unnoticeable. In what concerns to the classification method, the binary information of a patient containing (or not) a given SP is very simple. Nonetheless, the obtained results were very interesting, especially in comparison to the results of [9], in which comparable results of prediction accuracy and other metrics were obtained, although with extensive parameter search and optimization.

As expected, the consideration of the sequential nature of the data improved the results when faced to a simple variation of the original features between initial and final evaluations. Nevertheless, when the combination of both types of data was initially used, the results were similar to the ones using only the SPs. To assess the influence of each type of data, FS was performed, resulting in a (marginally) better performance, with a significant reduction of features (only one or two of the original features, and three to seven SPs were kept).

This is one of the most interesting results, since direct interpretation of these features is possible, which might

(9)

bring new insights on the mechanisms behind ALS progression. Moreover, with more optimized parameters, we can see that enriched data with FS returned a prediction accuracy over 83%, where this value was 80.71%

for using only SPs (with FS was 77.55%). Nonethe- less, as this is a preliminary work, the classification approach might be improved, for example, by using a similar method to Classify-By-Sequence [20], where the classifiable sequences are extracted from each class group, respectively, rather than from the whole dataset.

Nonetheless, the J48 Decision Tree performed well, even when compared to the SVM classifier. This is also in favour of interpretability, since the tree can be retrieved and possibly used for decision making.

Finally, it is our future goal to move from simple classification to prognosis prediction, anticipating that we could reply to clinical questions as: ”What is the probability that a particular patient will require NIV within the following 9 months?” or ”What is the safe time frame for the next appointment without risk of respiratory failure?”.

Acknowledgments. This work was supported by na- tional funds through Funda¸c˜ao para a Ciˆencia e a Tecnologia (FCT), under project contract PTDC/EIA- EIA/111239/2009 to Neuroclinomics and doctoral grant SFRH/BD/82042/2011 to AVC.

References

[1] R. Agrawal and R. Srikant,Mining sequential patterns., Proc. Int Conf. Data Engineering (ICDE), (1995), pp. 3–14.

[2] J. Cara¸ca-Valente and I. Chavarr´ıas,Discovering Sim- ilar Patterns in Time Series., KDD, (2000), pp. 497–

505.

[3] A. V. Carreiro, O. Anuncia¸cao, J. A. Carri¸co and S. C.

Madeira, Prognostic Prediction through Biclustering- Based Classification of Clinical Gene Expression Time Series., Journal of Integrative Bioinformatics, 8:3 (2011), pp. 175–191.

[4] A. V. Carreiro, Artur J. Ferreira, M. A. T. Figueiredo and Sara. C. Madeira, Towards a Classication Ap- proach using Meta-Biclustering: Impact of Discretiza- tion in the Analysis of Expression Time Series., Jour- nal of Integrative Bioinformatics, 9:3 (2012), pp. 207–

222.

[5] G. Das, H. Mannila and P. Smyth,Rule Discovery from Time Series., KDD, (1998), pp. 16–22.

[6] M. Gavrilov, D. Anguelov, P. Indyk and R. Motwani, Mining the Stock Market: Which Measure is Best?, KDD, (2000), pp. 487–496.

[7] C. M. Antunes and A. L. Oliveira, Temporal Data Mining: an Overview., KDD Workshop on Temporal Data Mining, (2001).

[8] V. C. Tseng, P. C. Tseng and K. W. C. Lin, Mining Temporal and Spatial Object Relations in Multimedia Contents., Int Conf. on Wireless Networks, Communi- cations and Mobile Computing, 2 (2005), pp. 1371–76.

[9] P. M. T. Amaral, S. Pinto, M. de Carvalho, P. Tom´as and S. C. Madeira,Predicting the need for non-invasive ventilation in patients with Amyotrophic Lateral Scle- rosis, ACM SIGKDD Workshop on Health Informatics (HI-KDD), (2012).

[10] R. Srikant and R. Agrawal, Mining Sequential Pat- terns: Generalizations and Performance Improve- ments., Proc. Int’l Conf. Extending Database Technol- ogy, (1996), pp. 3–17.

[11] R. Agrawal and R. Srikant,Fast Algorithms for Mining Association Rules., Proc. 20th Int’l Conf. on Very Large Data Bases (VLDB), (1994), pp. 487–499.

[12] C. M. Antunes and A. L. Oliveira, Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints., Machine Learning and Data Mining in Pattern Recognition, 2734 (2003), pp. 239–

251.

[13] J. Pei and J. Han and et al., PrefixSpan: Min- ing Sequential Patterns Efficiently by Prefix-Projected Pattern Growth., Proc. Int’l Conf. Data Engineering (ICDE 01), (2001).

[14] S. Prinke and M. Wojciechowski and M. Za- krzewicz, Pruning discovered Sequential Patterns using Minimum Improvement Threshold., ADBIS Work- shop on Data Mining and Knowledge Discovery (ADMKD’2005), (2005).

[15] G. E. Lasker, Application of Sequential Pattern- Recognition Technique to Medical Diagnostics., Int Journal of Bio-Medical Computing, 1:3 (1970), pp. 173–186.

[16] J. C. G. Ramirez et al, Temporal Pattern Discov- ery in Course-of-Disease Data., IEEE Engineering in Medicine and Biology, 19:4 (2000), pp. 63–71.

[17] F. Lin, S. Chou, S. Pan and Y. Chen, Mining time dependency patterns in clinical pathways., Int Journal of Medical Informatics, 62 (2001), pp. 11–25.

[18] S. Concaro, L. Sacchi and R. Bellazzi, Temporal data mining methods for the analysis of the AHRQ archives., Proc Am Med Inform Assoc Annual Sym- posium, (2007).

[19] K. Choi, S. Chung, H. Rhee and Y. Suh,Classification and Sequential Pattern Analysis for Improving Man- agerial Efficiency and Providing Better Medical Service in Public Healthcare Centers., Healthc Inform Res., 16 (2010), pp. 67–76.

[20] V. S. M. Tseng and C-H Lee,CBS: A new classification method by using sequential patterns., Proc. of the 2005 SIAM International Data Mining Conference, (2005), pp. 596–600.

[21] J. Barracosa and C. M. Antunes, Anticipating Teach- ers’ Performance., Proc.International Workshop on Knowledge Discovery on Educational Data in the ACM International Conference on Knowledge Discovery and Data Mining (KDinED@KDD), (2011).

(10)

Classification and Diagnosis of Myopathy from EMG Signals*

Brian D. Bue

^†

Erzsébet Merényi

^‡†

James M. Killian

^§

Abstract

We present a methodology to predict the presence of myopathy (muscle disease) from intramuscular electromyography (EMG) signals. Myopathy is a form of neuromuscular disease affecting skeletal muscle fibers resulting in muscle weakness. Many types of myopathy are serious and debilitating conditions that are difficult to diagnose and treat. Early detection of such diseases can potentially reduce both patient suffering and medical costs. Intramuscular electromyography is a standard clinical method used to diagnose neuromuscular disorders such as myopathies. By evaluating the shape and frequency of electrical action potentials produced by muscular fibers and captured in EMG measurements, a physician can often detect both the presence and the severity of such disorders. How- ever, EMG measurements can vary significantly across different subjects, different muscles, and according to session-specific characteristics such as muscle fatigue and degree of contraction. By considering normalized, fixed-duration (0.5-2 sec) samples of regions known to be diagnostic in EMG signals measured at full muscle contraction, we can automatically detect the presence of myopathies across different subjects and muscles with ~90% accuracy. We argue that our methodology is more generally applicable than existing methods that depend upon accurate segmentation of individual motor unit action potential (MUAP) waveforms. We present a rigorous evaluation of our technique across different subjects and several different muscles.

Keywords

EMG, myopathy, classification, diagnosis, FFT, frequency domain

1 Automated Diagnosis of Myopathy from EMG Signals

Myopathy (muscle disease) is a form of neuromuscular disorder that results in muscle weakness due to dys- functioning skeletal muscle fibers [2]. A wide variety of both acquired and hereditary myopathies have been identified, many of which are serious and often debilitating conditions that are difficult to accurately diagnose and treat [3]. Early detection of these diseases by clinical examination and laboratory tests can greatly

reduce patient suffering and medical costs. Moreover, data gathered during such examinations may lead to an improved understanding of the nature and treatment of such diseases, and allow development of automated systems that assist diagnosis.

In clinical practice, intramuscular EMG is a standard method used to assess neurophysiologic characteristics of skeletal muscles to diagnose neuromuscular diseases. EMG records electrical action potentials generated by groups of muscle fibers controlled by the same motor nerve, called a motor unit. These motor units are the basic functional units of the muscle that can be voluntarily activated. The shape of individual motor unit action potential waveforms reflect the status and structure of a given motor unit. EMG measurements from patients with myopathy differ from healthy subjects in that their recruited MUAPs usually have shorter duration, lower amplitude, and increased poly- phasicity. Figure 1 illustrates the difference between 0.5 second samples of EMG traces from the deltoid muscle of a healthy subject vs. the deltoid of a subject with myopathy. These, and many additional subtleties characterize differences between healthy and abnormal subjects, depending on the nature and severity of pathology and are extensively discussed in the literature.

In recent years, a number of techniques have been proposed to classify EMG signals for medical diagnosis. Several authors (e.g., [8, 9, 4, 5, 7, 10]) propose segmenting the EMG data in the temporal domain into individual MUAP waveforms, which are then labeled and classified based upon (features derived from) the segmented waveforms. However, such techniques are limited in that they assume that individual MUAPs can be extracted from data in a consistent and reliable manner. Extracting individual MUAPs may be difficult or impossible since MUAPs at high muscle contraction are often in superposition, while pathologies of interest may not be observable at low muscle contraction. Most previous works analyze data obtained with low (less Figure 1: 0.5 second samples of EMG traces from the deltoid muscle

from a healthy subject (top) vs. a subject with myopathy (bottom).

* This work was partially supported by the Wheeler fund from the Baylor College of Medicine.

†Department of Electrical and Computer Engineering, Rice University, Houston, TX. (bbue@rice.edu)

^‡Department of Statistics, Rice University, Houston, TX.

(erzsebet@rice.edu)

§ Department of Neurology, Baylor College of Medicine, Houston, TX. (jkillian@bcm.edu)

(11)

than full) muscle contraction. Moreover, the ratio of recruited MUAPs is another indicator of presence or absence of myopathy, and should also be taken into account, which is often not the case with previous work.

Given the issues with time-domain MUAP segmentation, classifying EMG data in the frequency domain may be a more robust approach. Some recent work has shown good results in classifying neuromuscular disease from EMG data in the frequency domain.

For instance, 6 demonstrated 85% overall accuracy in classifying EMG signals of 59 subjects in the frequency domain into Normal, Myopathy and Neuropa- thy classes. In this work, we also analyze EMG data in the frequency domain. Our work, however, is distinct from previous research in the following:

a) We consider EMG data measured at full muscle contraction, which improves the objective evaluation of per-subject and per-muscle characteristics;

b) We classify diagnostic regions of the full EMG signal in the frequency domain, rather than pre- segmented and manually-labeled individual MUAP waveforms; and

c) In addition to evaluating classification performance on data across different subjects, we also evaluate the characteristics of different muscles for diagnostic purposes.

We provide a rigorous evaluation of generalization capabilities via cross-validation, in contrast to a number of existing works [e.g., 8, 9, 5, 6].

2 EMG Data Description

Our EMG data was collected at the EMG Labora- tory in the Department of Neurology of the Baylor College of Medicine in Houston, TX, by (or under the direction of) Dr. James Killian, M.D. The data we consider consists of 15 EMG sessions from 8 different subjects measured in one or more different muscles.

Three of the subjects are female and the remaining subjects are male. The mean age of the subjects is 56.63 (std. dev=16.4) years. The currently available data are from the biceps brachii, triceps brachii, deltoid and vastus lateralis (VL), selected for their diagnostic utility by the physician. We use the term trace to de- note a record of a “full” EMG session for a single subject on a single muscle. Each trace is collected using the following methodology: A monopolar needle electrode is inserted into a designated skeletal muscle in a proximal arm or leg. The signal is processed through the differential preamplifier to a Cadwell Sierra EMG machine amplifier (Cadwell Laboratories, WA, USA) which transfers the signal to a computer display and loudspeaker for clinical evaluation. The subject then exerts maximum contraction of the muscle under study as the electrode is moved by several millimeters until an adequate interferential muscle pattern of firing motor units is noted on the screen. A 60 sec sample is then

recorded. The process is repeated on 4 to 6 separate muscles and the captured traces from each muscle are stored for subsequent signal analysis.

In a post-labeling session the physician designates each trace as a member of one of the following five classes based upon the observed severity of the pathology in the EMG signal: Healthy/Normal (Nor), Bor- derline Myopathy (Myo1), Mild Myopathy (Myo2), Moderate Myopathy (Myo3), Severe Myopathy (Myo4). The basis of the clinical diagnostic gradings of abnormal myopathic motor units (individual motor unit with durations of activity (?) under 6ms) is related to the estimated percentage of myopathic units relative to the total number of firing motor units. Borderline: 0- 10% abnormal units, mild: 10-25%, moderate: 25- 50%, severe: above 50%. This is a subjective grading based on visual and auditory analysis by co-author JK of the different muscle samples. While our work, in general, does include discrimination of all these classes, here we focus on the methodology of classification from full signals (as opposed to MUAPs) in frequency domain and demonstrate the effectiveness on two classes in favor of keeping the focus on the methodology in this short paper. We present classification results for the above five classes using the methodology described here, in a subsequent paper.

Portions of the traces are not diagnostic and/or saturated due to insertional activity or instrument tun- ing effects. To eliminate the non-diagnostic portions of each trace, the physician manually defines the diagnos- tic regions in each trace, which are temporally- contiguous segments of varying length. While automated identification and separation of non-diagnostic regions is important, it is outside the scope of the present work.

3 Methodology

3.1 Data Preprocessing

We split the diagnostic regions of each trace into fixed slices of ns seconds in duration. We subsequently refer to each of these slices as a sample. Each sample is a m-dimensional vector capturing a temporally- contiguous portion of each diagnostic region. We nor- malize each sample by its L² norm which maps the amplitudes of the samples to a common range. This allows us to reconcile, to some degree, amplitude differences between measurements on different muscles and different subjects at varying contraction levels while retaining other differences of the waveforms. We then map each normalized sample into the frequency domain using the Fast Fourier Transform in MATLAB. We discard the symmetric portion of the frequency-domain samples, resulting in sample vectors of dimensionality m/2. Table 1 gives a summary of the samples we consider with sample duration ns = 0.5 sec.

(12)

3.2 Classification

In this study, we consider the problem of classifying the frequency-domain samples as Normal or My- opathic. To achieve this, we group all of the samples labeled Myo1-Myo4 into a single superclass Myo*.

However, several classes are poorly represented in terms of the number of samples – particularly the Normal and the borderline myopathy (Myo1) classes, which represent only 14.94% and 9.2% of the total samples, respectively. To mitigate this issue, we first balance the sampling distributions of the five (Normal, Myo1-Myo4) classes by augmenting the training data with Nresamp_j = Nmax ! N_j samples, sampled with replacement, from the training samples of each class j, where Nmax is the number of samples of the class with the maximum number of samples, and Nj is the number of samples in class j. This balancing step ensures that samples of varying severity are equally represented, but leads to a sampling bias between the Normal vs.

Myo* superclass. Consequently, we perform an additional balancing step by adding Nnormal = Nall ! NMyo* samples from the normal class to the training set, as before, sampling with replacement, where Nall is the total number of samples, and NMyo* is the number of samples in the Myo* class. After balancing, we have a total of 524 samples for the Normal and Myo*

classes, with the Myo* class consisting of 131 samples of each of the Myo1-Myo4 classes, respectively.

EMG signals may vary between different subjects or on different muscles. Consequently, it is crucial to evaluate classification accuracy when data from different subjects and/or muscles is used as training and test data. To achieve this, after balancing the samples as described above, we perform ten cross-validation splits, where in each split we use data from half of the subjects for test data, and divide the remaining samples into training (3/8^th of the total samples) and validation (1/8^th of the total samples) sets. We ensure by random stratified sampling that the training, test and validation sets each contain instances from each of the Normal and Myo* classes and from each muscle group. The classifier we use is a linear Support Vector Machine (SVM). We select the SVM regularization parameter C from the set {0.01, 0.1, 1, 10, 100, 1000} that yields the highest accuracy on the validation set. We report the mean and standard deviation of classification accuracies produced on the test data in each split.

4 Classification Results and Evaluation

4.1 Classification Accuracy vs. Sample Duration We first evaluate the classification accuracy with respect to the sample duration ns. We consider ns values in the set {0.05, 0.1, 0.2, 0.5, 1, 2}. Table 2 gives the number of balanced samples and the dimensionality m of each sample for each value of ns, and the corresponding mean and standard deviation of classification accuracies across the ten cross-validation splits.

We observe that classification accuracy increases with increasing sample duration. The standard deviation also typically decreases, with the exception of ns=2, where the high dimensionality and small quantity of samples produce slightly less stable results. However, this generally suggests that longer sample durations are desirable, despite the high dimensionality of the resulting feature space. Additionally, our results indicate that it is possible to predict the presence or absence of myopathies from relatively short portions of a full EMG trace.

4.2 Per-class, Per-muscle and Per-subject Eval- uation

We now evaluate the performance of our methodology on the individual classes, muscles and subjects we consider in this work. For this evaluation we fix the sample duration ns to 0.5, as this duration consists of a reasonable number of samples (1024) to evaluate, at fairly high dimensionality (16000 dimensions/sample) and yields very good classification accuracies (90.4%

average).

With respect to the Normal vs. Myo* classes, we observe considerably higher classification accuracy on the Myo* class (mean=0.959, stddev=0.023) than on the normal class (mean=0.822, stddev=0.070). This is due to the fact that our data includes significantly fewer subjects with normal conditions. When we con- Table 1: Summary of EMG data for each muscle with sample duration ns = 0.5. The total number of seconds of data for each class is provided.

Values in parenthesis give the number of unique subjects for each muscle with respect to each class.

ns # samp m/2 Accuracy (std.dev.) 0.0

5

10528 1600 0.760 (0.058) 0.1 5256 3200 0.815 (0.059) 0.2 2616 8000 0.878 (0.042) 0.5 1048 16000 0.904 (0.033)

1 512 32000 0.966 (0.028)

2 256 64000 0.971 (0.041)

Table 2: Number of balanced samples and sample dimensionality (N/2) with respect to sample duration ns, and corresponding mean

classification accuracy and standard deviation accuracies.

(13)

sider individual muscles (Table 3), we observe that the samples from the biceps and deltoid muscles tend to be misclassified more often than the triceps and VL muscles. A possible reason for this is that the biceps and deltoid muscles appear similar to one another in terms of EMG signals, but appear different from the triceps and VL muscles. This is also suggested by the results in Bischoff et al. [1], but further investigation on additional data is necessary to confirm this hypothesis in our case.

Table 4 gives the classification accuracies for the individual subjects and their respective traces. Most notable are the results for subject S10, whose biceps and deltoid traces are classified with 28.5 and 16.1%

less than their respective mean muscle accuracies (as shown in Table 3). Subject S10 represents a case where some muscles exhibit no observable pathology, while other muscles show signs of myopathy. While it is difficult to state conclusively without data from additional patients with similarly mixed pathologies, according to the physician, this case may be a result of a borderline myopathy, and the training labels may need

revision once sufficient evidence is available.

5 Discussion and Future Work

In this work, we evaluated a novel methodology for classifying EMG signals in the frequency domain.

By considering, as training samples, Fourier trans- forms of normalized, fixed-length segments of diagnostic regions of the full signals (as opposed to extracted MUAPs) measured at full contraction, we demonstrated high average generalization performance by a linear SVM classifier across individual subjects and different muscles. The average classification accuracy on test data increases from 80% to 97% with the duration of the samples (0.1 to 2 sec, respectively) while the reliability, determined from ten cross- validation folds, simultaneously increases (standard deviation decreases). Our analysis also suggests that detecting the presence of myopathy can be accom-

plished with very short duration samples of a full EMG trace.

The long-term, primary goal of our work is to de- velop a system that captures the physician’s capability to diagnose a variety of neuromuscular disorders from EMG data, as well as to distinguish among the severity degrees of diseases such as the classes of myopathies listed in Section 2. While our classification accuracies are fairly high, this is of course a two-class case. Clas- sifying the samples according to their severities is a more challenging task, and will require more elaborate and sophisticated experiments.

We also aim to classify EMG signals of patients with neurogenic disorders using our methodology.

Because our methodology yields comparable results to previous analyses considering EMG data from myopathic and neurogenic diseases (e.g., 6), and based upon our preliminary experiments with 5 classes, we anticipate our method will generalize well to such sce- narios.

While the results presented here are encouraging, much additional analysis and development is needed in order to achieve the above goals and to make our system useful for clinicians. This includes systematically designed experiments with increasing amounts and complexity of data (increased variety of subjects, muscles, diseases), testing increasingly sophisticated classification techniques to better align with real-life cir- cumstances such as highly imbalanced sample sets, and intelligent identification of feature subsets necessary for producing high-quality (high-accuracy and high-fidelity) classifications. For fully automated processing, developing techniques to segment an EMG signal into diagnostic and non-diagnostic regions, or to incorporate learning constraints to identify various non-disease-related conditions are also necessary.

Acknowledgements The authors thank Penny Gregg at the EMG Laboratory of the Department of Neurol- ogy, Baylor College of Medicine, for her assistance with data collection, and Rice University graduate stu- dents Kai Du and Du Nguyen for data preprocessing and software modification efforts in the early stages of this work.

References

[1] C. Bischoff, E. Stälberg, B. Falck, and K.E. Eeg- Olofsson. Reference values of motor unit action potentials obtained with multi-MUAP analysis.

Muscle & Nerve, vol. 17, no. 8, pp. 842–851, Aug. 1994.

[2] A. S. Blum and S. B. Rutkove. The clinical neuro- physiology primer, vol. 388. Humana Press, 2007.

[3] F. Buchthal, An introduction to electromyography.

Copenhagen: Gyldendal, 1957.

Bicep Deltoid Tricep VL

0.907 (0.087) 0.852 (0.072) 1.000 (0.000) 1.000 (0.000) Table 3: Per-muscle accuracies from all subjects for ns=0.5

Subject Average Trace Class Trace Accuracy S02 0.936 (0.050) Biceps Myo* 0.936 (0.050) S03 0.958 (0.037) Deltoid Myo* 0.937 (0.055) Triceps Myo* 1.000 (0.000) S04 1.000 (0.000) VL Myo* 1.000 (0.000) S07 0.986 (0.022) Biceps Myo* 0.972 (0.043) Deltoid Myo* 0.984 (0.025) VL Myo* 1.000 (0.000) S08 0.888 (0.007) Deltoid Nor 0.888 (0.007) S09 0.975 (0.035) Biceps Myo* 0.951 (0.068) Deltoid Myo* 1.000 (0.000) Triceps Myo* 1.000 (0.000) S10 0.789 (0.128) Biceps Myo* 0.622 (0.171) Deltoid Nor 0.691 (0.056) VL Myo* 1.000 (0.000) S15 0.852 (0.028) Deltoid Nor 0.852 (0.028) Table 4: Per-subject/trace classification accuracies for ns=0.5.

(14)

[4] C. I. Christodoulou and C. S. Pattichis. Unsuper- vised pattern recognition for the classification of EMG signals. IEEE Transactions on Biomedical Engineering, vol. 46, no. 2, pp. 169–178, Feb.

1999.

[5] N. F. Güler and S. Koçer, Classification of EMG Signals Using PCA and FFT, J Med Syst, vol. 29, no. 3, pp. 241–250, Jun. 2005.

[6] N. F. Güler and S. Koçer, Use of Support Vector Machines and Neural Network in Diagnosis of Neuromuscular Disorders, J. Med Syst, vol. 29, no. 3, pp. 271–284, Jun. 2005.

[7] R. Merletti and D. Farina. Analysis of intramuscular electromyogram signals. Philosophical Trans- actions of the Royal Society A: Mathematical,

Physical and Engineering Sciences, vol. 367 no.

1887, pp. 357–368, Jan. 2009.

[8] C. S. Pattichis, C. N. Schizas, and L. T. Middle- ton, Neural network models in EMG diagnosis.

Biomedical Engineering, IEEE Transactions on, vol. 42, no. 5. May 1995.

[9] C. S. Pattichis and C. N. Schizas, “Genetics-based machine learning for the assessment of certain neuromuscular disorders.,” Neural Networks, IEEE Transactions on, vol. 7, no. 2, pp. 427–439, Jan. 1996.

[10] M. B. I. Reaz, M. S. Hussain, and F. Mohd-Yasin.

Techniques of EMG signal analysis: detection, processing, classification and applications. Bio- logical Procedures Online, vol. 8, no. 1, pp. 11–

35, Dec. 2006.