LAMB MEAT TENDERNESS PREDICTION USING
NEURAL NETWORKS AND SENSITIVITY ANALYSIS
∗Paulo Cortez Manuel Portelinha, Sandra Rodrigues, Vasco Cadavez, Alfredo Teixeira Departamento de Sistemas de Informac¸˜ao Escola Superior Agr´aria
University of Minho Braganc¸a Polytechnic Institute
4800-058 Guimar˜aes, PORTUGAL 5301-855 Braganc¸a, PORTUGAL
Email: pcortez@dsi.uminho.pt Emails: {mportelinha, srodrigues, vcadavez, teixeira}@ipb.pt
KEYWORDS: Regression, Multilayer Perceptrons, Multiple Regression, Meat Quality, Ensembles, Data Mining.
ABSTRACT
The assessment of quality is a key factor for the meat indus-try, where the aim is to fulfill the consumer’s needs. In par-ticular, tenderness is considered the most important charac-teristic affecting consumer perception of taste. In this paper, a Neural Network Ensemble, with feature selection based on a Sensitivity Analysis procedure, is proposed to predict lamb meat tenderness. This difficult real-world problem is defined in terms of two regression tasks, by using instrumental mea-surements and a sensory panel. In both cases, the proposed solution outperformed other neural approaches and the
Mul-tiple Regression method.
1. Introduction
A top priority factor in the success of meat industry relies on the ability to deliver specialties that satisfy the consumer’s taste requirements. Although there are several factors that influence meat quality (e.g. juiciness or appearance),
ten-derness is considered the most important attribute (Huffman
et al., 1997). The ideal method for measuring tenderness should be accurate, fast, automated and noninvasive. In the past, two major approaches have been proposed (Arvanitoy-annis and Houwelingen-Koukaliaroglou, 2003):
instrumen-tal and sensory analysis. The former is based in an
objec-tive test, such as the Instron instrument, which measures the
Warner-Bratzler Shear (WBS) force and is the most
com-monly used device. On the other hand, sensory methods are based in subjective information, usually given by a hu-man taste panel. Both approaches are invasive, expensive and time demanding, since they require laboratory work. For instance, the WBS values can only be obtained 72 hours af-ter slaughaf-tering, while the preparation and execution of con-sumer taste panel may take several days.
An alternative is to use cheap and non invasive car-cass measurements that can be collected within the first 24 ∗In Proceedings of the European Simulation Multiconference -ESM’2005, Oporto, Portugal, October, 2005, SCS.
hours after slaughtering (e.g. pH and color). Under this scheme, the classic animal science approach is based on
Mul-tiple Regression models (Arvanitoyannis and
Houwelingen-Koukaliaroglou, 2003), using meat features as independent (or input) variables and the WBS or sensory measures as the depended (or output) ones. Yet, these linear models will fail when strong nonlinear relationships are present. In such cases, a better option is to use Neural Networks (NNs), due to their nonlinear mapping and noise tolerance capabilities (Haykin, 1999). Indeed, NNs are gaining an attention within the Data Mining field, due to their performance in terms of predictive knowledge. Another promising research area is based in the use of Ensembles, where several models are combined in some way in order to produce an answer (Di-etterich, 2001). This interest arose due to the discovery that ensembles are often more accurate than single models.
In Data Mining applications, besides obtaining a high predictive performance, it is often useful to provide explana-tory knowledge. In particular, the measure of input im-portance is relevant within this domain. Since carcass fea-tures are often highly correlated, Principal Component
Anal-ysis has been proposed to reduce the input dimensionality
(Arvanitoyannis and Houwelingen-Koukaliaroglou, 2003). However, the principal components are compressed variables and they do not represent a direct meaning for the meat user. A better approach is to use Sensitivity Analysis (Kewley et al., 2000), which has outperformed other input selection tech-niques (e.g. Forward Selection and Genetic algorithms).
In the past, several studies have used NNs to assess meat quality (e.g. beef, pork, poultry or sausages) (Arvanitoyannis and Houwelingen-Koukaliaroglou, 2003). However, regard-ing tenderness prediction, the literature seems scarce and it is primarily oriented towards beef (Hill et al., 2000). In this work, a Neural Network Ensemble, in conjunction with a fea-ture selection procedure based on a Sensitivity Analysis, is proposed to predict lamb meat tenderness. This real-world problem will be modeled in the R simulation environment (R Development Core Team, 2004) in terms of two regression tasks, using instrumental and sensory measurements. The proposed strategy will be tested on animal data, collected from the Tr´as-os-Montes region of Portugal, and compared with other NNs approaches and a Multiple Regression.
2. Materials and Methods
2.1 Lamb Meat Data
This study considered lamb animals with the Protected
Des-ignation of Origin (PDO) certificate, from the Tr´as-os-Montes northeast region of Portugal. The database was collected from November/2002 until November/2003, with each instance denoting the readings obtained from a slaugh-tered animal. Since each animal presents considerable costs (around 6 euros per carcass), the dataset is quite small, with a total of 81 examples. Table 1 presents the data attributes. The HCW is obtained one hour after slaughter, exfoliation and evisceration. The former two attributes (Breed and Sex) are also registered at slaughterhouse, while the others are measured in laboratory. Due to their visual nature, the color attributes (a*, b*, dE, dL and dB*) have a high impact in consumer’s perception. In most of the situations, these are the only attributes that the consumer can judge.
Table 1: The Dataset Main Attributes
Attribute Description Domain
Breed Breed type {1, 2}a
Sex Lamb sex {1, 2}b
HCW Hot carcass weight (kg) [4.1, 14.8] STF2 Sternal fat thickness [6.0, 27.8]
C Subcutaneous fat depth [0.3, 5.1]
pH1 pH 1 hour after slaughtering [5.5, 6.8] pH24 pH 24 hours after slaughtering [5.5, 5.9]
a* Color red index [11.5, 22.2]
b* Color yellow index [6.5, 12.5]
dE Total color difference [46.5, 60.9]
dL Luminosity differential [−56, −39]
dB* Yellow differential [15.3, 22.5]
WBS Warner-Bratzler Shear force [9.5, 57.0]
STP Sensory Taste Panel [0.7, 7.1]
a1 – Braganc¸ana, 2 – Mirandesa;b1 – Male, 2 – Female
The WBS force is the major index for measuring meat tenderness. It can only be obtained in laboratory, no sooner than 72 hours after slaughter, by using an invasive device called Instron. On the other hand, a more elaborated scheme was devised to obtain the sensory values (STP). A panel of 12 trained individuals, from the Braganc¸a Polytechnic
Insti-tute, was selected. Then, meat samples from the longissinus thoracis muscle were collected and defrost at 4◦C in a re-frigerator. Next, each sample was randomly encoded with a 3 digit number, wrapped in an aluminum sheet and heated at 100◦C. Then, each panel member was set in an individual compartment, performing a taste proof, under similar condi-tions, of random selected samples. Between different tastes, mouths were cleaned by using water and by eating small golden apple pieces. Each sample was ranked from 0 (the most tender) to 10 (the most tough). Finally, the STP
at-tribute was measured as the average of the grades from the panel. Since the original data contained missing values (2 for the WBS and 10 for the STP), two new datasets were created by discarding these entries. The first contains 79 rows (for the WBS task), while the second has 71 examples (STP).
2.2 Learning Models
A regression dataset D is made up of k ∈ {1, ..., N} examples, each mapping an input vector (xk1, . . . , xkI) to a given target yk. The error for a given k is: ek= yk−ybk, wherebyk
repsents the predicted value for k input pattern. The overall re-gression performance is computed by global metric, namely the Mean Absolute Error (MAE), Relative Mean Absolute
Er-ror (RMAE), Root Mean Squared (RMSE) and Relative Root Mean Squared (RRMSE), which can be computed as:
MAE = 1/N ×∑Ni=1|yi−byi|
RMAE = 1/N × MAE/∑Ni=1|yi− yi| × 100 (%)
RMSE = q ∑N i=1(yi−ybi) 2/N RRMSE = RMSE/ q ∑N i=1(yi− yi)2/N × 100 (%) (1)
In all these statistics, lower values result in better predictive models. The RMAE and RRMSE metrics are scale indepen-dent, where a 100% means that the regression method has similar performance as the constant average predictor.
A Multiple Regression (MR) model is defined by the equation (Hastie et al., 2001):
b y =β0+ I
∑
i=1 βixi (2)where {x1, . . . , xI} denotes the set of input variables and
{β0, . . . ,βI} the set of parameters to be adjusted, usually by applying a least squares algorithm. Due to is additive nature, this model is easy to interpret and has been widely used in regression applications.
Neural Networks (NNs) denote a set of connectionist
models inspired in the behavior of the central nervous sys-tem of living beings. In particular, the Multilayer
Percep-tron is the most popular neural architecture, where neurons
are grouped in layers and only forward connections exist (Haykin, 1999). The Multilayer Perceptrons used in this study make use of biases, one hidden layer with H hidden nodes and sigmoid activation functions (Fig. 1). When mod-eling regression tasks, the usual approach is to adopt one out-put node with a linear function, since outout-puts may lie out of the logistic output range ([0, 1]) (Hastie et al., 2001). Thus, each regression task (WBS and STP) will be modeled by a different NN and the overall model is given by the equation:
b y = wo,0+ o−1
∑
j=I+1 f ( I∑
i=1 xiwj,i+ wj,0)wo,i (3)where wi, jdenotes the weight of the connection from node j to i (if j = 0 then it is a bias connection), o denotes the output
node, f the logistic function (1+e1−x), and I the number of input nodes. i wi,0 wi,j j
Hidden Layer Output Layer Input Layer x x 1 2 +1 xI +1 +1 +1 ... ... ... ...
Fig. 1: The Multilayer Perceptron Architecture
Supervised learning is achieved by an iterative adjust-ment of the network connection weights (the training algo-rithm), in order to minimize an error function (typically the sum of squared errors), computed over the training examples (or cases). Before training, the data needs to be preprocessed. Hence, all attributes were standardized to a zero mean and one standard deviation domain (Hastie et al., 2001)
The performance will be sensitive to the NN topology choice: a small network will provide limited learning, while a large one will overfit the data. To solve this hurdle, one so-lution is to use a large number of hidden nodes (H) and train the NN with a regularization method (Hastie et al., 2001). In this work, regularization will be performed by a weight
de-cay procedure, where a weight penalty term (λ) shrinks the size of the neural weights. Under this scheme, the crucial parameter is the choice ofλ.
For a given network, the initial weights will be randomly set within the range [−0.7, +0.7]. Next, the training algo-rithm is applied and stopped when the error slope approaches zero or after a maximum of E epochs. After training, the
Sen-sitivity Analysis is performed. It is measured as the variance
(Va) produced in the output (by) when the input attribute (a) is
moved through its entire range (Kewley et al., 2000):
Va = ∑Li=1(ybi−y)/(L − 1)b Ra = Va/∑Ij=1Vj× 100 (%)
(4) where I denotes the number of input attributes and Rathe rel-ative importance of the a attribute. Thebyioutput is obtained by holding all input variables at their average values. The exception is xa, which varies through its entire range with L levels. In this work, L was set to 2 for the binary attributes and 7 for the continuous inputs.
Since the NN cost function is nonconvex (with multiple minima), the quality of the trained network depends on the choice of the starting weights. Thus, R runs will be applied to each neural configuration and the selected NN will be the one with the lowest penalized error. This setup will be called
Multiple Neural Network (MNN). Another option is to use a
Neural Network Ensemble (NNE), consisting of R networks
trained with random weights. The final prediction is given as the average of the individual predictions.
2.3 Simulation Environment
All experiments were conducted with an Intel Centrino 1.60
GHz processor, under the Linux operating system. The
sim-ulations were programmed in the R environment (R Devel-opment Core Team, 2004), an open source and high-level matrix programming language that provides a powerful suite of tools for statistical and graphical analysis.
The R functions that were used by the written code in-clude: lm, nnet and crossval. The former function is de-fined in the R base distribution (R Development Core Team, 2004) and fits a Multiple Regression. The second procedure is available in the nnet package (Venables and Ripley, 2002) and trains a multilayer network with the BFGS algorithm, from the family of quasi-Newton methods, allowing also the use of weight decay. Finally, the last function implements the K-fold estimation procedure and it can be found in the
bootstrap package (Efron and Tibshirani, 1993). For
demon-strative purposes, a small piece of the main R code is shown:
library(bootstrap) # load this package library(nnet)
source("code.R") # load the written R code # read the WBS dataset from a file
d<-read.table("wbs.csv",header=T,sep=’;’) # set the input and output variables Inputs<-d[,1:12] # matrix with the 12 inputs Output<-d[,13] # vector with the WBS values Runs<-5 # number of runs
for(i in 1:Runs) {
# display current run and time print(paste("Run:",i,date()))
# fit the MR model (uses lm and crossval) MR<-lm.ktest(Inputs,Output)
# get the MAE, RMAE, RMSE, RRMSE errors eMR<-errors(MR,Output)
# fit the MNN model (uses nnet and crossval) MNN<-mlp.ktest(Inputs,Output)
eMMN<-errors(MNN,Output)
# fit the NNE model (uses nnet and crossval) NNE<-mlpens.ktest(Inputs,Output)
eNNE<-errors(NNE,Output) }
3. Results
After preliminary experiments, the maximum number of training epochs was set to E = 10, the number of hidden nodes was set to H = 24 and the number of runs/ensemble networks was set to R = 5. The most important param-eter (λ) is tuned by applying a coarse grid-search. The first grid level searches all discrete values within the range
{0.00, 0.01, . . . , 0.20} and the configuration with the
low-est prediction error (λ1) is selected. Then, the second
level proceeds with a fine tune within the rangeλ2∈ {λ1−
0.005, . . . ,λ1− 0.001,λ1+ 0.001, . . . ,λ1+ 0.004} ∧λ2≥ 0.
Therefore, the number of searches is equal to 21 + 9 = 30 (or 21 + 5 = 26 ifλ1= 0).
To estimate the NN prediction accuracy for the grid-search, a 10-fold cross-validation (Efron and Tibshirani, 1993) will be adopted, where the training set is divided in 10 subsets of equal size. Sequentially, one different subset is tested (with 10% of the data) and the remaining data used for ajusting the NN weights. At the end of 10 trainings, the predictor has been tested on all training data and the final estimate is given by the RMSE (Equation 1) computed over the 10 test sets. As an example, Fig. 2 plots the error evolu-tion for a given execuevolu-tion of the two level grid-search (WBS task). The figure clearly illustrates that the error curve is nonconvex, thus justifying the use of the grid search. In this case, the highest predictive decay (RMSE = 6.75) was found forλ= 0.097. After obtaining the best decay, the final NNs
are retrained with the all the data from the training set.
0.00 0.05 0.10 0.15 0.20 7.0 7.2 7.4 7.6 7.8 0.096 0.098 0.100 0.102 0.104 6.8 7.0 7.2 7.4 7.6
Fig. 2: The Decay (x-axis) vs RMSE (y-axis) Values for the First Level (left) and Second Level (right) Grid Searches
At a higher level, and to compare the different models, 5 runs of a 10-fold cross-validation (computed over all avail-able data) were executed. This means that in each of these 50 experiments, 90% of the data is used for learning and 10% for testing. The results are shown in Table 2, in terms of the average of the test errors obtained over the 50 experiments.
Table 2: The Lamb Meat Tenderness Regression Results
Task Model MAE RMAE RMSE RRMSE
WBS MR 9.2 134.7% 11.6 130.4% MNN 6.2 90.1% 8.1 91.2% NNE 5.9 86.6% 7.8 87.3% NNESA 5.5 81.4% 7.5 83.7% STP MR 1.6 119.3% 2.1 131.7% MNN 1.4 99.9% 1.7 104.1% NNE 1.3 92.4% 1.6 96.7% NNESA 1.2 84.9% 1.5 89.5%
The Multiple Regression (MR) results are worst than the trivial average forecast. The differences between the MR and the NN methods suggest that both tasks present nonlinear-ity. The Multiple Neural Network (MNN) works better than the MR, although it is outperformed by the ensemble
ver-sion (NNE). Regarding the computational effort, the MR re-sults were obtained after 1 second, while the MNN and NNE configurations required 1 hour each. Since both neural ap-proaches demand a similar computation, the last setup will be favored due to its best performance.
Table 3 shows the average relative importance (Equation 4) of the most important input variables for the NNE method. For the feature analysis, it was decided to select the attributes with a relative importance ≥ 3%, which allows an input re-duction to around half the inputs. Despite the difference in the percentage values, the selected features are quite simi-lar for both problems. It is also interesting to notice that the
Sex attribute is the least relevant factor, with a relevance of
0.08% (WBS) and 1.48% (STP). Apparently, this contrasts with the known knowledge that gender affects tenderness. However, female meat often presents a higher weight and fat-ness, thus the sex information may be indirectly represented in the HCW and STF2 variables.
Since non relevant inputs may affect the performance, an-other setup, called Neural Network Ensemble based on
Sen-sitivity Analysis (NNESA), was devised by considering the
most important inputs of Table 3. Indeed, the NNESA method managed to obtain the best results (Table 2), outperforming the NNE approach, specially for the STP task. In Table 3, the sensitivity values were also presented for this last method. For the WBS task, the red color (a*) seems to be the most important attribute, followed by the weight (HCW) and to-tal color difference (dE). Regarding the STP problem, the most relevant features are the Breed, red color index (a*) and sternal fat thickness (STF2). The differences obtained between the two tasks may be explained by psychological factors. For instance, the Breed importance increased from 2.8% (WBS) to 36.8% (STP). As an example, Fig. 3 shows the scatter plots of the predicted values vs the observed ones for the WBS task, where the diagonal line denotes the per-fect forecast. The NNESA approach clearly presents a better performance, with more predictions along the line than the
MR method. 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60
Fig. 3: The Predicted (x-axis) vs Observed (y-axis) Values for the MR (left) and the NNESA (right)
Table 3: The Relative Importance of the Input Variables (in percentage)
Task Model Attribute
Breed HCW STF2 pH1 a* dE dL dB* WBS NNE 4.3 5.8 7.6 – 50.3 11.1 5.5 8.5 NNESA 2.8 21.4 7.7 – 41.7 11.7 6.2 8.5 STP NNE 41.0 – 5.1 6.6 22.6 7.5 – 3.8 NNESA 36.8 – 20.1 9.3 22.4 9.7 – 1.7 4. Conclusions
In this work, a Neural Network Ensemble based on
Sensitiv-ity Analysis (NNESA) algorithm is proposed, aiming at the
prediction of lamb meat tenderness. This real-world prob-lem was addressed by two distinct regression tasks by using instrumental and sensory measurements. In both cases, the
NNESA outperformed other Neural Network approaches, as
well as a Multiple Regression. Furthermore, the final neu-ral solution is much simpler, requiring only half the num-ber of inputs (7/6 instead of 12). In addiction, the proposed method is noninvasive, much cheaper than the WBS or STP procedures, and can be computed just 24 hours after slaugh-ters. This opens the room for the development of automatic tools for decision support (Turban et al., 2004). One draw-back may be the obtained accuracy, which is still high when compared with the simple constant average predictor. Never-theless, it should be stressed that the tested datasets are very small. Furthermore, as argued by D´ıez et al. (2004) , mod-eling sensory preferences is a very difficult regression task. To our knowledge, this is the first time lamb meat tender-ness is approached by neural regression models and further exploratory research needs to be performed.
Another relevant issue regards the high importance of the
Breed attribute in the STP task, which seems to contradict the
animal science theory. The obtained results were discussed with the experts, which discovered that the Mirandesa lambs were considered less stringy and more odor intense. This be-havior may be due to animal stress during slaughter, although further research needs to be addressed towards this issue. In future work, it is also intended to enrich the datasets by gath-ering more meat samples. Moreover, other nonlinear tech-niques (e.g. Support Vector Machines) will also be explored (Hastie et al., 2001).
References
Arvanitoyannis, I. and Houwelingen-Koukaliaroglou, M. 2003. Implementation of Chemometrics for Quality Control and Authentication of Meat and Meat Prod-ucts. Critical Reviews in Food Science and Nutrition, 43(2):173–218.
Dietterich, T. 2001. Ensemble methods in machine learning. In Kittler, J. and Roli, F., editors, Multiple Classifier
Systems, LNCS 1857, 1–15. Springer.
Di´ez, J., Bay´on, G., Quevedo, J., Coz, J., Luaces, O., Alonso, J., and Bahamonde, A. 2004. Discovering relevancies in very difficult regression problems: applications to sensory data analysis. In Proc. of ECAI, 993–994. Efron, B. and Tibshirani, R. 1993. An Introduction to the
Bootstrap. Chapman & Hall, USA.
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The
Ele-ments of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, NY, USA.
Haykin, S. 1999. Neural Networks - A Compreensive
Foun-dation. Prentice-Hall, New Jersey, 2nd edition.
Hill, B., Jones, S., Robertson, W., and Major, I. 2000. Neural Network Modeling of Carcass Measurements to Pre-dict Beef Tenderness. Canadian Journal of Animal
Science, (80):311–318.
Huffman, K., Miller, M., Hoover, L., Wu, C., Brittin, H., and Ramsey, C. 1997. Effect of beef tenderness on con-sumer satisfaction with steaks consumed in the home and restaurant. Journal of Animal Sc., 74(1):91–97. Kewley, R., Embrechts, M., and Breneman, C. 2000. Data
Strip Mining for the Virtual Design of Pharmaceuticals with Neural Networks. IEEE Transactions on Neural
Networks, 11(3):668–679.
R Development Core Team 2004. R: A language and
envi-ronment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria, http://www.R-project.org.
Turban, E., Aronson, J., and Liang, T. 2004. Decision
Sup-port Systems and Intelligent Systems. Prentice-Hall.
Venables, W. and Ripley, B. 2002. Modern Applied Statistics
with S. Springer, New York, USA, fourth edition.
Author Biography
PAULO CORTEZ was born in Braga, Portugal and went to the
University of Minho where he obtained an Engineering Degree (1995), a M.Sc. (1998) and a Ph.D (2002) in Computer Science. Currently, he is a Lecturer at the Department of Information Sys-tems. He is co-author of more than forty publications in int. confer-ences, journals and book chapters. Research interests: Data Mining and Machine Learning; Neural and Evolutionary Computation; and Forecasting. Web-page: http://www.dsi.uminho.pt/˜pcortez