• Nenhum resultado encontrado

Application of time-series analysis for prediction of molding sand properties in production cycle

N/A
N/A
Protected

Academic year: 2017

Share "Application of time-series analysis for prediction of molding sand properties in production cycle"

Copied!
5
0
0

Texto

(1)

ARCHIVES

Of

FOUNDRY ENGINEERING

Published quarterly as the organ of the Foundry Commission of the Polish Academy of Sciences

ISSN (1897-3310)

Volume 11

Issue 2/2011

95–100

19/2

Application of time-series analysis

for prediction of molding sand properties

in production cycle

M. Perzyk

P a,

P

*, S. Maciejak

P a

P

, J. Koz

ł

owski

P a

P

P

a

P

Institute of Manufacturing Technologies, Warsaw University of Technology, Narbutta 85, 02-524 Warszawa, Polska

* Corresponding author. E-mail address: M.Perzyk@aster.pl

Received 11.04.2011; accepted in revised form 26.04.2011

Abstract

Time-series analysis is characterized, as a data mining tool which facilitates understanding nature of manufacturing processes and permits prediction of future values of the process parameters or production results on the basis of the past data, recorded in regular intervals. The main methods and problems of the time-series analysis are presented. The authors’ research results, based on green molding sand proper-ties data collected in a foundry with Disamatic molding line, are presented. The work was aimed at finding optimal settings and models of the time-series analysis for that data as well as detection of possible periodicities appearing in the sand properties. It is concluded that although the time-series analysis requires individual approach to each particular problem, some general recommendations can be also formulated. It can be a useful tool for analysis and predictions of outcomes of foundry processes.

Keywords:Tdata mining,time-series analysis, foundry technology, molding sand propertiesT.

1. Introduction

Time-series analysis deals with series of data recorded in a chronological order, usually in regular time intervals or in another sequences. The main goal of the analysis, being one of the data mining techniques, is to facilitate finding some important charac-teristics of processes as well as prediction of the future results. The time-series prediction can be considered as a particular case of a regression task, where the input and output variables are the same quantity but measured at different time moments.

It has been recognized that an application of these methods in manufacturing industry can bring various benefits (see e.g. [1-4]). Detection of periodicity in products’ or materials’ properties in a foundry can facilitate identification of the irregularities in the manufacturing processes. Prediction of the process parameters or

the product properties in a near future can help to prevent un-wanted tendencies or suggest required changes in the process control.

(2)

by taking the above mentioned input – output sets, shifted by one point in the series. The composition of the records for regression model should be preceded by subtraction of the general trend (usually linear), the periodical component and, possibly, the variability amplitude trend. The idea of this methodology is to use a regression model for modeling finer changes than those which can be easily described by simple trends and periodicity.

For the calculation of the trend of variability amplitude, the authors own methodology was applied, presented in [6].

Calculation of the periodical component should be preceded by the significance analysis of the periodicity appearing in the time-series. The appropriate analysis is based on the autocorrela-tion coefficient and is described in [6]. After determinaautocorrela-tion of the periodicity, a centered periodical component is calculated and subtracted from the time-series (without trends); details can be found in [5].

Thus obtained residual data can be used for regression model-ing. However, a question arises if these data contain important information or they are simply a random noise. In the second case the modeling would be certainly unsuccessful. To answer this question some statistical tests can be applied. One of them is the non-parametric runs-test that verifies a randomness hypothesis for a binary-valued data sequence; in case of time-series, the signs of the residual values of the time-series. Another test as-sumes, that if the residual component is of random nature then it is characterized by a low autocorrelation coefficient. In particular, the autocorrelation of the first order is tested (i.e. for time lag equal to 1), using the Durbin-Watson test. The description of the both tests can be found in [7]).

The purpose of the present research was to find optimal set-tings and models of the time-series analysis for the real foundry data including recorded properties of green molding send as well as detection of possible periodicities appearing in the sand prop-erties.

2. Research plan and methodology

2.1. Data sets

All data used in the present work have been obtained from a Polish grey cast iron foundry, equipped with two Disamatic molding lines (2110 Mk3 and 2110 Mk5 with a sand cooling system. The original source data included 624 records collected during three summer months, in 39 working days (16 records per day in 1 hour intervals). The following green sand properties were measured and recorded: moisture content of used and fresh sands, compression strength, permeability, compactibility, and temperatures of used and fresh sands.

Two types of series data were used. The first one, further re-ferred to as ‘hourly based’, assumes 1 hour time intervals and utilizes all the source data, placed consecutively in order of their recording.

The second type of series data, further referred to as ‘daily based’, was aimed at analysis of the sand properties variability on a daily basis. For each sand property the following three quanti-ties were calculated from the values recorded each day: the first measurement of a given day, the average of the 3 first measure-ments of the day and the average value of the whole day. Because some periodicity can be expected, related to week periods, it was

essential that only repeatable sequences of week days were in-cluded in the sets used for the analysis. However, on some work-ing days the production was not run and it was necessary to re-duce the data, so that only 6 week periods, each containing 4 first days (from Monday to Thursday), could be selected.

Summarizing, the total number of serial data sets prepared for the time-series analysis for the hourly based type data was 7 (each containing 624 elements) and total number of serial data sets for the daily based type data was 21 (each containing 24 elements).

2.2. Details of time-series analysis

There are several decisions which should be made before time-series analysis can be performed. First, the number of points used as inputs and the distance of prediction (from the last input) should be determined. In the present work two numbers of inputs were applied: 5 and 9, and 3 distances of prediction equal 1, 2 and 3 were assumed.

The periodicity appearing in the data can be statistically sig-nificant or non sigsig-nificant. In the latter case a question arises if the periodical component should be subtracted or not. In the present work the accuracy of prediction was tested for such cases and the appropriate conclusion was drawn.

The information content included in the residual data, i.e. af-ter subtraction of trends and periodical component, can be evalu-ated on the basis of the two statistical tests discussed earlier. However, they can give divergent results and also each of them can provide uncertain information. For the purpose of the present work the variable called ‘information content’, valued between 0 and 1, was proposed. It is calculated as a sum of the values ob-tained from the two tests, given in Table 1.

Table 1. Residual data information contents dependent on the statistical tests results

Runs-test Noise Rather noise

Information rather

signifi-cant

Information significant

Information

content 0 1/6 1/3 1/2

Durbin

-Watson test Noise Ambiguous result

Information

significant

Information

content 0 1/4 1/2

The basic model used for residual data was simple linear mul-tivariate regression. However, for the 10 cases with largest pre-diction errors, also a regression tree model, based on the C&RT algorithm [8], was utilized. The regression trees are non-parametric, non-linear models which are particularly suitable for small data sets with unknown characteristics. Because of the small number of records for the daily based data, the regression tree algorithm was set to keep only 2 records in a leave. The prediction results were compared with those obtained from the linear model.

(3)

3. Results

In Fig. 1 exemplary results of initial steps in time-series analysis are shown, illustrating the magnitude of particular com-ponents described in section 1.

For majority of the hourly based type data sets the periodicity appeared to be statistically significant and equal to 2, except for compression strength, where it was insignificant. The probable interpretation is that 2 hours period is strongly related to the molding sand processing cycle time. By contrast, for the majority of daily based type data sets, the periodicity was insignificant, except for the used sand temperature, where it was significant and equal to 4. This result may indicate that the molding sand proc-essing was stable and resistant to the temperature perturbations,

possibly related to the position of a day in a week (e.g. after weekends).

For all hourly based data sets the information content in re-sidual data was equal 1. It means that both statistical tests explic-itly indicated that the residual data may include important infor-mation and therefore it is advisable to apply a regression model for finding desired predictions. By contrast, for most of the daily based data sets the information content in residual data was below 0,5, (in almost half of the cases it was equal to zero). Taking into account that the residual data are important fraction of the whole variability in those sets (an example is shown in Fig. 1), these results indicate that the time-series analysis can give prediction results with relatively large errors.

-1 -0,5 0 0,5 1 1,5 2

0 5 10 15 20 25

Consecutive measumerements U s e d s a nd m o is ture c o nte n ts , % Original values

General trend (linear)

General trend substracted

Trend of amplitude substracted

Periodicial component substracted - values for regression modeling

Fig. 1. Example of the consecutive steps of the time-series analysis: the daily-type data for used sand moisture content

An evaluation of the magnitude of relative error of prediction can be made by its comparison to relative absolute deviations appearing in the data. For all hourly based data the average rela-tive error of predictions was about 2,7% and the average relarela-tive absolute deviation for that data was about 12,1%. This result means that the time-series analysis can be a useful prediction tool for that type of data. By contrast, for the daily based data the average relative error of predictions was about 34% and the aver-age relative absolute deviation for that data was about 13%. This result confirms the low usefulness of the time-series analysis for prediction of future values when the noise component is relatively large. 0 1 2 3 4 5 6

5-1 5-2 5-3 9-1 9-2 9-3 Number of inputs - distance of predicted

output A v e rag e r at io of pr e di c tion er ror s f or no n-s ign ifi c an t per iodi c ity ignor ed /s ubtr a c ted

Fig. 2. Effect of subtraction of non significant periodicity on prediction errors for all data sets, for various numbers of inputs

and prediction distances

The subtraction of periodical component can be made in spite of its statistical significance, just to reduce the variability in the data used for the regression model. The predictions were made for all data sets with non-significant periodicities in two versions: with the periodicity ignored and subtracted.

In Fig. 2 the ratio of prediction errors for all data sets with non- significant periodicity for the above two versions are shown. It can be seen that the prediction errors are substantially reduced by subtraction of the insignificant periodicity before regression modeling. 0 1 2 3 4 5 6 7 8 9 10

1 point forward 2 points forward 3 points forward Distance of predicted output

R at io of pr edic tio n er ror s f or num ber s o f input s 9/ 5

Fig. 3. Effect of number of inputs on prediction errors for daily based data (open circles) and hourly based data (solid black

(4)

It was found that increasing the number of input points not only makes the regression model more complex, but can lead to remarkable larger prediction errors. This is shown in Fig. 3, where the ratio of prediction errors for numbers of inputs equal to 9 to that equal to 5 is plotted. for both types of data sets.

In Fig. 4 the relative prediction errors are plotted vs average relative errors for training data. It can be seen that increasing the number of inputs significantly decreases only the training error. It can be also seen that the cases which are more difficult to learn (train) are not necessarily more difficult to predict (the correlation between the two errors is very low).

R2

= 0,045 R2

= 0,001

0% 10% 20% 30% 40% 50% 60%

0% 5% 10% 15% 20%

Relative training error

R

e

la

tiv

e

pr

ed

ic

tion

er

ro

r

Number of inputs = 5

Number of inputs = 9

Fig. 4. Relative prediction errors vs average relative errors for training data, for all data sets of both types

For the data sets with the largest prediction errors (of daily type) a more advanced regression model was tried out. In Fig. 5 the regression modeling relative errors obtained from the linear and tree-type models are compared; note that these values are no prediction errors but are merely related to the residual data.

0% 100% 200%

1 2 3 4 5 6 7 8 9 10 Ordinal numbers of data sets with largest errors

Linear regression Regression tree

Fig. 5. Relative regression modeling errors for the 10 cases with the largest prediction errors

A radical rise of the modeling accuracy can be observed due to application of the tree type regression model instead of the simple linear one.

4. Summary and conclusions

The presented research has demonstrated some important fea-tures of time-series analysis from the standpoint of its application to real industrial manufacturing processes. It was shown that application of that kind of data mining tool can provide a signifi-cant insight in the process and predict its future results, i.e. the

product properties or process parameters. Although the research was based on data obtained in specific conditions, some observa-tions can be treated as applicable to a wider range of processes.

In particular, it can be recommended, that the periodical com-ponent of the series data should be always subtracted prior to final regression modeling, even when the periodicity is statistically non-significant. This can facilitate the regression modeling and can lead to lower prediction errors.

Another finding is that the simple linear regression model may be unsatisfactory for modeling more complex relationships included in the residual data. The regression trees turned out to be a promising type of model, especially when the number of train-ing records is small. However, for larger sets, also some other advanced models should be considered, e.g. artificial neural net-works [5].

In spite of above general findings, it should be emphasized, that the time-series analysis always requires individual approach to the problem. In particular it applies to such problems like set-ting number of input points, choosing type of the general trend type (linear or curved), use the final regression model when the information content in residual data is relatively low, etc.

Acknowledgment

This work was partly supported by Ministry of Science and Higher Education of Poland, project No. N R07 0015 04.

References

[1]S. Bisgaard, M. Kulahci, Quality quandaries: Using a time-series model for process adjustment and control, Quality En-gineering, vol. 20 (2008) 134–141.

[2]Z.C. Lin, D.Y. Chang, A band-type network model for the time-series problem used for IC leadframe dam-bar shearing process, International Journal of Advanced Manufacturing Technology, vol. 40, No. 11-12 (2009).

[3]S.G. Kapoor, R.W. Terry, Time-series analysis of tensile strength in a die casting process, Time-series Analysis: The-ory and Practice 4, Proc. 8th International Time-series Meet-ing and 3rd American Conference, 1983, 221-228.

[4]F.R. Camisani-Calzolari, I.K. Craig, P.C. Pistoriust, Quality prediction in continuous casting of stainless steel slabs, Jour-nal of The South African Institute of Mining and Metallurgy, vol. 103, No. 10 (2003), 651-665.

[5]T. Masters, Sieci neuronowe w praktyce. WNT Warszawa 1996.

[6]M. Perzyk, K. Krawiec, J. Kozłowski, Application of time-series analysis in foundry production, Archives of Foundry Engineering, vol. 9, No.3 (2009), 109-114.

[7]B. Borkowski, H. Dudek, W. Szczęsny. Ekonometria. Wybra-ne zagadnienia. PWN, 2004

(5)

Zastosowanie analizy szeregów czasowych do przewidywania w

ł

a

ś

ciwo

ś

ci mas formierskich

w cyklu produkcyjnym

Streszczenie

Imagem

Table 1. Residual data information contents dependent on the  statistical tests results
Fig. 1. Example of the consecutive steps of the time-series analysis: the daily-type data for used sand moisture content
Fig. 5. Relative regression modeling errors for the 10 cases with  the largest prediction errors

Referências

Documentos relacionados

O Tratado de Não Proliferação de Armas, acordado entre as nações do mundo para evitar a disseminação de armas nucleares e viabilizar o uso pacífico de tecnologia nuclear da

2 Overlap integral of the output 90 0 - bend PSW mode profile with a CWG mode profile with different waveguide widths and considering its core as being Si or being the

O termo praça é igualmente utilizado para designar as aglomerações. Ele pode significar: 1) um grande espaço público de uso múltiplo em uma localidade. Trata-se geralmente de

The probability of attending school four our group of interest in this region increased by 6.5 percentage points after the expansion of the Bolsa Família program in 2007 and

social assistance. The protection of jobs within some enterprises, cooperatives, forms of economical associations, constitute an efficient social policy, totally different from

The application of statistical methods for the analysis of frequencies in time series depends fundamentally on the statistical validity of the series, which is linked to the

Foram inventariadas cinco comunidades pratenses de Stipo giganteae-Agrostietea castellanae, quatro das quais inéditas: Arrhenathero erianthi-Celticetum giganteae, da aliança

Entre eles, destacam-se os Scores de Alerta Precoce, que são Sistemas “Track and Trigger” agregados e ponderados que consistem na atribuição ponderada de scores aos