CONSTRUCTIVE NEURAL NETWORKS: A REVIEW

(1)

CONSTRUCTIVE NEURAL

NETWORKS: A REVIEW

Sudhir Kumar Sharma

Ansal Institute of Technology, GGS Indraprastha University Gurgaon-1220033, Haryana, India

Pravin Chandra

Institute of Informatics & Communication, University of Delhi South Campus, New Delhi, India

Abstract :

In conventional neural networks, we have to define the architecture prior to training but in constructive neural networks the network architecture is constructed during the training process. In this paper, we review constructive neural network algorithms that constructing feedforward architecture for regression problems. Cascade-Correlation algorithm (CCA) is a well-known and widely used constructive algorithm. Cascade 2 algorithm is a variant of CCA that is found to be more suitable for regression problems and is reviewed in this paper. We review our recently proposed two constructive algorithms that emphasize on architectural adaptation and functional adaptation during training. To achieve functional adaptation, the slope of the sigmoidal function is adapted during learning. The algorithm determines not only the optimum number of hidden layer nodes, as also the optimum value of the slope parameter of sigmoidal function. The role of adaptive sigmoidal activation function has been verified in constructive neural networks for better generalization performance and lesser training time.

Keywords: Adaptive slope sigmoidal function; Constructive neural networks; Constructive algorithm; Cascade 2 algorithm.

1. Introduction

Many types of neural network models have been proposed for function approximation (pattern classification and regression problems). Among them the class of multilayer feedforward neural networks (FNNs) is the most popular due to the flexibility in structure, good representational capabilities (universal approximators), and large number of available training algorithms.

In general, the learning accuracy, the generalization ability and training time of supervised learning in FNNs depend on various factors such as chosen network architecture (number of hidden nodes and connection topology between nodes), the choice of activation function for each node, the choice of the optimization method and the other training parameters (like learning rate, initial weights etc.). The architecture of the network is either fixed empirically prior to training or is dynamically adjusted during training of the network for solving a specific problem.

If the chosen network architecture is not appropriate for the fixed size network, then under-fitting or over-fitting takes place. For better generalization performance and lesser training time, neither too small nor too large network architecture is desirable. We need sufficient number of trainable parameters (weights, biases and parameters associated with activation function) to capture the unknown mapping function from training data.

Single hidden layer FNNs (SLFNNs) with sufficient number of hidden nodes are universal approximators (UAPs) i.e. these models are capable of approximating any continuous function to any desired degree of accuracy [1], [2]. These results do not give any idea about the selection of optimum number of hidden nodes. There are, however, a number of situations where two hidden layers have been more effective in terms of generalization ability and training time. There are no known efficient methods for determining optimum network architecture for a problem at hand. The selection of the optimal network architecture remains an open problem.

The adaptive structure neural networks framework is a collection of a group of techniques in which network structure is adapted during the training according to a given problem. The network structure adaptation may be applicable to three levels namely, architecture adaptation, functional adaptation and training parameters adaptation. These approaches can be classified into two different groups: evolutionary and non-evolutionary.

(2)

evolutionary strategies [3], [4]. The global search methods like ant colony optimization and particle swarm optimization are widely used nowadays to determine optimum architecture during the learning [5], [6]. However, the evolutionary approach is quite demanding in both time and user–defined parameters [7].

2. Non-Evolutionary Adaptive Structure Neural Networks

Unlike conventional neural networks (NNs) algorithms that require the definition of the NN architecture before training starts, adaptive structure neural networks enable the network architecture to be constructed along with the training process.

Many methods have been many proposed to determine the optimal network architecture during training, such as various constructive, pruning, constructive-pruning, and regularization algorithms. A constructive algorithm adds hidden layers, nodes, and connections to a minimal NN architecture during training. A pruning algorithm does the opposite, i.e., it deletes redundant hidden layers, nodes, and connections starting from larger NN during training. A constructive-pruning algorithm is a hybrid approach in which the NN may be pruned after completion of the constructive process or be interleaved with the constructive process. A regularization method adds / subtracts a penalty term to the error function to be minimized /maximized so that the effect of unimportant network connection weights are decreased in the trained network. The modified error function looks like

E



 

W



E

 

W





R

 

W

, where,

E

 

W

is the training error function,

R

 

W

is the regularization term and



and is a regularization parameter that controls the influence of the regularization term. The difficulty of using such modified error function is in choosing a suitable regularization parameter, which often requires trial-and-error. The regularization framework can be used with constructive and pruning algorithms [7], [8].

Constructive algorithms have the following major advantages over the pruning algorithms:

(1) It is relatively easier to specify an initial network architecture in constructive algorithms, whereas in pruning algorithms one usually does not know a priori how large the initial network should be. Therefore, an initial network that is much larger than actually required by the underlying problem is usually chosen in pruning algorithms, leading to a computational expensive network training process.

(2) Constructive algorithms tend to build small networks due to their incremental learning nature. Networks are constructed that correspond to the complexity of the given problem, while overly large efforts may be spent in pruning the redundant weights and hidden nodes contained in the network in pruning algorithms. Thus, constructive algorithms are generally more economical (in terms of training time and network complexity/structure) than pruning algorithms.

(3) In constructive algorithms, a smaller number of parameters (weights) is to be updated in the initial stage of the training process thus requiring less training data for good generalization, while a sufficiently large training data is required in pruning algorithms.

(4) One common feature in constructive algorithms is to assume that the hidden nodes already installed in the network are useful in modeling part of the underlying function. In such case, the weights feeding into these installed nodes can be frozen to avoid moving target problem. The number of weights to be optimized at a time is reduced, so that time and memory requirements are decreased.

(5) In pruning algorithms and regularization methods, several problem dependant parameters need to be properly specified or selected in order to obtain an acceptable network yielding satisfactory performance. This requirement makes these algorithms more difficult to be used in real life applications.

3. Constructive Neural Networks

Constructive neural networks (CoNN) is a collection of a group of algorithms that alters the network structure as learning proceeds, producing automatically a network with an appropriate size. The learning algorithms used in CoNN are called constructive algorithms. Constructive algorithm starts with a minimal network architecture and adds layers, nodes and connections during the training, as required by the given problem. The architecture adaptation process is continued till the training algorithm finds a near optimal architecture that gives satisfactory solution of the problem.

Six motivations for using constructive algorithms are listed with explanations in [Parekh et al., 2000]. These are:

(3)

(2) Potential for matching the intrinsic complexity of the learning task (3) Estimation of expected case complexity of the learning task (4) Tradeoffs among performance measures

(5) Incorporation of prior knowledge (6) Lifelong learning.

Mostly function approximation algorithms can solve both kinds of classification and regression problems, however, efficacy may be dependent on the type of problem. Classification problems can be seen as a special case of regression problems, where the discrete outputs are only allowed. This is why all algorithms for regression problems can also be used for classification problems while the reverse is not always true.

Kwok and Yenug, 1997a survey the major constructive algorithms for regression problems. In their proposed taxonomy, based on the perspective of a state-space search, they group the algorithms into six different categories, each named after its most representative algorithm, as follows:

(1) Cascade-Correlation algorithm (CCA) that mostly groups variants of the Cascade architecture proposed by Fahlman & Lebiere, 1990 [10].

(2) Dynamic node creation (DNC) algorithm proposed by Ash, 1989 [11].

(3) Projection pursuit regression, based on the statistical technique proposed by Friedman & Stuetzle, 1981[12]. (4) Resource-allocating network proposed by Platt, 1991 [13].

(5) Group method of data handling, a class of algorithms inspired by the GMDH proposed by Ivakhnenko and described in Farlow, 1984 [14].

(6)Hybrid algorithm, that employs both, a constructive and pruning-strategy proposed by Nabhan & Zomaya,1994 [15].

Among these, the most popular for function approximation problems is the CCA and the next in popularity is DNC. The latter algorithm constructs a SLFNN, whereas the former constructs cascade architecture.

Many CoNN algorithms suitable only for classification problems have been proposed in neural network literature. The well known CoNN algorithms for two-class classification are the Tower and Pyramid [Gallant, 1986], the Tiling [Mezard and Nadal, 1989], the Upstart [Fren, 1990] and the Perceptron Cascade [Burgess, 1994]. The multi-class version of these algorithms is the MTower, MPyramid, Mtiling, MUpstart and MPerceptron Cascade. These algorithms can be seen in [Parakeh et al., 1997; Parakeh et al., 2000].

Nicoletti and Bertini, 2007 evaluated empirically several two-class and multi-class CoNN algorithms. Nicoletti, et. al., 2009 reviewed several well-known CoNN algorithms suitable for classification tasks that of constructing feedforward architectures as a result of adaptive structured learning. They classified the algorithms into two groups: some directed by the minimization of classification errors and some others based on a sequential learning model. The algorithms above are based on minimization of classification errors. The CoNN algorithms based on the sequential learning model add hidden node and train as partial classifier during training. The well-known CoNN algorithms for this group are Irregular Partitioning Algorithm [Marchand et al., 1990], the Carve algorithm [Young and Downs, 1998], the Target Switch algorithm [Campbell and Vicente, 1995], the Oil Spot algorithm [Mascioli and Martinelli, 1995], the Constraint based decomposition algorithm [Draghici, 2001], and the Decomposition Algorithm for Synthesis and generalization [Subirats et al., 2008]. Many recently proposed CoNN algorithms for classification tasks are found in [22, see for further references]

4. Constructive Algorithms for Feedforward Neural Networks

In general, the constructive algorithms comprise three common resulting architectures as SLFNN, and multilayer FNN and cascade architecture as a result of structured learning. For example, a constructive algorithm starts SLFNN with “one” or “zero” node in a hidden layer and then adds one or few hidden nodes iteratively in each step in the current network. The node(s) is/are added in the same hidden layer for designing SLFNNs or different hidden layers for designing multilayer FNNs. The cascade architecture is a special class of multilayer FNN, with one node in each hidden layer and each hidden node receiving inputs from network inputs and previously added hidden nodes.

DNC algorithm is probably the first-ever constructive algorithm for designing SLFNNs dynamically. In DNC algorithm hidden nodes with sigmoid activation function are added in SLFNN iteratively. The whole network or only the weights associated with newly added node is trained after each addition step. A large number of constructive algorithms following the DNC have been developed, e.g., [29], [30].

(4)

The addition of nodes in different hidden layers, however, is not straightforward. It is because one has to decide whether a node will be added: in an existing hidden layer or in a new hidden layer. To tackle this issue, most existing algorithms add a predefined and fixed number of nodes in the first hidden layer, then add the same number of nodes in the second hidden layer, and so on [Ma and Khorasani, 2003; Monirul Islam et. al., 2009]. As aforementioned, this number is very crucial for the performance of NNs, and restricting it to a small value limits the ability of a hidden layer to form complicated feature detection.

Any standard training algorithm based on local optimization methods for fixed size network architecture may be used in conjunction with constructive approach for determining the optimum set of weights of the network. The usual choice is local optimization method based on first-order gradient-descent method, like the standard backpropagation algorithm [37] or its variants like the QuickProp algorithm [38], the RPROP algorithm [39] etc., or the second-order (using the information of the Hessian matrix in some form or the other) algorithms, like the quasi-Newton method [40] or the Levenberg–Marquardt algorithm [41].

There are a variety of ways of training the resulting network after each hidden node addition in constructive algorithms. These can be classified into two general methods. The first consists of training the whole network after the addition of a new hidden node. The second consists of only training a newly added node, with the remaining weights frozen. The method for adding a new hidden node is standard across many constructive algorithms and in general consists of either adding a new hidden node when the error fails to meet a set amount over a given period or testing for some criterion such as a local minimum. Halting network construction is equivalent to finding the best model for a given problem, and hence techniques such as early stopping are employed.

5. Cascade-Correlation algorithm and its variants

The cascade-correlation algorithm (CCA) is designed to overcome the local minima problem, the step-size problem, the moving target problem in [Fahlman and Lebiere, 1990] and to avoid having to define the number of hidden nodes in advance. CCA is widely used for classification and function approximation tasks.

CCA adds one hidden node in the cascade architecture at a time and hidden node is connected to all inputs as well as previously trained hidden nodes. After the training of input weights of current hidden node gets completed, it is connected to output nodes with input weights frozen and all inputs of output nodes are trained again. In the following section many CCA variants and similar type constructive algorithms have been presented [10].

5.1.Cascade 2 algorithm

Cascade 2 algorithm was also first proposed by Fahlman, who has implemented the CCA. Cascade 2 algorithm differs from CCA by training a new hidden node to directly minimize the residual error rather than to maximize the covariance between hidden node output and residual error at output nodes. Besides, hidden node has adjustable output connections to all of the output nodes and all other things are common in both algorithms. Several authors have demonstrated that CCA is effective for classification but not very successful on regression tasks. This is due to the correlation term tending to drive the hidden node activations to their extreme values, thereby, making it hard for the network to produce a smoothly varying output [31], [42], [43].

For the sake of clarity, we use flow-chart to describe the Cascade 2 algorithm as Fig. 1 with concrete contents for each step are listed below:

(1) Step A: Initializing NN

(i) Create the training, testing set and fix the training parameters for a given problem.

(ii) Determining the number of nodes in the input and output layers according to the characteristic of a given problem.

(iii) Input nodes are fully connected to output nodes. (iv) Initializing all connection weight values. (2) Step B1: Calculating the error over an epoch by





2

1 1

2

1 

 





P

p N

k

kp kp

o

f

d

E

(1)

where

d

_k_pis the desired output and

f

_k_pis the actual output at k-th output node for p-th pattern and

P

is the total number of exemplars.

N

_ois the number of output nodes.

(5)

specified error or not.

(4) Step B3: Judging whether the overall (output nodes) stopping criterion, which depends on overall patience (the percentage change in the network error required to continue training and the length of “patient time”) is satisfied or not.

(5) Step B4: Updating all weights connected to the output nodes by gradient descent method, to minimize the objective function described by (1).

(6) Step C1: Initializing candidate

(i) Connect all input nodes and previously installed hidden nodes to the candidate and also connect candidate to output nodes.

(ii) Initializing the new weights of the candidate.

(7) Step C2: Calculating the difference between the error of the output nodes and the input from the candidate to these nodes defined as:





2

1 1

2

1 

 





P

p N

k

n kn kp o

O

ow

e

S

(2)

where

e

_k_p is the residual error at k-th output node for the pattern p and

ow

_kn

O

_n is the input from the n-th candidate node to the k-th output node. where

O

_nis the output of the n-th candidate node and

ow

_knis the connection weight from the n-th candidate node to k-th output node.

(8) Step C3: Judging whether the local (adding hidden node) stopping criterion, which depends on local patience (the percentage change in the network error required to continue training and the length of “patient time”) is satisfied or not.

(9) Step C4: Updating all input and output weights of adding hidden node by gradient descent method, to maximize the objective function described by (2), while the main NN is frozen.

(10)Step BC: Installing the trained candidate into the NN and this is the interface between B series and C series.

(i) All input connections to the candidate are frozen.

(ii) The output weights of the candidate are inserted with inverted sign. 5.2.Modified CCA using different objective functions

There are many ways to modify the CCA algorithm. One of them is to change the objective function for training hidden node. Kwok and Yenug, 1997b conducted a very careful investigation on the objective functions for training hidden nodes in constructive algorithms for regression tasks, aiming at deriving a class of objective functions whose value and corresponding weight update could be in O(P) time, for a training set with P patterns. Any objective function may be used in place of covariance objective function in CCA[30].

5.3. Fixed cascade error

CCA has also inspired the proposal of the Fixed Cascade Error, described in [Lahnajarvi et al., 1999 c; Lahnajarvi et al., 2002], while the general structure of both algorithms is the same, they differ in the way the hidden nodes are created. The candidate-hidden nodes are trained to maximize the following objective function for single output node.











p

e

y

e

S

(3)

where

p

ranges over the training patterns,

y

_pis the activation of the candidate hidden node for the pattern

p

(6)

Fig. 1. Cascade 2 algorithm

5.4.Modeling with constructive backpropagation

Lehtokangas, 1999 proposed a constructive backpropagation (CBP) that is a similar approach to CCA. CBP is computationally just as efficient as the CCA even though there is a need to back-propagate the error through no more than one hidden layer. Further, CBP has the same constructive benefits as CCA, but in addition CBP benefits from simpler implementation and the ability to utilize stochastic optimization routines.

Moreover, it is shown as to how CBP could be extended to allow addition of multiple new nodes simultaneously and how it can be used to perform continuous automatic structure adaptation. This includes both addition and deletion of nodes in batches. The performance of CBP learning is studied with time series modeling experiments, which demonstrate that CBP can provide significantly better modeling capabilities compared to CCA learning [33].

5.5. A cascade network employing progressive RPROP

Treadgold and Gedeon, 1997 proposed a new cascade network algorithm employing Progressive RPROP (CasPer). CasPer is a constructive learning algorithm, which builds cascade network that adds new hidden nodes to the NN, and then trains the whole network with different step-sizes for different parts of the network, instead of using input weight freezing of current hidden node. CasPer uses a variation of RPROP to train the whole network. CasPer is known to produce more compact networks with very promising results.

(7)

the node addition. Lastly, they can be classified on the basis of as to how the connection weights are frozen and again trained [34].

Prechelt, 1997, investigated problems and improvements in CCA. He developed six variants of CCA, one of then was Cascade 2 algorithm. These variants were empirically compared using 42 different datasets from the benchmark PROBEN1[31].

Thivierge et al, 2003 implement an algorithm that is simultaneously growing and pruning in cascade networks. The pruning is done by removing irrelevant connections using the Optimal Brain Damage procedure [45].

Islam and Murose, 2000 propose the cascade neural network design algorithm for two-hidden-layers FNNs. The method automatically determines the number of nodes in each hidden layer and can also be reduced a two-hidden-layer network to a single-two-hidden-layer network. It is based on the use of a temporary weight freezing technique [46].

The fast constructive-covering algorithm for neural network construction proposed in [Wang, 2008] is based on geometrical expansion. It has the advantage of each training example having to be learnt only once, which allows the algorithm to work faster than traditional training algorithms [47].

6. Adaptive slope sigmoidal function constructive algorithms

There are five major issues involved in constructive algorithms for regression tasks. These issues are as follows: (1) The choice of minimal architecture and network growing strategy: How to connect a new hidden node in

the existing network?

(2) The choice of activation function: Which activation function to use at the hidden and output nodes? (3) The choice of weights freezing: Training the entire network or only newly added hidden node.

(4) The choice of optimization technique: Which optimization method to use to determine the optimum weights during training?

(5) The choice of training stoppage criteria: When to stop the addition of new hidden nodes, or, in other words, what is the optimal number of hidden nodes to be installed in the network?

The generalization ability and training time of constructive algorithms for regression tasks depend on each choice discussed above.

In this section, we review our recently developed adaptive slope sigmoidal function constructive algorithms [Sharma and Chandra, 2010a, Sharma and Chandra 2010b].

The number of nodes in the input and output layers is defined according to the characteristics of a given problem. We start from a minimal SLFNN with one hidden node, where input and output nodes are not directly connected. The algorithm starts from minimal architecture and during training one hidden node is added in the current network at a time. The hidden node is added in the separate hidden layer that is connected to inputs, output node as well as all previously added hidden nodes and thus constructs cascading architecture [37]. The hidden node is added in the single hidden layer and thus constructs SLFNN dynamically [38].

In both algorithms, we used adaptive slope sigmoidal function at non-linear hidden nodes, defined as:

 

;

1

1 ,

_b_x

e

b

x

g

_





(4) where b is slope parameter and adapted in the same way that other weights are adapted during the training. The activation function for very small slope values effectively behaves as a constant output function, thereby, reducing the effect of the output to that of a threshold node (similar to the zero-th node of a layer). And, also for large values of the slope, the functional map of the output effectively becomes equivalent to the step function.

We start slope parameter with a value of unity and update it so that it reaches its optimal value. To avoid the saturation problem of log-sigmoid function and for best use of non-linearity, we restrict the slope parameter to lie in the interval [0.1, 10].

In both the algorithms, the n-th hidden node is added in the current network in n-th iteration. The input and output connection weights of the newly added node are only trained to further reduce the residual error. The weight freezing is used here to make computation faster and to circumvent moving target problem.

In both the algorithms, we choose to update the input and output connection weights of newly added hidden node, slope parameter of sigmoidal function along with the bias of output node by using gradient-descent optimization method in sequential mode, minimizing the squared error objective function to further reduce the residual error.

Each individual hidden node was trained up to a fixed number of epochs. The optimal number of hidden nodes is selected on the basis of cross validation in the form of early stopping.

(8)

convergence property, smoother learning and generalization performance of adaptive slope sigmoidal variant are superior to the fixed-shape sigmoid variant [48], [49].

7. Conclusion

This paper presents an overview of non-evolutionary constructive neural networks. Constructive neural networks are a collection of a group of methods, which enable the network architecture to be constructed along with the training process. Although focusing on constructive algorithms that construct feedforward architecture for regression problems. In general, a constructive algorithm has two integral components: pre-specified network growing strategy and local optimization technique for updating weights during learning. The role of adaptive sigmoidal activation function is justified in constructive neural networks for better generalization performance and lesser training time.

8. References

[1] Cybenko, G. (1989): Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signal and Systems 2:. 303-314.

[2] Hornik, M. S. K; White, H. (1989): Multilayer feedforward networks are universal approximators. Neural Networks, vol. 2(5), 359-366.

[3] Koza, J. R.; Rice, J. P. (1991): Genetic generation of both the weights and architecture for a neural network. in Proc. IEEE, IJCNN, Seattle, WA, 1991, vol. 2, 397-404.

[4] Yao, X.; Liu, Y. (1997): A new evolutionary system for evolving artificial neural networks. Transactions on Neural Networks, vol. 8, no. 3, 694-713.

[5] Wei, G. (2008): Evolutionary Neural Network Based on New Ant Colony Algorithm. International symposium on Computational Intelligence and Design IEEE, 318-321.

[6] Huang, R.; Tong, S. (2009): Evolving Product Unit Neural Networks with Particle Swarm Optimization. Fifth International Conference on Image and Graphics, IEEE Computer Society.

[7] Kwok, T. Y.; Yeung, D. Y. (1997a): Constructive Algorithms for Structure Learning in feedforward Neural Networks for Regression Problems. IEEE Transactions on Neural Networks, 8 (3), 630-645.

[8] Reed, R. (1993): Pruning algorithms-A Survey. IEEE Transactions on Neural Networks, vol. 4, 740-747.

[9] Parekh, R.; Yang, J.; Honavar, V. (2000): Constructive neural-network learning algorithms for pattern classification. IEEE Transaction on Neural Networks, vol. 11, no. 2, pp. 436-451.

[10] Fahlman, S. E.; Lebiere, C. (1990): The cascade correlation learning architecture. Advances in Neural Information Processing System 2, D. S. Touretzky, Ed. CA: Morgan Kaufmann, 524-277.

[11] Ash, T. (1989): Dynamic node creation in backpropagation networks. Connection Science, vol. 1, no. 4, 365-375.

[12]Friedman, J. H.; Stuetzle, W. (1981): Projection pursuit regression. J. Amer. Statist. Assoc., vol. 76, no. 376, pp. 817-823.

[13] Platt, J. (1991): A resource-allocating network for function interpolation. Neural Computation, vol. 3, pp. 213-225.

[14] Farlow, S. J. Eds. (1984): Self-Organizing Methods in Modeling: GMDH Type Algorithms, vol. 54 of Statistics: Textbooks and Monographs. New York: Marcel Dekker.

[15] Nabhan, T. M.; Zomaya A. Y. (1994): Toward generating neural network structures for function approximation. Neural Networks, vol. 7, no. 1, pp. 89-90.

[16] Gallant, S. I. (1986): Three constructive algorithms for network learning. in IEEE proc. 8th

conf. on Pattern Recognition, pages 849-852.

[17] Mezard, M.; Nadal, J. P. (1989): Learning in feedforward layered networks: The Tiling algorithm. Journal of Physics A: Math. Gen., vol. 22, no. 12, pp. 2191- 2203.

[18] Frean, M (1990): The Upstart algorithm: A method for constructing and training feed-forward neural networks. Neural Networks, vol. 2, pp. 198-209.

[19] Burgess, N. (1994): A constructive algorithm that converges for real-valued input patterns. Int. journal of Neural Systems 5(1), pp. 59-66.

[20] Parekh, R.; Yang, J.; Honavar, V. (1997): Constructive Neural network Learning Algorithms for multi-category pattern classification. in Artificial Intelligence Research Group, Department of Computer Science, 26 Atanasoff Hall, Iowa Sate University, Ames, Iowa , USA.

[21] Nicoletti, M. C.; Bertini, J. R (2007): An empirical evaluation of constructive neural network algorithms in classification tasks. in Int. J. Innovative Computing and Applications, vol. 1, no.1

[22] Nicoletti, M. C. et al. (2009): Constructive Neural Network Algorithm for feedforward Architectures Suitable for Classification Tasks. pp. 1-23, In Leonardo Franco etc. (Eds.): Constructive Neural Networks (Studies in Computational Intelligence vol. 258), Springer. [23] Marchand, M. et al., (1990): A convergence theorem for sequential learning in two layer perceptrons. Europhysics Letters 11(6), pp.

487-492.

(9)

[25] Campbell, C.; Vicente, C. P. (1995): Constructing feed-forward neural networks for binary classification tasks. Advanced Computing Research Centre, Bristol University, United Kingdom.

[26] Mascioli, F. M. F.; Martinelli, G. (1995): A Constructive algorithm for binary neural networks: the oil-spot algorithm. IEEE Transaction on Neural Networks, 6, pp. 794-797.

[27] Draghici, S. (2001): The Constraint based decomposition (CBD) training architecture. Neural Networks, 14, pp. 527-550.

[28] Subirats, J. L.; Jerez, J. M. ; Franco L. (2008): A new decomposition algorithm for threshold synthesis and generalization of Boolean functions. IEEE Transaction on Circuits and Systems I 55, pp. 3188-3196.

[29] Setiono, R.; Hui, L. C. K (1995): Use of a Quasi-Newton Method in a Feedforward Neural Network Construction Algorithm. IEEE Transactions on Neural Networks, vol. 6, no. 1.

[30] Kwok, T. Y.; Yenug D. Y. (1997b): Objective functions for training new hidden units in constructive neural networks. IEEE Transactions on Neural Networks, vol. 8, no. 5, 1131-1148

[31] Prechelt, P. (1997): Investigation of the cascor family of learning algorithms. Neural Networks 10 (5), pp 885-896.

[32] Lahnajarvi, J. J. T.; Lehtokangas, M. I.; Saarinen, J. P. P. (1999): Fixed Cascade Error-A Novel Constructive Neural network for Structure Learning. Proceedings of the Artificial Neural Networks in Engineering Conference, ANNIE’99, St. Louis, Missouri, USA. [33] Lehtokangas, M. (1999): Modeling with constructive backpropagation. Neural Networks, vol. 12, 707-716.

[34] Treadgold, N. K.; Gedeon, T. D. (1997): A Cascade Network employing Progressive RPROP. International conference on Artificial and Natural Neural Networks, pp. 733-742.

[35] Ma, L.; Khorasani, K. (2003): A new strategy for adaptively constructing multiplayer feedforward neural networks. Neurocomputing 51, pp 361-385.

[36] Islam, M. M. et al. (2009): A New Adaptive Merging and Growing Algorithm for Designing Artificial Neural Networks. IEEE Transactions on Systems, Man, and Cybernetics- Part B: Cybernetics, vol. 39, no. 3.

[37] Rumelhart, D. E.; Hinton, G. E.; Williams R. J. (1986): Learning internal representations by error propagation. Parallel Distributed Processing, vol. I, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA: MIT Press, 318-362.

[38] Fahlman, S. E. (1989): An empirical study of learning speed in backpropagation networks. Carnegie Mellon Univ., Pittsburg, PA, Tech. Rep. CMU-CS-88-162.

[39] Riedmiller, M.; Braun, H. (1993): A direct adaptive method for faster backpropagation learning: The RPROP Algorithm. Proc. of the IEEE Int. Conf. on Neural Networks, San Francisco, CA, 586-591

[40] Setiono, R.; Hui L. C. K. (1995): Use of a Quasi-Newton Method in a Feedforward Neural Network Construction Algorithm. IEEE Transactions on Neural Networks, vol. 6, no. 1

[41] Hagan, M. T.; Menhaj, M. B. (1994): Training Feedforward Networks with the Marquardt algorithm. IEEE Transactions on Neural Networks, vol. 5, no. 6, 989- 993

[42] Nechyba, M. C.; Xu, Y. (1994): Neural network approach to control system identification with variable activation functions. IEEE International Symposium on Intelligent Control, Columbus, Ohio, USA.

[43] Hwang, J. N.; Shien, S.; Lay, S. R. (1996): The Cascade – Correlation Learning: A Projection Pursuit Learning Perspective. IEEE Transactions on Neural Networks, vol. 7, no. 2.

[44] Lahnajarvi, J. J. T.; Lehtokangas, M. I.; Saarinen, J. P. P. (2002): Evaluation of constructive neural networks with cascaded architectures. Neurocomputing, vol. 48, pp. 573-607.

[45] Thivierge, J. -P.; Rivest F.; Shultz, T. R. (2003): A dual-phase technique for pruning constructive networks. In proceedings of the IEEE International Joint Conference on Neural Networks, vol. 1, pp. 559-564.

[46] Islam, M.; Murase, K (2000): A new algorithm to design compact two-hidden-layer artificial neural networks,”Neural Networks, vol. 14, no. 9, pp. 1265-1278.

[47] Wang, D. (2008): Fast constructive-covering algorithm for neural networks and its implementation in classification. In Applied SoftComputing 8, pp. 166-173.

[48] Sharma, S. K.; Chandra, P. (2010a): An adaptive slope sigmoidal function cascading neural networks algorithm. Proc. of the IEEE, Third International Conference on Emerging Trends in Engineering and Technology (ICETET 2010), India, pp. 139-144, doi: 10.1109/ICETET.2010.71.