APPLICATION TO SPATIAL LOAD FORECASTING

(1)

MAPPING NEURAL NETWORKS INTO RULE SETS AND MAKING THEIR HIDDEN KNOWLEDGE EXPLICIT

APPLICATION TO SPATIAL LOAD FORECASTING

Adriana Castro Vladimiro Miranda

INESC & UFPA^∗ INESC & FEUP

Porto, Portugal Porto, Portugal

[email protected] [email protected]

∗Federal University of Para - Brazil

Sponsored by CAPES – Brazilian Government

Abstract – This paper presents a mathematical transform that maps artificial neural networks into rule- based fuzzy inference systems. This allows one to make explicit the knowledge implicitly captured by a trained neural network. This result is exact and its application is illustrated with data from a spatial load forecasting problem.

Keywords: Artificial Neural Networks, Fuzzy Inference Systems, rule based systems, spatial load forecasting.

1. INTRODUCTION

Artificial Neural Networks (ANNs) are recognized by their powerful capacity of expressing relationship between variables of a problem. They constitute a powerful interpolation tool and their presently most used form is considered as a universal approximator, meaning that is always possible to design an ANN that approximates with some precision a given function.

Furthermore, ANNs are exceptionally apt for efficient computational implementations.

However, there is still much distrust in ANNs for a number of reasons, some better than other. One often heard argument is that ANNs do not have explaining capability – they deliver, but that don’t tell why. In a number of ways, this is certainly true.

In many cases, ANNs are sufficient and there is no real need to make knowledge explicit. But in some application areas this will be felt as a must. A good example would be the building of diagnosis for system or equipment failures. Human understanding would be greatly enhanced if the relation between variables or symptoms and equipment condition were explicit, and engineers or technicians would also gain more confidence in the diagnosis produced.

On the other hand, rule based systems (Fuzzy Inference Systems - FIS) have precisely the desired characteristics of an explicit form of knowledge.

However, their construction is not always straightforward.

One of the more important problems in the design of FIS from input-output data is a form of “curse of dimensionality”, meaning that the number of rules of

the system grows exponentially when the number of inputs increases and computational complexity in the implementation for practical problems increases accordingly. Besides, if the number of rules is excessive, the understanding will be more difficult for the human specialist.

To couple the advantages of neural networks and rule based systems, there has been a number of works trying to establish relations between them [1]-[4]. Some of these relationships have been built on the basis of progressive approximations.

In this paper, instead of approaching an ANN by some rule based system arbitrarily built, we will present a mathematical transform that maps certain type of ANN into Takagi-Sugeno (TS) fuzzy inference systems – not an approximation, but an equivalence process that allows one to replace an ANN by a TS fuzzy inference system and obtain exactly the same results.

The paper will present the derivation and definition of the transform and will present one example of practical case. The case will illustrate how the transform makes explicit and puts into light knowledge that was hidden in the ANN architecture.

The practical case used to illustrate the technique is related to Spatial Load Forecasting. However, this technique has the potential to contribute to the development of better diagnosis systems, like for machine failure (transformers, generators…) or relay failure or even for complex power system problems like voltage collapse – or any other problem where an ANN produces a suitable solution.

2. ARTIFICIAL NEURAL NETWORKS An ANN is characterized by having in its architecture many low-level processing units with a high degree of interconnectivity via weighted connections.

Motivated by different networks in biological system, some ANN architectures have been proposed in the literature. Among these, the most commonly used is the Multilayer Feedforward Neural Network (Figure 1).

In its most basic form this is a modelling consisting of a finite number of successive layers where each layer consists of a finite number of processor units called

(2)

neurons. Each neuron of each layer is connected to each neuron of subsequent layer through synaptic weights.

Figure 1: ANN Architecture

Considering the ANN of the Figure 1, every neuron in hidden layer calculates:

∑

= +

= ⁿ

i i ij j

j f xw

s

1

)

( θ (1)

where xi is the i-th input to the net, wij is the weight of the connection from input neuron i to hidden neuron j, θj is the bias of the j-th hidden neuron and f(.) is the activation function of the neuron.

For the output layer, each neuron calculates:

∑

=

= ^m

j j jk

k g s

y

1

)

( β (2)

where βjk is the weight of the connection from hidden neuron j to output neuron k , yk is the k-th output of the net and g(.) is the activation function of the neuron.

Among all properties of the ANNs, the most important is their function approximation capability. It has been extensively demonstrated that a Multilayer Feedforward Neural Network working with arbitrary squashing functions in hidden neurons can approximate virtually any function of interest to any desired degree of accuracy [5]-[6].

However, a barrier to a more widespread acceptance of ANNs is that they don’t have explaining capability. It is impossible for the specialist human to understand how the neural network arrives a particular decision.

The ANNs are considered black boxes and nothing can be revealed about the knowledge encoded within them.

3. FUZZY INFERENCE SYSTEMS

Fuzzy Inference Systems or Fuzzy Rule Based Systems (FRBS) have precisely the desired characteristics of an explicit form of knowledge. As ANNs, FIS are dynamic, parallel processing systems that estimate input-output functions.

In a FIS, the relationship between variables is represented by means of fuzzy IF-THEN rules in the

form:

IF (antecedent) THEN (consequent) The antecedent is a fuzzy proposition of the type “x is A” where x is a linguistic variable and A is a linguistic term defined by a fuzzy set.

Basically, FIS can be categorized into two families:

1) the family that includes linguistic models based on collections of IF-THEN rules, whose antecedents and consequents utilize fuzzy values, and

2) the family that uses a rule structure that has fuzzy antecedent and functional (crisp) consequent.

The second category, based on Takagi-Sugeno (TS) fuzzy inference systems, is built with rules in the following form:

Rule Rl: IF x^l1 is C^l1 and … and x^ln is C^ln THEN y^l =c₀+c₁x₁+...+c_nx_n (3) where C^li are fuzzy sets, xi is the input of the system and ci are constants.

The consequent of the rule is an affine linear function of the input variables and the output of the TS model is computed as the weighted average of y^l.

When y^lis a constant, the fuzzy inference system is called a Zero-order TS fuzzy model, which can be viewed as a special case of the Mandani fuzzy system in which each rule’s consequent is specified by a fuzzy singleton (or a pre-defuzzified consequent).

The Figure 2 illustrates the reasoning mechanism for Zero-Order TS, which is the model of the interest of this paper.

Figure 2: A two-input Zero-order Takagi-Sugeno Model

4. MAPPING NEURAL NETWORKS INTO RULE SETS

4.1. Definition of the topology of the ANN

For the purpose of this paper consider the ANN in Figure 3. This ANN has one neuron in output layer with a linear function as activation function and has only one

x₁ x₂

A1 B1

A2 B₂

y1=c1

y2=c2

If If

and and

Then

2 2 1 2 1 1

2 2 1

1 vy v y

v v

y v y

y v = +

+

= +

(3)

hidden layer whose activation function for each neuron is the sigmoid basis approximation function, whose graphic is shown in Figure 4 and defined as:

0 x 0

0 x ) 1

( 





<

≥

= −e^−x x

f (4)

This function is selected mainly because in Fuzzy Logic it can represent a fuzzy set interpreted as “is greater than 2.3”; since this sigmoid basis approximation can reach 1 only asymptotically, the activation level considered is 0.9 andf(2.3)=0.9.

Figure 3: ANN topology

Figure 4: Sigmoid Basis Approximation Function

4.2. Introducing the concept of f-duality

The concept of f-duality was introduced by Benitez, Castro and Requena in [3]. In their work they have used this concept to find a convenient operator, the logical interactive-or, to give a proper interpretation of ANN.

To produce the mapping of an ANN into rule sets as proposed in this paper, the same concept introduced by those authors will be used to find the equivalent mathematical operation to equation (1) – the operation calculated for the hidden neuron.

The following propositions and lemmas are useful:

Proposition 1: Let f: X→Y be a bijective function and let ⊕ be an operation defined in the domain of f, X.

Then there is one and only one operation ⊗, defined in the range of f, Y, verifying:

) ( )

( 1 1 i

n i i n

i x f x

f ⊕= =⊗= (5)

Definition 1: Let f be a bijective function and let ⊕

be an operation defined in domain of f. The operation ⊗ whose existence is proven in proposition 1 is called f- dual of ⊕.

Lemma 1: If ⊗ is the f-dual of ⊕ then ⊕ is the f ^–1^– dual of ⊗.

The sigmoid basis approximation function defined in (4) is a bijective function for x>0. Then, considering this function as function f and ⊕ as the operation + in ℜ, we have:

Lemma 2: The f-dual of + is *, defined as:

=

= + +

+ ... ) ( )* ( )*...* ( ) (x₁ x₂ x_n f x₁ f x₂ f x_n f

) 1 )...(

1 )(

1 ( 1

... p a b p

b

a∗ ∗ ∗ = − − − − (6)

Therefore, applying the concept of f-duality to (1), the output signal of the hidden neurons can also be calculated by:

=

∑

=

) (

* ...

* ) ( )

( ₁ ₁

1 j n nj

n

i i ij

j f xw f xw f x w

s

)) ( 1 ))...(

( 1 (

1− −f x₁w₁_j −f x_nw_nj (7)

Since the function f(.), which is the Sigmoid Basis Approximation Function, can represent one membership function in Fuzzy Logic and the operation in (7) is the already known Algebraic Sum operator , qualified as a S-norm (union) in Logic Fuzzy, then rules can be extracted of the ANN.

4.3. Extracting Rules of an ANN

From ANN shown in Figure 3, and considering the hidden neurons without bias (the function of the bias will be explained later), for each neuron in hidden layer, one rule can be extracted:

Rule Rj: If

∑

= n

i ij iw x

1 is A then yj=βj (8) where A is a fuzzy set whose membership function is the activation function of the hidden neuron.

From equation (7), rules as in (8) can be expressed as:

Rule Rj : If x₁ w_1j is A *…* x_i w_{i j} is A* …* x_n w_nj is A

then yj=βj (9)

As the expression “xi wij is A” might also be interpreted as “xi is Ai ” where the fuzzy set Ai has as membership function µ_A_i = f(x_iw_ij) with the weight w_ij as a scaling of the slope of f(.) and once the operation ∗ is considered as a logical operator OR, we can rewrite (9) as:

Rule Rj: If x1 is A1 or…or xi is Ai or…or xn is An then yj=βj (10) with the firing strength for each rule Rj as:

(4)

n

i A

A A

vj=µ₁∗....∗µ ∗...∗µ (11) Finally, from the output neuron in Figure 3 the output of the fuzzy system can be extracted:

∑

=

= ^m

j jsj

y

1

β (12)

and as s_j=v_j:

∑

=

= ^m

j j jv y

1

β (13)

The inference system extracted from the neural net is similar to a zero-order Takagi-Sugeno model, except that here the fuzzy logic operator used to calculate the firing strength of each rule is a S-norm (OR) and not a T-norm (AND- product).

However, for each S-norm there is a T-norm associated with it, that is, there is a fuzzy complement such that the three together satisfy the DeMorgan’s Law [7]. Specifically we state that the S-norm (a,b), T-norm t(a,b) and the fuzzy complement c(a) form an associated class if :

[

⁽ ^), ⁽ ⁾

]

)) , (

(s ab tca cb

c = (14)

and as already known, the T-norm associated with the Algebraic Sum is the Algebraic Product operator defined as:

b a b a

t( , )= × (15)

From (14), if the fuzzy system extracted will use the algebraic product to calculate the vj rule’s firing strengths then the rule set must use the negation for all memberships function extracted in (10) i.e. “xi is NOT A_i”, and the new output of the FIS will be:

) 1 (

∑

1

= −

= ^m

j j vj

y β (16)

Therefore, the Fuzzy System represented for the rule set in (10) and output in (13) can be also represented for the output in (16 ) and rule set as:

Rule Rj: If x1 is Not A1 and…and xi is Not Ai and…and xn is Not An then yj=βj (17) with the firing strength for each rule Rj as:

) 1 ( ...

) 1

( ₁ _A_n

A i

vj= −µ × × −µ × × −µ (18) The Fuzzy System extracted and the ANN are equivalent since their output are the same for any input.

Figure 5 illustrates the extraction/transformation process for two rules.

4.4. Comments

The process explained so far contains the basic idea to produce a mapping of ANNs into FIS.

However, the rule antecedents and consequents

extracted from ANNs have to make sense - they must be meaningful and subject to interpretation. For this reason, some considerations will be now presented in such a manner that the extracted rule set will be in appropriate form.

Figure 5: Extraction/Transformation process for two rules.

Condition 1: For all membership function )

( _i _ij

A_i = f xw

µ extracted, the weight wij can be seen as a scaling factor of the f(.) and with f(2.3/wij)=0.9, then the interpretation of the set fuzzy will be “greater than

wij

/ 3 .

2 ” and it only will make sense if 0<2.3/wij ≤1, which leads towij≥2.3.

This consideration appears as a result from the usual practice of training an ANN with normalized inputs;

therefore all memberships functions extracted have to be defined for the respective input interval.

With wij ≥2.3 and as 0≤xi≤1, the correct use of the equation (7) is guaranteed once we will always have

>0

ij iw

x .

Condition 2: The consequent of the rule, whose value is constant (singleton), has to be into the interval of the output system. Therefore, scaling change inβjk

(extracted consequent) has to be made after the training of the net. This scaling change can be incrusted in the

v2=(1-µA3)(1- µA4) v1=(1-µA1 )(1-µA2) If

v1=1-(1-µA1)(1- µA2)

v2=1-(1-µA3)(1- µA4) y=v1β1+ v2 β2

x₂ A1

v1

v2

y1 = β1

y2 = β2

THEN A2

012345678910

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x₁ OR

012345678910

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

012345678910

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

x1

OR

012345678910

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

If

x2

A1

v1

v2

A2 THEN

012345678910

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x1

AND

012345678910

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

A4

012345678910

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

x₁ AND

012345678910

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

If If

A3

A₃

Not

Not Not

Not

x2

y1 = β1

y₂ = β₂ y=(1-v1)β1+ (1-v2 )β2

A₄ ANN

(5)

output (16) of the FIS.

4.5. ANN with bias

If the bias is used in hidden neurons, the equation (7) is rewritten as:

) ( ) (

* ...

* ) ( )

( ₁ ₁

1

j nj n j

n

i

j ij i

j f xw f xw f x w f

s =

∑

+θ = ∗ θ

=

(19) Therefore, the firing strength of the rule j as in (18) will be now:

)) ( 1 ( ) 1 ( ...

) - (1 ) 1

( _A₁ _A₂ _A _j

j f

v = −µ × µ × × −µ _n × − θ (20) If θj >0, we can consider the term 1− f(θ_j) as the weight of the rule j; then the set of rules as in (17) will be now of the form:

Rule Rj: If x1 is Not A1 and…and xi is Not Ai and…and xn is Not An then yj=βj with rj = (1-f(θj)) (21) 4.6. ANN Learning Restrictions

The following restrictions have to be introduced in the learning algorithm of the ANN to guarantee condition 1 and the use of bias as weights of the rules:

a) All the weights between input layer and hidden layer have to be greater than 2.3, that iswij≥2.3. b) All bias of the hidden neurons have to be greater

than zero, that is, θj >0.

5. APPLICATION TO SPATIAL LOAD FORECASTING

Spatial Load Forecasting (SLF) methods have been developed to predict where, when and how much load growth will occur in a utility service area. They have been used to model the process of load growth in order to predict load evolution in a spatial and temporal basis, and have proved their value in distribution expansion planning.

One of the best ways to implement SLF methods is use Geographical Information System (GIS). The GIS is an ideal environment to SLF models because of its ability to: manage spatial information; model and simulate the phenomena behaviour; visualise data and simulation results and establish the interaction between the planner and simulation environment.

To demonstrate the process of the extraction of the FIS from ANN we have selected the study region of the island of Santiago (Republic of Cabo Verde, Africa) illustrated in Figure 6 [8]-[9].

The region size is 39km x 50,5km. The resolution in the GIS spatial representation was of square cells of 250m, aggregated in a cell-based map with 31512 cells.

Each cell contains information about influence factors as inputs factors and potential for development as the output. We have divided the data defined in each cell into a training set and a test set of equal size (15756 points each).

Figure 6: Island of Santiago, Republic of Cabo Verde, Africa. 5.1. ANN Results

An ANN with three normalized inputs (x1-distance to main urban centre, x2-distance to roads and x3- distance to secondary urban centre), 14 hidden neurons and one output neuron (potential for development,

[

⁰^,^5.4

]

y∈ ) was trained with data training set with 15756 points.

This data set was acquired directly from the map of the region stored at the GIS together with data about demand development (no. and type of consumers) at each point of the map.

The objective of building an ANN was to derive a relationship between influence factors and potential for development that would allow, later on, the use of the ANN for spatial demand forecasting, when applied to the same region in future scenarios or when applied in different, although similar, regions.

The training of the ANN was made using a Backpropagation algorithm while guaranteeing the limit constraint for all weights wij and bias θj. Table 1 shows all the weights and bias after the training of the ANN.

Figure 7 shows the results of the ANN for the 15756 points trained. The average error of the ANN was 0.1048.

j w1j w2j w3j θj βj

1 3.618 3.400 5.565 1.953 1.693 2 28.058 9.419 25.48 0.279 -26.1 3 4.692 5.456 20.39 0.199 12.00 4 32.687 42.014 23.63 0.002 32.88 5 5.725 4. 572 7.965 3.199 2.346 6 6.662 2.571 17.32 0.828 -6.654 7 11.294 5.069 15.55 0.486 8.332 8 5.789 3.542 5.240 3.579 3.375 9 7.679 6.180 43.29 1.560 10.19 10 20.529 9.048 20.42 0.074 -12.93 11 4.112 5.937 21.24 0.430 14.13 12 3.696 58.470 12.03 0.190 -22.11 13 5.672 6.668 9.983 0.389 -4.926 14 21.592 2.388 20.38 1.092 -11.81 Table 1: Weights wij between input and hidden layer, bias θj

of the hidden neurons and weights βj between hidden layer and output.

(6)

Figure 7: Potential for development in all cells in the map of the study region (x axis: no. of cells) for three-input neural networks - (+) target output and (*) ANN output.

5.2. Extraction of the FIS

With the ANN trained and all weights and bias obtained, we can initiate the process of extraction of the rule set composing a FIS.

The FIS will have three normalized inputs (distance to main urban centre, distance to road and distance to secondary urban centre) and one output (potential for development); as the trained ANN has 14 neurons in its hidden layer then 14 rules will be extracted.

Since the methodology to extract the antecedent rule from one hidden neuron is similar to all neurons in the hidden layer, we will demonstrate the method only for first neuron in the layer.

For this neuron, from Table 1, the weights and bias are:^w₁₁=3.618,^w₂₁=3.4, w₃₁=5.565 andθ₁=1.953.

Then:

a) For w11=3.618 we can extract the membership function µ_A₁= f(3.618x₁) as illustrated in figure 8(a) and whose interpretation is “x1 is greater than 0.6356” since 2.3/3.618=0.6356.

b) For w21=3.4 we can extract the membership function µ_A₂ = f(3.4x₂) as illustrated in figure 8(b) and whose interpretation is “x2 is greater than 0.675” since 2.3/3.4=0.675.

c) For w31=5.565 we can extract the membership function µ_A₃ = f(5.565x₃) as illustrated in figure 8(c) and whose interpretation is “x3 is greater than 0.675” since 2.3/5.565=0.675.

d) For θ₁=1.953, we can extract the weight of the rule 1, that is r₁=1− f(1.953)=0.1417.

Considering now the weights β →

[

−⁹^.⁸⁸²^26.57

]

that will be used to extract the consequents of the rule, we must firstly change their values to into the real output value of the system, that is

[

⁻⁹^.⁸⁸¹^26.57

] [

^→ ⁰^5.4

]

^.

The scaling change will be made through:

4634 . 1 1481 .

0 +

= j

j β

β (22)

And applying this scaling change in βof the table 1 we have found:

β = [2.56 0 3.50 5.4 2.62 1.80 3.16 2.71 3.33 1.23 3.69 0.39 1.96 1.33] (21) With β , and applying the method to the extraction of all antecedents from hidden neurons, the complete set of rules can be obtained. Table 2 shows the FIS extracted.

Figure 8: Membership Functions extracted from the first hidden neuron. (a) w₁₁=3.618 (b) w₂₁=3.4 (c) w₃₁=5.565.

Finally, from output neuron of the ANN, the output of the FIS can be extracted as in (16) and once the values of the weights β were changed to β, then this scaling change must be included in the system output, which results in:

∑ ∑

= =

− +

= ¹⁴

1

14 1

75194 . 6 8811

. 9 6382 . 128

j j

j j j j

jr v r

v

y β (24)

where the firings strengths v_jof the rules are calculated through the Algebraic product as in (18) and r_j is the weight of the rule j.

6. CONCLUSION

In this paper, we have presented the derivation and definition of the transform that maps an ANN to a TS- FIS. We have explained how to proceed in a practical case and have demonstrated how the transform made explicit and put into light knowledge that was hidden in the ANN architecture.

The transformation methodology was based on the concept of f-duality that allowed to find the equivalent mathematical operation for a hidden neuron, which can be considered the foundation for all the process of extraction of rules.

f (3.4x2)

x1 x2

(a) (b)

(c) 0.635

0.9

0.675 0.9

0.9

f (5.565x3)

0.413 x₃

f (3.618x₁)

(7)

Rule

Distance to main urban centre (x1)

is not

Distance to main urban centre (x2)

is not

Distance to secondary urban centre (x3) is not

Potential for development is

Rule weight

1 greater than 0.63 greater than 0.67 greater than 0.41 2.56 0.14 2 greater than 0.08 greater than 0.24 greater than 0.09 0 0.75 3 greater than 0.49 greater than 0.42 greater than 0.11 3.50 0.81 4 greater than 0.07 greater than 0.05 greater than 0.09 5.4 0.99 5 greater than 0.40 greater than 0.50 greater than 0.29 2.62 0.03 6 greater than 0.34 greater than 0.89 greater than 0.13 1.80 0.43 7 greater than 0.20 greater than 0.45 greater than 0.14 3.16 0.61 8 greater than 0.40 greater than 0.65 greater than 0.43 2.71 0.02 9 greater than 0.29 greater than 0.37 greater than 0.05 3.33 0.21 10 greater than 0.11 greater than 0.25 greater than 0.11 1.23 0.92 11 greater than 0.55 greater than 0.38 greater than 0.10 3.69 0.65 12 greater than 0.62 greater than 0.03 greater than 0.19 0.39 0.82 13 greater than 0.40 greater than 0.34 greater than 0.23 1.96 0.67 14

IF

greater than 0.10 A N D

greater than 0.96 A N D

greater than 0.11 T H E N

1.33 0.33 Table 2: Rules extracted from ANN for SLF problem.

It is important to emphasize that any method for the extraction of rules from an ANN is valuable only to the degree to which the extracted rules are meaningful and comprehensible to a human expert. Therefore, in order to give to the extracted rules a more human-friendly form and logic sense we have to introduce restrictions on the ANN learning algorithm and have to make scaling changes in the extracted consequent.

For illustration, the Spatial Load Forecasting problem was chosen; we have demonstrated that the method for extraction of FIS is simple and direct.

For this practical case we have obtained the FIS with only 12 rules. In [9] we have constructed a FIS for the same problem using the known ANFIS (adaptive neuro- fuzzy inference system) and we have obtained 124 rules for the zero-order Takagi Sugeno model. Either system has good results, but in terms of computational complexity for practical implementation we can see that working with the FIS extracted from the ANN is better than working with the FIS obtained from ANFIS.

Besides, it becomes easier for the human specialist to understand and analyze a system with smaller number of the rules.

7. REFERENCES

[1] S. Mitra and Y. Hayashi, “Neuro-Fuzzy Rule Generation:

Survey in Soft Computing Framework”, IEEE Transactions On Neural Network, vol. 11, n^o3, pp 748- 768, May 2000.

[2] D. Nauck , F. Klawonn and R. Kruse, “Foundations of Neuro-Fuzzy Systems”, Wiley 1997, ISBN0-471-97151-0.

[3] J. M. Benitez, J. L. Castro and I. Requena, “Are Artifical Neural Networks Black Boxes?”, IEEE Transactions On Neural Network, vol. 8, n^o5, pp 1156-1164, September 1997.

[4] J. S. R. Jang and T. Sun, “Functional Equivalence between Radial Basis Function Networks and Fuzzy Inference Systems”, IEEE Transactions On Neural Network, vol. 4, pp 156-158, 1992.F.

[5] Scarsell and C. Tsoi, “Universal Approximation using Feedforward Neural Networks: A Survey of some existing methods, and some news results”, Neural Network, vol.

11, n^o 1, pp15-37, 1998.

[6] F.A. Pinkus, “Approximation Theory of the MLP Model in Neural Networks”, Acta Numerica, pp 143-195, Cambridge University Press, 1999.

[7] L. X. Wang, “A Course in Fuzzy Systems and Control”, Prentice-Hall International, 1997, ISBN: 0-13-593005-7.

[8] C. Monteiro, V. Miranda and T. P. Leão, “Scenario Identification Process on Spatial Load Forecasting”, Proceedings of PMAPS2000 – 6^th Int. Conference on Probabilistic Methods Applied to Power Systems, ed.

INESC Porto, Funchal, Portugal, September 2000.

[9] T. Konjic, I. Kapetanovic, V. Miranda and A. Castro,

“Uncertainty in Spatial Load Forescasting Models – A Comparasion of Neural Networks and Several Fuzzy Inference Systems”, Proceedings of RIMAPS 2001, Euro Conf. on Risk Management in Power System Planning and Operation, Porto, Portugal, September 2001.

ANNEX - Proof of Lemma 2:

Only two input variables in domain of X will be considered to proof the Lemma 2.

Let a^,b∈

[ [

⁰¹. Let x1,x2∈ℜ such that a= f(x₁) and )

(x₂ f

b= . For the sigmoid basis approximation defined in equation (2) we have then:

For x=x₁ ,x₁=−ln(1− f(x₁))=−ln(1−a) x=x₂,x₂=−ln(1−f(x₂))−=−ln(1−b) then

x₁+x₂=−ln(1−a)−ln(1−b)=−ln(1−a)(1−b) for x=x₁+x₂, x₁+x₂=−ln(1− f(x₁+x₂)) then

−ln(1−a)(1−b)=−ln(1− f(x₁+x₂)) (1−a)(1−b)=1− f(x₁+x₂)

and f(x₁+x₂)=1−(1−a)(1−b)

then f(x₁+x₂)= f(x₁)∗f(x₂)=a∗b=1−(1−a)(1−b) and generalizing for n inputs we have proved Lemma2.