• Nenhum resultado encontrado

Linear autoregressive process with circular noise

2.7 Experimental results

2.7.1 Linear autoregressive process with circular noise

This means that the matrix

H

Hkhas the form

H

Hk =

H1k H2k H3k H4k ıH2kı−1 ıH1kı−1 ıH4kı−1 ıH3kı−1

H3k−1 H4k−1 H1k−1 H2k−1 κH4kκ−1 κH3kκ−1 κH2kκ−1 κH1kκ−1

 ,

whereH1k = (J1k)HJ1k+ı(J2k)HJ2kı−1+(J3k)HJ3k−1+κ(J4k)HJ4kκ−1kIN,H2k= (J1k)HJ2k+ ı(J2k)HJ1kı−1+(J3k)HJ4k−1+κ(J4k)HJ3kκ−1,H3k = (J1k)HJ3k+ı(J2k)HJ4kı−1+(J3k)HJ1k−1+ κ(J4k)HJ2kκ−1,H4k= (J1k)HJ4k+ı(J2k)HJ3kı−1+(J3k)HJ2k−1+κ(J4k)HJ1kκ−1. Furthermore, we have thatHgk =

 gk ıgkı−1

gk−1 κgkκ−1

, wheregk = (J1k)Hek+ı(J2k)Hekı−1+(J3k)Hek−1+κ(J4k)Hekκ−1. Now we have all the necessary ingredients to compute the update rule given in (2.6.3).

Up until now we have worked with vectors fromH4N. Ideally, we would like to work with vectors directly in HN. Considering the definition ofHqforq ∈ HN, this is done by taking the firstN elements of the vector Hq. By using the Banachiewicz inversion formula [237], relation (2.6.3) thus becomes: [150]

wk+1 =wk−(H1k)−1gk+ (H1k)−1 H2k H3k H4k T−1

−ıH2kı−1(H1k)−1gk+ıgkı−1

−H3k−1(H1k)−1gk+gk−1

−κH4kκ−1(H1k)−1gk+κgkκ−1

,

where T =

ıH1kı−1 ıH4kı−1 ıH3kı−1

H4k−1 H1k−1 H2k−1 κH3kκ−1 κH2kκ−1 κH1kκ−1

−

ıH2kı−1

H3k−1 κH4kκ−1

(H1k)−1 H2k H3k H4k

, which represents thequaternion-valued Levenberg-Marquardt (LM)algorithm.

In this case too, the gradient of the error function at different steps is computed using the well-known backpropagation scheme.

gorithm with symmetric rank-one updates (SR1), the quasi-Newton algorithm with Davidon- Fletcher-Powell updates (DFP), the quasi-Newton algorithm with Broyden-Fletcher-Goldfarb- Shanno updates (BFGS), the one step secant method (OSS), and the Levenberg-Marquardt al- gorithm (LM).

The tap input of the filter was 4, so the networks had 4 inputs, 4 hidden neurons on a single hidden layer, and one output. The activation function for the hidden layer was the fully quaternion hyperbolic tangent function, given byG2(q) = tanhq = eeqq−e+e−q−q,and the activation function for the output layer was the identityG3(q) = q. Training was done for5000 epochs with5000randomly generated training samples.

To evaluate the effectiveness of the algorithms, we used a measure of performance called prediction gain, defined by Rp = 10 log10 σσ2x2

e, where σx2 represents the variance of the input signal andσe2represents the variance of the prediction error. The prediction gain is given in dB and it is obvious that, because of the way it is defined, a bigger prediction gain means better performance. After running each algorithm 50 times, the prediction gains are given in Table 2.1.

We can see that QCP, SAB, and RPR performed approximately the same, followed by DBD, but all of them performed better than the gradient descent algorithm. Then, CGHS and CGPR gave approximately the same results, with CGFR performing better and CGPB worse. The SCG algorithm was better than the conjugate gradient algorithms. From the quasi-Newton methods, DFP and SR1 gave approximately the same results, with BFGS performing better and OSS worse. The absolute best was the LM algorithm.

Table 2.1: Experimental results for linear autoregressive process with circular noise Algorithm Prediction gain

GD 4.51±6.64e-2

QCP 6.37±1.08e-1

RPR 6.41±1.47e-1

DBD 5.46±1.48e-1

SAB 6.40±1.31e-1

CGHS 5.17±1.30e-1 CGPR 5.19±8.14e-2 CGFR 6.91±2.51e-1 CGPB 5.00±9.57e-2 SCG 7.36±9.25e-2

SR1 6.73±2.34e-1

DFP 6.61±2.15e-1

BFGS 7.23±3.80e-1

OSS 5.11±2.04e-1

LM 8.94±3.33e-1

QESN [229] 3.57

AQESN [229] 3.51

2.7.2 3D Lorenz system

The 3D Lorenz system is given by the ordinary differential equations dx

dt = α(y−x) dy

dt = −xz+ρx−y dz

dt = xy−βz,

whereα= 10,ρ= 28, andβ = 2/3. This represents a chaotic time series prediction problem, and was used to test quaternion-valued neural networks in [7, 23, 209, 212, 33, 229].

The tap input of the filter was4, and so the networks had4inputs,4hidden neurons, and one output neuron. The networks were trained for 5000epochs with1337 training samples, which result from solving the 3D Lorenz system on the interval [0,25], with the initial conditions (x, y, z) = (1,2,3).

The results after50runs of each algorithm are given in Table 2.2. The measure of perfor- mance was the prediction gain, like in the above experiment.

In this case, QCP and RPR performed best, SAB followed, and DBD was again last of the four, but still better than GD. Next, CGHS and CGPB performed approximately in the same way, CGFR slightly better, and the best was CGPR. In this experiment also, SCG had better results than the conjugate gradient algorithms. From the quasi-Newton methods, DFP and SR1 performed approximately in the same way, OSS slightly better, and the best was BFGS. The best overall performance was attained by the LM algorithm.

Table 2.2: Experimental results for the 3D Lorenz system Algorithm Prediction gain

GD 7.56±7.42e-1

QCP 10.59±5.29e-1

RPR 11.07±7.08e-1

DBD 9.35±6.17e-1

SAB 10.33±7.09e-1

CGHS 10.04±6.65e-1

CGPR 11.31±8.34e-1

CGFR 10.69±5.81e-1

CGPB 10.12±7.35e-1

SCG 12.58±6.44e-1 SR1 11.74±6.82e-1

DFP 11.27±7.76e-1

BFGS 13.74±6.30e-1 OSS 12.09±7.806e-1

LM 31.45±1.21e0

QESN [229] 17.73

AQESN [229] 18.92

2.7.3 4D Saito chaotic circuit

Lastly, we experimented on the 4D Saito chaotic circuit given by dx1

dydt1

dt

=

−1 1

−α1 α1β1

x1−ηρ1h(z) y1−ηρβ1

1h(z)

dx2

dydt2

dt

=

−1 1

−α2 α2β2

x2−ηρ2h(z) y2−ηρβ2

2h(z)

,

whereh(z) =

(1, z ≥ −1

−1, z ≤1 is the normalized hysteresis value, andz =x1+x2, ρ1 = 1−ββ1

1, ρ2 = 1−ββ2

2. The parameters are given by (α1, β1, α2, β2, η) = (7.5,0.16,15,0.097,1.3). This is also a chaotic time series prediction problem, and was used as a benchmark for quaternion- valued neural networks in [5, 6, 7, 31, 210, 211, 32].

The networks had the same architectures as the ones described earlier, and were trained for 5000epochs with5249training samples, which result from solving the 4D Saito chaotic circuit on the interval[0,10], with the initial conditions(x1, y1, x2, y2) = (1,0,1,0).

The prediction gains after50runs of each algorithm are given in Table 2.3.

In this last experiment, the performances were similar to those in the previous experiments:

QCP, SAB, and RPR had approximately the same performance, followed by DBD, and finally by GD. Also, CGPR had the best performance, followed closely by CGPB, and lastly by CGFR and CGHS. The performance of the SCG algorithm was similar to the ones in the previous experiments. Between the quasi-Newton algorithms, OSS had the best performance, followed closely by BFGS, and lastly by SR1 and DFP. The conclusion is the same: the LM algorithm had the best performance among all the tested algorithms.

Table 2.3: Experimental results for the 4D Saito chaotic circuit Algorithm Prediction gain

GD 5.76±1.70e-1 QCP 11.49±6.47e-1 RPR 11.58±7.91e-1 DBD 6.28±3.36e-1

SAB 11.55±4.96e-1 CGHS 11.59±4.09e-1 CGPR 13.64±3.67e-1 CGFR 12.08±5.30e-1 CGPB 13.02±4.93e-1 SCG 15.32±9.35-1

SR1 11.71±6.73e-1 DFP 11.10±6.32e-1 BFGS 16.24±5.06e-1 OSS 16.94±7.70e-1 LM 25.36±9.63e-1