Discussion - Experimental results - Complex- and hypercomplex-valued neural networks

2.7 Experimental results

2.7.4 Discussion

2.7.3 4D Saito chaotic circuit

Lastly, we experimented on the 4D Saito chaotic circuit given by _dx₁

dydt1

−1 1

−α₁ α₁β₁

x₁−ηρ₁h(z) y₁−η^ρ_β¹

1h(z)

_dx₂

dydt2

−1 1

−α₂ α₂β₂

x₂−ηρ₂h(z) y₂−η^ρ_β²

2h(z)

whereh(z) =

(1, z ≥ −1

−1, z ≤1 is the normalized hysteresis value, andz =x₁+x₂, ρ₁ = _1−β^β¹

1, ρ₂ = _1−β^β²

2. The parameters are given by (α₁, β₁, α₂, β₂, η) = (7.5,0.16,15,0.097,1.3). This is also a chaotic time series prediction problem, and was used as a benchmark for quaternion- valued neural networks in [5, 6, 7, 31, 210, 211, 32].

The networks had the same architectures as the ones described earlier, and were trained for 5000epochs with5249training samples, which result from solving the 4D Saito chaotic circuit on the interval[0,10], with the initial conditions(x₁, y₁, x₂, y₂) = (1,0,1,0).

The prediction gains after50runs of each algorithm are given in Table 2.3.

In this last experiment, the performances were similar to those in the previous experiments:

QCP, SAB, and RPR had approximately the same performance, followed by DBD, and finally by GD. Also, CGPR had the best performance, followed closely by CGPB, and lastly by CGFR and CGHS. The performance of the SCG algorithm was similar to the ones in the previous experiments. Between the quasi-Newton algorithms, OSS had the best performance, followed closely by BFGS, and lastly by SR1 and DFP. The conclusion is the same: the LM algorithm had the best performance among all the tested algorithms.

Table 2.3: Experimental results for the 4D Saito chaotic circuit Algorithm Prediction gain

GD 5.76±1.70e-1 QCP 11.49±6.47e-1 RPR 11.58±7.91e-1 DBD 6.28±3.36e-1

SAB 11.55±4.96e-1 CGHS 11.59±4.09e-1 CGPR 13.64±3.67e-1 CGFR 12.08±5.30e-1 CGPB 13.02±4.93e-1 SCG 15.32±9.35-1

SR1 11.71±6.73e-1 DFP 11.10±6.32e-1 BFGS 16.24±5.06e-1 OSS 16.94±7.70e-1 LM 25.36±9.63e-1

n(k), whereas the 3D Lorenz system and the 4D Saito chaotic circuit are real-valued problems cast into the domain of quaternions. The 3D Lorenz system uses only the three quaternion imaginary units for the three variables, and the 4D Saito chaotic circuit uses a full quaternion for its four variables. It was showed, for example in [7], that quaternion-valued feedforward neural networks perform better on these problems than the real-valued ones. This gives rise to the possibility that, in the future, more high-dimensional problems from the real domain be treated using quaternion-valued neural networks.

Although we only used time series prediction problems to illustrate the effectiveness of the deduced algorithms, pattern recognition problems like color image compression [89] and color night vision [104] could also benefit from using these learning methods. The domain of quaternion-valued neural networks is just starting to gain interest, which means that the future might bring more applications in the pattern recognition domain, also.

The performances of the algorithms were similar to the ones from the real-valued case, mainly because these learning methods were derived starting from their real-valued counter- parts. The main difference is the quaternion multiplication, which gives rise to the specific formulations of these algorithms in the quaternion domain. However, the performances of the algorithms in each class differ from the ones in the real-valued case, mainly due to the quaternion dynamics, giving one more reason to extend these algorithms to the quaternions.

The computational costs for these algorithms are on the order of the computational cost of the same algorithms in the real-valued domain. In all cases, the learning methods scale well, meaning that the computational cost is four times that of the real version of the same method, which is the best that can be obtained, taking into account the fact that every quaternion is formed of four real numbers.

TheHRcalculus was used to deduce the algorithms starting from the real-valued case. Two other methods could have been used for the deduction. One is by splitting the quaternions into their four real components, and giving the formulation of the algorithms separately for each of the four components. However, this would have meant that the specific relations between components in the quaternion domain would have been lost, and also the formulations would have been much more cumbersome, because of the need to split the fully quaternion activation func- tion into components. In principle, these algorithms would have also been deducible directly in the quaternion domain, but this type of deduction would have needed quaternion-domain derivatives and would have been much more error prone. Nonetheless, all the three methods give equivalent formulations, thus the easiest and most natural method is preferred.

Complex-valued deep learning

3.1 Complex-valued convolutional neural networks

Convolutional neural networks have become one of the most successful models in solving vir- tually any image recognition task. Proposed for the first time in [107], where they were used for handwritten digit recognition, they were later applied to handwriting recognition in [108]. By 1995, applications of this type of networks appeared in image recognition, speech recognition, and time series prediction [106]. Convolutional neural networks represent a particularization of feedforward neural networks, in which matrix multiplication is replaced by convolution and the weight matrix is replaced by many convolution kernels with much smaller dimension than that of the weight matrices. Although they had many applications in computer vision [109], convolutional networks started gaining more popularity only with the increase in the available computational resources and their implementation using parallel computing on graphics pro- cessors (GPU).

The use of these computational resources allowed a reducing of training times by a factor of 100, giving way to models with an increasing number of layers and parameters, thus inaugurat- ing the domain of deep learning [15, 17, 193, 64]. The basis of this domain is represented by the convolutional networks, for which the increase in the number of layers gives better performance, as opposed to feedforward networks, whose performance degrades for a big number of layers.

The same field includes other network models, for which it has been proved mathematically that performance is directly proportional to the size of the model.

The domain of complex-valued deep learning has appeared in the last few years. Although feedforward complex-valued neural networks have been applied to image recognition [137], and a single layer complex-valued convolutional neural network was used in [72] for object detection in Polarimetric Synthetic Aperture Radar (PolSAR) images, only in the last few years deep learning algorithms using complex numbers were derived and used. For example, in [10]

complex-valued autoencoders are proposed, which are a type of model belonging to the deep learning paradigm. In [20] a wavelet scattering network is proposed, which uses complex numbers. Neuron synchrony in a complex-valued Deep Boltzmann Machine (DBM) was discussed in [179], showing superior performances to the real-valued case.

Very recent works discuss complex-valued recurrent neural network models [8], as well as learning time series representations using complex-valued recurrent networks [188], both with similar if not superior results than the real-valued ones, and the existence of certain properties of these networks that do not appear in the real-valued case, which makes them suitable for certain types of applications. One of the most important papers in this domain is [215], which gives a mathematical motivation for complex-valued convolutional neural networks, showing

that they can be seen as nonlinear multiwavelet packets, thus making the mathematical analysis from the signal processing domain available for a rigorous formulation of the properties of complex-valued convolutional networks. Following the footsteps of this paper, it is expected that research in the complex-valued deep learning domain will increase in the coming years.

Taking the above discussion into consideration, a natural idea is to use complex-valued convolutional neural networks for image recognition, also taking into account the fact that some images are given by the imaging devices in complex form [105].

Synthetic Aperture Radars (SAR) are imaging systems that produce complex-valued images of the ground [51, 187, 30]. They can be Interferometric Synthetic Aperture Radars (InSAR) [204] or Polarimetric Synthetic Aperture Radars (PolSAR) [3, 71, 72]. Complex-valued neural networks were used for noise reduction, compression, and recognition of this type of complex- valued images. To date, to the best of our knowledge, there are only two attempts of recognition of this type of images directly in the complex domain, one using complex-valued feedforward neural networks, and one using a complex-valued convolutional neural network with a single layer, both models being more suitable to this problem than real-valued neural networks, which ignore the dependencies present in the data in the complex plane [71, 72]. This fact lead to the idea that deep convolutional neural networks could bring even better performance in this area.

On the other hand, functional Magnetic Resonance Imaging (fMRI) collects the data in complex form also, but the majority of studies on this type of imaging only use the amplitude of the data in their analysis, ignoring the frequency information [86]. Complex-valued neural networks have proven their superior performances in the reconstruction and analysis of such images [86, 73]. Complex-valued independent component analysis was also successfully applied for this type of images [182, 111, 234]. Previous observations open up the possibility of applying complex-valued convolutional neural networks for the recognition of different patterns that may appear in these images, taking into account all the information provided by the imaging devices, and not ignoring the frequency information as was done until now.

In this section, we start by applying complex-valued convolutional neural networks to real- valued image recognition, leaving the complex-valued image recognition using these networks as future work. The presentation in this section follows that in the author’s papers [153] and [168].

No documento Complex- and hypercomplex-valued neural networks (páginas 37-40)