UNCERTAINTY TREATMENT: NN ENSEMBLES 1 Construction of an Ensemble of NNs

In general terms, it is well known that an ensemble of different predictors can generate predictions that are more accurate than those obtained by individual predictors [148].

Specifically, a NN ensemble is a learning paradigm where a certain number of NNs are combined to estimate the desired output for the target of interest (see Figure 11) [148].

Typically, a NN ensemble is constructed in two steps: i) training a number of individual NNs and ii) combining the predictions yielded from these NNs. The aim of assembling a number of NNs into an ensemble is to improve the generalization ability and estimation accuracy of the prediction model.

Considerable research has been carried out both on ensembles and, also, specifically on ensembles of NNs. Traditional NN ensemble techniques have been built via several strategies, such as randomly trying different topologies (different number of hidden layers and neurons) in each individual NN, setting different initial weights or parameters, using different training datasets (e.g. bagging, CV, etc.) or learning algorithms, etc. [148]-[151].

Bagging and boosting are the most prevailing approaches used to produce ensembles [148], [151]. Bagging is based on bootstrap sampling [152], since it produces replicate training sets by sampling with replacement from the training samples [149], [153], [154]. The method works by training the multiple (m) models on different data splits (generated by sampling with replacement from the original training dataset), and by averaging their outcomes to obtain the ultimate prediction results on the testing set [153]. The bootstrap method is one of the most widely used statistical methods for standard errors estimation and for construction of CIs and PIs related to the response variable. This is due to its ease of use and to its robustness, and also to the advantages of not requiring assumptions about its probability distribution, and of being efficient even when a small data set is available [97], [154], [155]. The bootstrap is a computational procedure that uses resampling with replacement, in order to reduce uncertainty [97].

In boosting ensembles, the patterns that the earlier classifiers in the series recognized incorrectly are over-represented in the composition of a particular training set, i.e. training samples that are incorrectly predicted by previous classiﬁers in the series are more often

Uncertainty Treatment: NN Ensembles

56 chosen than samples that were correctly predicted [149], [151], [156]. Thus, boosting aims at producing new classiﬁers that are more capable to predict samples for which the current ensemble performance is poor [151].

Regarding the combination of the estimated predictions (outputs) of each individual NN, different techniques can be adopted, like a simple arithmetic mean, a weighted mean, a median, a linear combination, local fusion (LF), dynamic integration, etc. [149], [157]. As an exemplification, Baraldi et al. [157] have explored the LF strategies for the aggregation of the outcomes of different ensemble models, whereas Khosravi et al. [150] have combined individual PI forecasts through mean and median calculations.

Figure 11. A basic scheme of NN ensemble.

7.2 Application to Wind Power Prediction Intervals Estimation with Interval Wind Speed Inputs

In Paper VI, we propose a novel approach to short-term (1-h ahead) wind power forecasting with uncertainty quantification. The approach can be schematized in two steps: first, short- term estimation of wind speed PIs is performed within a multi-objective optimization framework worked out by NSGA-II; then, the uncertainty in wind speed and the uncertainty in the power curve are combined via a bootstrap sampling technique.

Input

Training

…

𝑦̂ 𝑦̂2 𝑦̂𝑚

Combination of multiple individual model outputs

𝒚̂_{𝒄𝒐𝒎𝒃}

Uncertainty Treatment: NN Ensembles

57 In the present work, we treat the power curve parameters as random variables and account for the epistemic uncertainty by bootstrapping [158], which allows combining also the aleatory uncertainty in the wind speed. The inherent stochasticity in the power curve is motivated by the fact that different wind turbines correspond to specific power curve parameters, which leads to an imprecise and imperfect knowledge of the power curve transformation. A plot of the power curve with parameters , i.e. cut-in speed, rated speed, cut-off speed and rated power, is shown in Figure 12.

In the case study, we consider and to be fixed (deterministic) values, and respectively equal to the values 30 m/s and 20 MW [159], [160] while and are random variables with distributions and , respectively. More precisely, we sample and from both a uniform and a Gaussian distributions centered around average values of 3.5 and 14.5 m/s, respectively, with a range of uncertainty of [3, 4] and [12, 17] m/s, respectively, defining the domain of the associated distribution (see Figure 12). The two parameters are sampled either from a uniform distribution ( and ), or from a Gaussian one ( ( . ( ) ) and ( . ( ) ).

Estimation of the wind power PIs based on estimated wind speed PIs are performed as follows:

i. Given the estimated hourly wind speed PIs ( ) ( ) ( ) ( ) on the testing set, sample two values for the stochastic parameters and from the corresponding distributions, i.e. and , and transform all wind speed PIs ( ) ( ) ( ) ( ) into wind power PIs [ ( ) ( )], … [ ( ) ( )], via the power curve transformation. In the case study, this procedure has been repeated 1000 times.

ii. Aggregate the results of the bootstrap phase by computing, for each element of the testing set, the bootstrapped average wind power PI and the 5^th and 95^th percentiles of the wind power PI bootstrapped distribution.

Uncertainty Treatment: NN Ensembles

58 Figure 12. Plot of the power curve as a function of wind speed. Solid vertical lines correspond to the values of the two stochastic parameters and . Dashed vertical lines identify the domains of the distributions and , respectively.

The user-specified parameters of NN and NSGA-II, and the plots of the resulting average bootstrapped PIs for 1-h ahead wind power prediction are given in the paper.

In short, considering the fact that wind-integrated distributed power generation systems are subject to both epistemic and aleatory uncertainties, this paper presents a novel approach for an adequate treatment (quantification) of both types of uncertainties. The proposed approach quantifies aleatory uncertainty by estimating wind speed PIs, and then transforms them into wind power PIs by using a power curve. In doing so, epistemic uncertainty arising from the imperfect knowledge of the power curve parameters is also taken into account through bootstrap sampling. The procedure manages to effectively decouple aleatory and epistemic uncertainty, and shows a good robustness with respect to the parametric assumptions implicit in the bootstrap. The invariance of the coverage probability by passing from wind speed to wind power PIs has also been shown.

7.3 Application to Short-term Wind Speed Prediction Intervals Estimation

In paper VII, we address the problem of short-term wind speed prediction for wind power production. PIs are considered to account for the uncertainties in the predictions and two non- parametric methods are proposed to construct ensemble models made by NNs to estimate PIs.

The proposed method is the enhanced version of the non-parametric MOGA-NN method [38],

Uncertainty Treatment: NN Ensembles

59 [39], here extended to build an ensemble of MLP NNs as base learners. We then apply this method to the problem of short-term wind speed prediction.

We propose two strategies for the construction of the NN ensemble, differing in the partitioning or not of the training dataset, and embedding the k-nearest neighbors (k-nn) approach in the aggregation phase for the identification of the neighborhoods of a test pattern [157], [161]. The first strategy splits the training dataset into sub-sets with an equal number of samples and, then, each individual NN is trained on a different sub-training set; the second strategy, instead, uses the same training dataset (the entire dataset) for the training of each individual NN. The two methods differ also in the combination method of the individual NNs outputs. Note that in method 2, we obtain an overall Pareto front, hereafter called combined Pareto front, which is obtained by applying non-dominated sorting to the Pareto fronts obtained by the training of each network.

Each individual NN in the ensemble is trained independently to minimize the prediction error with respect to the target. We have used the same architecture (i.e. number of hidden neurons) for each individual NN. The number of hidden neurons has been determined by a trial-and error method. In both methods, the validation set has been used to screen the NNs with respect to their performance in PIs estimation on the validation set

On the real data considered as case study, both methods have obtained superior results compared to those yielded from the individual networks selected in the respective ensembles.

Compared to literature methods conceptually and methodologically similar to the present ones, the results obtained show a significant improvement in terms of the quality of the predicted PIs. We can, then, conclude that both ensemble modeling frameworks proposed yield a reliable estimation of the PIs, characterized by a high coverage probability and a small interval size. The reported results demonstrate the practically efficient methods proposed for quantification of uncertainties associated with wind speed prediction.

Conclusions

No documento Modélisation à base de réseaux de neurones dédiés à la prédiction sous incertitudes appliqué aux systèmes (páginas 88-93)