Theoretical Overview
2.3 Surrogate Models
2.3.1 Sampling Techniques
Sampling strategies evolved in the past years, consequence of the increasing in using these strategies to explore an input space. In the beginning, experiments were planned by using the one factor or variable factor at a time approach or by using a trial and error approach [73]. These strategies were too expensive and the latter did not even guarantee to find the best sample points location [37].
Fisher [72] introduced the Design of Experiments to plan the experiments in order to understand the behaviour of agricultural crops systems. He defined three basic principles, randomization, replication and blocking [37]. These new strategies were planned when only physical experiments (which can be biased) were made. With the emergence of computational experiments, the need of replication was lost, since simulation-based experiments are deterministic [73].
Montgomery [73] outlined these three basic principles of DOE:
• Randomization - The order of the runs and each run material experiment have to be randomly determined. This principle ensures an alleviation of bias by external factors.
• Blocking - Is a technique that reduces the unexplained variability, i.e, the variability that cannot be overcome is mixed with an interaction to eliminate its influence on the experiment. This is done
by using groups/blocks of runs that have similar nuisance factors (that are not the scope of the experiment).
• Replication - This concept has two properties. First, it allows for a determination of an eventual ex- perimental error. Second, by using the mean of one parameter, replication gives the experimenter the chance to obtain a more precise value for the real response of the parameter.
The DOE sampling strategies can be classified into two main groups [73]:
• Classical DOE – Examples: Full/fractional factorial design, Central Composite Design (CCD), Box- Behnken design and Orthogonal Arrays (OA). The first is explained below and the others can be found in [73];
• Modern DOE – Main groups: Random designs, Quasi-random designs, Projection-based designs, Uniform designs, Miscellaneous designs and Hybrid designs [73].
2.3.1.1 Classical Design of Experiments
Classic DOE are sampling techniques that assume an uniform distribution of training points. This en- sures a fixed number of samples and so the influence of the random error term in posterior simulations is minimized. In this category, the Full and Fractional Factorial Designs will be presented.
Full and Fractional Factorial Designs
In a Full Factorial (FF) design, in each run of the experiment, every possible combination of factor levels has to be tested. For example, if the problem has two factors, one withxlevels and another withylevels, in each run there arexypossible combinations.
This sampling technique can be implemented in Python, using the Surrogate Modelling Toolbox referred in [74]. The number of levels used in industrial experiments is commonly 2 or 3, i.e, 2-levels (2k) and 3-levels (3k) factorial designs, wherekis the number of factors/design variables. A geometrical view of a23full factorial design is presented in Figure 2.9.
Figure 2.9: 23Full-factorial design geometrical view [73]
These designs are used when the optimizer desires to fully explore the design space. However they have some limitations [37]:
• The computational cost is high because the size of the experiment depends on the number of design variables.
• The response of a 2-levels full factorial design is approximately linear over the range of the design variables.
As a way to reduce the size of the experiment, and if the designer knowsa priori that certain high-order interactions are negligible, instead of running the whole experiment, only a fraction of it may be tested.
Thus, the relevant information can still be obtained and the time to run the experiment is reduced.
These designs are known as fractional factorial designs and are usually represented by 2(k−p), for a 2-levels fractional factorial design with k design variables and where1/(2p)represents the fraction of the full factorial design. To generate a fractional factorial design, the experimenter has to choose the criteria to select the fractions. The most used ones are the highest resolution possible and the minimum aberration [73].
2.3.1.2 Modern Design of Experiments
Modern DOE are also known as space-filling or exploratory designs. These DOE have been replacing the classical ones mainly because of their expensive cost. These techniques tend to locate more training points in the interior of the design space [73].
Random Designs
The methods in this category generate training points which have an equal probability to appear in any samplenfrom a populationN. [87]
The most common methods from this group are the Monte Carlo methods. Such methods include for example the Markov Chain Monte Carlo (MCMC) samplings, Gibbs sampling and the Metropolis Hast- ings sampling [73].
Monte Carlo Methods
These methods are used to evaluate mathematical/physical expressions through the use of experiments with random numbers. [87]
The methods can only be used when there is no missing data because they tend to show poor coverage regions in high dimensions. They are advantageous when used in complex systems, particularly with a high level of simultaneous correlations among the design variables. [37]
Latin Hypercube Sampling
Latin Square isN×N grid containing sample points in which there is only one point per each row and column. Latin Hypercube Sampling (LHS) [37] is a generalization of this concept to higher dimensions.
This design is a variant of stratified random samplings, which uses a constraint Monte Carlo sampling to solve the uncertainty quantification of some problems.
This is the most used simulation-based design in aerodynamics because of its flexibility. However it
has some disadvantages, for example it may not always be space-filling. This drawback can be solved by using it in combination with orthogonal arrays, suitable space-filling criteria or by the construction of another LHS within small regions of the design space.
2.3.1.3 Hybrid Designs
Hybrid designs are a combination of some techniques aforementioned, either classical or modern, to overcome their individual drawbacks. There is a great effort to improve LHS and so there are several hybrid designs which are based on it such as Nearly Orthogonal Latin Hypercubes and Minimax/Maximin Latin Hypercubes [37].
The maximin LHS uses space-filling metrics to provide a good cover of the design space. The metric is based on the pairwise distances between any two points (xand x0) of the sampling plan. All the distances, equation (2.28), are calculated based on the Euclidean norm [86].
d(x, x0) = v u u u t
Pk i=1
xi−x0i ximax−ximin
2
k (2.28)
wherekis the number of dimensions andximaxandximinare the maximum and minimum values of the ithdimension ofx.
Using the maximin criterion, a set of different sampling plans is built, and then, the one that maximizes the minimum distance between every two points is selected.
2.3.1.4 Adaptive sampling
The aforementioned techniques are used to generate a sampling plan which is static, i.e which does not change during the evolution of an optimization. However, some points can be added to some areas of interest, for example where the objective function is expected to be improved or where there are few sample points. When points are added to the initial sampling plan, according to some criterion at choice, the sampling is called adaptive. Some criteria that can be used for this purpose are explained in section 2.4.