Optimization Performance - Determining the Step Size for the Optimization Methods

5.4 Determining the Step Size for the Optimization Methods

6.1.2 Optimization Performance

In this subsection, the performance of the optimization methods is tested with the Gaussian prior. As the underlying MNP distribution has no effect on the optimal activation placements, the expected reconstruction error is used as the metric of reconstruction quality. This metric is also used with Gaussian priors in Burger et al. (2021). The expected reconstruction error is defined as _N¹√

Φ_A, where N is the parameter describing the reconstruction discretization, which is N = 75 in the experiments conducted in this thesis.

The maximum number of iterations for gradient descent and Newton’s method is chosen as N_iter = 50. The stopping condition threshold is set to ϵ = 10⁻⁵. For Newton’s method, the parameters used when modifying non-positive definite matrices are set as ϵ_λ = 10⁻⁵ and ξ = 5·10⁻⁵. As the optimal step size is determined at each iteration based on Wolfe conditions with Algorithm 6, the initial step size is α= 1 at each iteration. The step size determining algorithm is run for a maximum ofN_wolfe = 15 iterations. The scaling parameters are set as β₁ = 10⁻¹⁰ andβ₂ = 0.9.

The small value for β₁ means that step sizes that give even a small decrease in function value are accepted. This decision is made based on Figure 9, which shows many peaks and valleys in the objective function. A larger upper bound for the step size should allow the optimization algorithms to jump between the different peaks and valleys more robustly.

The expected reconstruction errors acquired with gradient descent and Newton’s method with respect to the optimization iterations are shown in Figure 11. In the figure, the expected reconstruction errors correspond to the means of 10 runs of the algorithms. Each run uses a different random initialization for the activations. The optimal activation placements suggested by gradient descent in one of these runs can be seen in Figure 12. The target used is the Shepp-Logan phantom. However, recall that the underlying MNP distribution has no effect on the optimal activation placements.

Figure 11: The expectedL²(Ω) reconstruction errors with respect to optimization iterations for gradient descent and Newton’s method. The errors correspond to the means of 10 separate runs of the optimization algorithms. The Shepp-Logan phantom is used in this experiment.

As can be seen from the Figure 11, Newton’s method is outperformed by gradient descent. This seems counter-intuitive, as Newton’s method is a second-order method, while gradient descent is a first-order method. The reason behind this behavior is that there is an issue with Hessian matrices that are not positive definite. In order to make such Hessians positive definite, a scalar multiple of the identity matrix is added to them. The size of the scalar multiple needed depends on the number of optimized parameters. Having more optimized parameters necessitates the use of a larger scalar multiple, which makes the search directions noisy. This is not an issue with online optimization, as only one activation is optimized at once. Regardless, the use of the optimization methods seems to provide an improved experimental design in comparison to having only randomized activation placements. The initial activation positions correspond to the randomized activation placements. Thus, the corresponding expectedL²(Ω) reconstruction error can be seen in Figure 11 at iteration zero. The running times of the algorithms can be seen in Table 1. These times correspond to a single run of the optimization algorithms.

Figure 12: The optimal activation placements suggested by gradient descent.

Method Prior Running Time

Gradient Descent Gaussian 3125 s Newton’s Method Gaussian 8687 s

Table 1: The running times of the optimization algorithms with the Gaussian prior.

With the Gaussian prior, the activations are optimized simultaneously. Hence, the Hessian matrix requires the calculation of 3×N_a = 48 second-order partial derivatives. This starts to dominate the computational complexity. Thus, one iteration of Newton’s method is much more expensive, than an iteration of gradient descent. With the chosen stopping condition, both optimization algorithms ran the full 50 iterations. Newton’s method did not converge in less iterations due to the noisy search directions, which result from the modification of Hessians that are not positive definite. Therefore, gradient descent is almost three times as fast as Newton’s method, when the Gaussian prior is used.

The reconstructions of the three phantoms with the Gaussian prior can be seen in Figure 13. These reconstructions were created by using the activation placements suggested by Newton’s method. The reconstruction of the letter P has the best quality when it is compared to the underlying MNP distribution. The shape of the letter is clearly seen in the figure. For the Shepp-Logan phantom, the overall shape is barely retained. In addition, the edges become blurry and practically all of the interior features are lost in the reconstruction. With the Gaussian prior, the reconstruction of the phantom representing a tumor is also of low quality. While the

reconstruction has the shape of a disk, the vein is lost. The reconstruction implies that there are MNPs in the area where the vein is, which is not the case. As a side note, while the optimization methods provide an improved experimental design over a randomized one, the visual difference is not significant between the reconstructions acquired with a randomized setup and the reconstructions in Figure 13.

(a) Letter P Reconstruction (b) Shepp-Logan Reconstruction

Figure 13: The reconstructions of the three phantoms with the Gaussian prior.

There is one key feature that can be seen when the reconstructions in Figure 13 are compared to the corresponding underlying MNP distributions in Figure 5; the bulk of the edges are lost in the reconstructions. A Gaussian prior does not promote edges. Instead, it smooths them out. This results from the fact that with a Gaussian prior, smooth MNP distributions are favored. Hence, there is a lot of uncertainty around the edges in the reconstructions. This motivates the use of edge-promoting priors, when the inverse problem of MRXI is considered.

No documento Bayesian Experimental Design for Magnetorelaxometry Imaging (páginas 51-54)