Optimization Performance - Online Optimization

6.2 Online Optimization

6.2.2 Optimization Performance

with many activations, Newton’s method indeed converges to a local optimum. An improvement suggestion to the optimization process would be to use heuristic choices for the initial activation placements. Additionally, if the computational cost of the optimization procedure is of lower importance, an exhaustive algorithm could be used instead of the ones considered in this thesis. An exhaustive algorithm considers all possible activation positions (with some discretization) and then chooses the position that achieves the smallest objective function value.

(a) TV Prior

(b) Perona-Malik Prior

Figure 16: The relative L²(Ω) reconstruction errors with the TV prior and the Perona-Malik prior. The errors correspond to the means of 10 separate runs of the optimization algorithms. The target MNP distribution is the Shepp-Logan phantom.

According to Figure 16, gradient descent performs almost as well as Newton’s method. However, Newton’s method has one superior quality over Gradient descent, when the edge-promoting priors are used. It has a significantly smaller running time.

The running times of the optimization algorithms with the TV prior and the Perona- Malik prior can be seen in Table 2. They were measured during the reconstruction error experiment and they correspond to a single run of the optimization algorithms.

(a) TV Prior (b) Perona-Malik Prior

Figure 17: The optimal activation placements suggested by Newton’s method with the edge-promoting priors.

Method Prior Running Time

Gradient Descent TV 1699 s

Newton’s Method TV 844 s

Gradient Descent Perona-Malik 1560 s Newton’s Method Perona-Malik 875 s

Table 2: The running times of the optimization algorithms with the two edge- promoting priors.

The computational cost of one step of Newton’s method is higher than a step of gradient descent. However, when the activations are optimized sequentially, the number of optimized parameters is only two. Hence, the calculation of the Hessian matrix does not notably dominate the optimization process and the cost of optimization steps is not much higher in comparison to gradient descent. Furthermore, as gradient descent is a first-order method, a second-order method should converge in fewer iterations. In the considered setting of Bayesian experimental design for MRXI, this is indeed the case. Consequently, as can be seen from the running times presented in Table 2, the optimization process with Newton’s method is approximately twice as fast in comparison to gradient descent. As a side note, the choice of the prior has little effect on the running time of the optimization algorithm.

In Figure 18, the reconstructions of the three phantoms with the TV prior can be seen after a full set of 16 activations. The reconstructions are of better quality, when compared to the ones acquired with the Gaussian prior (refer to Figure 13).

For example, the shape of the letter P is much more defined. Furthermore, the interior features of the Shepp-Logan phantom, which were lost with the Gaussian prior, are now more present in the reconstruction. The most drastic improvement

can be seen in the tumor phantom reconstruction. With the Gaussian prior, the vein is not detected. When the TV prior is used, the vein is clearly present in the reconstruction.

(a) Letter P Reconstruction (b) Shepp-Logan Reconstruction

Figure 18: The reconstructions of the three phantoms with 16 activations and the TV prior.

The reconstructions with the Perona-Malik prior are presented in Figure 19.

Again, the reconstruction quality is an improvement over the results acquired with the Gaussian prior. The shapes of the letter P and the Shepp-Logan phantom are reproduced more accurately. For example, the edges of the Shepp-Logan phantom are more defined. In addition, the reconstruction of some of the inner features is even better than with the TV prior. Finally, the vein in the tumor phantom is not lost, when the Perona-Malik prior is used.

(a) Letter P Reconstruction (b) Shepp-Logan Reconstruction

Figure 19: The reconstructions of the three phantoms with 16 activations and the Perona-Malik prior.

The relative L²(Ω) reconstruction errors for the TV prior and the Perona-Malik prior implied that the TV prior achieves a better reconstruction quality with the Shepp- Logan phantom. When a visual comparison is made between the reconstructions in Figures 18 and 19, this quality difference is much more difficult to determine.

However, the reconstructions of the other two phantoms seem better with the TV prior. In any case, the reconstructions acquired by using the edge-promoting priors are better than the ones acquired with the Gaussian prior. The most significantly improved characteristic is edge detection. While the Gaussian prior mainly smoothed out the edges, in Figures 18 and 19, many of the edges are retained. Therefore, the

supposed edge-promoting characteristic of the TV prior and the Perona-Malik prior is verified by the experiments.

7 Concluding Remarks

In this thesis, Bayesian experimental design for magnetorelaxometry imaging was considered. Magnetorelaxometry imaging is a noninvasive technique that can be used to determine the density of magnetic nanoparticles in a subject. It has applications for example in cancer treatment. Determining the density of magnetic nanoparticles requires solving a linear inverse problem. In the majority of literature on the topic, the corresponding inverse problem is solved with classical methods. In contrast, this thesis utilized the probabilistic approach. Additionally, there exists virtually no previous research on Bayesian experimental design for magnetorelaxometry imaging.

In this thesis, the optimal experimental design was determined with respect to the placements of the activations. The numerical experiments conducted used A- optimality as the optimality criterion. It is important to mention that extending the experiments to use D-optimality is not mathematically difficult. The inclusion of D-optimality was omitted for the purpose of keeping the scope of the experiments reasonable.

When the probabilistic approach is used for solving the inverse problem of magnetorelaxometry imaging, choosing a suitable prior is crucial. Because of this, three different prior models were evaluated. The first prior was Gaussian. In this case, the underlying magnetic nanoparticle distribution has no effect on the optimal design parameters. As a result, the optimization process can be implemented offline, i.e., without taking any measurements from the subject. Obviously, this is not a desired characteristic. In order to incorporate the measurements into the model, two edge-promoting priors were also considered. The priors in question are the total variation prior and the Perona-Malik prior. Out of the box, there are some issues with these priors. For instance, the total variation prior is not differentiable at zero. This can be resolved with a slight modification of the prior. From the computational standpoint, the most considerable drawback of these priors is that neither of them is Gaussian. Hence, the direct use of A- or D-optimality as the optimality criterion becomes inapplicable. To overcome this issue, the priors can be approximated iteratively with lagged diffusivity iteration. As a result, a maximum a posteriori estimate can be derived. However, the measurements still affect the posterior distribution with the total variation prior and the Perona-Malik prior.

Therefore, the optimization procedure has to be implemented online.

As A- and D-optimality can be used as the optimality criterion with the three priors, by differentiating the corresponding objective function, the optimal design parameters can be determined by resorting to some optimization algorithm. The first and second-order partial derivatives for A-optimality were derived in this thesis.

In addition, the first-order partial derivatives of the system matrix related to magnetorelaxometry imaging were provided. The nature of the forward model allows the partial derivatives to be calculated explicitly. With these results, gradient descent can be implemented to find the optimal design parameters, namely, the location and orientation of the activations. For Newton’s method, the second-order partial derivatives of the system matrix are still needed. In this thesis, they are not provided explicitly for the purpose of improving readability. In any case, the calculation

of these derivatives is a straightforward task starting from the first-order partial derivatives, which were provided in this thesis.

In the numerical experiments conducted in this thesis, the reconstruction performance was evaluated with the three priors. With the Gaussian prior, the optimization of the different activations was done simultaneously. When the edge-promoting priors are used, the measurements are incorporated into the model. Hence, with these priors, the activations were optimized sequentially. The Gaussian prior has difficulties detect- ing edges, as it prefers smooth distributions. In contrast, with the edge-promoting priors, the edges are seen more prominently in the reconstructions. Furthermore, the overall reconstruction quality with the edge-promoting priors is superior. The total variation prior was determined to achieve the highest reconstruction quality in the experiments. In addition, the derivative-based optimization methods achieved an improved reconstruction quality over randomized activations. In addition to the randomized activation initialization, a heuristic one was considered with the edge- promoting priors. With this initialization, the initial location of each activation was chosen to be as far from the previously optimized activation as possible. Using this initialization did not provide notable improvements with respect to reconstruction quality.

The benefit of using Newton’s method over gradient descent with respect to reconstruction quality seemed only minor with the Perona-Malik prior. With the TV prior, the benefit was slightly more noticeable. The most significant advantage of using Newton’s method with the edge-promoting priors is that it was approximately twice as fast as gradient descent in the numerical experiments. The reason for this is that Newton’s method converged in considerably fewer iterations. Furthermore, the difference in computational cost for an optimization step is not much different between the two methods, when the activations are optimized sequentially. This is not the case for simultaneous optimization, where gradient descent is faster. In that setting, the computational cost of calculating the inverse of the Hessian matrix for Newton’s method starts to slow down the optimization. Also, the simultaneous optimization approach made the search directions for Newton’s method noisy. This was a result of the method used in modifying non-positive definite Hessian matrices to be positive definite. Therefore, gradient descent achieved improved performance over Newton’s method with respect to the expected reconstruction quality, when the Gaussian prior was used. These results imply that the Newton’s method implementation used with the Gaussian prior should be improved.

One key issue with gradient descent and Newton’s method is that the solution given by them is not necessarily the global optimum. In this thesis, some analysis was conducted on the objective function corresponding to A-optimality. The results of the analysis suggest that there are many local optima in the objective function.

In order to find the global optimum, an exhaustive search algorithm could be used instead of the derivative-based algorithms considered in this thesis. However, the major drawback of such an approach is its computational cost. In future research, an exhaustive implementation could be compared with the optimization algorithms tested in this thesis.

Another thing that could be taken into consideration in future research is to not

restrict the instrument domain to a circle. For example, if the region of interest is a square, the instrument domain could be a slightly larger square. This way, the instruments can be placed much closer to the corners of the region of interest, which should improve the overall reconstruction quality. Finally, the two-dimensional experiments conducted in this thesis could be extended to the three-dimensional space, and for realistic measurement geometries.

References

Abadir, K. M. & Magnus, J. R. (2005). Matrix algebra. Cambridge, England:

Cambridge University Press. 466 p. ISBN 9780521537469.

Alexanderian, A. (2021). ‘Optimal experimental design for infinite-dimensional Bayesian inverse problems governed by PDEs: A review’. Inverse Problems. Vol.

11:3. 043001 .

Alexanderian, A., Gloor, P. J. & Ghattas, O. (2016). ‘On Bayesian A-and D- optimal experimental designs in infinite dimensions’. Bayesian Analysis. Vol. 11:3.

pp. 671–695.

Armijo, L. (1966). ‘Minimization of functions having Lipschitz continuous first partial derivatives’. Pacific Journal of mathematics. Vol. 16:1. pp. 1–3.

Arridge, S., Betcke, M. & Harhanen, L. (2014). ‘Iterated preconditioned LSQR method for inverse problems on unstructured grids’. Inverse Problems. Vol. 30:7.

075009 .

Braess, D. (2001). Finite elements: Theory, fast solvers, and applications in solid mechanics. 2nd edn. Cambridge, England: Cambridge University Press. 370 p.

ISBN 9780521011952.

Brown Jr, W. F. (1963). ‘Thermal fluctuations of a single-domain particle’. Physical Review. Vol. 130:5. pp. 1677–1686.

Burger, M., Hauptmann, A., Helin, T., Hyvönen, N. & Puska, J.-P. (2021).

‘Sequentially optimized projections in x-ray imaging’. Inverse Problems. Vol. 37:7.

075006 .

Canny, J. (1986). ‘A computational approach to edge detection’. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. PAMI-8:6 pp. 679–698.

Chaloner, K. & Verdinelli, I. (1995). ‘Bayesian experimental design: A review’.

Statistical Science. Vol. 10:3. pp. 273–304.

Crevecoeur, G., Baumgarten, D., Steinhoff, U., Haueisen, J., Trahms, L. & Dupré, L.

(2012). ‘Advancements in magnetic nanoparticle reconstruction using sequential activation of excitation coil arrays using magnetorelaxometry’. IEEE Transactions on Magnetics. Vol. 48:4. pp. 1313–1316.

Dobson, D. C. & Santosa, F. (1996). ‘Recovery of blocky images from noisy and blurred data’. SIAM Journal on Applied Mathematics. Vol. 56:4. pp. 1181–1198.

Dobson, D. C. & Vogel, C. R. (1997). ‘Convergence of an iterative method for total variation denoising’. SIAM Journal on Numerical Analysis. Vol. 34:5.

pp. 1779–1791.

Föcke, J., Baumgarten, D. & Burger, M. (2018). ‘The inverse problem of magnetorelaxometry imaging’. Inverse Problems. Vol. 34:11. 115008 .

Griffiths, D. J. (2017). Introduction to electrodynamics. 4th edn. Cambridge, England: Cambridge University Press. 620 p. ISBN 9781108420419.

Groetsch, C. W. (1993). Inverse problems in the mathematical sciences. Wiesbaden, Germany: Vieweg+Teubner Verlag. 154 p. ISBN 9783322992048.

Hanson, J. D. & Hirshman, S. P. (2002). ‘Compact expressions for the Biot–Savart fields of a filamentary segment’. Physics of Plasmas. Vol. 9:10. pp. 4410–4412.

Helin, T., Hyvönen, N. & Puska, J.-P. (2022). ‘Edge-promoting adaptive Bayesian experimental design for X-ray imaging’. SIAM Journal on Scientific Computing.

Vol. 44:3. pp. B506–B530.

Henderson, H. V. & Searle, S. R. (1981). ‘On deriving the inverse of a sum of matrices’. Siam Review. Vol. 23:1. pp. 53–60.

Kaipio, J. & Somersalo, E. (2005). Statistical and computational inverse problems.

New York, United States: Springer. 339 p. ISBN 9780387220734.

Kaipio, J. & Somersalo, E. (2007). ‘Statistical inverse problems: discretization, model reduction and inverse crimes’. Journal of Computational and Applied Mathematics. Vol. 198:2. pp. 493–504.

Kearfott, B. (1978). ‘A proof of convergence and an error bound for the method of bisection in Rⁿ’. Mathematics of Computation. Vol. 32:144. pp. 1147–1153.

Kötitz, R., Fannin, P. & Trahms, L. (1995). ‘Time domain study of Brownian and Néel relaxation in ferrofluids’. Journal of Magnetism and Magnetic Materials. Vol.

149:1-2. pp. 42–46.

Kötitz, R., Matz, H., Trahms, L., Koch, H., Weitschies, W., Rheinlander, T., Semmler, W. & Bunte, T. (1997). ‘SQUID based remanence measurements for immunoassays’. IEEE Transactions on Applied Superconductivity. Vol. 7:2.

pp. 3678–3681.

Krishnamoorthy, A. & Menon, D. (2013). Matrix inversion using cholesky decompo- sition. in‘2013 signal processing: Algorithms, architectures, arrangements, and applications (SPA)’. IEEE. pp. 70–72.

Kullback, S. & Leibler, R. A. (1951). ‘On information and sufficiency’. The Annals of Mathematical Statistics. Vol. 22:1. pp. 79–86.

Liebl, M., Steinhoff, U., Wiekhorst, F., Haueisen, J. & Trahms, L. (2014). ‘Quanti- tative imaging of magnetic nanoparticles by magnetorelaxometry with multiple excitation coils’. Physics in Medicine & Biology. Vol. 59:21. pp. 6607–6620.

Liebl, M., Wiekhorst, F., Eberbeck, D., Radon, P., Gutkelch, D., Baumgarten, D., Steinhoff, U. & Trahms, L. (2015). ‘Magnetorelaxometry procedures for quantitative imaging and characterization of magnetic nanoparticles in biomedical applications’. Biomedical Engineering/Biomedizinische Technik. Vol. 60:5. pp. 427–

443.

Moroz, P., Jones, S. K. & Gray, B. N. (2002). ‘Tumor response to arterial embolization hyperthermia and direct injection hyperthermia in a rabbit liver tumor model’. Journal of Surgical Oncology. Vol. 80:3. pp. 149–156.

Néel, L. (1949). ‘Théorie du traînage magnétique des ferromagnétiques en grains fins avec applications aux terres cuites’. Ann. Géophys. Vol. 5. pp. 99–136.

Nocedal, J. & Wright, S. J. (1999). Numerical optimization. 2nd edn. New York, United States: Springer. 664 p. ISBN 9780387303031.

Novak, E. & Ritter, K. (1996). ‘High dimensional integration of smooth functions over cubes’. Numerische Mathematik. Vol. 75:1. pp. 79–97.

Pankhurst, Q. A., Connolly, J., Jones, S. K. & Dobson, J. (2003). ‘Applications of magnetic nanoparticles in biomedicine’. Journal of Physics D: Applied Physics.

Vol. 36:13. R167 .

Perona, P. & Malik, J. (1990). ‘Scale-space and edge detection using anisotropic diffusion’. IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol.

12:7. pp. 629–639.

Pohjavirta, O. (2021). Optimization of projection geometries in x-ray tomography.

Master’s thesis. Aalto University, School of Science. Espoo. 71 p.

Rasmussen, C. E. & Williams, C. K. (2006). Gaussian processes for machine learning. Cambridge, United States: MIT Press. 272 s. ISBN 9780262182539.

Rudin, L. I., Osher, S. & Fatemi, E. (1992). ‘Nonlinear total variation based noise removal algorithms’. Physica D: Nonlinear Phenomena. Vol. 60:1-4. pp. 259–268.

Schier, P., Coene, A., Jaufenthaler, A. & Baumgarten, D. (2021). ‘Evaluating selection criteria for optimized excitation coils in magnetorelaxometry imaging’.

Physics in Medicine & Biology. Vol. 66:23. 235001 .

Schier, P., Liebl, M., Steinhoff, U., Handler, M., Wiekhorst, F. & Baumgarten, D.

(2020). ‘Optimizing excitation coil currents for advanced magnetorelaxometry imaging’. Journal of Mathematical Imaging and Vision. Vol. 62:2. pp. 238–252.

Shannon, C. E. (1948). ‘A mathematical theory of communication’. The Bell System Technical Journal. Vol. 27:3. pp. 379–423.

Shepp, L. A. & Logan, B. F. (1974). ‘The Fourier reconstruction of a head section’.

IEEE Transactions on Nuclear Science. Vol. 21:3. pp. 21–43.

No documento Bayesian Experimental Design for Magnetorelaxometry Imaging (páginas 58-70)