Optimization methods applied to Stellar Astrophysics

(1)

Optimization

methods applied

to Stellar

Astrophysics

Sérgio Filipe Assunção Batista

Dissertação de Mestrado apresentada à

Faculdade de Ciências da Universidade do Porto em

Engenharia Matmática

2013

O pti mi z ati on met ho ds ap pli ed t o S tell ar A strop h y sics S ér g io F ilipe A ss u n çã o Bat ista FCUP 2013 CICLO

(2)

Optimization

methods applied

to Stellar

Astrophysics

Sérgio Filipe Assunção Batista

Departamento de Matemática 2013

Supervisors:

Sérgio António Gonçalves de Sousa, Investigador, Centro de Astrofísica da Universidade do Porto

Nuno Cardoso Santos, Investigador Coordenador, Faculdade de Ciências da Universidade do Porto

(3)

(4)

(5)

Special Thanks

I firstly would like to thank my parents, grandparents, uncles and my girlfriend for all the support that they gave me and also for their patience and help.

I also would like to thank my supervisor Dr. Sérgio Sousa and my cosupervisor Dr. Nuno Santos for having accepted me as a masters student and for all the support and help given during the work of this thesys. Due to both my supervisors, it was possible during this work to learn how to program in C language and also to learn more about optimization methods and strategies. I am also thankfull to the fellowship grant CAUP2012-04UnF-BI, founded under the FP7 through Starting Grant agreement number 239953, supported by the European Research Council/European Community. I am also thankfull to had the opportunity of giving two talks in conferences, during this work.

I also would like to thank Prof. Dr. João Nuno Tavares for the help and disponibility of helping in the search for optimization methods.

(6)

Abstract

The analysis of spectroscopic data of solar type stars provides a powerful tool to derive stellar parameters, such as effective temperature, surface gravity, microturbulence or metallicity. This can be done through the measurement of Fe I and Fe II weak absorption lines, which are then used to compute abundances, assuming local thermodynamics equilibrium. In this work, the derivation of stellar parameters is done by forcing excitation and ionization equilibrium of iron. The knowledge of these parameters is of crucial importance in several hot topics in astrophysics, such as the characterization of the solar neighborhood, the study of exoplanets, the composition and kinematics of our Milky Way or even for asteroseismology.

The process implemented to derive the stellar parameters is based on the comparison between observations and theoretical models through an iterative process, which stops when the most appropriated model to fit the observations is found.

In this work, I aim to identify the best optimization method which can be applied to the described problem. For this purpose, I will implement and test several optimization methods. In addition, I will derive stellar parameters using all the implemented methods to test their efficiency and to compare the obtained results with the tabulated values.

(7)

Resumo

A análise espectral de estrelas do tipo solar providencia uma ferramenta ponderosa para determinar parâmetros estelares, tais como a temperatura efectiva, gravidade à superfície da estrela, microturbulência ou a metalicidade. Estas determinações são efectuadas a partir da medição das larguras equivalentes das linhas espectrais de ferro neutro e ferro ionizado. Estas medições, são posteriormente utilizadas para computacionar as abundâncias, assumindo um regime de equilíbrio termodinâmico local. Por conseguinte, neste trabalho, a determinação dos parâmetros estelares é efectuada forçando um equilíbrio de excitação e de ionização do ferro. A determinação de parâmetros estelares é de importância crucial em diversas áreas da astrofísica, tais como a caracterização das estrelas presentes na vizinhança do Sol, estudar a composição e cinemática da nossa galáxia, ou até mesmo em áreas como a asterosismologia.

O processo implementado para determinar os parâmetros estelares é baseado na comparação entre as observações e modelos teóricos, através de um processo iterativo. Este processo termina quando se encontra o modelo que melhor se ajusta ás obervações.

Neste trabalho, pretendo identificar o melhor método de minimização que possa ser aplicado ao problema acima descrito. Para tal, irei implementar e testar alguns métodos de otimização. Igualmente, irei efectuar a determinação de parâmetros estelares de diversas estrelas, utilizando todos os métodos estudados e testar a sua eficiência. Por outro lado, irei também comparar os valores dos parâmetros estelares determinados através dos métodos implementados neste estudo com valores tabelados.

(8)

Indices

Page: Indices of Figures……….………..………..…..9 Indices of Tables………...15 Indices of Abbreviations………...17 Introduction……….…………..……….19

1. Fundaments of stellar spectroscopy……….………..……..21

1.1. Basic Concepts of Spectroscopy……….……...………..22

1.1.1. The physics of spectral lines ……….23

1.1.2. Origin of stellar spectra………25

1.1.3. Spectral Sequence: The classification of stars………29

1.2. A spectroscopic method to derive stellar parameters……….30

1.2.1. MOOG………32

2. Deterministic Optimization methods………..35

2.1. Direct search methods……….37

2.1.1. Hooke and Jeeves method……….38

2.1.2. Downhill Simplex method………39

2.1.3. Rosenbrock’s method………..43

2.2. Conjugate direction methods: Powell’s quadratically convergent method..45

2.3. Conjugate Gradient methods: The Fletcher-Reeves algorithm……….48

2.4. Quasi-Newton Methods………...50

(9)

Page:

3. Stochastic Optimization methods………...53

3.1. Simulated Annealing………...…….54

3.1.1. Modified Downhill Simplex method with a cooling scheme…...56

3.2. Genetic Algorithms………...57

3.3. Particle Swarm Optimization………..60

4. Implementation procedure of optimization methods and Results……….65

4.1. Objective Function and the Amoeba implementation default version……..65

4.2. Sample of spectroscopic parameters for 451 stars in the HARPS GTO planet search program……….………68

4.3. Implementation of deterministic methods and Results………...69

4.3.1. Amoeba……….…….70

4.3.2. Amebsa……….……….76

4.4. Implementation of stochastic methods: Particle Swarm Optimization (PSO) and Results………81

4.5. Implementation of a combination between the PSO and some deterministic methods and Results………...88

4.5.1. PSO and Amoeba………...……….90

4.5.2. PSO and Amebsa……….92

4.5.3. PSO and Powell conjugate directions method………...….94

4.6. Analysis of convergence rates and times……….94

4.6.1. Performance of the Amoeba (Fortran) and Amoeba (C)…………95

(10)

Page: 5. Conclusions………..………...101 References………...103 Annexes……….….….107 A. Glossary………..………...107 B. Tables of Results………..………111 C. Communications in Conferences………...…149

C.1. Abstract of the oral communication in the Ist Portuguese Meeting on Mathematics for Indrustry (6-8 of June, 2013)……….149

C.2. Abstract of the oral communication in the XXIII ENAA (Encontro Nacional de Astronomia e Astrofísica – 16-17 of July, 2013)…………151

(11)

(12)

Indices of Figures

Fig.1.1 – Zoom of a high-resolution stellar spectrum of the star HD225097, obtained from the ÉLODIE catalogue, available online at http://atlas.obs-hp.fr/elodie/. The spectral line Hβ can be observed at around 4860Ǻ...………...Page 22 Fig.1.2 – Example of how stellar spectra are formed. An incandescent light source is represented by the sun dot. By directly looking to the light source, a blackbody spectrum is observed (spectrograph 1). Photons of all wavelengths enter in the box filled with hydrogen. Photons that have energies which do not match the energy differences between the orbits of hydrogen, pass through the box. For example, photons which have wavelengths that correspond to the Balmer series of hydrogen have a high probability of being absorbed. This is observed by spectrograph 2. The dotted line represents the original continuum and shows the evidence of some continuous Balmer absorption. As a result, some Balmer lines and continuum in emission are produced (spectrograph 3)………Page 25 Fig.1.3 – Example of zoom of a high-resolution spectrum of the star HD102117………Page 26 Fig.1.4 – Temperature dependence of sodium lines. It can be observed that as temperature increases, the line strength decreases……….Page 28 Fig.1.5 – Derived profile for the FeI line, located at , for several values of surface gravity. It can be observed the change in the EW of the line, as function of the surface gravity. Model parameters were: and ( ) ………...Page 28 Fig.1.6 – Top: Theoretical curve of growth of a FeI line ( ) as function of the chemical abundance of the absorbing species. The letter “A” represents the total number of iron abundance as a function of hydrogen. Bottom: Profile dependence of a FeI line ( ) as function of the chemical abundance of the absorbing species. The dots represented in the curve of growth (top panel) correspond to the profile below (bottom panel). Parameters of the photospheric model: =0.87 and

(13)

Fig.1.7 – Top: Plot of the FeI lines abundances from individual lines versus the excitation potential. Bottom: Plot of the FeI lines abundances from individual lines versus the reduced EW’s. It also contains information about the stellar model atmosphere used in this computation. The abundance units of both vertical axis are logarithmic number densities on a standard scale in which ( ( )) ………..Page 33 Fig.2.1 – Extremes of a objective function at a given interval [ ]. Points A, C and E are illustrative of local maxima. Points B and F are illustrative of local minima. Point G are illustrative of a global maximum and point D is illustrative of the global minimum………..……….Page 36 Fig.2.2 – A sequence of reflections ( ), each of which failed to replace the best vertex , bringing the simplex to its starting sequence………..Page 41 Fig.2.3 – Left: Original simplex with a reflection, expansion and two possible contractions. Right: Shrink step towards the best vertex , when all the other moves failed………..…………...Page 42 Fig.2.4 – Sequence of possible moves executed by the Downhill Simplex method. The initial simplex, here a tetrahedron, is shown on top. Possible simplex moves: (a) reflection away from the highest point. (b) a reflection and expansion away from the highest point. (c) a contraction along one dimension from the highest point. (d) a contraction along all dimensions to the lowest point………....Page 43 Fig.2.5 – Rosenbrock’s method in action………Page 44 Fig.3.1 – Successful running of the GA, for minimizing a 2-dimensional objective function, with a population size of . The initial cluster of possible solutions converged to the global minimum after some iterations…………..……….Page 58 Fig.3.2 – Crossover operator under bit coding. Case A: One crossover point randomly chosen, where the bits appear divided into two sections, after which the chromosomes are interchanged. Case B: Two crossover points randomly chosen, dividing the bits into three sections and where only the middle section is interchanged………Page 59

(14)

Fig.3.3 – Mutation process affecting the third bit for the first child chromosome, in a binary coding………...………Page 60 Fig.4.1 – Flowchart of the Amoeba implementation default version………..Page 67 Fig.4.2 – Histogram of the metallicity distribution of the sample of stars………..Page 69 Fig.4.3 – Flowchart of the new implementation of the Amoeba optimization method………..Page 71 Fig.4.4 – Comparision between the derived stellar parameters with the Amoeba implementation and the tabulated values. Top right: Plot of the derived Teff (in K) versus the tabulated values. Top left: Plot of the derived surface gravity (in cm.s-2) versus the tabulated values. Bottom right: Plot of the derived microturbulence (referred as vt or ξ and listed in km.s-1

) versus the tabulated values. Bottom left: Plot of the derived [Fe/H] values versus the tabulated values. The tabulated values were taken from the publically available database: http://vizier.cfa.harvard.edu/viz-bin/VizieR-3...Page 75 Fig.4.5 – Comparision between the derived stellar parameters with the Amoeba implementation and the tabulated values. Top right: Plot of the derived Teff (in K) versus the tabulated values. Top left: Plot of the derived surface gravity (in cm.s-2) versus the tabulated values. Bottom right: Plot of the derived microturbulence (referred as vt or ξ and listed in km.s-1_{) versus the tabulated values. Bottom left: Plot of the}

derived [Fe/H] values versus the tabulated values. The tabulated values were taken from the publically available database: http://vizier.cfa.harvard.edu/viz-bin/VizieR-3...Page 80 Fig.4.6 – Flowchart of the implementation of the PSO optimization method……Page 83 Fig.4.7 – Comparision between the derived stellar parameters with the Amoeba implementation and the tabulated values. Top right: Plot of the derived Teff (in K) versus the tabulated values. Top left: Plot of the derived surface gravity (in cm.s-2) versus the tabulated values. Bottom right: Plot of the derived microturbulence (referred as vt or ξ and listed in km.s-1

) versus the tabulated values. Bottom left: Plot of the derived [Fe/H] values versus the tabulated values. The tabulated values were taken

(15)

from the publically available database: http://vizier.cfa.harvard.edu/viz-bin/VizieR-3...Page 87 Fig.4.8 – Flowchart of the implementation of the PSO+Amoeba optimization method……….Page 89 Fig.4.9 – Comparision between the derived stellar parameters with the Amoeba implementation and the tabulated values. Top right: Plot of the derived Teff (in K) versus the tabulated values. Top left: Plot of the derived surface gravity (in cm.s-2) versus the tabulated values. Bottom right: Plot of the derived microturbulence (referred as vt or ξ and listed in km.s-1_{) versus the tabulated values. Bottom left: Plot of the}

derived [Fe/H] values versus the tabulated values. The tabulated values were taken from the publically available database: http://vizier.cfa.harvard.edu/viz-bin/VizieR-3...Page 91 Fig.4.10 – Comparision between the derived stellar parameters with the Amoeba implementation and the tabulated values. Top right: Plot of the derived Teff (in K) versus the tabulated values. Top left: Plot of the derived surface gravity (in cm.s-2) versus the tabulated values. Bottom right: Plot of the derived microturbulence (referred as vt or ξ and listed in km.s-1

) versus the tabulated values. Bottom left: Plot of the derived [Fe/H] values versus the tabulated values. The tabulated values were taken from the publically available database: http://vizier.cfa.harvard.edu/viz-bin/VizieR-3...Page 93 Fig.4.11 – Top: Boxplots of the convergence times for the K-, G- and F-type stars, respectively, using the Amoeba implementation. Bottom: Same, but with the Amebsa implementation………Page 98 Fig.4.12 – Right: Plot of the standard deviation (sigma) as function of the convergence time (in minutes), for the stellar parameters derived with the Amoeba C version. Left: Plot of the standard deviation (sigma) as function of the derived Teff (in K) with the

Amoeba C version. In both plots, only the stars that have converged to the solution were considered……….…….Page 99 Fig.4.13 – Right: Plot of the standard deviation (sigma) as function of the convergence time (in minutes), for the stellar parameters derived with the Amebsa C version. Left:

(16)

Plot of the standard deviation (sigma) as function of the derived Teff (in K) with the

Amebsa C version. In both plots, only the stars that have converged to the solution were considered………..……….Page 100

(17)

(18)

Indices of Tables

Table 1.1 – Summary of the relationship between the spectral sequence and the surface temperature of stars……….Page 30 Table 4.1 – Mean convergence times for the six selected stars. The selected stars with Teff lower than 5300 K were: HD10166; HD21209A; HD63454. The selected stars with

Teff higher than 5300 K were: HD142; HD1461; HD16141. Different values of n were

tested. The convergence times are listed in minutes………….………..Page 72 Table 4.2 – Mean convergence times for the selected cooler stars: HD2025; HD10166; HD50590. Different values of the initial temperature (T), different cooling schemes and different values of n were tested. The convergence times are listed in minutes……….Page 77 Table 4.3 – Mean convergence times for the stars: HD142; HD361; HD1581. Different values of the initial temperature (T), different cooling schemes and different values of n were tested. The convergence times are listed in minutes……….……….Page 78 Table 4.4 – Mean convergence times for the selected hotter stars ( ): HD1461; HD3823; HD23456. Different values of n and N were tested. The convergence times are listed in minutes……….Page 86 Table 4.5 – Mean convergence times for the selected cooler stars ( ): HD40105; HD52919; HD283. Different values of n and N were tested. The convergence times are listed in minutes...……….………Page 86 Table 4.6 – Number of stars that have converged and that have not converged to the solution, from the subsample of 149 stars, for the default version and for the new implementation of the Amoeba………..………..Page 95 Table 4.7 – Mean convergence times for the stars that have converged to the solution, from the subsample of 149 stars, for the default version and for the new implementation of the Amoeba method. The mean convergence times are listed in minutes. The ranges of Teff were discussed in section 4.2………..Page 95

(19)

Table 4.8 – Summary of the number of stars that have converged and that have not converged to the solution in the sample of 451 stars, with the different optimization methods………..…….Page 97 Table 4.9 - Summary of the mean convergence times for the stars that have converged to the solution. The stars were divided into three different ranges of effective temperature, following Sousa et al. (2008). The convergence times are listed in minutes. The ranges of Teff were discussed in section 4.2..…..………..Page 97

Table B1 – Derived results of the 451 HARPS GTO program stars with Amoeba (C implementation). The convergence time (Conver. Time) for each star is listed in minutes………...………Page 111 Table B2 – Derived results of the 451 HARPS GTO program stars with Amebsa. The convergence time (Conver. Time) for each star is listed in minutes…...……….Page 118 Table B3 – Derived results of the 451 HARPS GTO program stars with the Particle Swarm Optimization. The convergence time (Conver. Time) for each star is listed in minutes……….………..Page 125 Table B4 – Derived results of the 451 HARPS GTO program stars with the combination of the Particle Swarm Optimization strategy and Amoeba (C version). The convergence time (Conver. Time) for each star is listed in minutes………..….……….Page 132 Table B5 – Derived results of the 451 HARPS GTO program stars with the combination of the Particle Swarm Optimization strategy and Amebsa. The convergence time (Conver. Time) for each star is listed in minutes……….………Page 139 Table B6 – Selected 149 stars from the 451 HARPS GTO sample of Sousa et al. (2008), to test the convergence times of the Amoeba default version and of the Amoeba C version………Page 146

(20)

Indices of Abbreviations

Page:

ARES: Automatic Routine for line Equivalenth Widths………...21

EW: equivalenth width………..……….….….21

IRAF: Image Reduction and Analysis Facility………..26

LTE: local termodynamics equilibrium………..…………....31

CPU: Central processing unit……….………...…….35

BFGS: Broyden-Fletcher-Goldfarb-Shanno minimization method………..……….50

SA: simulated annealing………..……...54

EC: evolutionary computation………..…….….57

GA: genetic algorithms………..…….….57

(21)

(22)

Introduction

At night, several stars can be observed in the sky. Some stars look more reddish and others look more bluish. Since the Antiquity, stars have played an important role for several civilizations. They were used for religious beliefs, navigation or astronomical orientation. Nowadays, the humankind looks to them in a different way. Due to several advances in technology and in the development of instruments, which allows to obtain high resolution spectroscopic data, the desire of classifying stars has been made possible. In this field, spectroscopy plays an important role. It can help us to construct the identity card of each star.

Through spectroscopy it is possible to access the chemical composition of elements present in a star. By analysing spectroscopic data, it is possible to derive stellar parameters, such as effective temperature, surface gravity, microturbulence and metallicity. These parameters will be defined in chapter one. The physical information existing on the shapes and strengths of spectral lines can be used to derive these parameters. The idea behind this approach is to compare theoretical models with the observations to try to find the best fit.

In this work, I aim to try to test and implement some optimization methods and strategies to the described problem. The best method found is aimed to be more efficient and less time consuming than the one which is currently being used.

This thesis is divided into five chapters. In the first chapter, I will provide a brief introduction to some fundaments of spectroscopy and a motivation to the problem. In the second chapter, I will describe several deterministic optimization methods. In the third chapter, I will describe some stochastic methods. In the fourth chapter, I will describe the objective function and the default method currently being used. I will also discuss the implementation and efficiency of the tested methods, in this study. In addition, I will discuss and point out the results obtained by each implemented method. In the fifth chapter, I will point out the conclusions of this work.

(23)

(24)

1. Fundaments of stellar spectroscopy

The past few years have been fruitful in the derivation of stellar parameters of nearby stars [e.g., Santos et al. 2004, Valenti, F. and Fisher, D. 2005, Casagrande, L. et al. 2011, Sousa et al. 2008]. Due to the existence of new spectrographs, it has been possible to obtain larger samples of stellar spectra. As some examples are the following spectrographs: HARPS coupled in the 3.6 meter telescope at La Silla; UVES located at the Nasmyth B focus of UT2; FLAMES coupled in the Very Large Telescope; DEMOS coupled in the Keck Observatory; AAOmega coupled at the Anglo-Australian telescope; HYDRA coupled at the Blanco Telescope of the Cerro Tololo Inter-American Observatory. Nowadays, several surveys to derive stellar parameters have been started available online, such as the following examples: the Gaia-ESO survey committed by the European Southern Observatory [Gilmore, G. et al. 2012]; the APOGEE Survey at the Apache Point Observatory [Allende Prieto et al. 2008]; the RAVE Survey at the Anglo-Australian Observatory [Sterinmetz, M. et al. 2006].

From a stellar spectrum, it is possible to derive its atmospheric parameters, such as effective temperature (Teff), surface gravity (log(g)), microturbulence (ξ) and

abundances of several chemical species [Abibekyan, V. et al. 2012]. Then, these parameters can be used to derive other indirect parameters. Indeed, this technique can be quite as powerful as it is time consuming. The method that will be used in this work, requires precise measurements of equivalent widths (EW’s) for many Fe spectral lines, in solar-type stars. In order to accomplish this task an automatic routine, called ARES (Automatic Routine for line Equivalent widths in stellar Spectra) [Sousa, S. et al. 2007] may be used. Finally, the atmospheric parameters may be derived with the help of the MOOG1 routine. The derivation of stellar parameters is of crucial importance in several hot topics in astrophysics, such as for example, the study and characterization of exoplanets, composition and kinematics of the Milky Way or to derive other indirect stellar parameters (e.g, stellar ages). As one example of the importance of the derivation of stellar parameters check the work of Batista and Fernandes (2012).

Due to the desired necessity of quickly and efficiently derive stellar parameters (meaning effective temperature (Teff), surface gravity (log(g)), metallicity ([X/H]) and

(25)

microturbulence (ξ) of larger and larger amounts of stellar spectra, it was necessary to build automatic tools capable of managing a high quantity of data. However, before going deeply in the description of this problem, it is necessary to firstly introduce some basic concepts of stellar spectroscopy. For this purpose, in the first section of this chapter, I will introduce some basic concepts about stellar spectrum. Then, I will describe the spectroscopic method, which will be applied to derive stellar parameters in this study, and how sensitive are these parameters to several constraints. In this last section of this chapter, I will introduce the MOOG program.

1.1. Basic Concepts of Spectroscopy

Spectroscopy is usually a term usually used to refer to the measurement of radiation intensity as function of the wavelength. Spectroscopic data is represented by a spectrum. In Fig.1.1, I show a high-resolution stellar spectrum of the star HD225097, where it can be observed the presence of some absorption lines. The device to obtain a spectrum is called spectrograph. In order to better understand what absorption lines are, how stellar spectra are originated and how stars are classified, it will be provided a brief description in the next three sections.

Fig.1.1 – Zoom of a high-resolution stellar spectrum of the star HD225097, obtained from the ÉLODIE catalogue, available online

(26)

1.1.1. The physics of spectral lines

According to the laws of quantum mechanics, electrons are not confined to a specific state. They can have different energies in different states. The quantitization of energy of microscopic systems is governed by the laws of quantum mechanics. In the proposed model to describe the atomic structure, an atom is composed by a small positively charged nucleus, surrounded by electrons moving in circular orbits around it. Each of these orbits has energy levels well quantitized, by the main atomic number n. Electrons are not confined to a specific energy level and are allowed to jump between energy levels. Consider, for example, the case of hydrogen, which is composed by one proton and one electron. The quantitization of each energy level is given by:

( ) ( ) In the case of the hydrogen atom, the energy level corresponds to the lowest energy level, also called as the ground state level.

Consider now an example, in which an electron is at the energy level and that suddenly jumps into the energy level . As a consequence of this transition one photon is emitted with an energy exactly equal to the energy difference between these two levels. The wavelength, in which the photon is released, is defined by:

( ) The photon emissions, which are a result of these transitions, are called emission lines.

All transitions, which have the energy level as the ground energy level, belong to the Balmer series. For example, the transition from: to is called Balmer α or Hα; to is called Hβ; to is called Hγ; to is called Hδ.

On the other hand, electrons can also absorb photons. If an electron is at the energy level , and then suddenly absorbs a photon with an energy exactly equal to the energy difference between the levels and , it will jump to the

(27)

corresponding energy level . In this case, an absorption line will appear on the spectrum.

To understand the formation of a spectrum, let’s consider the example given by Kapler, J. (1997). Consider a box filled with a low density hydrogen gas (Fig.1.2). Also, shine the box with a source of continuous radiation, with for example a metal that has been heated to a high temperature. In this case, the produced spectrum can be observed in three different ways: by only looking to the continuous source (Fig.1.2, number 1); by only looking at the box (Fig.1.2, number 2); or by looking to the radiation passing through the box (Fig.1.2, number 3).

Firstly, by looking only into the light source, its spectrum will be similar to the one of a blackbody (Fig.1.2, number 1). A continuum is also represented by the accompanying graphical spectrum. Let us now impose this continuum to the gas contained in the box. The temperature in the box works like a control parameter of the gas speed. As the temperature is increased inside the box, the rate of collisions between the gas particles in the box will increase. In each one of these collisions, an electron can jump from one orbit to another.

By looking at the box (Fig.1.2, spectrograph number 2 – line of sight), the photons whose energies correspond to all the energy differences between level pairs of the elements in the box may be absorbed and preventing them to reach the spectrograph. The rate of electron, which in fact get absorbed and then removed, depends on the number of electrons that have been collisionally raised to the appropriate level. It is also dependent of the length of the crossing path through the box. As a consequence of this phenomenon, an absorption spectrum is originated. In Fig.1.2, number 2, is represented such a spectrum with a broken continuum at the wavelengths of the hydrogen Balmer lines.

On the other hand, by looking only to the radiation that comes from the box alone (which means, looking to the box spectrum alone), the electrons which jumped to higher levels, due to collisions or absorption of photons, will need to jump downwards and will originate an emission-like spectrum (Fig.1.2, number 3).

This analogy of the box of gas can also be reported to explain the formation of stellar spectra. This will be explained in the next section.

(28)

Fig.1.2 – Example of how stellar spectra are formed. An incandescent light source is represented by the sun dot. By directly looking to the light source, a blackbody spectrum is observed (spectrograph 1). Photons of all wavelengths enter in the box filled with hydrogen. Photons that have energies which do not match the energy differences between the orbits of hydrogen, pass through the box. For example, photons which have wavelengths that correspond to the Balmer series of hydrogen have a high probability of being absorbed. This is observed by spectrograph 2. The dotted line represents the original continuum and shows the evidence of some continuous Balmer absorption. As a result, some Balmer lines and continuum in emission are produced (spectrograph 3). Source: Kapler, J. (1997).

1.1.2. Origin of stellar spectra

In the deeper layers of the star, gases under high pressures produce a continuum spectrum, like in the previous case of the incandescent metal of the latter example (Fig.1.2). However, by moving through the upper layers and posteriorly to the atmosphere of the star, the pressure and density drop. Depending on whether an electron jumps to higher levels or it jumps downwards towards the ground state, an absorption or an emission will occur, respectively. If the electron jumps to higher levels of energy, an absorption line will be originated in the stellar spectrum. This fact is also

(29)

more or less similar to what is observed in the previous example (Fig.1.2, number 2), by looking to the radiation that passed through the box.

As Kapler, J. (1997) quotes in his book, the continuum and absorption lines of stellar spectra are indeed created at the same place, but at different depths. Each spectral line is indeed originated at different depths at the stellar atmosphere. As an example, ion spectral lines may be produced at deeper layers at the stellar atmosphere, when compared to those who generate neutral features. Actually, ions require higher temperatures to strip electrons from atoms through collisions. The intensity of each absorption line depends, for example, on the probability of an absorption to occur or on the surface temperature of the star. Some emission lines may also be present in stellar spectra.

In Fig.1.3, I show an example of a high-resolution stellar spectrum of the star HD102117, plotted with the “splot” routine within the echelle package in IRAF2

.

Fig.1.3 – Example of zoom of a high-resolution spectrum of the star HD102117.

2

IRAF is distributed by the National Optical Astronomy Observatories, operated by the Association of Universities for Research in Astronomy, Inc., under contract with the National Science Foundation, USA.

(30)

Absorption lines show several differences in shape and strength according to the physical conditions of the stellar atmosphere. In particular, the EW’s of absorption lines can be modified, for example and in a good approximation, by the effect of the effective temperature, pressure or element abundance. Spectral lines behaviour can be used to interpret the fundamental properties of stars, such as for example, the derivation of effective temperature, surface gravity and chemical composition. Gray, D. (1992) presents a good chapter about the behaviour of spectral lines in his book.

Temperature is the parameter which affects more the line strength, due to the exponential and power dependence with the temperature in the excitation-ionization process. This increase in strength is mainly due to an increase in the excitation rate. In Fig.1.4, I show an example of how the line strength varies from a cold and a hot star.

Pressure can also affect the behaviour of absorption lines. One example is the change in the ratio of line absorbers to the continuous opacity that can be induced (or by other words, ionization equilibrium). In solar-type stars (meaning F-, G- and K-type stars), the pressure dependence can be transformed in gravity dependence. In Fig.1.5, I show a line of FeII as an illustrative example, where it can be observed the gravity dependence of spectral lines in solar-type stars. It can also be observed that as the surface gravity increase, the strength of the FeII absorption lines decreases.

The chemical abundance is the most important factor that affects the line strength. As it is expected, the line strength increases as the chemical abundance increases (Fig.1.6, bottom panel). However, this relation is non-linear. In the top panel of Fig.1.6, I show the typical growth curve. As it can be observed, there are three regimes well defined. In the first regime, the relation between the abundance and the EW’s is linear and is called the weak-line regime. In the second regime, the central depth of the line approaches the maximum value and consequently the line saturates. In this case, the relation between the chemical abundance and the EW’s will grow asymptotically to a constant value. In the third regime, the optical depth of the line wings start to dominate, compared to the absorption of the continuum.

(31)

Fig.1.4 – Temperature dependence of sodium lines. It can be observed that as temperature increases, the line strength decreases. Source: Gray, D. (1992).

Fig.1.5 – Derived profile for the FeI line, located at , for several values of surface gravity. It can be observed the

change in the EW of the line, as function of the surface gravity. Model parameters were: and ( ) . Source:

(32)

Fig.1.6 – Top: Theoretical curve of growth of a FeI line ( ) as function of the chemical abundance of the absorbing species. The letter “A” represents the total number of iron abundance as a function of hydrogen. Bottom: Profile dependence of a FeI line ( ) as function of the chemical abundance of the absorbing species. The dots represented in the curve of

growth (top panel) correspond to the profile below (bottom panel). Parameters of the photospheric model: =0.87 and ( )

cm/s2_{. Source: Gray, D. (1992).}

1.1.3. Spectral sequence: The classification of stars

Stars are classified into spectral types, according to the appearance of their spectrum. The first astronomer to classify stars into spectral types was Angelo Secchi in the 1860s. This is the date usually referred as the birth date of stellar spectroscopy field.

(33)

Nowadays, stars are spectrally classified into the following ordered sequence: OBAFGKMLT. In order to refine the spectral sequence, each spectral type was divided into ten subtypes. The only exception is the O-type stars, which are only divided into five subtypes. The subtypes are indicated by attaching an integer, ranging from 0 to 9, to each spectral type. For example, the Sun is spectrally classified as a G2 type star and its spectrum denotes the presence of calcium and ion lines.

As it was explained in the latter section, the surface temperature of a star affects its spectrum. The OBAFGKMLT spectral sequence is indeed a sequence of temperature. The hottest stars are the O-type stars, while the coolest stars like the M-type stars. M-M-type stars have surface temperatures of around 3000K. In table 1.1, I present a summary of the relationship between the spectral sequence and the surface temperature of stars, the colors of the stars and some spectral features.

Spectral

Class Temperature (K) Spectral lines Color

O 28000 – 35000 Ionized atoms, especially helium Blue-violet

B 10000 – 28000 Neutral helium, some hydrogen Blue-white

A 7500 – 10000 Strong hydrogen, some ionized metals White

F 6000 – 7500 Hydrogen and ionized metals such as

calcium and iron Yellow-white

G 5000 – 6000 Ionized calcium and both neutral and

ionized metals Yellow

K 3500 – 5000 Neutral metals Orange

M 2500 – 3500 Strong titanium oxide and some neutral

calcium Red-orange

Table 1.1 – Summary of the relationship between the spectral sequence and the surface temperature of stars.

1.2. A spectroscopic method to derive stellar parameters

The main stellar atmospheric parameters (meaning Teff, log(g), ξ and [Fe/H]) are

subject to four constraints [Santos et al. 2004; Mucciarelli et al. 2013]:

1. Effective temperature: is derived by imposing that there is no correlation between the excitation potential and the abundance of FeI lines (or by other

(34)

words, imposing the excitation equilibrium). By taking into account the Boltzmann equation, the number of electrons populating each energy level is a function of the Teff. If the derived value of the Teff is too high, the lower energy

levels will be under-populated. Thus, the expected line strengths for low excitation potential transitions will be too shallow. High Teff derived values

induces an anticorrelation between element abundance and the excitation potential. On the other hand, if the derived value of the Teff is too low, a deeper

line profile for low excitation potential transitions and a positive correlation is introduced.

2. Surface gravity: is derived by imposing that for given chemical specie, its abundance in the neutral state is equal to its abundance in an ionized state (within the uncertainties). This requirement is the so-called ionization equilibrium. Surface gravity is a direct measure of the photospheric pressure. Hence, variations in the surface gravity value induce changes in the ionized lines, as they are more sensitive to the electronic pressure. The method assumes that energy levels of a certain chemical specie is populated according to the Saha and Boltzmann equations (in local thermodynamic equilibrium, hereafter reported by the acronym LTE).

3. Microturbulence velocity: is computed by establishing that there is no correlation between the iron abundance and line strength. Microturbulence mostly affects the moderate/strong lines located across the flat regime of the growth curve (Fig.1.6). On the other hand, in the linear regime, these spectral lines are more sensitive to the element abundance. Microturbulence is introduced to try to correct the non-thermal effects which are not, in general, well described by the 1-dimensional static model atmospheres. It also acts as a corrective factor, which minimizes the line-to-line scatter.

4. Metallicity: the overall metallicity is in general approximated by the measure of

iron abundance ([ ] [ ]⁄ [ ] [ ]⁄ ), as iron has a large number of spectral

lines to measure.

(35)

In solar-type stars, Fe lines can be used to derive the stellar atmospheric parameters (naming Teff, log(g), vt and [Fe/H]). The obtained solution should verify

three conditions of the standard method:  Ionization equilibrium: [FeI/H]=[FeII/H];

 Excitation equilibrium, which means that the metallicity must be independent of the excitation potential;

 Independence of the metallicity with the EW’s;

In order to measure the EW’s of the iron lines, it is possible to use a routine called ARES (Automatic Routine for measuring Equivalenth Widths) [Sousa et al., 2007]. Then, in order to derive the stellar parameters, a program called MOOG may be used [Sneden, 1973]. I will describe this program in the next section.

1.2.1. MOOG

MOOG is a code written in FORTRAN, which performs several LTE line analysis and spectral synthesis tasks. The basic equations of LTE stellar line analysis follow the formulation provided by Edmonds, F. N. (1969). The major part of the MOOG code follows the WIDTH and SYNTHE codes provided by Kurucz, R., available online

at http://kurucz.harvard.edu/. Further details about other options of the MOOG code or

how to download it, are provided online at http://www.as.utexas.edu/~chris/moog.html. The typical use of MOOG is to assist to derive the stellar atmospheric parameters. This is done using the MOOG option called abfind. It computes the EW’s which agree with the observed ones, previously derived through other software packages. In Fig.1.7, I show two typical output plots of MOOG. The red points represent the measured abundances, the yellow dashed line represents the mean of FeI abundances and the blue dashed line represents linear trends of abundance with three variables.

Let be the slope of the blue dashed line of the of the FeI lines abundances from individual lines versus the excitation potential plot. Let be the slope of the blue dashed line of the FeI lines abundances from individual lines versus the reduced EW plot.

(36)

Fig.1.7 – Top: Plot of the FeI lines abundances from individual lines versus the excitation potential. Bottom: Plot of the FeI lines abundances from individual lines versus the reduced EW’s. It also contains information about the stellar model atmosphere used in this computation. The abundance units of both vertical axis are logarithmic number densities on a standard scale in which

( ( )) . Source: http://www.as.utexas.edu/~chris/moog.html.

The best values of the stellar atmospheric parameters are found, when the slope deviations of these two lines are typically zero. Or by other words, when and . So, I aim to construct a routine which iteratively attempts to minimize these deviations and finds the best atmospheric parameters.

(37)

(38)

2. Deterministic Optimization methods

In mathematics, the term optimization is referred to the study of problems where it is aimed to minimize or maximize a function. Optimization is usually a very large field of numerical research, which can be found among many domains, such as physics, astrophysics, biology, economy, engineering and many others.

Let f be a N-dimensional objective function that it is aimed to minimize. This problem can be expressed in mathematical terms, as follows:

( )

(2.1)

Or by other words, finding a vector that satisfies:

( ) ( ) (2.2) A minimum can be whether global (i.e., truly the lowest objective function value) or local (i.e., the lowest value in a finite neighborhood and not on the boundary of that neighborhood) (Fig.2.1).

Computationally it is aimed to solve an optimization problem faster and with the less CPU memory as possible. One issue in minimization problems is when the optimization algorithm gets stuck in a local minimum. In fact, this problem can be locally avoided by optimally adapting the optimization algorithm to our problem or by changing the optimization algorithm, making it capable of avoiding local minima. Indeed, in some cases, these adaptations can made our algorithm more efficient and more robust to the minimization problem.

In the world of optimization we would feel lucky if we could find the “perfect” minimization algorithm for our problem. Indeed, sometimes finding the best algorithm to solve our minimization problem is a quite arduous task. To solve this problem, trying more than one minimization method and compare their efficiencies and robustness to the problem, may be the solution. However, how should we choose the most appropriated optimization methods to a given problem? Our initial choice can be based on the knowledge of the behavior of the objective function f or as, for example, on whether we can to calculate or not its derivatives terms. In general, minimization

(39)

methods are iterative: initially a starting point is given and then a minimization algorithm may be used to decide where to proceed and how long.

Fig.2.1 – Extremes of a objective function at a given interval [ ]. Points A, C and E are illustrative of local maxima. Points B

and F are illustrative of local minima. Point G are illustrative of a global maximum and point D is illustrative of the global minimum. Source: Press et al. (2007).

In unconstrained multidimensional minimization (i.e., where initially there are no limitations on the allowed values of the independent variables), numerical optimization methods are classified taking into account how many terms of the Taylor’s expansion of the objective function are exploited:

a) Zero order methods: this type of methods only requires evaluations of the objective function and their storage requirement is of the order of N2, where N is the number of iterations. They are also called by direct search method, among the literature.

(40)

b) First order methods: this type of methods only requires the computation of the Jacobian matrix and an one-dimensional minimization sub-algorithm. Their storage requirement is of the order of a few times N.

c) Second order methods: this type of methods only requires the computation of the Hessian matrix and an one-dimensional minimization sub-algorithm. Their storage requirement is of the order of N2.

In this chapter, I will describe some deterministic methods for unconstrained optimization and is divided into three sections. In section 2.1, I will describe some direct search methods. In section 2.2, I will describe some gradient based methods. In section 2.3, I will describe the Quasi-Newtonian methods.

2.1. Direct search methods

Direct search methods are reasonably straightforward to implement and can be adapted to solve many nonlinear optimization problems. The term direct search was first coined by Hooke, R. and Jeeves, T. (1961).

A good overview about this class of minimization methods is given by Lewis, R. et al. (2000). According to this work, direct search methods can be classified as pattern search methods, simplex methods3 and methods with adaptive sets of search directions. Pattern search methods are characterized by a series of exploratory moves which takes into account the behavior of the objective function at a pattern of points, all lying on a rational lattice. Simplex methods are characterized by a simple guide, which leads them in the search. Methods with adaptive sets of search directions attempt to accelerate the search, by constructing directions which are obtained from the curvature of the objective function. Among the literature, several derivative-free minimization methods may be found: Hooke and Jevees method [Hooke, R. and Jeeves, T. 1961]; the Downhill Simplex method [Spendley et al. 1962; Nelder and Mead 1965]; Rosenbrock’s method [Rosenbrock, H. 1960]; the Powell’s method [Powell, M. 1964].

In this section, I will describe three direct search methods: Hookes and Jeeves method (2.2.1); the Downhill Simplex method (2.1.2); the Rosenbrock’s method (2.1.3).

(41)

2.1.1. Hooke and Jeeves method

The Hooke and Jeeves method [Hooke, R. and Jeeves, T. 1961] can be characterized by two major moves in the space parameters:

 Exploratory move;  Pattern search;

Initially, an exploratory move is conducted in the vicinity of the current point in order to find the best target point. Then, these points are used to conduct a pattern move. However, what are the bases of these exploratory and pattern moves?

Let be the current solution and be a small perturbation induced to . In the exploratory move, is perturbed either in the positive and negative directions. The best solution found in this process is recorded. If the new point found at the end of all variable perturbations is different from the initial point, then the move is a success. Otherwise, the move is a failure. The exploratory move can be summarized as:

1. Set and .

2. Determine ( ) ( ) and ( ); 3. Find ( ) and set ;

4. If then return as the result. Evaluate if . If true, then return success or, if not, return failure. Otherwise, set and return to step 2. Then, a pattern move takes place. A new point is found by jumping from along a direction which connects the previous best point ( )_{and the current best} point , as follows:

( )

( ( )_{) (2.4)}

Kalyanmoy, D. (2005) presents a summarized view of the Hookes and Jeeves algorithm:

1. Define a starting point , variable increments ( ) , a step reduction factor and the termination tolerance . Set . (Usually, it is recommended to set ).

(42)

2. Do an exploratory move with and let be the point found. If this move is a success then ( ) _{and go to step 4.}

3. If ‖ ‖ then terminate the process. Otherwise, ⁄ ( ) and return to step 2.

4. Set . Perform a pattern search: ( ) ( ( )_).

5. Do another exploratory move around ( ) and let ( )_{be the found point.} 6. If ( ( )_{) (}( )_{), then return to step 4. Otherwise, return to step 3.}

As I quoted above, the numerical calculations done are really simple. However, the Hookes and Jeeves method can early go to incorrect solutions, especially if the objective function has highly nonlinear interactions between its variables. This method can also get stuck between steps 3 and 4 or 5 and 6, doing infinite exploratory moves and pattern searches. The convergence of this method to an accurate solution may also take a large number of objective functions evaluations.

2.1.2. Downhill Simplex method

The Downhill Simplex method was initially proposed by Spendley et al. (1962) and later modified by Nelder and Mead (1965). Indeed, this method can work as a very robust hill climbing scheme. It only requires objective function evaluations, but can be a slowly converging method, depending on the number of required objective function evaluations. It is the most popular technique among the direct search methods [Barton, R. and Ivey, J. 1996]. However, there is no general convergence property in the deterministic version of this method [Barton, R. and Ivey, J. 1996]. In fact, some demonstrations of no convergence for particular functions or class of functions are found among the literature: Lagarias, J. et al. (1998); Kelly, C. (1999).

A simplex is a N-dimensional geometrical figure, defined by N+1 vertices. If , the simplex is a triangle and if is a tetrahedron, but not necessarily a regular tetrahedron. In general, the simplex is nondegenerate (i.e., a simplex that encloses a finite inner N-dimensional volume). By other words, it means that any point in the domain of the search can be constructed through linear combinations of the adjacent edges at any given vertex.

(43)

The implementation of the Downhill Simplex algorithm is very easy. Initially, a N-dimensional vector of an initial guess is given to construct the initial simplex. Then, the algorithm is supposed to decide whether to search and move along the space of parameters, until it finds a global or at least a local minimum. However, if the algorithm gets stuck in a local minimum, it is a good idea to restart the algorithm, by constructing a new initial simplex in a region around the local minimum found and then let the algorithm do the job.

The downhill simplex is started by giving a N-dimensional vector to construct the initial simplex. Let be the vector of initial guesses. Then, the initial simplex is constructed as follows:

(2.3) where the ’s (for ) are N unit vectors and is a constant that depends on the problem’s characteristic length scale. Sometimes, different values of can be set for each vector direction. The algorithm evaluates the objective function fitness at each point of the initial simplex and stores its fitness values in a vector ζ.

The downhill simplex method takes now a series of iterations. The initial algorithm proposed by Spendley et al. (1962) only let the simplex to take reflection moves. Firstly, the algorithm looks for which point is the least desirable (i.e., the worst point), the next-to-least desirable (i.e., the next-worst point) and the most desirable (i.e., the best point), in the simplex. Also, the centroid ( ) of each surface of the simplex is derived. The worst point found is reflected through the centroid of the opposite face. If this reflected vertex is still the worst vertex among the simplex, then the next-worst vertex is selected and the reflection process is repeated. In Fig.2.2, I show a sequence of consecutive reflections illustrative of this scenario.

Nelder and Mead (1965) proposed some additional moves in order to accelerate the search. They proposed to add expansion and contraction moves (Fig.2.3) to deform the simplex in a way that it can be better adapted to the features of the objective function. So, after reflecting the worst point through the centroid of the opposite surface, the algorithm has to decide what will be the next move. Let be the reflected point.

(44)

So:

a) If the objective function value at is better than the best point in the simplex, then the reflection took the simplex to a good search region. In this case, an expansion takes place along the direction from the centroid to the reflected point. This expansion is controlled by a factor , called by expansion coefficient. b) If the objective function value at is worse than the worst point in the simplex, then the reflection took the simplex to a bad search region. In this case, a contraction takes place from the centroid to the reflected point. This contraction is controlled by a factor β, called the contraction coefficient. Usually β is set to a negative value.

c) If the objective function value at is better than the worst point and worse than the next-to-worse point of the simplex, then a set of contractions take place until β is made positive.

Fig.2.2 – A sequence of reflections ( ), each of which failed to replace the best vertex , bringing the simplex to its

starting sequence. Source: Lewis, R. et al. (2000).

The new point replaces the worst point in the simplex and the algorithm keeps going with the new simplex.

(45)

Fig.2.3 – Left: Original simplex with a reflection, expansion and two possible contractions. Right: Shrink step towards the best

vertex , when all the other moves failed. Source: Lewis, R. et al. (2000).

Since the 60’s, this method has become the most popular among the direct search methods and had suffered several improvements.

Press et al. (2007), present an implementation of the simplex method, in a subroutine coded in C, called amoeba. In this implementation, the initial simplex should be created as before. Then, in the next steps, the amoeba will move the simplex until it finds the highest point through its opposite face, by successive reflections. They will conserve the volume of the simplex and will maintain its nondegeneracy. When a valley is found, the method contracts itself and executes consecutive moves across the valley until it reaches the lowest point (i.e., the minimum of the valley). The basic moves conducted by the amoeba subroutine are summarized in Fig.2.4.

But, as in any minimization method, the stopping criterion is a delicate business. In this case, the routine is set to stop when the objective function value is inferior to a given tolerance ftol or when the maximum number of allowed iterations is reached.

(46)

Fig.2.4 – Sequence of possible moves executed by the Downhill Simplex method. The initial simplex, here a tetrahedron, is shown on top. Possible simplex moves: (a) reflection away from the highest point. (b) a reflection and expansion away from the highest point. (c) a contraction along one dimension from the highest point. (d) a contraction along all dimensions to the lowest point. Source: Press et al. (2007).

2.1.3. Rosenbrock’s method

Rosenbrock, H. (1960) proposed a method with adaptive sets of search directions. This algorithm approximates a gradient search, merging the best strategies of both zero and first order methods.

(47)

In the first iteration of the method, a search is conducted according to the directions of the base vectors of a N-dimensional space. If the search is a success, which means that a new minimum of the objective function was found, the step width is increased. Otherwise, the step width is decreased and the search will be conducted in the opposite direction. As soon as a success is found and exploited in each base direction, the coordinate system is rotated in order to make the first base vector pointing into the direction of the gradient. This is usually done through the Gram-Schmidt orthogonalization procedure, which can be very time consuming as the number of dimensions increases. At this time, all step widths are initialized and this process is repeated using the new rotated coordinate system, until the minimum of the objective function is found. The Rosenbrock’s method can become very unstable in some extreme cases, leading to its premature failure. However, adapting the search directions by taking into account what the method learnt about the objective function in each stage may be very fruitful.

In Fig.2.5, I show the way that Rosenbrock’s method works. Also, all the new iterations are marked with a square. In each new stage, the method has available the search directions.

(48)

It is important to notice how quickly the method adapts itself to the narrow valley (Fig.2.5). Clearly, the initial three stages show a failure in the search directions. Indeed, in the following steps, the algorithm converged to the minimum of the narrow valley.

2.2. Conjugate direction methods: Powell’s quadratically convergent

method

Any objective function f can be approximated by its Taylor series, at a given point P with coordinates x, as follows:

( ) ( ) ∑ ∑ ( ) ( ) ( ) where ( ), | and [ ]

| . Note that the matrix A is also called as the Hessian matrix of the objective function evaluated at a given point P.

The gradient vector of a quadratic form (2.5) is defined as:

( ) [ ] ( )

The gradient is a vector field, which at a given point P, points towards the direction of the greatest increase of the objective function f. If A is a symmetric matrix, then the gradient of the objective function is defined as:

( ) ( ) The objective function f can be minimized, by setting ( ) . So, by other words, is aimed to solve a linear system of equations, defined as:

( ) or,

(49)

[ ] [ ] [ ] ( )

where A is a matrix and is defined positive (which means that for any nonzero vector x, ) and x and b are both vectors.

However, how does the gradient of the objective function f changes across some direction?

( ) ( ) ( ) Let us suppose that we moved across a direction of a vector u and now we attempt to move across another direction v. So, from equation (2.10):

( ) ( ) According to equation (2.11), u and v are called conjugate vectors. A set of this kind of vectors is called conjugate set.

Powell, M. (1964) introduced a direction set method, which produces a N mutually conjugate directions. The basic idea of this method is quite simple. Firstly, define a basis of vectors :

(2.12) Then, loop until the objective function stops decreasing:

1. Let be the starting point.

2. For , move from to across the direction . 3. For : .

4. Define .

5. Move to the desired minimum, across the direction and denote as the new point found.

According to Powell, M. (1964), this procedure does a good job in finding the minimum of a quadratic form, as defined by equation (2.5). However, the Powell’s method as it was described above, has a problem: the substitution of by , can

(50)

create a sets of directions which are linearly dependent. As a consequence, this leads the method to give a wrong answer. This linear dependence can be avoided by different ways.

Press et al. (2007) presents a modified version of the Powell’s method. It takes the as being the new direction. In the case of a valley in which its long direction is changing slowly, the direction will perhaps give a good run across the new long direction. The change introduced in the Powell’s method is to discard the old direction across which the objective function f made its largest decrease. This change in the procedure helps to avoid the linear dependence. However, sometimes it is better to not add a new direction. So, how does the algorithm decides whether to add or not a new direction?

For this purpose, set ( ) , ( ) and ( ) , where represents the function value at a given “extrapolated” point found across the new direction. In addition, let be the magnitude of the largest decrease across a certain direction. So:

1. If , then keep the old set of directions for the next procedure, as the average direction is discarded.

2. If ( )[( ) ] ( ) , then keep the old set of directions for the next procedure, as:

a. The decrease across the average direction was not mainly due to a simple direction’s decrease.

b. There seems to be a significative second derivative across the average direction and which appears to be close to the minimum.

This modified version of the Powell’s method was coded in C language and presented by Press et al. (2007). The routine is called powell.c. The inputs to the routine consist of: a N-dimensional starting point; a matrix, whose columns are the initial set of directions; a given tolerance ftol. The output consists of: a point P, which is the best point found; a vector , which is the current direction set; fret, which is the objective function value evaluated at P; iter, which is the number of iterations taken. This routine makes use of a line minimization, also described in Press et al. (2007). It is a subroutine called linmin and uses methods of one-dimensional minimization, such as