SISTEMA DE BIBLIOTECAS DA UNICAMP
REPOSITÓRIO DA PRODUÇÃO CIENTIFICA E INTELECTUAL DA UNICAMP
Versão do arquivo anexado / Version of attached file:
Versão do Editor / Published Version
Mais informações no site da editora / Further information on publisher's website:
http://www.compama.co.usb.ve/node/18
DOI: 0
Direitos autorais / Publisher's copyright statement:
©2016
by Universidad Simon Bolivar. All rights reserved.
DIRETORIA DE TRATAMENTO DA INFORMAÇÃO
Cidade Universitária Zeferino Vaz Barão Geraldo
CEP 13083-970 – Campinas SP
Fone: (19) 3521-6493
http://www.repositorio.unicamp.br
A globally onvergent method for
nonlinear least-squares problems based
on the Gauss-Newton model with
spe tral orre tion
Douglas S. Gonçalves
∗
, Sandra A. Santos
†
CompAMaVol.4,No.2,pp.7-26,2016-A eptedJune3,2016
InHonorofthe60thBirthdayofProfessorMar osRaydan
Abstra t
Thisworkaddressesaspe tral orre tionfor theGauss-Newtonmodelinthe
solu-tion of nonlinear least-squares problems within a globally onvergent algorithmi
framework. The nonmonotone line sear h ofZhang andHager isthe hosen
glob-alization tool. We show that the sear h dire tions obtained from the orre ted
Gauss-Newtonmodelsatisfythe onditions thatensure theglobal onvergen e
un-dersu haline sear h s heme. Anumeri al studyassessestheimpa t ofusing the
spe tral orre tion forsolving two sets oftest problems fromtheliterature.
Keywords: Nonlinear leastsquares, spe tral parameter, Gauss-Newton method,
global onvergen e, numeri al tests.
1 Introdu tion
Nonlinear least-squares (NLS) problems are a lass of stru tured
un on-strained minimization problems that has a pra ti al importan e in several
∗
DepartmentofMathemati s,CCFM,FederalUniversityofSantaCatarina,88040-900,
Florianópolis,SC,Brazil(douglas.gon alvesufs .br).
†
Department of Applied Mathemati s, IMECC, University of Campinas, 13083-970,
s enarios. Fromdatatting[1℄,parameteridenti ation[2℄,anddata
assim-ilation [3℄ to regularization of ill-posed problems [4℄, to name a few, many
appli ationsmay be addressed within the NLS framework.
Con erning the iterative numeri al te hniques employed to solve NLS
problems, the Gauss-Newton (GN) is very popular. The asso iated linear
system that must be solved at ea h iteration may not be safely positive
denite, so a s alar and positive orre tion is adopted, giving rise to the
Levenberg-Morrison-Marquardtapproa h[57℄. Re entworkshaveproposed
adaptive strategies to update the regularizing parameter. See [811℄ and
referen es therein. Newton's method is also a possibility, as long as the
se ondorder derivatives are available [12,13℄.
Theaforementioneds alar orre tion,however, mightbesign-freein ase
it plays the role of the se ond-order derivative matrix. This is pre isely the
room where the spe tral orre tion ts in. Su h a orre tion was exploited
for NLS with quadrati residues, and upon the lo al perspe tive [14℄. The
spe tral orre tionfortheGauss-Newtonmethodwasmotivatedbythe
spe -tralstep size tothe gradient method,proposedbyBarzilaiand Borwein[15℄
and Raydan [16℄. In the former work, the authors presented a onvergen e
analysis to bidimensionalquadrati s, whereas in the latter, the onvergen e
was extended to
n
-dimensional stri tly onvex quadrati problems.Gen-eral un onstrained minimization problems were addressed by means of the
spe tral perspe tive by Raydan in [17℄. Adding upon these ideas, Spe tral
Proje ted Gradient (SPG) methods were proposed by Birgin,Martínez and
Raydan [18℄, appli able to large-s ale onvex onstrainedproblems inwhi h
theproje tion ontothefeasible set an beinexpensively omputed. The
sur-veys [19,20℄ provide a broad perspe tive of the spe tral proje ted gradient
methods. Con erningthesolutionofnonlinearsystemsofequations,LaCruz
andRaydan [21℄usedthe spe tralapproa hforsu hagoal,further analyzed
by La Cruz,Martínez and Raydan [22℄ ina gradient-frees enario.
ThisworkaddressesthegeneralNLSproblembymeansoftheGNmethod
withthe spe tral orre tion. Notationandbasi denitionsare givenin
Se -tion 2, in luding the spe tral orre tion. Se tion3 presents the algorithmi
framework, dis ussing some implementation details and the global
onver-gen e analysis. The numeri al performan e is investigated along with two
distin t globalization strategies, based on monotone and nonmonotone line
sear hes, as wellas with aben hmark. The numeri al resultsfor two sets of
testproblems fromtheliteratureare shown andanalyzedinSe tion4. Final
2 Preliminaries and the spe tral orre tion
Giventheresidualfun tion
F : R
n
→ R
m
,
m
≥ n
,withF
i
twi e ontinuouslydierentiable fun tionsfor
i = 1, . . . , m
, the NLS problemis stated asmin
x∈R
n
f (x)
(1) wheref (x) :=
1
2
kF (x)k
2
and
k · k
is the Eu lidean ve tor norm and theindu ed matrix operator norm. Denoting the Ja obian of the residual
fun -tion
F
omputed atx
as the matrixJ(x)
∈ R
m×n
, the derivatives of the
obje tive fun tion
f
omputed atx
are given by∇f(x) = J(x)
T
F (x)
and∇
2
f (x) = J(x)
T
J(x) + S(x)
, whereS(x) =
P
m
i=1
F
i
(x)
∇
2
F
i
(x)
. We alsoadopt the redu ed notation
F
k
≡ F (x
k
)
,J
k
≡ J(x
k
)
andS
k
≡ S(x
k
)
.The iterative s heme of Newton'smethodappliedto the nonlinear
equa-tions
∇f(x) = J(x)
T
F (x) = 0
inits pure lo alform iswritten as
x
k+1
= x
k
− (J
k
T
J
k
+ S
k
)
−1
J
k
T
F
k
.
(2)The Gauss-Newton and the Levenberg-Morrison-Marquardt are popular
alternatives to ir umvent the need of omputing the matrix
S
k
. In theformer, this matrix is just dropped from (2), whereas in the latter a s alar
matrix (i.e. diagonal with identi al elements) of the form
µ
k
I
is added toJ
T
k
J
k
, leadingto the globalizediterationx
k+1
= x
k
− t
k
(J
k
T
J
k
+ µ
k
I)
−1
J
k
T
F
k
,
(3)where
µ
k
≥ 0
is interpreted as a regularization parameter andt
k
∈ (0, 1]
isthe stepsize, omputedtoa omplishasu ientde rease onditionupon
f
.Inthisworkweaddresstheusageofaspe tral hoi efortheparameter
µ
k
,whi h leadsto asimple se ondorder orre tion for the Gauss-Newton (GN)
model based on the urvature information ontained in approximations for
the residual Hessians atthe urrent iterate. Noti e that
µ
k
may be negativein ertainiterations, a ording with the residualvalues and Hessians. Thus,
the se ondterm of the matrix
∇
2
f (x
k
)
is approximated byS
k
≈
m
X
i=1
F
i
(x
k
)σ
i,k
I.
The s alar
σ
i,k
isa spe tral parameter ( f. [1517℄), updatedbyσ
i,k
=
y
T
i,k−1
s
k−1
s
T
k−1
s
k−1
,
i = 1, . . . , m
where
y
i,k−1
=
∇F
i
(x
k
)
− ∇F
i
(x
k−1
)
ands
k−1
= x
k
− x
k−1
.Therefore,
S
k
≈ µ
k
I
, and the spe tral parameterµ
k
is dened byµ
k
:=
m
X
i=1
F
i
(x
k
)
(
∇F
i
(x
k
)
− ∇F
i
(x
k−1
))
T
s
k−1
s
T
k−1
s
k−1
,
(4)whi h may be alternatively statedin matrix formas
µ
k
=
F
T
k
(J
k
− J
k−1
)s
k−1
s
T
k−1
s
k−1
.
(5)By applying the Taylor expansion with integral remainder to (4), we
obtain
µ
k
=
m
X
i=1
F
i
(x
k
)
s
T
k−1
R
1
0
∇
2
F
i
(x
k−1
+ ts
k−1
)dt
s
k−1
s
T
k−1
s
k−1
,
(6)so that the parameter
µ
k
may be interpreted as well as a weighted sum ofthe Rayleigh quotients of average residual Hessians, for whi h the weights
are the orresponding omponentsofthe residualfun tions omputedatthe
urrent iterate.
From a somewhat dierent perspe tive, due to the relationship
S
k
(x
k
− x
k−1
)
≈ (J
k
− J
k−1
)
T
F
k
,
under the assumption that the ee t aquasi-Newton approximation
B
k
onthe ve tors
k−1
should be similar totheoneofthematrix
S
k
( f.[23℄),onemustimposethatB
k
s
k−1
= (J
k
−J
k−1
)
T
F
k
.
Relaxingsu ha onditionwehave
s
T
k−1
B
k
s
k−1
= s
T
k−1
(J
k
−J
k−1
)
T
F
k
,
sothat,withthe s alarapproximation
B
k
= µ
k
I
, the expressionµ
k
asdened in(5)isalsoobtained. Hen e,aweakse ant ondition isveried[12℄,whi hmeans
that the matrix
µ
k
I
arries some information on how the matrixS
k
on-tributestothe urvatureofthe modelalong thesegment thatjoins
x
k−1
andx
k−1
+ s
k−1
.3 A globally onvergent algorithmi framework
Next we present the main algorithm of this work. We all the iteration
s heme(3)withthespe tral hoi efortheparameter
µ
k
astheGauss-NewtonAlgorithm 1. Globalframework forthe GN+SCmethod
Input :
x0
∈ R
n
,
µ0
∈ R
,µmax
> 0
,β > 1
,∆max
> 0
,γ
∈ (0, 1)
,Q0
= 1
,0
≤ η
min
≤ η
max
≤ 1
1. Set
k = 0
,evaluateFk
,Jk
andsetCk
=
1
2kF
k
k
2
2. whilethestopping riteriaarenotsatiseddo3. if
µk
≥ 0
then4. Solve
(J
T
k
Jk
+ µk
I)d =
−J
k
T
Fk
,andsetdk
= d
,αk
= 0
5. else 6. Choose
∆k
∈
1
β
kJ
T
k
Fk
k, min
β
kJ
T
k
Fk
k, ∆
max
.
7. Compute
(dk, αk)
asaprimal-dualsolutionofmin
1
2
kJ
kd + Fk
k
2
+
µ
k
2
kdk
2
s.t.kdk ≤ ∆
k.
8. endif 9. Sett = 1
10. while1
2
kF (x
k
+ tdk)
k
2
> Ck
+ γtd
T
k
J
T
k
Fk
do 11.t = t/2
12. endwhile 13.tk
= t
14.xk+1
= xk
+ tk
dk
,sk
= xk+1
− x
k
15. Evaluate
Fk+1
,Jk+1
and omputeµk+1
= max
min
s
T
k
(Jk+1
− J
k
)
T
Fk+1
s
T
k
sk
, µmax
,
−µ
max
16. Choose
ηk
∈ [η
min, ηmax]
andsetQk+1
= ηkQk
+ 1
,Ck+1
= ηkQkCk
+
1
2
kF
k+1
k
2
/Qk+1
17. Setk = k + 1
18. endwhileRemarks about the Algorithm 1.
1. ThealgorithmmaystartwiththepureGauss-Newtonstep,i.e.
µ
0
= 0
.2. The veri ation at line 3 immediately implies in the positive
denite-ness of
J
T
k
J
k
+ µ
k
I
wheneverµ
k
> 0
,and insu ha ase,the linearsys-tem of thenormal equationsatline4mightbe solved by the Cholesky
fa torization. Wehaveadoptedthemorestablealternativeproposedby
Moré in [24℄, despite its higher omputational ost, whi h onstitutes
in solving the linear least-squares problem
min
d∈R
n
1
2
J
k
√
µ
k
I
d +
F
k
0
2
by aQR fa torization of the augmented matrix.
Besides, as the urrent Ja obian might not have full rank, whenever
µ
k
= 0
, rst we ompute a QRfa torization ofJ
k
, to verify if the GNstepisanoption,i.e. ifthefa tor
R
issafelyfull-rank. In aseitisnot,orif the sign-free spe tral parameter isnegative, thenthe trust-region
subproblemof line7 issolved to produ e a des ent dire tion.
3. The quadrati norm onstrained subproblem of line7 issolved by the
Moré-Sorensenstrategy[25℄,sothatthepair
(d
k
, α
k
)
∈ R
n
×R
+
veriesJ
k
T
J
k
+ (µ
k
+ α
k
)I
d
k
=
−J
k
T
F
k
(7) withα
k
≥ max
0,
−λ
min
(J
k
T
J
k
+ µ
k
I)
(8)
(
λ
min
(A)
denotes the smallest eigenvalue of the symmetri matrixA
),kd
k
k
2
≤ ∆
2
k
,α
k
(
kd
k
k
2
− ∆
2
k
) = 0
,and the matrixJ
T
k
J
k
+ (µ
k
+ α
k
)I
ispositivesemidenite.
The safeguarding s heme that denes the radius
∆
k
takes intoa - ount the stationarity measure at the urrent iterate. It aims to
al-low the Newton's step for the subproblem, namely
d
k
=
−(J
T
k
J
k
+
µ
k
I)
−1
(J
k
T
F
k
)
, tobe feasible in ase it is ades ent dire tion.4. Thelinesear h pro edureoflines 913 isbasedonthe workof Zhang
and Hager [26℄, with the updating s heme of line 16. The hoi e
η
k
= 0
provides a monotone linesear h, whereas any hoi eη
k
∈
(0, 1]
generates a nonmonotone linesear h. Con erning the sequen esQ
k
andC
k
, from the updating expressions at line 16, it is easy tosee that
C
k+1
is a onvex ombination ofC
k
and1
2
kF
k+1
k
2
. Sin eC
0
=
1
2
kF
0
k
2
,itfollowsthatC
k
isa onvex ombinationofthe fun tion values1
2
kF
0
k
2
,
1
2
kF
1
k
2
, . . . ,
1
2
kF
k
k
2
. For further details and propertiesof the (non)monotone linesear h we referthe readerto [26℄.
5. From the previous reasoning about the omputation of the (des ent)
step
s
k
,westressthattheAlgorithm1iswelldened, thatis,italwaysomputesades entdire tion
d
k
andthelinesear h(lines913)nishesaftera nitenumberof steps.
Theglobal onvergen e ofAlgorithm1isanalyzedinthefollowing,based
prove that their fundamentalassumptions uponthe dire tions are satised.
Lemma 1. If
kJ(x)k ≤ ζ
,ζ > 0
, for anyx
∈ R
n
, then the dire tions
generated by the Algorithm1 satisfy the onditions
a)
g
T
k
d
k
≤ −c
1
kg
k
k
2
, b)kd
k
k ≤ c
2
kg
k
k
, for allk
, whereg
k
:= J
T
k
F
k
, andc
1
andc
2
are positive onstants.Proof. Foragiven threshold
µ
+
> 0
,and axed iterationk
,letussplit theanalysis in two ases.
Case 1. The spe tral parameter is safely positive, that is,
µ
k
is su h thatµ
+
≤ µ
k
≤ µ
max
. It is lear thatλ
min
(J
k
T
J
k
+ µ
k
I)
≥ µ
+
.
Thuskd
k
k = k(J
k
T
J
k
+ µ
k
I)
−1
g
k
k ≤ k(J
k
T
J
k
+ µ
k
I)
−1
kkg
k
k ≤
1
µ
+
kg
k
k.
Additionallyg
T
k
d
k
=
−g
k
T
(J
k
T
J
k
+ µ
k
I)
−1
g
k
≤
−1
ζ
2
+ µ
max
kg
k
k
2
,
where the last inequality follows from
λ
max
(J
k
T
J
k
) + µ
k
=
kJ
k
T
J
k
k + µ
k
≤ kJ
k
k
2
+ µ
max
≤ ζ
2
+ µ
max
,
where
λ
max
(A)
denotes the largest eigenvalue of the symmetri matrixA
.Therefore (a)and (b) hold with
c
1
= 1/(ζ
2
+ µ
max
)
andc
2
= 1/µ
+
.Case2. Thespe tralparameterisbelowthethreshold,i.e.,
−µ
max
≤ µ
k
< µ
+
In this se ond ase, the following trust-regionsubproblem is solved
min
d
kJ
k
d + F
k
k
2
+
µ
k
2
kdk
2
s.t.kdk ≤ ∆
k
.
(9)Let
(d
k
, α
k
)
be a primal-dual solution of (9), and onsider the SVD de om-position(J
T
k
J
k
+ µ
k
I + α
k
I) = Q
k
Σ
k
Q
T
k
,
whereQ
k
= [q
1
k
· · · q
k
n
]
∈ R
n×n
is orthogonal,Σ
k
:=
|Λ
k
+ µ
k
I + α
k
I
|
withΛ
k
=
diag(λ
1
k
, . . . , λ
n
k
)
,λ
1
k
≥ . . . ≥ λ
n
k
≥ 0
,so thatσ
i
k
:=
|λ
i
k
+ µ
k
+ α
k
|
, andσ
1
k
≥ · · · ≥ σ
r
k
> 0 = σ
k
r+1
=
· · · = σ
k
n
. Noti e that(d
k
, α
k
)
is a solutionof (9), and due to (8),
λ
i
k
+ µ
k
+ α
k
≥ 0, ∀i.
Hen e, for all
i
itholds|λ
i
k
+ µ
k
+ α
k
| = λ
i
k
+ µ
k
+ α
k
.The dire tion
d
k
may be expressed asd
k
=
−(J
k
T
J
k
+ µ
k
I + α
k
I)
+
g
k
+ v
k
=
−
X
σ
i
k
6=0
(q
i
k
)
T
g
k
λ
i
k
+ µ
k
+ α
k
q
i
k
+ v
k
,
whereA
+
denotes the generalized inverse of the matrix
A
andv
k
isa ve torinthe null spa e of
B
k
:= (J
k
T
J
k
+ µ
k
I + α
k
I),
so that
v
k
∈
span{q
r+1
k
· · · q
k
n
}
, whereas, due to (7),g
k
∈
span{q
1
k
· · · q
k
r
}
. Thereforev
T
k
g
k
= 0
and weobtaing
k
T
d
k
=
−
X
σ
i
k
6=0
((q
i
k
)
T
g
k
)
2
λ
i
k
+ µ
k
+ α
k
.
If
kd
k
k < ∆
k
, thenα
k
= 0
and sin eλ
i
k
≤ λ
max
(J
k
T
J
k
) =
kJ
k
T
J
k
k ≤ ζ
2
, wehaveg
k
T
d
k
=
−
X
σ
i
k
6=0
((q
i
k
)
T
g
k
)
2
λ
i
k
+ µ
k
≤
−1
ζ
2
+ µ
+
X
σ
i
k
6=0
((q
k
i
)
T
g
k
)
2
=
−1
ζ
2
+ µ
+
kg
k
k
2
.
Now, if
kd
k
k = ∆
k
, from (7), reasoning as in the proof of Lemma 2.3ofNo edaland Yuan[27℄, and due tothe onditions
1
β
kg
k
k ≤ ∆
k
and−µ
max
≤
µ
k
itfollows thatkB
k
d
k
k = kg
k
k ⇒ λ
min
(B
k
)
kd
k
k ≤ kg
k
k ⇒ λ
min
(B
k
)
≤
kg
k
k
∆
k
⇒ min
i
λ
i
k
+ µ
k
+ α
k
= min
i
λ
i
k
+ µ
k
+ α
k
≤
kg
k
k
∆
k
⇒ α
k
≤
kg
k
k
∆
k
− min
i
λ
i
k
+ µ
k
≤ β + µ
max
.
Consequently,
λ
i
k
+ µ
k
+ α
k
≤ ζ
2
+ µ
+
+ β + µ
max
.
Thereforeg
k
T
d
k
=
−
X
σ
i
k
6=0
(q
T
i
g
k
)
2
λ
i
k
+ µ
k
+ α
k
≤
−kg
k
k
2
ζ
2
+ µ
+
+ β + µ
max
.
Moreover, be ause
∆
k
≤ βkg
k
k
,we havekd
k
k = ∆
k
≤ βkg
k
k,
and thusthe onditions (a) and (b) hold in this ase for
c
1
= 1/(ζ
2
+ µ
+
+ β + µ
max
)
andc
2
= β
.Finally,observingthat
max
1
ζ
2
+ µ
+
,
1
ζ
2
+ µ
max
,
1
ζ
2
+ µ
+
+ β + µ
max
=
1
ζ
2
+ µ
+
,
settingc
1
=
1
ζ
2
+ µ
+
andc
2
= min
1
µ
+
, β
ompletes the proof.
✷
As in [26℄, when the onditions (a) and (b) of Lemma 1 are satised
by a set of sear h dire tions, we say that the Dire tion Assumption holds.
Next, forthe reader's onvenien e, we restatethe resultof Zhangand Hager
[26, Theorem 2.2℄ that assures the global onvergen e of Algorithm1.
Theorem 1. Suppose
f (x)
is bounded from below and the Dire tionAssumption holds. Assume that the gradient of the obje tive fun tion is
Lips hitz ontinuous on an open onvex set
Ω
that ontains the level setL(x
0
) =
{x ∈ R
n
| f(x) ≤ f(x
0
)
}
, beingx
0
∈ R
n
agiveninitialiterate. Then
the iterates
x
k
generated bythenonmonotoneline sear h ofAlgorithm1havethe property that
Moreover, if
η
max
< 1
thenlim
k→∞
k∇f(x
k
)
k = 0.
Hen e, every onvergent subsequen e of the iterates approa hes a point
x
∗
,where
∇f(x
∗
) = 0
.Proof. We remark thatthe boundednessfrombelowfor theobje tive
fun -tion learly holds for problem (1) and, by Lemma 1, the sear h dire tions
generated by Algorithm 1satisfy the Dire tion Assumption. The remainder
of the proof follows as inTheorem 2.2of [26℄.
✷
Itisworthwhile noti ingthat thepreviousanalysis isvalidfor any hoi e
ofaboundedsequen e
{µ
k
}
. Nevertheless,thespe tral orre tionprovidedatline15oftheAlgorithm1isalegitimate hoi e,whosepra ti alperforman e
isenlightenednext.
4 Numeri al results
Toinvestigate the e ien y and the robustness of the Algorithm 1,we have
implemented it in Fortran and solved two olle tions of problems from the
literature: (i) the rst 18 nonlinear least squares problems listed in Table 1
werepresented byMoré,GarbowandHillstrom[28℄;(ii)thelast22problems
were proposed by Luk²an [29℄. Table 1 also brings the number of variables
n
, numberof residual fun tions(equations)m
and a lassi ationa ordingwith the residualsize atthe solution.
Although the test set is omposed by small-s ale problems (
n
≤ 100
and
m
≤ 500
), it is important to stress the wide variety of su h problems:zero (47.5%), small (22.5%) and large (30%) residual problems; bad s aled
problems and problems with rank de ientJa obian.
Thetests wereperformedusingthe GNU-Fortran ompiler(64-bits),
ver-sion5.3.0, inanIntelMa BookPro, with2.4GHz,RAM of8GbandCa he
L2: 256Kb (per ore) and Ca he L3: 6Mb.
The parametri and algorithmi hoi es were
β =
100,
ifkJ
T
0
F
0
kkF
0
k ≤ 10
3
10,
if10
3
<
kJ
T
0
F
0
kkF
0
k ≤ 10
6
4,
otherwise,
Tab. 1: Nonlinear least squares test problems
# Problem
n
m
Residual-size1 Rosenbro k 2 2 zero
2 Powelsingular 4 4 zero
3 Bard 3 15 small
4 Chebyquad 9 9 zero
5 Brownand Dennis 4 20 large
6 Watson 12 31 zero
7 Jenri h and Sampson 2 10 large
8 Kowalik andOsborne 4 11 small
9 Freudenstein andRoth 2 2 large
10 Box 3D 3 10 zero
11 Heli al valley 3 3 zero
12 Brownalmost linear 10 10 zero
13 Osborne1 5 33 small
14 Osborne2 11 65 small
15 Meyer 3 16 large
16 Linear full rank 10 10 zero
17 Linear rank one 10 10 small
18 Linear rank one withzeros 3 3 small
19 Chained Rosenbro k 100 198 zero
20 Chained Wood 100 294 zero
21 Chained Powell 100 196 zero
22 Chained Cragg andLevy 100 245 small
23 Generalized Broydentridiagonal 100 100 zero
24 Chained BroydenBanded 100 100 zero
25 Extended Freudenseinand Roth 100 198 large
26 Wright and Holtzero residual 100 500 zero
27 Toint quadra ti merging 100 294 large
28 Chained exponential 100 199 large
29 Chained serpentine 100 198 zero
30 Chained andmodied HS-47 98 192 large
31 Chained andmodied HS-48 98 224 large
32 Chained andmodied HS-53 98 224 large
33 Sparse sigmoidal 100 196 small
34 Sparse exponential 100 196 small
35 Sparse trigonometri 100 196 zero
36 Counter urrent rea tors problem1 100 100 zero
37 Tridiagonal system 100 100 zero
38 Stru tured Ja obian problem 100 100 zero
γ = 10
−4
,µ
0
= 0
,µ
max
= 10
6
,∆
max
= min
100, 2
kJ
T
0
F
0
k
,
∆
0
= β
kJ
T
0
F
0
k
,∆
k
= max
{(1/β)kJ
k
T
F
k
k, min{βkJ
k
T
F
k
k, βks
k−1
k, ∆
max
}}
andη
k
∈ {0, 1}
.The starting points were sele ted as in the referen es [28,29℄ and the
trust-region subproblems were solved by the routine GQTPAR of MINPACK-2
(see http://ftp.m s.anl.gov/pub/MINPACK-2/).
Setting
ε
ma h
asthema hinepre ision,theimplementedstopping riteria
were as follows.
•
Convergen e toa stationarypoint was rea hedflag 2:
kJ
T
k
F
k
k ≤
gtol(= 10
−8
)
;
•
The omputed sear h dire tionis toosmallflag 3:
kd
k
k ≤
xtol(= 10
−14
)
;
•
The variationbetween two onse utive iterates istoo smallflag 4:
ks
k
k = kt
k
d
k
k ≤
xtol√
ε
ma h
+
kx
k
k
;
•
The linesear h failedflag 5:
t
k
≤
tolstep(= 10
−15
)
;
•
The variationof the obje tive fun tion is toosmallflag 6:
|kF
k+1
k
2
− kF
k
k
2
| ≤
tolreskF
k
k
2
, (i)tolres=10
−12
fortherstset of problems; (ii)tolres=
10
−8
for the se ondset, dueto
prob-lems with very large residual;
•
The maximum allowed number of iterationswas rea hedflag 99: itmax = 400.
Table 2 presents a omparison between the monotone and nonmonotone
line-sear hstrategiesforAlgorithm1when appliedtothe 40problemslisted
inTable1. It ontainsthenumberofiterations(IT)andfun tionevaluations
(FE) required for ea h variant of Algorithm 1 to rea h one of the oded
stopping riteria, indi ated in the olumn ag. The squared residual norm
kF
k
k
2
and the gradient normkJ
T
k
F
k
k
atthe last iterateare also presented.As we an see from the gures of Table 2, Algorithm 1 integrated with
bothline-sear hstrategies ouldsolvealmostallofproblems,stoppingbythe
toleran einthe gradient norm,whi hhappens for mostpart of the zeroand
smallresidualproblems,whereas the ag6(toosmallvariationinthe
obje -tivefun tion)predominatesamongthelargeresidualproblems. Algorithm1
Tab. 2: Numeri alresults for the monotone and nonmonotonestrategies monotone nonmonotone # IT FE
kF
k
k
2
kJ
T
k
Fk
k
ag IT FEkF
k
k
2
kJ
T
k
Fk
k
ag1 22 32 0.00000E+00 0.00E+00 2 18 25 1.34353E30 2.45E14 2
2 19 20 2.60254E12 4.11E09 2 19 20 2.60254E12 4.11E09 2
3 9 10 8.21488E03 2.55E10 2 9 10 8.21488E03 2.55E10 2
4 12 25 1.92146E22 5.50E11 2 16 33 7.32440E23 2.29E11 2
5 24 30 8.58222E+04 3.13E03 6 21 24 8.58222E+04 1.25E02 6
6 15 16 4.72527E10 1.70E10 2 15 16 4.72527E10 1.70E10 2
7 8 11 1.24362E+02 6.30E08 6 8 11 1.24362E+02 6.30E08 6
8 14 16 3.07506E04 7.09E10 2 14 16 3.07506E04 7.09E10 2
9 25 27 4.89843E+01 2.22E05 6 26 27 4.89843E+01 1.35E05 6
10 8 9 2.25414E19 1.10E10 2 8 9 2.25414E19 1.10E10 2
11 13 19 2.39151E19 3.29E09 2 15 22 6.91772E33 3.66E16 2
12 9 19 4.11690E21 6.92E11 2 9 19 4.11690E21 6.92E11 2
13 31 42 5.46489E05 1.12E10 2 23 24 5.46489E05 2.39E10 2
14 14 17 4.01377E02 3.15E09 2 18 22 4.01377E02 4.10E09 2
15 158 261 8.79459E+01 1.28E04 6 35 53 8.79459E+01 5.46E04 6
16 1 2 7.14905E30 2.67E15 2 1 2 7.14905E30 2.67E15 2
17 2 3 2.14286E+00 1.01E06 6 2 3 2.14286E+00 1.01E06 6
18 1 2 2.00000E+00 0.00E+00 2 1 2 2.00000E+00 0.00E+00 2
19 180 207 9.23929E19 2.19E09 2 169 170 4.80886E20 1.36E09 2
20 45 60 4.44440E19 4.60E09 2 39 40 4.38771E20 1.17E09 2
21 18 19 4.40338E12 8.60E09 2 18 19 4.40338E12 8.60E09 2
22 20 21 2.52061E+01 5.91E04 6 20 21 2.52061E+01 5.91E04 6
23 5 6 1.08621E22 3.13E11 2 5 6 1.08621E22 3.13E11 2
24 6 7 7.39953E20 1.27E09 2 6 7 7.39953E20 1.27E09 2
25 18 23 1.19646E+04 2.25E02 6 25 26 1.19646E+04 1.59E02 6
26 23 31 1.21017E09 9.77E09 2 23 31 1.21017E09 9.77E09 2
27 58 75 4.41616E+02 5.04E03 6 63 65 4.41616E+02 1.11E02 6
28 10 12 3.87395E+01 1.71E05 6 10 12 3.87395E+01 1.71E05 6
29 381 477 4.93038E32 2.22E16 2 323 343 2.95274E20 1.67E09 2
30 13 31 4.29220E+03 7.64E03 6 17 26 4.29220E+03 8.51E03 6
31 24 39 2.51889E+04 7.41E02 6 23 30 2.51889E+04 6.81E02 6
32 7 13 2.13121E+01 2.21E05 6 9 11 2.13121E+01 1.26E03 6
33 6 7 3.56970E+00 7.36E07 6 6 7 3.56970E+00 7.36E07 6
34 11 12 4.93161E01 1.14E06 6 11 12 4.93161E01 1.14E06 6
35 15 16 1.64116E21 2.40E11 2 15 16 1.64116E21 2.40E11 2
36 401 569 8.70870E01 3.27E+00 99 53 54 1.10351E25 3.64E12 2
37 13 14 3.82880E19 2.80E09 2 13 14 3.82880E19 2.80E09 2
38 11 18 8.30418E19 9.93E11 2 7 13 7.87067E16 3.06E09 2
39 17 48 1.23924E+02 1.94E02 6 18 20 1.23924E+02 1.05E03 6
(Counter urrent rea tors problem 1), for whi h the maximum number of
iterationswas rea hed. Besides the fa t that the nonmonotonestrategy
in- reasesabit(lessthan8)thenumberof iterationsin9out of40problems, it
redu es onsiderably the number of iterations in the harder problems when
ompared with the monotone linesear h (espe iallyproblems 15 and 36).
To ontextualize our results, we have also solved both sets of test
prob-lems using the routineLMDER of MINPACK, with the aforementioned stopping
riteriaand the s alingoption inhibited. LMDER is an implementationof the
Levenberg-Marquardt algorithm due to Moré [24℄, within the trust-region
philosophy, based onthe model
min
1
2
kJ
k
d + F
k
k
2
s.t.
kdk ≤ ∆
k
.
The results are shown inTable 3,whi hhas the samestru ture of
presenta-tion of the Table 2.
Bothrobustnessande ien ymaybebetterappre iatedfromthe
perfor-man eproles[30℄ ofFigure1. Wehave onsidered thenumberofiterations
(left) and the number of fun tion evaluations (right) demanded by the
al-gorithm GN+SC with ea h variant of the line sear h, together with LMDER.
Con erning the rst performan e measure, both variants of the line sear h
are omparable in terms of e ien y: 37% and 40% of the problems were
solved with the fewest number of iterations by GN+SC with the monotone
and the nonmonotone line sear h, respe tively. LMDER is the most e ient
one, solving 62.5% of the problems with the fewest number of iterations.
When it omes to robustness, however, only the nonmonotone variant of
GN+SC managed to solve the whole set of test problems. Both the
mono-tone GN+SC and LMDER did not manage to solve just one problem within
the400 iterationsbudget, rea hing 97.5%of su ess. Withrespe t tothe
ef- ien yofthe se ondperforman e measure,the dieren ebetween thethree
analyzedstrategiesismore signi ant: 30%,45%and 62.5%of theproblems
weresolved withthe fewest numberoffun tionevaluationsby themonotone
GN+SC,the nonmonotoneGN+SCand LMDER,respe tively. In termsof
ro-bustness, theout ome issimilartothe oneof therst performan e measure.
Con erning the ratio FE/IT, whi h assesses the efe tiveness of the
om-puted step, the box plots of Figure 2 depi ts a omparativeoverview of the
strategiesunderanalysis. ConsideringthetwovariantsofGN+SC,themore
Tab. 3: Numeri alresults for the LMDER routine LMDER(MINPACK) # IT FE
kF
k
k
2
kJ
T
k
Fk
k
ag 1 16 21 0.00000E+00 0.00E+00 2 2 13 13 5.71987E13 3.29E09 2 3 6 6 8.21488E03 3.04E09 2 4 9 12 6.63760E26 6.31E13 2 5 23 35 8.58222E+04 1.87E02 6 6 6 6 1.70822E09 1.07E10 2 7 13 22 1.24362E+02 3.82E04 6 8 25 28 3.07506E04 6.95E09 2 9 20 32 4.89843E+01 8.49E05 6 10 6 6 1.13586E19 4.88E10 2 11 11 15 9.54175E29 1.84E13 2 12 7 8 2.28724E25 1.42E12 2 13 18 21 5.46489E05 3.23E08 6 14 11 15 4.01377E02 1.15E07 6 15 114 128 8.79459E+01 3.59E05 6 16 2 2 1.14385E29 3.38E15 2 17 2 2 2.14286E+00 1.10E09 2 18 2 2 2.00000E+00 0.00E+00 2 19 133 136 6.51223E22 3.72E10 2 20 102 122 9.52118E29 3.00E13 2 21 13 13 1.27500E12 4.49E09 2 22 29 32 2.52061E+01 2.36E03 6 23 6 6 1.67941E30 6.92E15 2 24 7 7 9.08124E31 3.30E15 2 25 18 25 1.19646E+04 5.33E02 6 26 13 15 3.05290E11 1.39E09 2 27 26 27 4.34919E+02 4.49E03 6 28 9 19 3.87395E+01 3.94E04 6 29 265 276 0.00000E+00 0.00E+00 2 30 13 22 4.29220E+03 5.58E01 6 31 19 31 2.51889E+04 4.82E02 6 32 6 13 2.13121E+01 1.93E02 6 33 6 6 3.56970E+00 5.33E04 6 34 11 20 4.93161E01 8.38E04 6 35 12 15 1.60010E22 1.45E11 2 36 401 412 8.99843E01 1.06E+01 99 37 15 15 4.48088E19 3.02E09 2 38 10 12 4.68425E22 3.27E12 2 39 20 22 1.23924E+02 1.90E03 6 40 117 127 8.73941E+02 2.20E02 610
0
10
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Iterations
GN+SC (monotone)
GN+SC (nonmonotone)
LMDER
10
0
10
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Function evaluations
GN+SC (monotone)
GN+SC (nonmonotone)
LMDER
Fig.1: Performan eproles(logs aled)ofthenumberofiterations(left)and
the number of fun tion evaluations(right).
nonmonotonelinesear hsuggests thatthisvariantispreferabletothe
mono-tone one. Furthermore, one an see that GN+SC nonmonotone and LMDER
behave quitesimilarlyfor this measure.
5 Final remarks
We have proposeda spe tral orre tion for theGauss-Newton method asan
optionforsolving generalnonlinearleast-squares forwhi hthe residual
Hes-siansare noteasilyavailable. Wehaveadopted thenonmonotoneline-sear h
framework ofZhang andHager [26℄ asthe globalizationtool. Thedire tions
omputed by our algorithmwere proved tosatisfy the des ent ondition
as-sumedbyZhangandHager,sothattheirglobal onvergen eresultapplies. A
numeri alstudy with problems fromthe literature orroborates the
reliabil-ity ofadopting the spe tral orre tion forthe Gauss-Newton modelwithina
nonmonotoneline-sear hframework. This approa hturnedout tobe robust
and ompetitive with the ben hmark LMDER.
A knowledgements: This work was partially supported by CNPq (Grant
GN
+SC mon.
GN
+SC nonmon.
LMDER
.
.
.
.
.
Fig. 2: Box plots of the ratios FE/IT for the results of GN+SC with the
monotone and the nonmonotone line-sear h strategies (Table 2) as
well as the results of LMDER (Table 3). The orresponding medians
are 1.285,1.143 and 1.121.
Referen es
[1℄ K.S hittkowski.Numeri alDataFittinginDynami alSystems: A
Pra -ti al Introdu tion with Appli ations and Software. Kluwer A ademi
Publishers, Norwell(MA), 2002. doi: 10.1007/978-1-4419-5762-7.
[2℄ Q. Jin and M. Zhong. On the iteratively regularized Gauss-Newton
method in Bana h spa es with appli ations to parameter
identi a-tion problems. Numeris he Mathematik, 124:647683, 2013. doi:
10.1007/s00211-013-0529-5.
[3℄ S. Gratton, A.S. Lawless, and N.K. Ni hols. Approximate
Gauss-Newtonmethodsfornonlinearleast squaresproblems. SIAM J.Optim.,
18(1):106132,2007. doi: 10.1137/050624935.
[4℄ D. Pradeep and M.P. Rajan. A simplied Gauss-Newton iterative
ill-posedproblems. International Journalof Applied andComputational
Mathemati s, 2:97112, 2016. doi: 10.1007/s40819-015-0050-x.
[5℄ K.Levenberg. Amethodforthe solutionof ertainnon-linearproblems
in least squares. The Quarterly of Applied Mathemati s, 2(2):164168,
1944. http://www. iteulike.org/user/iranmehr/arti le/10796881.
[6℄ D. Marquardt. An algorithm for least-squares estimation of
nonlin-earparameters. SIAM Journal on Applied Mathemati s,11(2):431441,
1963. doi: 10.1137/0111030.
[7℄ D.D. Morrison. Methods for nonlinear least squares problems and
on-vergen e proofs. In J. Lorell and F. Yagi, editors, Pro eedings of the
Seminar on Tra king Programs and Orbit Determination, pages 19.
JetPropulsion Laboratory, Pasadena, CA, USA, 1960.
[8℄ S.Bellaviaand B. Morini. Stronglo al onvergen e properties of
adap-tive regularized methods for nonlinear least squares. IMA Journal of
Numeri alAnalysis,35(2):947968,2015.doi: 10.1093/imanum/dru021.
[9℄ J. Fan. A elerating the modied Levenberg-Marquardt method for
nonlinear equations. Mathemati s of Computation, 83(287):11731187,
2014. doi: 10.1090/S0025-5718-2013-02752-4.
[10℄ E.W.Karas,S.A.Santos, and B.F.Svaiter. Algebrai rules for
omput-ingthe regularization parameter of the LevenbergMarquardt method.
Computational Optimization and Appli ations, online rst:129, 2016.
doi: 10.1007/s10589-016-9845-x.
[11℄ Y.-X.Yuan. Re entadvan es innumeri almethodsfornonlinear
equa-tions and nonlinear least squares. Numeri al Algebra, Control and
Op-timization, 1(1):1534,2011. doi: 10.3934/na o.2011.1.15.
[12℄ J.E. Dennis, Jr. and R.B. S hnabel. Numeri al methods for
un on-strained optimization and nonlinear equations, volume 16of Classi s in
Applied Mathemati s. So iety for Industrial and Applied Mathemati s
(SIAM),Philadelphia,PA,1996. Corre ted reprintof the 1983original,
[13℄ J.No edal and S.J. Wright. Numeri aloptimization. Springer Series in
Operations Resear h. Springer-Verlag, New York, 1999. ISBN:
0-387-98793-2.
[14℄ D.S.Gonçalvesand S.A. Santos. Lo alanalysis of aspe tral orre tion
for the Gauss-Newton model applied to quadrati residual problems.
Numeri alAlgorithms,onlinerst:125,2016. doi:
10.1007/s11075-016-0101-3.
[15℄ J. Barzilai and J. Borwein. Two-point step size gradient
meth-ods. IMA Journal of Numeri al Analysis, 8(1):141148, 1988.
10.1093/imanum/8.1.141.
[16℄ M. Raydan. On the Barzilai and Borwein hoi e of steplength
for the gradient method. Te hni al Report TR90-11, Ri e
University, Houston, EUA, Revised O tober, 1991 1990.
http://www. aam.ri e.edu/te h_reports/1990/TR90-11.pdf.
[17℄ M. Raydan. The Barzilai and Borwein gradient method for the large
s aleun onstrainedminimizationproblem. SIAM Journalon
Optimiza-tion,7(1):2633, 1997. doi: 10.1137/S1052623494266365.
[18℄ E.G. Birgin, J. M. Martínez, and M. Raydan. Nonmonotone spe tral
proje tedgradientmethodson onvexsets. SIAM Journalon
Optimiza-tion,10(4):11961211, 2000. doi: 10.1137/S1052623497330963.
[19℄ E.G.Birgin,J.M.Martínez,andM.Raydan.Spe tralproje tedgradient
methods. InC. A. Floudasand P.M.Pardalos,editors,En y lopedia of
Optimization,pages 3652365. Springer, 2009.
[20℄ E.G. Birgin, J.M. Martínez, and M. Raydan. Spe tral proje ted
gradi-ent methods: Review and perspe tives. Journal of Statisti al Software,
60(3),2014. doi: 10.18637/jss.v060.i03, https://www.ime.usp.br/
egbir-gin/publi ations/bmr5.pdf.
[21℄ W.La Cruzand M. Raydan. Nonmonotonespe tral methods for
large-s alenonlinearsystems. OptimizationMethodsandSoftware,18(5):583
599, 2003. doi: 10.1080/10556780310001610493.
[22℄ W.La Cruz, J.M.Martínez, and M. Raydan. Spe tral residualmethod
of equations. Mathemati s of Computations, 75(255):14291448, 2006.
http://www.jstor.org/stable/4100282.
[23℄ J.E. Dennis, Jr., D.M. Gay, and R.E. Wels h. An adaptive nonlinear
least-squares algorithm. ACM Transa tions on Mathemati alSoftware,
7:348368, 1981. doi: 10.1145/355958.355965.
[24℄ J.J. Moré. The Levenberg-Marquardt algorithm: Implementation and
theory. In Numeri al Analysis, volume 630 of Le ture Notes in
Mathe-mati s,pages 105116.1978. doi: 10.1007/BFb0067700.
[25℄ J.J.Moré and D.C.Sorensen. Computing a trust regionstep. SIAM J.
S i. Statist. Comput., 4(3):553572, 1983. doi: 10.1137/0904038.
[26℄ H. Zhang and W.W. Hager. A nonmonotone linesear h te hnique and
its appli ation to un onstrained optimization. SIAM Journal on
Opti-mization,14(4):10431056, 2004. doi: 10.1137/S1052623403428208.
[27℄ J.No edaland Y.-X.Yuan. Advan es in Nonlinear Programming:
Pro- eedingsof the 96International Conferen eon Nonlinear Programming,
hapter Combining Trust Region and Line Sear h Te hniques, pages
153175. Springer US, Boston, MA, 1998. doi:
10.1007/978-1-4613-3335-7_7.
[28℄ J.J. Moré, B.S. Garbow, and K.E. Hillstrom. Testing un onstrained
optimizationsoftware. ACMTrans.Math. Softw.,7:1741,Mar h1981.
doi: 10.1145/355934.355936.
[29℄ L. Luk²an. Hybrid methods for large sparse nonlinear least squares.
Journal of Optimization Theory and Appli ations,89(3):575595,1996.
doi: 10.1007/BF02275350.
[30℄ E.D. Dolan and J.J. Moré. Ben hmarking optimization software with
performan e proles. Mathemati al Programming,91(2):201213,2002.