Contents lists available atScienceDirect
Insurance: Mathematics and Economics
journal homepage:www.elsevier.com/locate/imeFull Bayesian analysis of claims reserving uncertainty
Gareth W. Peters
a, Rodrigo S. Targino
a, Mario V. Wüthrich
b,c,∗aUniversity College London, Department of Statistical Science, UK bETH Zurich, RiskLab, Department of Mathematics, Switzerland cSwiss Finance Institute, Switzerland
a r t i c l e i n f o
Article history: Received May 2016 Received in revised form September 2016
Accepted 30 December 2016 Available online 5 January 2017 Keywords:
Chain-ladder method Claims reserving uncertainty Claims development result Mack’s formula
Merz–Wüthrich’s formula
Conditional mean square error of prediction Run-off uncertainty
Full Bayesian chain-ladder model
a b s t r a c t
We revisit the gamma–gamma Bayesian chain-ladder (BCL) model for claims reserving in non-life insurance. This claims reserving model is usually used in an empirical Bayesian way using plug-in estimates for the variance parameters. The advantage of this empirical Bayesian framework is that allows us for closed form solutions. The main purpose of this paper is to develop the full Bayesian case also considering prior distributions for the variance parameters and to study the resulting sensitivities.
© 2017 Elsevier B.V. All rights reserved.
1. Introduction
The chain-ladder (CL) algorithm is probably to most popular method to set the reserves for non-life insurance claims. Orig-inally, the CL method was introduced in a purely algorithmic fashion and it was not based on a stochastic model. Stochastic mod-els underpinning the CL algorithm with a statistical model were only developed much later. The two most commonly used stochas-tic representations areMack’s(1993) distribution-free CL model and the over-dispersed Poisson (ODP) model ofRenshaw and Ver-rall(1998) andEngland and Verrall(2002). In this paper we study the gamma–gamma Bayesian chain-ladder (BCL) model which pro-vides in its non-informative prior limit another stochastic repre-sentation for the CL method. This model was first considered in a claims reserving context byGisler(2006) andGisler and Wüthrich
(2008). The typical application of the gamma–gamma BCL model was done under fixed (given) variance parameters, using plug-in estimates for these variance parameters, see Example 2.13 plug-in
Wüthrich and Merz(2015) for such an empirical Bayesian
analy-sis. Of course, this (partially) contradicts the Bayesian paradigm. In a full Bayesian approach one should also model these variance
∗Corresponding author.
E-mail address:mario.wuethrich@math.ethz.ch(M.V. Wüthrich).
parameters with prior distributions. The aim of this paper is to study the influence of such a full Bayesian modeling approach and compare it to the empirical Bayesian modeling approach used in
Wüthrich and Merz(2015). In particular, we aim at analyzing the
sensitivity of the prediction uncertainty in the choice of these vari-ance parameters. This is crucial in solvency considerations and it improves the crude estimates usually used in practice.
We remark that Bayesian models have become very popular in claims reserving, recently. This popularity is partly explained by the fact that Bayesian models can be solved very efficiently with Markov chain Monte Carlo (MCMC) simulation methods. In this paper we aim at finding a class of Bayesian models that provides an analytical closed form solution for the posterior quantities of interest. This has the advantage of getting a deeper mathematical insight and of accelerating the calculations. For an overview on Bayesian methods in claims reserving and corresponding MCMC applications we refer to the literature from this growing area. Organization.
In the next section we introduce the gamma–gamma BCL model and we show that we recover the classical CL reserves in its non-informative prior limit. In Section 3 we provide the prediction uncertainty formulas in the long-term view and the short-term view, respectively. In Section4we give several real data examples and analyze the resulting sensitivities in the choice of the prior distributions. All proofs are provided inAppendix A.
http://dx.doi.org/10.1016/j.insmatheco.2016.12.007
2. Gamma–gamma Bayesian chain-ladder model
We introduce the gamma–gamma BCL model in this section. In contrast to Model Assumptions 2.6 inWüthrich and Merz(2015) we also model the variance parameters in a Bayesian way. We then derive the claims predictors in the non-informative prior limit which turn out to be identical to the classical CL predictors, as seen
inTheorem 2.2.
We denote accident years by 1
≤
i≤
I and development yearsby 0
≤
j≤
J. Throughout, we assume I>
J. The cumulativeclaim of accident year i after development year j is denoted by
Ci,j, and Ci,Jdenotes the ultimate claim of accident year i. For more background information on the claims reserving problem and the corresponding notation we refer to Chapter 1 in Wüthrich and Merz(2015). We make the following model assumptions. Model Assumptions 2.1 (Gamma–Gamma BCL Model).
(a) Conditionally given parameter vectors 2
=
Θ0, . . . ,
ΘJ−1
and
σ = σ
0, . . . , σ
J−1
, the cumulative claims
(
Ci,j)
0≤j≤Jare independent (in accident year i), and Markov processes (in development year j) with conditional distributions
Ci,j+1
{Ci,j,2,σ}∼
Γ
Ci,jσ
j−2,
Θjσ
−2 j
,
for all 1≤
i≤
I and 0≤
j≤
J−
1.(b) The parameter vectors2and
σ
are independent.(c) The componentsΘjof 2are independent andΓ
(γ
j,
fj(γ
j−
1))
-distributed with prior parameters fj
>
0 andγ
j>
1 for 0≤
j≤
J
−
1.(d) The components
σ
j ofσ
are independent andπ
j-distributedhaving support in
(
0,
dj)
for given constants 0<
dj< ∞
forall 0
≤
j≤
J−
1.(e)
(
2, σ)
and C1,0, . . . ,
CI,0are independent and Ci,0>
0, P-a.s., for all 1≤
i≤
I.These model assumptions imply that we have the following CL properties E
Ci,j+1
Ci,0, . . . ,
Ci,j,
2, σ =
Θj−1Ci,j,
(2.1) Var
Ci,j+1
Ci,0, . . . ,
Ci,j,
2, σ =
Θ −2 jσ
j2Ci,j.
(2.2) Thus, for given parameter vectors 2 andσ
we obtain a distributional example ofMack’s(1993) distribution-free CL model with CL factorsΘj−1and variance parametersΘj−2σ
j2. Moreover, forπ
j(·)
being single point masses for all 0≤
j≤
J−
1 we exactly obtain Model Assumptions 2.6 ofWüthrich and Merz(2015) assuming given (known) variance parameters
σ
2j.
The main task in claims reserving is to predict the ultimate claims Ci,J, given observations
Dt
= {C
i,j:
i+
j≤
t,
1≤
i≤
I,
0≤
j≤
J},
at time t≥
I. In complete analogy to the derivations in Section 2.2.1 ofWüthrichand Merz(2015), the application of Bayes’ rule provides posterior
π
for the parameters(
2, σ)
, conditionally given observationsDt, for t≥
I,π (θ, σ|
Dt) ∝
J−1
j=0
θ
γj+ (t−j−1)∧I i=1 Ci,j σ2 j −1 j×
exp
−
θ
j
fj(γ
j−
1) +
(t−j−1)∧I
i=1 Ci,j+1σ
2 j
×
(t−j−1)∧I
i=1
Ci,j+1 σ2 j
Ci,j σ2 j Γ
Ci,j σ2 j
π
j(σ
j)
.
(2.3)From this we see that the posteriors of
(
Θj, σ
j)
are independent for different development periods 0≤
j≤
J−
1. A non-informative prior limit forΘj corresponds here to lettingγ
j→
1 (and the terms fj(γ
j−
1)
will vanish in(2.3)). Therefore, we refer as the non-informative prior limit when the (component-wise) limitsγ →
1 are taken, where we setγ = (γ
0, . . . , γ
J−1)
and 1=
(
1, . . . ,
1)
.We have the following theorem for the claims prediction under this non-informative prior limit (the proof is given in theAppendix). Theorem 2.2 (CL Predictor). UnderModel Assumptions 2.1and for t
≥
I≥
i>
t−
n≥
t−
J, the Bayesian predictor for Ci,nin itsnon-informative prior limit is given by
lim γ→1E
Ci,n
Dt =
Ci,t−i n−1
j=t−i
f CL(t) j def.=
C CL(t) i,n,
with CL factorsfjCL(t)defined by
f CL(t) j=
(t−j−1)∧I
ℓ=1 Cℓ,j+1 (t−j−1)∧I
ℓ=1 Cℓ,j.
Theorem 2.2states that in the non-informative prior limit
γ →
1we exactly obtain the classical CL predictor, seeMack(1993). Thus, we have found another stochastic representation that underpins the CL algorithm with a statistical model. Therefore, we (may) use this model in its non-informative prior limit to analyze the prediction uncertainty of the CL algorithm. In contrast to Remark 2.11 ofWüthrich and Merz(2015) we now obtain this result in the full Bayesian framework, also considering prior distributions for the standard deviation parameters
σ
.3. Prediction uncertainty formulas 3.1. Long-term prediction uncertainty formula
The main purpose of this paper is to analyze the influence of the standard deviation parameters
σ
on the ultimate claim prediction uncertainty, where in contrast to Chapter 2 ofWüthrich and Merz(2015), these standard deviation parameters
σ
are also modeled with prior distributions. We analyze the prediction uncertainty at time t≥
I in terms of the conditional mean square error ofprediction (MSEP) given by msepCi,J|Dt
E
Ci,J
Dt =
E
Ci,J−
E
Ci,J
Dt
2
Dt
=
Var
Ci,J
Dt
=
Var
E
Ci,J
Dt, σ
Dt
+
E
Var
Ci,J
Dt, σ
Dt
.
(3.1) We aim at calculating this conditional MSEP in the gamma–gamma BCL model which provides, in its non-informative prior limitγ →
1, an uncertainty estimate for the CL algorithm, that is, we aim at calculating the limit msepCi,J|Dt
C CL(t) i,J
def.=
lim γ→1 msepCi,J|Dt
E
Ci,J
Dt
.
For 0
≤
j≤
J−
1 and t≥
I, we define on the intervals(
0,
dj)
the functions hj(σ
j|
Dt) =
Γ
1+
(t−j−1)∧I
i=1 Ci,j σ2 j
(t−j−1)∧I
i=1 Ci,j+1 σ2 j
1+ (t−j−1)∧I i=1 Ci,j σ2 j×
(t−j−1)∧I
i=1
Ci,j+1 σ2 j
Ci,j σ2 j Γ
Ci,j σ2 j
π
j(σ
j).
(3.2)This function(3.2)is the un-normalized marginal posterior density of
σ
j, givenDt, in the non-informative prior limit (we refer toProposition A.2, and in particular to formulas(A.5)–(A.6)for more
details). The following lemma proves integrability of hj
(σ
j|
Dt)
on the interval(
0,
dj)
(the proof is provided in theAppendix). Lemma 3.1. Choose 0≤
j≤
J−
1 and t≥
I. Assume that either(
t−
j−
1) ∧
I=
1 or that for the observed dataDtthere exists at least one accident year 1≤
i≤
(
t−
j−
1) ∧
I withCi,j+1 Ci,j
̸=
fjCL(t).
(3.3) Then, we obtain integrabilitykj
(
Dt) =
dj0
hj
(σ
j|
Dt)
dσ
j< ∞.
To obtain a finite conditional MSEP at time t
≥
I in thenon-informative prior limit we require for the observed dataDtthat
(t−j−1)∧I
ℓ=1
Cℓ,j
>
d2j,
holds for all t−
I≤
j≤
J−
1, (3.4) see also proof ofTheorem 3.2. We then define the variables Ψ(t) j=
σ
2 j (t−j−1)∧I
ℓ=1 Cℓ,j−
σ
j2.
Under(3.4)we obtain, P-a.s., 0
<
Ψj(t)<
d 2 j (t−j−1)∧I
ℓ=1 Cℓ,j−
d2j.
This implies that under(3.3)and(3.4)the following two expecta-tions are well-defined
E σ
2 j
1+
Ψj(t)
Dt
=
kj(
Dt)
−1
dj 0σ
2 j
1+
Ψj(t)
×
hj(σ
j|
Dt)
dσ
j,
(3.5)
E
Ψ(t) j
Dt
=
kj(
Dt)
−1
dj 0 Ψ(t) j hj(σ
j|
Dt)
dσ
j,
(3.6) where
E denotes the expectation w.r.t. the density
π
j(σ
j) =
kj(
Dt)
−1hj(σ
j|
Dt)
on the interval(
0,
dj)
. We have the following estimate of the conditional MSEP for the CL algorithm (the proof is provided in theAppendix).Theorem 3.2. Under Model Assumptions 2.1, (3.3) and (3.4), we have for t
≥
I≥
i>
t−
J msepCi,J|Dt
C CL(t) i,J
=
C CL(t) i,J
2
J−1 j=t−i 1
C CL(t) i,j×
E σ
2 j
1+
Ψj(t)
Dt
×
J−1
n=j+1
1+
E
Ψ(t) n
Dt +
C CL(t) i,J
2×
J−1
j=t−i
1+
E
Ψ(t) j
Dt
−
1
;
and the non-informative prior limit of the aggregated conditional MSEP is given by msep i Ci,J|Dt
i
CiCL,J(t)
=
i msepCi,J|Dt
C CL(t) i,J
+
2
i<m
C CL(t) i,J
C CL(t) m,J
J−1
j=t−i
1+
E
Ψ(t) j
Dt
−
1
,
where the summations run over t
−
J+
1≤
i≤
I and t−
J+
1≤
i<
m≤
I, respectively.We remark that one could derive similar prediction uncertainty formulas for informative priors with
γ >
1. However, the formu-las in the non-informative prior limitγ →
1 provide a conditional MSEP estimate for the classical CL algorithm because we have re-covered exactly the classical CL predictor in this limit. To evalu-ate this prediction uncertainty formulas we still need to calculevalu-ate the expected values given in(3.5)and(3.6). This can be done effi-ciently with importance sampling or using numerical integration (note that the corresponding density(3.2)is known up to the nor-malizing constant kj(
Dt)
), since one only needs to compute one-dimensional integrals.3.2. Short-term prediction uncertainty formula
Theorem 3.2evaluates the long-term prediction uncertainty of
the CL algorithm because it considers the prediction uncertainty until all claims are finally settled. Recent solvency developments require a second, short-term uncertainty view. This second un-certainty view studies changes of predictions (one-year updates) when new information becomes available. These one-year updates directly influence the earning statements of insurance companies and are therefore crucial for the financial stability of the companies (for more details we refer toEuropean Commission, 2009,Swiss
Financial Market, 2006). We recall the claims development result.
Choose t
≥
I≥
i>
t−
J and define the claims development resultof accident year i at time t
+
1 by CDR(it+1)=
E
Ci,J
Dt −
E
Ci,J
Dt+1
.
Due to the martingale property of successive predictions (assum-ing integrability) we have
E
CDR(it+1)
Dt
=
0.
For this reason the claims development result is predicted by 0. The conditional MSEP viewed from time t for this prediction is then
defined by (subject to existence) msepCDR(t+1) i |Dt
(
0) =
E
CDR(it+1)−
0
2
Dt
=
Var
E
Ci,J
Dt+1
Dt
.
(3.7) The following lemma (which holds for any claims reserving method) relates the short-term view(3.7)to the long-term view(3.1)(the proof is provided in theAppendix).
Lemma 3.3. Assume(3.1)exists. Then, for t
≥
I≥
i>
t−
J we havemsepCi,J|Dt
E
Ci,J
Dt ≥
msep CDR(it+1)|Dt(
0) .
The aim is to study(3.7)in the non-informative prior limit
γ →
1 of the gamma–gamma BCL model. We define the credibility weights for t≥
I≥
t−
j≥
1 and 0≤
j≤
J−
1 byα
(t) j=
Ct−j,j t−j
ℓ=1 Cℓ,j∈
(
0,
1]
.
We have the following theorem (the proof is provided in the
Appendix).
Theorem 3.4. Under Model Assumptions 2.1, (3.3) and(3.4), we have for t
≥
I≥
i>
t−
J msepCDR(t+1) i |Dt(
0)
def.=
lim γ→1msepCDR( t+1) i |Dt(
0)
=
C CL(t) i,J
2
1+
α
(tt−)i
−1
E
Ψ(t) t−i
Dt
×
J−1
j=t−i+1
1+
α
j(t)
E
Ψ(t) j
Dt
−
1
;
and the non-informative prior limit of the aggregated conditional MSEP is given by msep i CDR(it+1)|Dt
(
0)
def.=
lim γ→1msep i CDR(it+1)|Dt(
0)
=
i msep CDR(it+1)|Dt(
0) +
2
i<m
C CL(t) i,J
C CL(t) m,J×
1+
E
Ψ(t) t−i
Dt
J−1 j=t−i+1
1+
α
(jt)
E
Ψ(t) j
Dt
−
1
,
where the summations run over t−
J+
1≤
i≤
I and t−
J+
1≤
i<
m≤
I, respectively.Theorem 3.4provides a prediction uncertainty for the classical CL
predictor (obtained by considering the non-informative prior limit
γ →
1 in the gamma–gamma BCL model). 4. Numerical sensitivitiesIn this section we study several examples in order to analyze the sensitivities of the prediction uncertainty to the choice of the prior distributions
π
j(·)
forσ
j. We start by describing the empirical Bayesian case considered in Wüthrich and Merz(2015). In the empirical Bayesian case we use plug-in estimates
σ
jforσ
jwhich is equivalent to choosing a degenerate prior distribution functionπ
j(σ
j) = δ
{σj=σj}that has a point mass of size 1 in
σ
j. Therefore, wefirst need to provide the empirical estimates
σ
j. Observe that in the classical distribution-free CL model ofMack(1993) one identifiess2
j
=
Θ−2
j
σ
j2, compare formula (3) inMack(1993) to formula(2.2).Table 1
CL reservesRCLi (I)and rooted conditional long-term MSEPs in the non-informative prior limit of the gamma–gamma BCL model ofTheorem 3.2with priorsπj(σj) =
δ{σj=
σj}, Mack’s formula (Mack, 1993) and the conditional short-term MSEP of
Theorem 3.4for the data ofTable 2. Accident year i CL reserves RiCL(I) emp. Bayes’ msep1/2 Mack’s msep1/2 emp. CDR msep1/2 1 0 2 15.126 0.267 0.267 0.267 3 26.257 0.914 0.914 0.884 4 34.538 3.058 3.058 2.948 5 85.302 7.628 7.628 7.018 6 156.494 33.341 33.341 32.470 7 286.121 73.467 73.467 66.178 8 449.167 85.399 85.398 50.296 9 1043.242 134.338 134.337 104.311 10 3950.815 410.850 410.817 385.773 Total 6047.061 462.990 462.960 420.220
If we useMack’s(1993) estimates for these variance parameters we obtain at time t
=
I, for 0≤
j≤
(
J−
1) ∧ (
I−
3)
, the result
s 2 j=
1 I−
j−
2 I−j−1
ℓ=1 Cℓ,j
Cℓ,j+1 Cℓ,j−
fjCL(I)
2,
and if j=
J−
1=
I−
2, we set
s 2 J−1=
min
s 2 J−3,
s2J−2,
s4J−2/
s 2 J−3 =
min
s 2 J−3,
s4J−2/
s 2 J−3
.
SinceΘj−1plays the role of the CL factor, see(2.1), we set for the empirical standard deviation estimate of
σ
j
σ
j=
s 2 j
f CL(I) j,
for 0≤
j≤
J−
1.The model is now fully specified for obtaining the empiri-cal Bayesian estimate (Wüthrich and Merz, 2015) and Mack’s distribution-free CL estimate (Mack, 1993) for the CL reserves and the corresponding conditional MSEPs. We calculate these for the data given in Table A.1 ofWüthrich and Merz(2015), see also
Ta-ble 2in theAppendix(where we additionally provide the
parame-ter estimates
fCL(I)
j ,
sjand
σ
j).This then provides the results given inTable 1where the CL reserves are defined as the prediction of the outstanding loss liabilities at time t
=
I, for I≥
i>
I−
J given byRiCL(I)
=
CCL(I) i,J
−
Ci,I−i.
Note that these results coincide with Tables 2.4 and 2.5 ofWüthrich
and Merz(2015). Moreover, we remark that
I−j−1
ℓ=1 Cℓ,j>
σ
2 j for all 0≤
j≤
J−
1,which, in particular, means that (3.4) holds for any dj
∈
(
σ
j,
I−j−1
ℓ=1 Cℓ,j
)
.The goal now is to challenge these results by replacing the empirical estimates
σ
j by non-trivial prior densitiesπ
j(·)
forσ
j. To study the sensitivities we make different choices of the prior densityπ
j(·)
, namely, we choose integers k∈ {
2, . . . ,
20}
and select uniform priors forπ
j(·)
on the intervals(
0,
dj) = (
0,
k·
σ
j)
for 0≤
j≤
J−
1. (4.1)We emphasize that k
≥
2, and these choices guarantee that the support(
0,
dj)
ofπ
j(·)
contains the empirical estimate
σ
jand forincreasing k the prior distribution
π
j(·)
ofσ
jgets less informative; for k=
20 there is a positive probability thatσ
jis up to 20 times as large as the empirical estimate
σ
j. Moreover, we note that in all our examples relation(3.4)is fulfilled for the maximal dj=
20·
σ
j.Fig. 1. Un-normalized posteriors hj(σj|DI)for 0≤j≤8 and uniform priorsπj(·)with k=2,5, the x-axis is truncated at 6·σj. (For interpretation of the references to color
in this figure legend, the reader is referred to the web version of this article.)
For these choices k
∈ {
2, . . . ,
20}
in(4.1)we calculate the rooted conditional long-term MSEP for aggregated accident years provided inTheorem 3.2and the rooted conditional CDR MSEP (short-term view) for aggregated accident years provided inThe-orem 3.4. By a slight abuse of notation we identify k
=
1 withthe empirical Bayesian estimate provided inTable 1(note that this corresponds to the priors
π
j(σ
j) = δ
{σj=σj}). The only remainingdifficulty for this analysis is the calculation of the one-dimensional integrals(3.5)–(3.6). This can be done either with importance sam-pling or with numerical integration. We have seen that in our ex-amples it was sufficiently precise to divide the intervals
(
0,
dj)
into M=
1′000 equally spaced sub-intervals(
aj(m),
a(jm+1))
withaj(1)
=
0 and aj(M+1)=
djand then approximate the denominatorby the discrete sum
kj
(
DI) =
dj 0 hj(σ
j|
DI)
dσ
j≈
M
m=1 1 a(jm+1)−
a(jm) hj (
a (m) j+
a( m+1) j)/
2
DI ,
and analogously for the remaining terms. InFig. 1we provide these (un-normalized) posteriors hj
(σ
j|
DI)
for 0≤
j≤
J−
1 and for uni-form priorsπ
j(·)
having k=
2 (in red) and k=
5 (in blue), the dotted green line provides the empirical estimate
σ
j. We see that the bigger development period j the more volatile is the posterior ofσ
j. The extreme case j=
J−
1=
I−
2 means that we haveFig. 2. Example ofTables 1and2, the x-axis displays the different choices k∈ {2, . . . ,20}in(4.1)and k=1 corresponds to the empirical Bayesian estimate: (left-hand side) CL reserves
iR CL(I)
i aggregated over all accident years 2≤i≤I and confidence bounds of±2 rooted conditional MSEPs in the long-term view (total MSEP over all development periods) and the short-term view (CDR MSEP); (right-hand side) rooted conditional MSEPs for k∈ {2, . . . ,20}divided by the corresponding empirical rooted conditional MSEPs (which corresponds to k=1).
Fig. 3. Motor third party liability (MTPL) insurance portfolio of a 22×16 trapezoid; relative increase is 18% (rooted total MSEP) and 16% (rooted CDR MSEP).
only one single observation C1,Jfrom which we cannot estimate a variance parameter and, henceforth, the posterior of
σ
J−1remainsa uniform distribution, seeFig. 1(bottom right-hand side), this is also seen in derivation(A.3).
The conditional MSEP results are presented inFig. 2. The left-hand side of Fig. 2 gives the CL reserves
iR CL(I)
i aggregated over all accident years 2
≤
i≤
I and the correspondingconfi-dence bounds of
±
2msep1/2iCi,J|DI
(
i
CCL(I)
i,J
)
, seeTheorem 3.2, and±
2msep1/2iCDR( I+1)
i |DI
(
0
)
, seeTheorem 3.4, for the different values of k∈ {
2, . . . ,
20}
. As expected, we see that the confidence bounds increase with increasing prior uncertainty inπ
j(·)
(i.e. increasingk), however the graph seems to stabilize rather quickly (around k
=
5) in this example. The right-hand side ofFig. 2shows this in-crease relative to the empirical Bayesian case. We observe an over-all increase of roughly 34% (rooted total MSEP) and 29% (rooted CDR MSEP) relative to the empirical Bayesian one. Thus, we observe an increase, but not a dramatic one. We have chosen uniform priors forπ
j(σ
j)
to obtain these findings. If alternatively we had chosen priors with a unique mode in
σ
j(and the same support as theuni-form ones) the results would be closer to the empirical Bayesian ones because in this second case the priors are more informative (around the empirical estimates
σ
j).In the remainder we present more examples of different lines of business. InFig. 3we consider a motor third party liability (MTPL) insurance portfolio. For this portfolio we have I
=
22 accident years and final development period J=
15. We see a similar pic-ture as in the previous example: the increase of the rooted condi-tional MSEP stabilizes around k=
5 and the total increase relative to the empirical Bayesian estimate is roughly 18% and 16%, respec-tively. The reason for this fast convergence is that the posterior un-certainty inσ
j, givenDI, is rather small for all development periodsj regardless of the choice of the prior distribution. This comes from
the fact that we have a large trapezoid (22
×
16) and even for the latest development period J=
15 we have 7 observations.InFig. 4we consider a general liability insurance portfolio of
size I
=
21 and J=
13, the relative increase amounts to roughly 14% and 12%, respectively, and the picture is very similar to MTPL insurance because posterior uncertainty inσ
jis again small due to the size of the trapezoid and the fact that we have 8 observations in the last column J=
13.Fig. 4. General liability insurance portfolio of a 21×14 trapezoid; relative increase is 14% (rooted total MSEP) and 12% (rooted CDR MSEP).
Fig. 5. Industrial property insurance portfolio of a 15×7 trapezoid; relative increase is 14% (rooted total MSEP) and 13% (rooted CDR MSEP).
InFig. 5we consider an industrial property insurance portfolio
of size I
=
15 and J=
6, the relative increase amounts to roughly 14% and 13%, respectively. Thus, again we obtain a rather similar picture to the previous examples. Note that here we have 9 observations in the last column J=
6.As a rule of thumb we can say that if the right-end point of the posterior functions hj
(σ
j|
DI)
, given inFig. 1, is sufficiently smooth then increasing the support of the uniform priorπ
j(·)
does not substantially increase the uncertainty estimates. This holds true as long as we do not get too close to the threshold(3.4)because at the threshold the conditional MSEP becomes infinite, see also Theorem 2.12 inWüthrich and Merz(2015).5. Conclusions
We have considered the full Bayesian version of the gamma– gamma BCL model. This model provides the CL reserves in its non-informative prior limit. Therefore, it can be used to analyze the prediction uncertainty of the CL claims reserving method. The main purpose of this paper was to analyze the contribution of the variance parameters to the prediction uncertainty in the case where these variance parameters are also modeled in a Bayesian way. In our examples this additional source of uncertainty increases the rooted conditional MSEP between 10%
and 40% (compared to the empirical Bayesian version presented
inWüthrich and Merz, 2015). This increase is bigger for triangles
I
=
J−
1 and becomes the smaller for trapezoids the bigger the difference I−
J is.Appendix A. Proofs
Proof of Theorem 2.2. For explicit details we refer to Section 2.2.1
inWüthrich and Merz(2015). Since conditionally on
σ
the ModelAssumptions 2.6 in Wüthrich and Merz(2015) are fulfilled, we obtain the following formula from Lemma 2.7, Corollary 2.8 and Theorem 2.10 ofWüthrich and Merz(2015), for t
≥
I≥
i>
t−n≥
t−
J, E
Ci,n
Dt, σ =
Ci,t−i n−1
j=t−iω
(t) j
f CL(t) j+
(
1−
ω
(t) j)
fj ,
(A.1)with credibility weights
ω
(t) j=
(t−j−1)∧I
ℓ=1 Cℓ,j (t−j−1)∧I
ℓ=1 Cℓ,j+
σ
j2(γ
j−
1)
∈
(
0,
1).
This implies using the tower property and posterior independence (2.3) E
Ci,n
Dt =
E
E
Ci,n
Dt, σ
Dt
=
Ci,t−i n−1
j=t−i E ω
(jt)
f CL(t) j+
(
1−
ω
( t) j)
fj
Dt
=
Ci,t−i n−1
j=t−i
E ω
(jt)
Dt
f CL(t) j+
1−
E ω
(jt)
Dt
fj .
There remains the consideration of the conditional expectations of the credibility weights. The bounded support assumption on
σ
j implies, P-a.s.,ω
(t) j def.=
(t−j−1)∧I
ℓ=1 Cℓ,j (t−j−1)∧I
ℓ=1 Cℓ,j+
d2j(γ
j−
1)
≤
ω
(jt)≤
1.
(A.2)Since theDt−1-measurable lower bound
ω
(jt)converges to 1 asγ
j→
1, the claim follows.Next we proveLemma 3.1which provides integrability of hj
(σ
j|
Dt)
on the interval(
0,
dj)
. We anticipate that the function hj(·|
Dt)
can be seen as the non-informative prior limit of the un-normalized posterior density ofσ
j, conditionally given the observationsDt. We provide the explicit explanation, the relevant notation and the corresponding convergence statement inProposition A.2and formula(A.6).Proof of Lemma 3.1. The case
(
t−
j−
1) ∧
I=
1 can easily be treated because in that case we havekj
(
Dt) =
dj 0 C1,j C1,j+1π
j(σ
j)
dσ
j=
C1,j C1,j+1< ∞.
(A.3) To prove the integrability on(
0,
dj)
in the case(
t−
j−
1) ∧
I>
1 we need to check that hj(σ
j|
Dt)
is well-behaved forσ
j→
0. We consider the following logarithmlog
hj(σ
j|
Dt)
=
log
σ
2 j
(t−j−1)∧I
i=1 Ci,j+1
− 1+ (t−j−1)∧I i=1 Ci,j σ2 j (t−j−1)∧I
i=1 C Ci,j σ2 j i,j+1
+
logΓ
1+
(t−j−1)∧I
i=1 Ci,jσ
2 j
−
(t−j−1)∧I
i=1 logΓ
Ci,jσ
2 j
+
log
π
j(σ
j) .
(A.4) The first line on the right-hand side behaves forσ
j→
0 aslog
σ
2 j +
(t−j−1)∧I
i=1 Ci,jσ
2 j log
Ci,j+1 (t−j−1)∧I
ℓ=1 Cℓ,j+1
+
O(
1).
The terms involving the gamma functions Γ
(·)
are treated by Stirling’s formula logΓ(
x) =
1 2log(
2π) +
x−
1 2
log x−
x+
o(
x),
as x→ ∞
.The identityΓ
(
1+x
) =
xΓ(
x)
and Stirling’s formula applied to the first two terms on the second line on the right-hand side of(A.4)provide for
σ
j→
0 log
(t−j−1)∧I
i=1 Ci,jσ
2 j
+
(t−j−1)∧I
i=1 Ci,jσ
2 j−
1 2
×
log
(t−j−1)∧I
i=1 Ci,jσ
2 j
−
(t−j−1)∧I
i=1 Ci,jσ
2 j−
(t−j−1)∧I
i=1
Ci,jσ
2 j−
1 2
log
Ci,jσ
2 j
−
Ci,jσ
2 j
+
O(
1)
= −
log(σ
j2) +
(t−j−1)∧I
i=1 Ci,jσ
2 j log
(t−j−1)∧I
ℓ=1 Cℓ,j Ci,j
+
1−
(
t−
j−
1) ∧
I 2 log
σ
2 j +
O(
1).
If we merge these two terms we obtain for
σ
j→
0 behavior(t−j−1)∧I
i=1 Ci,jσ
2 j log
Ci,j+1 Ci,j (t−j−1)∧I
ℓ=1 Cℓ,j (t−j−1)∧I
ℓ=1 Cℓ,j+1
+
1−
(
t−
j−
1) ∧
I 2 log(σ
2 j) +
O(
1)
=
1σ
2 j
(t−j−1)∧I
i=1 Ci,jlog
Ci,j+1 Ci,j
−
(t−j−1)∧I
i=1 Ci,jlog
f CL(t) j
+
1−
(
t−
j−
1) ∧
I 2 log(σ
2 j) +
O(
1).
By assumption, for at least one accident year 1
≤
i≤
(
t−
j−1)∧
Iwe have Ci,j+1
/
Ci,j̸=
fCL(t)
j . Therefore, Jensen’s inequality is strict in the following derivation
(t−j−1)∧I
i=1 Ci,jlog
Ci,j+1 Ci,j
=
(t−j−1)∧I
ℓ=1 Cℓ,j×
(t−j−1)∧I
i=1 Ci,j (t−j−1)∧I
ℓ=1 Cℓ,j log
Ci,j+1 Ci,j
<
( t−j−1)∧I
ℓ=1 Cℓ,jlog
(t−j−1)∧I
i=1 Ci,j (t−j−1)∧I
ℓ=1 Cℓ,j Ci,j+1 Ci,j
=
(t−j−1)∧I
i=1 Ci,jlog
f CL(t) j .
Therefore, we can define the following strictly positive constant
ε =
( t−j−1)∧I
i=1 Ci,jlog
f CL(t) j
−
(t−j−1)∧I
i=1 Ci,jlog
Ci,j+1 Ci,j
>
0.
This implies for
σ
j→
0 and a strictly positive constantε >
0 hj(σ
j|
Dt)
=
exp
−
ε
σ
2 j+
1−
(
t−
j−
1) ∧
I 2 log(σ
2 j) +
O(
1)
π
j(σ
j).
This provides integrability in 0 and completes the proof.Lemma A.1. Under Model Assumptions 2.1, we have in the non-informative prior limit for t
≥
I≥
i>
t−
J (subject to existence) msepCi,J|Dt
C CL(t) i,J
=
lim γ→1msepCi,J|Dt
E
Ci,J
Dt
=
lim γ→1E
Var
Ci,J
Dt, σ
Dt
.
Proof of Lemma A.1. We have for the first term in the conditional MSEP decomposition(3.1), using(A.1), see also Lemma 2.7 and Corollary 2.8 inWüthrich and Merz(2015), and using posterior independence(2.3) Var
E
Ci,J
Dt, σ
Dt
=
Ci2,t−iVar
J−1
j=t−iω
(t) j
f CL(t) j+
(
1−
ω
(t) j)
fj
Dt
=
Ci2,t−i
J−1
j=t−i E
ω
(t) j
f CL(t) j+
(
1−
ω
(t) j)
fj
2
Dt
−
J−1
j=t−i E ω
j(t)
f CL(t) j+
(
1−
ω
(t) j)
fj
Dt
2
.
Since
ω
(jt)is sandwiched, P-a.s., between aDt−1-measurable lowerbound
ω
(jt)that converges to 1 asγ
j→
1 and 1, see(A.2), we obtain that the right-hand side of the last identity converges to 0 asγ →
1. This proves the lemma.Formula(2.3)provides the posterior (marginal) distribution of
σ
j, conditionally givenDt. This posterior distribution depends on the explicit choice of the prior parameterγ
j>
1, to illustrate this we use the following notationπ
j(γj)(σ
j|
Dt) = π
j(σ
j|
Dt)
.Proposition A.2. Choose 0
≤
j≤
J−
1 and t≥
I, and assume that either(
t−j−
1)∧
I=
1 or that for the observed dataDtthere exists at least one accident year 1≤
i≤
(
t−j−
1)∧
I with Ci,j+1/
Ci,j̸=
fCL(t) j .
Consider Sj(γj)
∼
π
j(γj)(σ
j|
Dt)
. We have the following convergence in distribution lim γj→1 Sj(γj)(=
d)Sj∼
π
j(σ
j) =
kj(
Dt)
−1h j(σ
j|
Dt).
We use the notation Sj(γj)for the standard deviation parameter
σ
j in the above proposition to more clearly indicate the dependence of the law of the corresponding random variable Sj(γj) from the choice of the prior parameterγ
jand to clearly distinguish random variables Sj(γj)from their potential realizationsσ
jin the domain(
0,
dj)
.Proof of Proposition A.2. Convergence in distribution is implied by the corresponding convergence of the moment generating functions, that is, for any r
∈
R we aim at provinglim γj→1E
exp{rS
j(γj)}
Dt
=
E
exp{rS
j}
Dt
.
From(2.3)we obtain posterior distribution, conditionally given
Dt, note that we have kernels of gamma densities of
θ
j informula(2.3),
π(σ|
Dt) =
π(θ, σ|
Dt)
dθ ∝
J−1
j=0×
fj(γ
j−
1) +
(t−j−1)∧I
i=1 Ci,j+1σ
2 j
− γj+ (t−j−1)∧I i=1 Ci,j σ2 j ×
Γ
γ
j+
(t−j−1)∧I
i=1 Ci,jσ
2 j
×
(t−j−1)∧I
i=1
Ci,j+1 σ2 j
Ci,j σ2 j Γ
Ci,j σ2 j
π
j(σ
j)
.
(A.5)This again provides posterior independence between the different development years 0
≤
j≤
J−
1. We define the corresponding marginal functions on the intervals(
0,
dj)
h(γj j)
(σ
j|
Dt) =
Γ
γ
j+
(t−j−1)∧I
i=1 Ci,j σ2 j
fj(γ
j−
1) +
(t−j−1)∧I
i=1 Ci,j+1 σ2 j
γj+ (t−j−1)∧I i=1 Ci,j σ2 j×
(t−j−1)∧I
i=1
Ci,j+1 σ2 j
Ci,j σ2 j Γ
Ci,j σ2 j
π
j(σ
j),
(A.6)thus the density of Sj(γj)is given by
π
(γj) j(σ
j|
Dt) =
h(γj j)(σ
j|
Dt)
dj 0 h (γj) j(σ
j|
Dt)
dσ
j,
and we aim at proving the following convergence of moment generating functions lim γj→1E
exp{rS
j(γj)}
Dt
=
lim γj→1
dj 0 erσjh (γj) j(σ
j|
Dt)
dσ
j
dj 0 h (γj) j(σ
j|
Dt)
dσ
j=
dj 0 erσjhj(σ
j|
Dt)
dσ
j kj(
Dt)
=
E
exp{rS
j}
Dt
.
Firstly, observe that we have the following point-wise (in
σ
j) convergence on the interval(
0,
dj)
lim
γj→1
h(γjj)
(σ
j|
Dt) =
hj(σ
j|
Dt).
Thus, there remains to prove the existence of a uniform integrable upper bound such that one can apply Lebesgue’s dominated convergence theorem which proves the claim.
To find an integrable upper bound first we remark that for all
r
∈
R we have uniform bound on(
0,
dj)
There remains the consideration of h(γj j)
(σ
j|
Dt)
. We set cj=
(t−j−1)∧I i=1 Ci,jand cj+1=
(t−j−1)∧I i=1 Ci,j+1and p(σ
j) =
(t −j−1)∧I i=1
Ci,j+1 σ2 j
Ci,j σ2 j Γ
Ci,j σ2 j
−1π
j(σ
j)
. Then we have h(γj j)(σ
j|
Dt) =
fj(γ
j−
1) +
cj+1/σ
j2
−γj+cj/σj2 ×
Γ
γ
j+
cj/σ
j2
p(σ
j)
=
p(σ
j)
∞ 0 xγj+cj/σj2−1×
exp{−
(
fj(γ
j−
1) +
cj+1/σ
j2)
x}dx=
p(σ
j)
∞ 0 exp
γ
j−
1
log x−
fjx
×
xcj/σj2exp{−xc
j+1/σ
j2}
dx.
Define the functionℓ(
x) =
log x−
fjx. It satisfies limx→0
ℓ(
x) = −∞
and xlim→∞ℓ(
x) = −∞.
There are two cases:
(i)
ℓ(
x) ≤
0 for all x∈ [
0, ∞)
. In this case we have forγ
j≥
1h(γj j)
(σ
j|
Dt) =
p(σ
j)
∞ 0 exp
γ
j−
1
ℓ(
x)
xcj/σj2×
exp{−xc
j+1/σ
j2}
dx≤
p(σ
j)
∞ 0 xcj/σj2exp{−xc
j+1/σ
j2}dx
=
p(σ
j)
Γ(
1+
cj/σ
j2)
(
cj+1/σ
j2)
1+cj/σj2=
hj(σ
j|
Dt).
Lemma 3.1then provides the integrable upper bound in the first
case (i) and we can apply Lebesgue’s dominated convergence theorem which proves the claim.
(ii) There exists x
∈
(
0, ∞)
such thatℓ(
x) >
0. In this case concavity of log x andℓ(
x)
, respectively, imply that there exist 0<
x1<
x2< ∞
such thatℓ(
x) ≤
0 on[
0,
x1] ∪ [x
2, ∞)
andℓ(
x) >
0 on[
x1,
x2]
. We then have h(γj j)(σ
j|
Dt) ≤
p(σ
j)
∞ 0 xcj/σj2exp{−xc
j+1/σ
j2}
dx+
p(σ
j)
x2 x1
exp
γ
j−
1
ℓ(
x) −
1
×
xcj/σj2exp{−xc
j+1/σ
j2}
dx.
Consider the function x→
exp
γ
j−
1
ℓ(
x)
forγ
j>
1; it takes its maximum in x=
fj−1∈ [x
1,
x2]
and we haveexp
γ
j−
1
ℓ(
fj−1) =
exp
γ
j−
1
(
log fj−1−
1) .
Note that in case (ii) we have by assumption
ℓ(
x) >
0. This impliesℓ(
fj−1) >
0, thus log fj−1−
1>
0 and fj−1>
e. Since we considerthe limit
γ
j→
1 we may (and will) w.l.o.g. assume thatγ
j≤
2. This implies in case (ii)K
=
sup1≤γj≤2,x1≤x≤x2
exp
γ
j−
1
ℓ(
x) = (
fje)
−1∈
(
1, ∞).
Thus, in this second case (ii) we have for 1≤
γ
j≤
2h(γj j)
(σ
j|
Dt) ≤
K hj(σ
j|
Dt),
and we have an integrable upper bound also in this case. This finishes the proof ofProposition A.2.
Proof of Theorem 3.2. Due toLemma A.1it is sufficient to study the limit lim γ→1E
Var
Ci,J
Dt, σ
Dt
.
The inner conditional variance Var
Ci,J
Dt, σ
exactlycorre-sponds to the situation of known standard deviation parameters
σ
. We can therefore directly apply Theorem 2.12 ofWüthrich and Merz(2015) to obtain this conditional variance. Assumption(3.4)implies that this conditional variance is finite, P-a.s., in
σ
; and pos-terior independence of parameters, see(A.5), implies that we can decouple the corresponding (outer) conditional expectations. This provides the following identityE
Var
Ci,J
Dt, σ
Dt
=
Ci,t−i J−1
j=t−i
j−1
m=t−i E
fmσ(t)
Dt
×
E
σ
2 j
fσ( t) j
2
1+
Φjγj(t)
Dt
×
J−1
n=j+1 E
fnσ(t)
2
1+
Φγn(t) n
Dt
+
Ci2,t−i
J−1
j=t−i E
fσ( t) j
2
1+
Φjγj(t)
Dt
−
J−1
j=t−i E
fσ( t) j
2
Dt
,
with Φγj(t) j=
σ
2 j (t−j−1)∧I
ℓ=1 Cℓ,j+
σ
j2(γ
j−
2)
,
and conditional BCL factor
fσ( t) j=
ω
(t) j
f CL(t) j+
(
1−
ω
(t) j)
fj.
(A.8) Recall(A.2) which saysω
(jt)≤
ω
j(t)≤
1, P-a.s., with Dt−1-measurable lower bound
ω
(jt)that converges to 1 asγ
j→
1. This implies, P-a.s.,ω
(t) j
f CL(t) j≤
f σ(t) j≤
f CL(t) j+
(
1−
ω
(t) j)
fj,
(A.9) with both bounds beingDt-measurable. Therefore, we obtain lim γ→1E
fjσ(t)
Dt
=
fjCL(t) and lim γ→1E
fσ( t) j
2
Dt
=
f CL(t) j
2.
Using(A.9)once more we have (note that(3.4)impliesΦjγj(t)