Full Bayesian analysis of claims reserving uncertainty

(1)

Contents lists available atScienceDirect

Insurance: Mathematics and Economics

journal homepage:www.elsevier.com/locate/ime

Full Bayesian analysis of claims reserving uncertainty

Gareth W. Peters

a

, Rodrigo S. Targino

a

, Mario V. Wüthrich

b,c,∗

a_{University College London, Department of Statistical Science, UK} b_{ETH Zurich, RiskLab, Department of Mathematics, Switzerland} c_{Swiss Finance Institute, Switzerland}

a r t i c l e i n f o

Article history: Received May 2016 Received in revised form September 2016

Accepted 30 December 2016 Available online 5 January 2017 Keywords:

Chain-ladder method Claims reserving uncertainty Claims development result Mack’s formula

Merz–Wüthrich’s formula

Conditional mean square error of prediction Run-off uncertainty

Full Bayesian chain-ladder model

a b s t r a c t

We revisit the gamma–gamma Bayesian chain-ladder (BCL) model for claims reserving in non-life insurance. This claims reserving model is usually used in an empirical Bayesian way using plug-in estimates for the variance parameters. The advantage of this empirical Bayesian framework is that allows us for closed form solutions. The main purpose of this paper is to develop the full Bayesian case also considering prior distributions for the variance parameters and to study the resulting sensitivities.

1. Introduction

The chain-ladder (CL) algorithm is probably to most popular method to set the reserves for non-life insurance claims. Orig-inally, the CL method was introduced in a purely algorithmic fashion and it was not based on a stochastic model. Stochastic mod-els underpinning the CL algorithm with a statistical model were only developed much later. The two most commonly used stochas-tic representations areMack’s(1993) distribution-free CL model and the over-dispersed Poisson (ODP) model ofRenshaw and Ver-rall(1998) andEngland and Verrall(2002). In this paper we study the gamma–gamma Bayesian chain-ladder (BCL) model which pro-vides in its non-informative prior limit another stochastic repre-sentation for the CL method. This model was first considered in a claims reserving context byGisler(2006) andGisler and Wüthrich

(2008). The typical application of the gamma–gamma BCL model was done under fixed (given) variance parameters, using plug-in estimates for these variance parameters, see Example 2.13 plug-in

Wüthrich and Merz(2015) for such an empirical Bayesian

analy-sis. Of course, this (partially) contradicts the Bayesian paradigm. In a full Bayesian approach one should also model these variance

∗_{Corresponding author.}

E-mail address:mario.wuethrich@math.ethz.ch(M.V. Wüthrich).

parameters with prior distributions. The aim of this paper is to study the influence of such a full Bayesian modeling approach and compare it to the empirical Bayesian modeling approach used in

Wüthrich and Merz(2015). In particular, we aim at analyzing the

sensitivity of the prediction uncertainty in the choice of these vari-ance parameters. This is crucial in solvency considerations and it improves the crude estimates usually used in practice.

We remark that Bayesian models have become very popular in claims reserving, recently. This popularity is partly explained by the fact that Bayesian models can be solved very efficiently with Markov chain Monte Carlo (MCMC) simulation methods. In this paper we aim at finding a class of Bayesian models that provides an analytical closed form solution for the posterior quantities of interest. This has the advantage of getting a deeper mathematical insight and of accelerating the calculations. For an overview on Bayesian methods in claims reserving and corresponding MCMC applications we refer to the literature from this growing area. Organization.

In the next section we introduce the gamma–gamma BCL model and we show that we recover the classical CL reserves in its non-informative prior limit. In Section 3 we provide the prediction uncertainty formulas in the long-term view and the short-term view, respectively. In Section4we give several real data examples and analyze the resulting sensitivities in the choice of the prior distributions. All proofs are provided inAppendix A.

http://dx.doi.org/10.1016/j.insmatheco.2016.12.007

(2)

2. Gamma–gamma Bayesian chain-ladder model

We introduce the gamma–gamma BCL model in this section. In contrast to Model Assumptions 2.6 inWüthrich and Merz(2015) we also model the variance parameters in a Bayesian way. We then derive the claims predictors in the non-informative prior limit which turn out to be identical to the classical CL predictors, as seen

inTheorem 2.2.

We denote accident years by 1

≤

i

≤

I and development years

by 0

≤

j

≤

J. Throughout, we assume I

>

J. The cumulative

claim of accident year i after development year j is denoted by

C_i_,_j, and C_i_,_Jdenotes the ultimate claim of accident year i. For more background information on the claims reserving problem and the corresponding notation we refer to Chapter 1 in Wüthrich and Merz(2015). We make the following model assumptions. Model Assumptions 2.1 (Gamma–Gamma BCL Model).

(a) Conditionally given parameter vectors 2

=



Θ₀

, . . . ,

ΘJ−1



and

σ = σ

0

, . . . , σ

J−1



, the cumulative claims

(

Ci,j

)

0≤j≤J

are independent (in accident year i), and Markov processes (in development year j) with conditional distributions

Ci,j+1



{Ci,j,2,σ}

∼

Γ



Ci,j

σ

j−2

,

Θj

σ

−2 j



,

for all 1

≤

i

≤

I and 0

≤

j

≤

J

−

1.

(b) The parameter vectors2and

σ

are independent.

(c) The componentsΘjof 2are independent andΓ

(γ

j

,

fj

(γ

j

−

1

))

-distributed with prior parameters fj

>

0 and

γ

j

>

1 for 0

≤

j

≤

J

−

1.

(d) The components

σ

j of

σ

are independent and

π

j-distributed

having support in

(

0

,

dj

)

for given constants 0

<

dj

< ∞

for

all 0

≤

j

≤

J

−

1.

(e)

(

2

, σ)

and C1,0

, . . . ,

CI,0are independent and Ci,0

>

0, P-a.s., for all 1

≤

i

≤

I.

These model assumptions imply that we have the following CL properties E



Ci,j+1



C_i,₀

, . . . ,

C_i,_j

,

2

, σ =

Θ_j−1C_i,_j

,

(2.1) Var



Ci,j+1



C_i,₀

, . . . ,

C_i,_j

,

2

, σ =

Θ −2 j

σ

j2Ci,j

.

(2.2) Thus, for given parameter vectors 2 and

σ

we obtain a distributional example ofMack’s(1993) distribution-free CL model with CL factorsΘ_j−1and variance parametersΘ_j−2

σ

_j2. Moreover, for

π

j

(·)

being single point masses for all 0

≤

j

≤

J

−

1 we exactly obtain Model Assumptions 2.6 ofWüthrich and Merz

(2015) assuming given (known) variance parameters

σ

2

j.

The main task in claims reserving is to predict the ultimate claims Ci,J, given observations

Dt

= {C

i,j

:

i

+

j

≤

t

,

1

≤

i

≤

I

,

0

≤

j

≤

J}

,

at time t

≥

I. In complete analogy to the derivations in Section 2.2.1 ofWüthrich

and Merz(2015), the application of Bayes’ rule provides posterior

π

for the parameters

(

2

, σ)

, conditionally given observationsDt, for t

≥

I,

π (θ, σ|

Dt

) ∝

J−1



j=0







θ

γj+ (t−j−1)∧I  i=1 Ci,j σ2 j −1 j

×

exp



−

θ

_j



fj

(γ

j

−

1

) +

(t−j−1)∧I



i=1 Ci,j+1

σ

2 j



×

(t−j−1)∧I



i=1



Ci,j+1 σ2 j



Ci,j σ2 j Γ



Ci,j σ2 j



π

j

(σ

j

)







.

(2.3)

From this we see that the posteriors of

(

Θj

, σ

j

)

are independent for different development periods 0

≤

j

≤

J

−

1. A non-informative prior limit forΘj corresponds here to letting

γ

j

→

1 (and the terms fj

(γ

j

−

1

)

will vanish in(2.3)). Therefore, we refer as the non-informative prior limit when the (component-wise) limits

γ →

1 are taken, where we set

γ = (γ

0

, . . . , γ

J−1

)

and 1

=

(

1

, . . . ,

1

)

.

We have the following theorem for the claims prediction under this non-informative prior limit (the proof is given in theAppendix). Theorem 2.2 (CL Predictor). UnderModel Assumptions 2.1and for t

≥

I

≥

i

>

t

−

n

≥

t

−

J, the Bayesian predictor for Ci,nin its

non-informative prior limit is given by

lim γ→1E



Ci,n



Dt

 =

C_i,_t−i n−1



j=t−i



f CL(t) j def.

=



C CL(t) i,n

,

with CL factorsf_jCL(t)defined by



f CL(t) j

=

(t−j−1)∧I



ℓ=1 C_ℓ,j+1 (t−j−1)∧I



ℓ=1 C_ℓ,j

.

Theorem 2.2states that in the non-informative prior limit

γ →

1

we exactly obtain the classical CL predictor, seeMack(1993). Thus, we have found another stochastic representation that underpins the CL algorithm with a statistical model. Therefore, we (may) use this model in its non-informative prior limit to analyze the prediction uncertainty of the CL algorithm. In contrast to Remark 2.11 ofWüthrich and Merz(2015) we now obtain this result in the full Bayesian framework, also considering prior distributions for the standard deviation parameters

σ

.

3. Prediction uncertainty formulas 3.1. Long-term prediction uncertainty formula

The main purpose of this paper is to analyze the influence of the standard deviation parameters

σ

on the ultimate claim prediction uncertainty, where in contrast to Chapter 2 ofWüthrich and Merz

(2015), these standard deviation parameters

σ

are also modeled with prior distributions. We analyze the prediction uncertainty at time t

≥

I in terms of the conditional mean square error of

prediction (MSEP) given by msep_C_i_,_J|Dt



E



Ci,J



Dt

 =

_E





Ci,J

−

E



Ci,J



Dt



2



Dt



=

Var



Ci,J



Dt



=

Var



E



Ci,J



Dt

, σ



Dt



+

_E



Var



Ci,J



Dt

, σ



Dt



.

(3.1) We aim at calculating this conditional MSEP in the gamma–gamma BCL model which provides, in its non-informative prior limit

γ →

1, an uncertainty estimate for the CL algorithm, that is, we aim at calculating the limit

 msep_C_i_,_J|Dt





C CL(t) i,J



_def_.

=

lim γ→1 msepCi,J|Dt



E



Ci,J



Dt



.

(3)

For 0

≤

j

≤

J

−

1 and t

≥

I, we define on the intervals

(

0

,

dj

)

the functions hj

(σ

j

|

Dt

) =

Γ



1

+

(t−j−1)∧I



i=1 Ci,j σ2 j





(t−j−1)∧I



i=1 Ci,j+1 σ2 j



1+ (t−j−1)∧I  i=1 Ci,j σ2 j

×

(t−j−1)∧I



i=1



Ci,j+1 σ2 j



Ci,j σ2 j Γ



Ci,j σ2 j



π

j

(σ

j

).

(3.2)

This function(3.2)is the un-normalized marginal posterior density of

σ

j, givenDt, in the non-informative prior limit (we refer to

Proposition A.2, and in particular to formulas(A.5)–(A.6)for more

details). The following lemma proves integrability of hj

(σ

j

|

Dt

)

on the interval

(

0

,

dj

)

(the proof is provided in theAppendix). Lemma 3.1. Choose 0

≤

j

≤

J

−

1 and t

≥

I. Assume that either

(

t

−

j

−

1

) ∧

I

=

1 or that for the observed dataDtthere exists at least one accident year 1

≤

i

≤

(

t

−

j

−

1

) ∧

I with

Ci,j+1 Ci,j

̸=



f_jCL(t)

.

(3.3) Then, we obtain integrability

kj

(

Dt

) =



dj

0

hj

(σ

j

|

Dt

)

d

σ

j

< ∞.

To obtain a finite conditional MSEP at time t

≥

I in the

non-informative prior limit we require for the observed dataDtthat

(t−j−1)∧I



ℓ=1

C_ℓ,j

>

d2j

,

holds for all t

−

I

≤

j

≤

J

−

1, (3.4) see also proof ofTheorem 3.2. We then define the variables Ψ(t) j

=

σ

2 j (t−j−1)∧I



ℓ=1 C_ℓ,j

−

σ

j2

.

Under(3.4)we obtain, P-a.s., 0

<

Ψ_j(t)

<

d 2 j (t−j−1)∧I



ℓ=1 C_ℓ,j

−

d2j

.

This implies that under(3.3)and(3.4)the following two expecta-tions are well-defined



E

 σ

2 j



1

+

Ψ_j(t)





Dt



=

kj

(

Dt

)

−1



dj 0

σ

2 j



1

+

Ψ_j(t)



×

hj

(σ

j

|

Dt

)

d

σ

j

,

(3.5)



E



Ψ(t) j



Dt



=

kj

(

Dt

)

−1



dj 0 Ψ(t) j hj

(σ

j

|

Dt

)

d

σ

j

,

(3.6) where



E denotes the expectation w.r.t. the density



π

j

(σ

j

) =

kj

(

Dt

)

−1hj

(σ

j

|

Dt

)

on the interval

(

0

,

dj

)

. We have the following estimate of the conditional MSEP for the CL algorithm (the proof is provided in theAppendix).

Theorem 3.2. Under Model Assumptions 2.1, (3.3) and (3.4), we have for t

≥

I

≥

i

>

t

−

J  msep_C_i_,_J|Dt





C CL(t) i,J



=





C CL(t) i,J



2



J−1 j=t−i 1



C CL(t) i,j

×



E

 σ

2 j



1

+

Ψ_j(t)





Dt



×

J−1



n=j+1



1

+



E



Ψ(t) n



Dt

 +





C CL(t) i,J



2

×



_J₋₁



j=t−i



1

+



E



Ψ(t) j



Dt



−

1



;

and the non-informative prior limit of the aggregated conditional MSEP is given by  msep i Ci,J|Dt





i



C_iCL_,_J(t)



=



i  msep_C_i_,_J|Dt





C CL(t) i,J



+

2



i<m



C CL(t) i,J



C CL(t) m,J



_J₋₁



j=t−i



1

+



E



Ψ(t) j



Dt



−

1



,

where the summations run over t

−

J

+

1

≤

i

≤

I and t

−

J

+

1

≤

i

<

m

≤

I, respectively.

We remark that one could derive similar prediction uncertainty formulas for informative priors with

γ >

1. However, the formu-las in the non-informative prior limit

γ →

1 provide a conditional MSEP estimate for the classical CL algorithm because we have re-covered exactly the classical CL predictor in this limit. To evalu-ate this prediction uncertainty formulas we still need to calculevalu-ate the expected values given in(3.5)and(3.6). This can be done effi-ciently with importance sampling or using numerical integration (note that the corresponding density(3.2)is known up to the nor-malizing constant kj

(

Dt

)

), since one only needs to compute one-dimensional integrals.

3.2. Short-term prediction uncertainty formula

Theorem 3.2evaluates the long-term prediction uncertainty of

the CL algorithm because it considers the prediction uncertainty until all claims are finally settled. Recent solvency developments require a second, short-term uncertainty view. This second un-certainty view studies changes of predictions (one-year updates) when new information becomes available. These one-year updates directly influence the earning statements of insurance companies and are therefore crucial for the financial stability of the companies (for more details we refer toEuropean Commission, 2009,Swiss

Financial Market, 2006). We recall the claims development result.

Choose t

≥

I

≥

i

>

t

−

J and define the claims development result

of accident year i at time t

+

1 by CDR(_it+1)

=

_E



Ci,J



Dt

 −

_E



Ci,J



Dt+1



.

Due to the martingale property of successive predictions (assum-ing integrability) we have

E



CDR(_it+1)



Dt



=

0

.

For this reason the claims development result is predicted by 0. The conditional MSEP viewed from time t for this prediction is then

(4)

defined by (subject to existence) msep_CDR(t+1) i |Dt

(

0

) =

E





CDR(_it+1)

−

0



2



_



Dt



=

Var



E



Ci,J



Dt+1





Dt



.

(3.7) The following lemma (which holds for any claims reserving method) relates the short-term view(3.7)to the long-term view

(3.1)(the proof is provided in theAppendix).

Lemma 3.3. Assume(3.1)exists. Then, for t

≥

I

≥

i

>

t

−

J we have

msep_C_i_,_J|Dt



E



Ci,J



Dt

 ≥

msep CDR(_it+1)|Dt

(

0

) .

The aim is to study(3.7)in the non-informative prior limit

γ →

1 of the gamma–gamma BCL model. We define the credibility weights for t

≥

I

≥

t

−

j

≥

1 and 0

≤

j

≤

J

−

1 by

α

(t) j

=

Ct−j,j t−j



ℓ=1 C_ℓ,j

∈

(

0

,

1

]

.

We have the following theorem (the proof is provided in the

Appendix).

Theorem 3.4. Under Model Assumptions 2.1, (3.3) and(3.4), we have for t

≥

I

≥

i

>

t

−

J  msep_CDR(t+1) i |Dt

(

0

)

def.

=

lim γ→1msepCDR( t+1) i |Dt

(

0

)

=





C CL(t) i,J



2





1

+

α

(_tt₋)_i



−1



E



Ψ(t) t−i



Dt





×

J−1



j=t−i+1



1

+

α

_j(t)



E



Ψ(t) j



Dt



−

1



;

and the non-informative prior limit of the aggregated conditional MSEP is given by  msep i CDR(_it+1)|Dt

(

0

)

def.

=

lim γ→1msep i CDR(_it+1)|Dt

(

0

)

=



i  msep CDR(_it+1)|Dt

(

0

) +

2



i<m



C CL(t) i,J



C CL(t) m,J

×





1

+



_E



Ψ(t) t−i



Dt





J−1 j=t−i+1



1

+

α

(_jt)



_E



Ψ(t) j



Dt



−

1



,

where the summations run over t

−

J

+

1

≤

i

≤

I and t

−

J

+

1

≤

i

<

m

≤

I, respectively.

Theorem 3.4provides a prediction uncertainty for the classical CL

predictor (obtained by considering the non-informative prior limit

γ →

1 in the gamma–gamma BCL model). 4. Numerical sensitivities

In this section we study several examples in order to analyze the sensitivities of the prediction uncertainty to the choice of the prior distributions

π

j

(·)

for

σ

j. We start by describing the empirical Bayesian case considered in Wüthrich and Merz(2015). In the empirical Bayesian case we use plug-in estimates

_

σ

jfor

σ

jwhich is equivalent to choosing a degenerate prior distribution function

π

j

(σ

j

) = δ

{_σj=σj}that has a point mass of size 1 in



σ

j. Therefore, we

first need to provide the empirical estimates

_

σ

j. Observe that in the classical distribution-free CL model ofMack(1993) one identifies

s2

j

=

Θ

−2

j

σ

j2, compare formula (3) inMack(1993) to formula(2.2).

Table 1

CL reservesRCL_i (I)and rooted conditional long-term MSEPs in the non-informative prior limit of the gamma–gamma BCL model ofTheorem 3.2with priorsπj(σj) =

δ{σj= 

σj}, Mack’s formula (Mack, 1993) and the conditional short-term MSEP of

Theorem 3.4for the data ofTable 2. Accident year i CL reserves RiCL(I) emp. Bayes’  msep1/2 Mack’s msep1/2 emp. CDR  msep1/2 1 0 2 15.126 0.267 0.267 0.267 3 26.257 0.914 0.914 0.884 4 34.538 3.058 3.058 2.948 5 85.302 7.628 7.628 7.018 6 156.494 33.341 33.341 32.470 7 286.121 73.467 73.467 66.178 8 449.167 85.399 85.398 50.296 9 1043.242 134.338 134.337 104.311 10 3950.815 410.850 410.817 385.773 Total 6047.061 462.990 462.960 420.220

If we useMack’s(1993) estimates for these variance parameters we obtain at time t

=

I, for 0

≤

j

≤

(

J

−

1

) ∧ (

I

−

3

)

, the result



s 2 j

=

1 I

−

j

−

2 I−j−1



ℓ=1 C_ℓ,j



C_ℓ,j+1 C_ℓ,j

−

f_jCL(I)



2

,

and if j

=

J

−

1

=

I

−

2, we set



s 2 J−1

=

min





s 2 J−3

,

s2J−2

,

s4J−2

/



s 2 J−3

 =

min





s 2 J−3

,

s4J−2

/



s 2 J−3



.

SinceΘ_j−1plays the role of the CL factor, see(2.1), we set for the empirical standard deviation estimate of

σ

j



σ

j

=





s 2 j



f CL(I) j

,

for 0

≤

j

≤

J

−

1.

The model is now fully specified for obtaining the empiri-cal Bayesian estimate (Wüthrich and Merz, 2015) and Mack’s distribution-free CL estimate (Mack, 1993) for the CL reserves and the corresponding conditional MSEPs. We calculate these for the data given in Table A.1 ofWüthrich and Merz(2015), see also

Ta-ble 2in theAppendix(where we additionally provide the

parame-ter estimates



f

CL(I)

j ,



sjand



σ

j).

This then provides the results given inTable 1where the CL reserves are defined as the prediction of the outstanding loss liabilities at time t

=

I, for I

≥

i

>

I

−

J given by

R_iCL(I)

=



C

CL(I) i,J

−

Ci,I−i

.

Note that these results coincide with Tables 2.4 and 2.5 ofWüthrich

and Merz(2015). Moreover, we remark that

I−j−1



ℓ=1 C_ℓ,j

>



σ

2 j for all 0

≤

j

≤

J

−

1,

which, in particular, means that (3.4) holds for any dj

∈

(

σ

j

, 

I

−j−1

ℓ=1 Cℓ,j

)

.

The goal now is to challenge these results by replacing the empirical estimates

_

σ

j by non-trivial prior densities

π

j

(·)

for

σ

j. To study the sensitivities we make different choices of the prior density

π

j

(·)

, namely, we choose integers k

∈ {

2

, . . . ,

20

}

and select uniform priors for

π

j

(·)

on the intervals

(

0

,

dj

) = (

0

,

k

· σ



j

)

for 0

≤

j

≤

J

−

1. (4.1)

We emphasize that k

≥

2, and these choices guarantee that the support

(

0

,

dj

)

of

π

j

(·)

contains the empirical estimate



σ

jand for

increasing k the prior distribution

π

j

(·)

of

σ

jgets less informative; for k

=

20 there is a positive probability that

σ

jis up to 20 times as large as the empirical estimate

_

σ

j. Moreover, we note that in all our examples relation(3.4)is fulfilled for the maximal dj

=

20

· 

σ

j.

(5)

Fig. 1. Un-normalized posteriors hj(σj|DI)for 0≤j≤8 and uniform priorsπj(·)with k=2,5, the x-axis is truncated at 6·σj. (For interpretation of the references to color

in this figure legend, the reader is referred to the web version of this article.)

For these choices k

∈ {

2

, . . . ,

20

}

in(4.1)we calculate the rooted conditional long-term MSEP for aggregated accident years provided inTheorem 3.2and the rooted conditional CDR MSEP (short-term view) for aggregated accident years provided in

The-orem 3.4. By a slight abuse of notation we identify k

=

1 with

the empirical Bayesian estimate provided inTable 1(note that this corresponds to the priors

π

j

(σ

j

) = δ

{σj=σj}). The only remaining

difficulty for this analysis is the calculation of the one-dimensional integrals(3.5)–(3.6). This can be done either with importance sam-pling or with numerical integration. We have seen that in our ex-amples it was sufficiently precise to divide the intervals

(

0

,

dj

)

into M

=

1′000 equally spaced sub-intervals

(

a_j(m)

,

a(_jm+1)

)

with

a_j(1)

=

0 and a_j(M+1)

=

djand then approximate the denominator

by the discrete sum

kj

(

DI

) =



dj 0 hj

(σ

j

|

DI

)

d

σ

j

≈

M



m=1 1 a(_jm+1)

−

a(_jm) hj

 (

a (m) j

+

a( m+1) j

)/

2



DI

 ,

and analogously for the remaining terms. InFig. 1we provide these (un-normalized) posteriors hj

(σ

j

|

DI

)

for 0

≤

j

≤

J

−

1 and for uni-form priors

π

j

(·)

having k

=

2 (in red) and k

=

5 (in blue), the dotted green line provides the empirical estimate

_

σ

j. We see that the bigger development period j the more volatile is the posterior of

σ

j. The extreme case j

=

J

−

1

=

I

−

2 means that we have

(6)

Fig. 2. Example ofTables 1and2, the x-axis displays the different choices k∈ {2, . . . ,20}in(4.1)and k=1 corresponds to the empirical Bayesian estimate: (left-hand side) CL reserves

iR CL(I)

i aggregated over all accident years 2≤i≤I and confidence bounds of±2 rooted conditional MSEPs in the long-term view (total MSEP over all development periods) and the short-term view (CDR MSEP); (right-hand side) rooted conditional MSEPs for k∈ {2, . . . ,20}divided by the corresponding empirical rooted conditional MSEPs (which corresponds to k=1).

Fig. 3. Motor third party liability (MTPL) insurance portfolio of a 22×16 trapezoid; relative increase is 18% (rooted total MSEP) and 16% (rooted CDR MSEP).

only one single observation C1,Jfrom which we cannot estimate a variance parameter and, henceforth, the posterior of

σ

J−1remains

a uniform distribution, seeFig. 1(bottom right-hand side), this is also seen in derivation(A.3).

The conditional MSEP results are presented inFig. 2. The left-hand side of Fig. 2 gives the CL reserves



iR CL(I)

i aggregated over all accident years 2

≤

i

≤

I and the corresponding

confi-dence bounds of

±

2msep1/2

iCi,J|DI

(

i



C

CL(I)

i,J

)

, seeTheorem 3.2, and

±

2msep1/2

iCDR( I+1)

i |DI

(

0

)

, seeTheorem 3.4, for the different values of k

∈ {

2

, . . . ,

20

}

. As expected, we see that the confidence bounds increase with increasing prior uncertainty in

π

j

(·)

(i.e. increasing

k), however the graph seems to stabilize rather quickly (around k

=

5) in this example. The right-hand side ofFig. 2shows this in-crease relative to the empirical Bayesian case. We observe an over-all increase of roughly 34% (rooted total MSEP) and 29% (rooted CDR MSEP) relative to the empirical Bayesian one. Thus, we observe an increase, but not a dramatic one. We have chosen uniform priors for

π

j

(σ

j

)

to obtain these findings. If alternatively we had chosen priors with a unique mode in

_

σ

j(and the same support as the

uni-form ones) the results would be closer to the empirical Bayesian ones because in this second case the priors are more informative (around the empirical estimates

_

σ

j).

In the remainder we present more examples of different lines of business. InFig. 3we consider a motor third party liability (MTPL) insurance portfolio. For this portfolio we have I

=

22 accident years and final development period J

=

15. We see a similar pic-ture as in the previous example: the increase of the rooted condi-tional MSEP stabilizes around k

=

5 and the total increase relative to the empirical Bayesian estimate is roughly 18% and 16%, respec-tively. The reason for this fast convergence is that the posterior un-certainty in

σ

j, givenDI, is rather small for all development periods

j regardless of the choice of the prior distribution. This comes from

the fact that we have a large trapezoid (22

×

16) and even for the latest development period J

=

15 we have 7 observations.

InFig. 4we consider a general liability insurance portfolio of

size I

=

21 and J

=

13, the relative increase amounts to roughly 14% and 12%, respectively, and the picture is very similar to MTPL insurance because posterior uncertainty in

σ

jis again small due to the size of the trapezoid and the fact that we have 8 observations in the last column J

=

13.

(7)

Fig. 4. General liability insurance portfolio of a 21×14 trapezoid; relative increase is 14% (rooted total MSEP) and 12% (rooted CDR MSEP).

Fig. 5. Industrial property insurance portfolio of a 15×7 trapezoid; relative increase is 14% (rooted total MSEP) and 13% (rooted CDR MSEP).

InFig. 5we consider an industrial property insurance portfolio

of size I

=

15 and J

=

6, the relative increase amounts to roughly 14% and 13%, respectively. Thus, again we obtain a rather similar picture to the previous examples. Note that here we have 9 observations in the last column J

=

6.

As a rule of thumb we can say that if the right-end point of the posterior functions hj

(σ

j

|

DI

)

, given inFig. 1, is sufficiently smooth then increasing the support of the uniform prior

π

j

(·)

does not substantially increase the uncertainty estimates. This holds true as long as we do not get too close to the threshold(3.4)because at the threshold the conditional MSEP becomes infinite, see also Theorem 2.12 inWüthrich and Merz(2015).

5. Conclusions

We have considered the full Bayesian version of the gamma– gamma BCL model. This model provides the CL reserves in its non-informative prior limit. Therefore, it can be used to analyze the prediction uncertainty of the CL claims reserving method. The main purpose of this paper was to analyze the contribution of the variance parameters to the prediction uncertainty in the case where these variance parameters are also modeled in a Bayesian way. In our examples this additional source of uncertainty increases the rooted conditional MSEP between 10%

and 40% (compared to the empirical Bayesian version presented

inWüthrich and Merz, 2015). This increase is bigger for triangles

I

=

J

−

1 and becomes the smaller for trapezoids the bigger the difference I

−

J is.

Appendix A. Proofs

Proof of Theorem 2.2. For explicit details we refer to Section 2.2.1

inWüthrich and Merz(2015). Since conditionally on

σ

the Model

Assumptions 2.6 in Wüthrich and Merz(2015) are fulfilled, we obtain the following formula from Lemma 2.7, Corollary 2.8 and Theorem 2.10 ofWüthrich and Merz(2015), for t

≥

I

≥

i

>

t−n

≥

t

−

J, E



Ci,n



Dt

, σ =

C_i,_t−i n−1



j=t−i

ω

(t) j



f CL(t) j

+

(

1

−

ω

(t) j

)

fj

 ,

(A.1)

with credibility weights

ω

(t) j

=

(t−j−1)∧I



ℓ=1 C_ℓ,j (t−j−1)∧I



ℓ=1 C_ℓ,j

+

σ

j2

(γ

j

−

1

)

∈

(

0

,

1

).

(8)

This implies using the tower property and posterior independence (2.3) E



Ci,n



Dt

 =

_E



E



Ci,n



Dt

, σ



Dt



=

Ci,t−i n−1



j=t−i E

 ω

(jt)



f CL(t) j

+

(

1

−

ω

( t) j

)

fj





Dt



=

Ci,t−i n−1



j=t−i



E

 ω

(jt)



Dt





f CL(t) j

+



1

−

_E

 ω

(_jt)



Dt



fj

 .

There remains the consideration of the conditional expectations of the credibility weights. The bounded support assumption on

σ

j implies, P-a.s.,

ω

(t) j def.

=

(t−j−1)∧I



ℓ=1 C_ℓ,j (t−j−1)∧I



ℓ=1 C_ℓ,j

+

d2j

(γ

j

−

1

)

≤

ω

(_jt)

≤

1

.

(A.2)

Since theDt−1-measurable lower bound

ω

(jt)converges to 1 as

γ

j

→

1, the claim follows.

Next we proveLemma 3.1which provides integrability of hj

(σ

j

|

Dt

)

on the interval

(

0

,

dj

)

. We anticipate that the function hj

(·|

Dt

)

can be seen as the non-informative prior limit of the un-normalized posterior density of

σ

j, conditionally given the observationsDt. We provide the explicit explanation, the relevant notation and the corresponding convergence statement inProposition A.2and formula(A.6).

Proof of Lemma 3.1. The case

(

t

−

j

−

1

) ∧

I

=

1 can easily be treated because in that case we have

kj

(

Dt

) =



dj 0 C1,j C1,j+1

π

j

(σ

j

)

d

σ

j

=

C1,j C1,j+1

< ∞.

(A.3) To prove the integrability on

(

0

,

dj

)

in the case

(

t

−

j

−

1

) ∧

I

>

1 we need to check that hj

(σ

j

|

Dt

)

is well-behaved for

σ

j

→

0. We consider the following logarithm

log



hj

(σ

j

|

Dt

)

=

log







σ

2 j



₍_t₋_j₋₁_)∧_I



i=1 Ci,j+1



−  1+ (t−j−1)∧I  i=1 Ci,j σ2 j  (t−j−1)∧I



i=1 C Ci,j σ2 j i,j+1







+

logΓ



1

+

(t−j−1)∧I



i=1 Ci,j

σ

2 j



−

(t−j−1)∧I



i=1 logΓ



Ci,j

σ

2 j



+

log



π

j

(σ

j

) .

(A.4) The first line on the right-hand side behaves for

σ

j

→

0 as

log



σ

2 j

 +

(t−j−1)∧I



i=1 Ci,j

σ

2 j log







Ci,j+1 (t−j−1)∧I



ℓ=1 C_ℓ,j+1







+

O

(

1

).

The terms involving the gamma functions Γ

(·)

are treated by Stirling’s formula logΓ

(

x

) =

1 2log

(

2

π) +



x

−

1 2



log x

−

x

+

o

(

x

),

as x

→ ∞

.

The identityΓ

(

1

+x

) =

xΓ

(

x

)

and Stirling’s formula applied to the first two terms on the second line on the right-hand side of(A.4)

provide for

σ

j

→

0 log



₍_t₋_j₋₁_)∧_I



i=1 Ci,j

σ

2 j



+



₍_t₋_j₋₁_)∧_I



i=1 Ci,j

σ

2 j

−

1 2



×

log



₍_t₋_j₋₁_)∧_I



i=1 Ci,j

σ

2 j



−

(t−j−1)∧I



i=1 Ci,j

σ

2 j

−

(t−j−1)∧I



i=1



Ci,j

σ

2 j

−

1 2



log



Ci,j

σ

2 j



−



Ci,j

σ

2 j



+

O

(

1

)

= −

log

(σ

_j2

) +

(t−j−1)∧I



i=1 Ci,j

σ

2 j log







(t−j−1)∧I



ℓ=1 C_ℓ,j Ci,j







+

1

−

(

t

−

j

−

1

) ∧

I 2 log



σ

2 j

 +

O

(

1

).

If we merge these two terms we obtain for

σ

j

→

0 behavior

(t−j−1)∧I



i=1 Ci,j

σ

2 j log







Ci,j+1 Ci,j (t−j−1)∧I



ℓ=1 C_ℓ,j (t−j−1)∧I



ℓ=1 C_ℓ,j+1







+

1

−

(

t

−

j

−

1

) ∧

I 2 log

(σ

2 j

) +

O

(

1

)

=

1

σ

2 j



₍_t₋_j₋₁_)∧_I



i=1 Ci,jlog



Ci,j+1 Ci,j



−

(t−j−1)∧I



i=1 Ci,jlog





f CL(t) j





+

1

−

(

t

−

j

−

1

) ∧

I 2 log

(σ

2 j

) +

O

(

1

).

By assumption, for at least one accident year 1

≤

i

≤

(

t

−

j−1

)∧

I

we have Ci,j+1

/

Ci,j

̸=



f

CL(t)

j . Therefore, Jensen’s inequality is strict in the following derivation

(t−j−1)∧I



i=1 Ci,jlog



Ci,j+1 Ci,j



=

(t−j−1)∧I



ℓ=1 C_ℓ,j

×

(t−j−1)∧I



i=1 Ci,j (t−j−1)∧I



ℓ=1 C_ℓ,j log



Ci,j+1 Ci,j



<

( t−j−1)∧I



ℓ=1 C_ℓ,jlog







(t−j−1)∧I



i=1 Ci,j (t−j−1)∧I



ℓ=1 C_ℓ,j Ci,j+1 Ci,j







=

(t−j−1)∧I



i=1 Ci,jlog





f CL(t) j

 .

Therefore, we can define the following strictly positive constant

ε =

( t−j−1)∧I



i=1 Ci,jlog





f CL(t) j



−

(t−j−1)∧I



i=1 Ci,jlog



C_i_,_j+1 Ci,j



>

0

.

(9)

This implies for

σ

j

→

0 and a strictly positive constant

ε >

0 hj

(σ

j

|

Dt

)

=

exp



−

ε

σ

2 j

+

1

−

(

t

−

j

−

1

) ∧

I 2 log

(σ

2 j

) +

O

(

1

)



π

j

(σ

j

).

This provides integrability in 0 and completes the proof.

Lemma A.1. Under Model Assumptions 2.1, we have in the non-informative prior limit for t

≥

I

≥

i

>

t

−

J (subject to existence)

 msep_C_i_,_J|Dt





C CL(t) i,J



=

lim γ→1msepCi,J|Dt



E



Ci,J



Dt



=

lim γ→1E



Var



Ci,J



Dt

, σ



Dt



.

Proof of Lemma A.1. We have for the first term in the conditional MSEP decomposition(3.1), using(A.1), see also Lemma 2.7 and Corollary 2.8 inWüthrich and Merz(2015), and using posterior independence(2.3) Var



_E



Ci,J



Dt

, σ



Dt



=

C_i2_,_t−iVar



_J₋₁



j=t−i

ω

(t) j



f CL(t) j

+

(

1

−

ω

(t) j

)

fj





Dt



=

C_i2_,_t₋_i



_J₋₁



j=t−i E



ω

(t) j



f CL(t) j

+

(

1

−

ω

(t) j

)

fj



2



Dt



−

J−1



j=t−i E

 ω

j(t)



f CL(t) j

+

(

1

−

ω

(t) j

)

fj



Dt



2



.

Since

ω

(_jt)is sandwiched, P-a.s., between aDt−1-measurable lower

bound

ω

(_jt)that converges to 1 as

γ

j

→

1 and 1, see(A.2), we obtain that the right-hand side of the last identity converges to 0 as

γ →

1. This proves the lemma.

Formula(2.3)provides the posterior (marginal) distribution of

σ

j, conditionally givenDt. This posterior distribution depends on the explicit choice of the prior parameter

γ

j

>

1, to illustrate this we use the following notation

π

_j(γj)

(σ

j

|

Dt

) = π

j

(σ

j

|

Dt

)

.

Proposition A.2. Choose 0

≤

j

≤

J

−

1 and t

≥

I, and assume that either

(

t

−j−

1

)∧

I

=

1 or that for the observed dataDtthere exists at least one accident year 1

≤

i

≤

(

t

−j−

1

)∧

I with Ci,j+1

/

Ci,j

̸=



f

CL(t) j .

Consider S_j(γj)

∼

π

_j(γj)

(σ

_j

|

Dt

)

. We have the following convergence in distribution lim γj→1 S_j(γj)(

=

d)Sj

∼



π

j

(σ

j

) =

kj

(

Dt

)

−1_h j

(σ

j

|

Dt

).

We use the notation S_j(γj)for the standard deviation parameter

σ

j in the above proposition to more clearly indicate the dependence of the law of the corresponding random variable S_j(γj) from the choice of the prior parameter

γ

jand to clearly distinguish random variables S_j(γj)from their potential realizations

σ

jin the domain

(

0

,

dj

)

.

Proof of Proposition A.2. Convergence in distribution is implied by the corresponding convergence of the moment generating functions, that is, for any r

∈

_{R we aim at proving}

lim γj→1E



exp

{rS

_j(γj)

}



Dt



=

_E



exp

{rS

j

}



Dt



.

From(2.3)we obtain posterior distribution, conditionally given

Dt, note that we have kernels of gamma densities of

θ

j in

formula(2.3),

π(σ|

Dt

) =



π(θ, σ|

Dt

)

d

θ ∝

J−1



j=0

×









fj

(γ

j

−

1

) +

(t−j−1)∧I



i=1 Ci,j+1

σ

2 j



−  γj+ (t−j−1)∧I  i=1 Ci,j σ2 j 

×

Γ



γ

j

+

(t−j−1)∧I



i=1 Ci,j

σ

2 j



×

(t−j−1)∧I



i=1



Ci,j+1 σ2 j



Ci,j σ2 j Γ



Ci,j σ2 j



π

j

(σ

j

)







.

(A.5)

This again provides posterior independence between the different development years 0

≤

j

≤

J

−

1. We define the corresponding marginal functions on the intervals

(

0

,

dj

)

h(γ_j j)

(σ

j

|

Dt

) =

Γ



γ

j

+

(t−j−1)∧I



i=1 Ci,j σ2 j





fj

(γ

j

−

1

) +

(t−j−1)∧I



i=1 Ci,j+1 σ2 j



γj+ (t−j−1)∧I  i=1 Ci,j σ2 j

×

(t−j−1)∧I



i=1



Ci,j+1 σ2 j



Ci,j σ2 j Γ



Ci,j σ2 j



π

j

(σ

j

),

(A.6)

thus the density of S_j(γj)is given by

π

(γj) j

(σ

j

|

Dt

) =

h(γ_j j)

(σ

j

|

Dt

)



dj 0 h (γj) j

(σ

j

|

Dt

)

d

σ

j

,

and we aim at proving the following convergence of moment generating functions lim γj→1E



exp

{rS

_j(γj)

}



Dt



=

lim γj→1



dj 0 erσjh (γj) j

(σ

j

|

Dt

)

d

σ

j



dj 0 h (γj) j

(σ

j

|

Dt

)

d

σ

j

=



dj 0 erσjhj

(σ

j

|

Dt

)

d

σ

j kj

(

Dt

)

=

_E



exp

{rS

j

}



Dt



.

Firstly, observe that we have the following point-wise (in

σ

j) convergence on the interval

(

0

,

dj

)

lim

γj→1

h(γ_jj)

(σ

j

|

Dt

) =

hj

(σ

j

|

Dt

).

Thus, there remains to prove the existence of a uniform integrable upper bound such that one can apply Lebesgue’s dominated convergence theorem which proves the claim.

To find an integrable upper bound first we remark that for all

r

∈

_{R we have uniform bound on}

(

0

,

dj

)

(10)

There remains the consideration of h(γ_j j)

(σ

j

|

Dt

)

. We set cj

=



(t−j−1)∧I i=1 Ci,jand cj+1

=



(t−j−1)∧I i=1 Ci,j+1and p

(σ

j

) = 

(t −j−1)∧I i=1



Ci,j+1 σ2 j



Ci,j σ2 j _Γ



Ci,j σ2 j



−1

π

j

(σ

j

)

. Then we have h(γ_j j)

(σ

j

|

Dt

) = 

fj

(γ

j

−

1

) +

cj+1

/σ

j2



−γj+cj/σj2 

×

Γ



γ

j

+

cj

/σ

j2



p

(σ

j

)

=

p

(σ

j

)



∞ 0 xγj+cj/σj2−1

×

exp

{−

(

fj

(γ

j

−

1

) +

cj+1

/σ

j2

)

x}dx

=

p

(σ

j

)



∞ 0 exp



γ

j

−

1

 

log x

−

fjx



×

xcj/σj2_exp

_{−xc

j+1

/σ

j2

}

dx

.

Define the function

ℓ(

x

) =

log x

−

fjx. It satisfies lim

x→0

ℓ(

x

) = −∞

and xlim→∞

ℓ(

x

) = −∞.

There are two cases:

(i)

ℓ(

x

) ≤

0 for all x

∈ [

0

, ∞)

. In this case we have for

γ

j

≥

1

h(γ_j j)

(σ

j

|

Dt

) =

p

(σ

j

)



∞ 0 exp



γ

j

−

1



ℓ(

x

)

xcj/σj2

×

exp

{−xc

j+1

/σ

j2

}

dx

≤

p

(σ

j

)



∞ 0 xcj/σj2_exp

_{−xc

j+1

/σ

j2

}dx

=

p

(σ

j

)

Γ

(

1

+

cj

/σ

j2

)

(

cj+1

/σ

j2

)

1+cj/σj2

=

hj

(σ

j

|

Dt

).

Lemma 3.1then provides the integrable upper bound in the first

case (i) and we can apply Lebesgue’s dominated convergence theorem which proves the claim.

(ii) There exists x

∈

(

0

, ∞)

such that

ℓ(

x

) >

0. In this case concavity of log x and

ℓ(

x

)

, respectively, imply that there exist 0

<

x1

<

x2

< ∞

such that

ℓ(

x

) ≤

0 on

[

0

,

x1

] ∪ [x

2

, ∞)

and

ℓ(

x

) >

0 on

[

x1

,

x2

]

. We then have h(γ_j j)

(σ

j

|

Dt

) ≤

p

(σ

j

)



∞ 0 xcj/σj2_exp

_{−xc

j+1

/σ

j2

}

dx

+

p

(σ

j

)



x2 x1



exp



γ

j

−

1



ℓ(

x

) −

1



×

xcj/σj2_exp

_{−xc

j+1

/σ

j2

}

dx

.

Consider the function x

→

exp



γ

j

−

1



ℓ(

x

)

for

γ

j

>

1; it takes its maximum in x

=

f_j−1

∈ [x

1

,

x2

]

and we have

exp



γ

j

−

1



ℓ(

f_j−1

) =

exp



γ

j

−

1



(

log f_j−1

−

1

) .

Note that in case (ii) we have by assumption

ℓ(

x

) >

0. This implies

ℓ(

f_j−1

) >

0, thus log f_j−1

−

1

>

0 and f_j−1

>

e. Since we consider

the limit

γ

j

→

1 we may (and will) w.l.o.g. assume that

γ

j

≤

2. This implies in case (ii)

K

=

sup

1≤γj≤2,x1≤x≤x2

exp



γ

j

−

1



ℓ(

x

) = (

fje

)

−1

∈

(

1

, ∞).

Thus, in this second case (ii) we have for 1

≤

γ

_j

≤

2

h(γ_j j)

(σ

j

|

Dt

) ≤

K hj

(σ

j

|

Dt

),

and we have an integrable upper bound also in this case. This finishes the proof ofProposition A.2.

Proof of Theorem 3.2. Due toLemma A.1it is sufficient to study the limit lim γ→1E



Var



Ci,J



Dt

, σ



Dt



.

The inner conditional variance Var



Ci,J



Dt

, σ

exactly

corre-sponds to the situation of known standard deviation parameters

σ

. We can therefore directly apply Theorem 2.12 ofWüthrich and Merz(2015) to obtain this conditional variance. Assumption(3.4)

implies that this conditional variance is finite, P-a.s., in

σ

; and pos-terior independence of parameters, see(A.5), implies that we can decouple the corresponding (outer) conditional expectations. This provides the following identity