A Catalog of Noninformative Priors
Ruoyong Yang and James O. Berger Department of Statistics, Purdue University
August, 1996
Abstract
PRELIMINARY DRAFT: This draft of the catalog is incomplete, with much remaining to be lled in and/or added. We are circulating this crude draft in the hopes that readers will know of relevant information that should be added. ]
A variety of methods of deriving noninformative priors have been developed, and applied to a wide variety of statistical models. In this paper we provide a catalog of many of the resulting priors, and list known properties of the priors. Emphasis is given to reference priors and the Jereys prior, although other approaches are also considered.
Key words and phrases
. Je reys prior, reference prior, maximal data informa- tion prior.1 Introduction
1.1 Motivation
The literature on noninformative priors has grown enormously over recent years. There have been several excellent books or review articles that have been concerned with discussing or comparing di erent approaches to developing noninformative priors (e.g., Kass and Wasserman, 1993), but there has been no systematic e ort to catalog the noninformative priors that have been developed. Since use of noninformative priors is becoming routine in Bayesian practice, preparation of such a catalog seemed in order.
ThisresearchwassupportedbyNSFgrantsDMS-8923071andDMS-9303556atPurdueUniversity.
Although general discussion is not the purpose of this catalog, it is useful to review the numerous reasons that noninformative priors are important to Bayesian analysis:
(i) Frequently, elicitation of subjective prior distributions is impossible, because of time or cost limitations, or resistance or lack of training of clients. Automatic or default prior distribu- tions are then needed.
(ii) The statistical analysis is often required to appear objective. Of course, true objectivity is virtually never attainable, and the prior distribution is usually the least of the problems in terms of objectivity, but use of a subjectively elicited prior signicantly reduces the appearance of objectivity. Noninformative priors not only preserve this appearance, but can be argued to result in analyses that are more objective than most classical analyses.
(iii) Subjective elicitation can easily result in poor prior distributions, because of systematic elicitation bias and the fact that elicitation typically yields only a few features of the prior, with the rest of the prior (e.g, its functional form) being chosen in a convenient, but possibly inappropriate, way. It is thus good practice to compare answers from a subjective analysis with answers from a noninformative prior analysis. If there are substantial di erences, it is important to check that the di erences are due to features of the prior that are trusted, and not due to either unelicited \convenience" features of the prior, or suspect elicitations.
(iv) In high dimensional problems, the best one can typically hope for is to develop subjective priors for the \important" parameters, with the unimportant or \nuisance" parameters being given noninformative priors.
(v) Good noninformative priors can be somewhat magical in multiparameter problems. As an example, the Je reys prior seems to almost always yield a proper posterior distribution. This is \magical," in that the common constant (or uniform) prior will much more frequently fail to yield a proper posterior. Even better, the reference prior approach has repeatedly yielded multiparameter priors that overcome limitations of the Je reys prior, and yield surprisingly good performance from almost any perspective. The point here is that, in multiparameter problems, inappropriate aspects of priors (even proper ones) can accumulate across dimensions in very detrimental ways reference priors seem to \magically" avoid such inappropriate accumulation.
(vi) Bayesian analysis with noninformative priors is being increasingly recognized as a method for classical statisticians to obtain good classical procedures. For instance, the frequentist-
matching approach to developing noninformative priors is based on ensuring that one has Bayesian credible sets with good frequentist properties, and it turns out that this is proba- bly the best way to nd good frequentist condence sets.
1.2 Approaches to Development of Noninformative Priors
We do not attempt a thorough discussion of the various approaches. See, e.g., Kass and Wasserman (1993), for such discussion. We primarily will just dene the various approaches, and give relevant references.
The Uniform Prior: By this, we just mean the constant density, with the constant typically being chosen to be 1 (unless the constant can be chosen to yield a proper density). This choice was, of course, popularized by Laplace (1812).
The Je reys Prior: This is dened as (
) = pdet(I
()), whereI
() is the Fisher infor- mation matrix. This was proposed in Je reys (1961), as a solution to the problem that the uniform prior does not yield an analysis invariant to choice of parameterization. Note that, in specic situations, Je reys often recommended noninformative priors that di ered from the formal Je reys prior.The Reference Prior: This approach was developed in Bernardo (1979), and modied for multiparameter problems in Berger and Bernardo (1992c). The approach cannot be simply described, but it can be roughly thought of as trying to modify the Je reys prior by reducing the dependence among parameters that is frequently induced by the Je reys prior there are many well-known examples in which the Je reys prior yields poor performance (even inconsistency) because of this dependence.
The Maximal Data Information Prior (MDIP): This approach was developed in Zellner (1971), based on an information argument. It is given by (
) = expfRp
(x
j)logp
(x
j)dx
g, wherep
(x
j) is the data density function.2 Organization and Notation
The catalog is organized around statistical models, with the models being listed in alpha- betical order. Each model-entry is kept as self-contained as possible. Listed for each are (i) the
model density (ii) various noninformative priors and (iii) certain of the resulting posteriors and their properties. Category (iii) information is often very limited.
Notation is standard. This include (
jD
), jA
j =determinant ofA
, () is a density w.r.t.d
.Important Notation: Noninformative priors that are proper (i.e., integrate to 1) are given in bold type. Others are improper. (The distinction is important for testing problems, where proper distributions are typically needed for estimation and prediction, improper noninformative priors are typically ne.)
3 AR(1)
The AR(1) model, in which the data
X
= (X
1:::X
T) follow the modelX
t=X
t;1+twhere the
t are i.i.d.N
(0 2).The expressions below are for
2 known. If2 is unknown, multiply by 1d
(or 12d
2).Prior (
) (Marginal) PosteriorUniform 1
Je reys h1;T2 +1;1;22TfEX202];1;12gi1=2
Reference1 expf12
E
log(PTi=1X
i2;1)]g all are properReference
28
>
<
>
:
1
=
2 p1;2] if jj<
1 1=
2 jjp2;1] if jj>
1:
1. Nonasymptotic reference prior.2. Symmetrized reference prior, recommended for typical use. See Berger and Yang (1992) for comparison of the noninformative priors.
4 Behrens-Fisher Problem
Let
x
1:::x
n be i.i.d. observations fromN
(2 andy
1:::y
n be i.i.d. observations fromN
(2. The parameters of interest are =; and =+.Liseo (1993) computed the Je reys prior and reference prior for this problem as Prior (
22) (Marginal) PosteriorUniform 1
Je reys 1
=
()3 properReference1 1
=
()21. Independent of the group ordering of the parameters.
5 Beta
The
Be
( ),>
0,>
0, density isf
(x
j ) = ;(+);(
);()x
;1(1;x
);1I
01](x
):
The Fisher information matrix isI
( ) =0
B
@
PG
(1);PG
(1+) ;PG
(1+);
PG
(1+)PG
(1);PG
(1+)1
C
A
where
PG
(1x
) = P1i=0(x
+i
);2 is the PolyGamma function. The Je reys prior is thus the square root of the Fisher information matrix.6 Binomial
The
B
(n p
), 0p
1, density isf
(x
jn p
) =0
B
@
n x
1
C
A
p
x(1;p
)(n;x):
Case 1: Priors forp
, givenn
Prior (
p
jn
) (Marginal) PosteriorUniform
1Be
(p
jx
+ 1n
;x
+ 1)Je reys
1p
;1=2(1;p
);1=2Be
(p
jx
+12n
;x
+12)Reference
MDIP
1:
6186p
p(1;p
)(1;p) proper Novick and Hall's1p
;1(1;p
);1Be
(p
jx n
;x
)1. See Novick and Hall (1965). Note this prior is uniform in
= logp=
(1;p
).Case 2: Priors for
n
Prior (
n
) (Marginal) PosteriorUniform 1
Je reys
n
;1=2Reference1 1/n
Universal
2 2:865nlog (n)loglog (1n)loglog:::log (n)1. Discussed by Alba and Mendoza (1995).
2. Last term for which loglog
:::
log(n
)>
1. See Rissanen (1983).7 Bivariate Binomial
f
(rs
jpqm
) =0
B
@
m r
1
C
A
p
r(1;p
)m;r0
B
@
r s
1
C
A
q
s(1;q
)r;sfor
r
= 1::: m
ands
= 1:::r
. Polson and Wasserman (1990) compute the Fisher information matrix of this distribution asI
(p q
) =m
diag(fp(1;p)g;1 pfq(1;q)g;1).Prior (
pq
jm
) (Marginal) PosteriorUniform
1Je reys
21(1;p
);=2q
;1=2(1;q
);1=2 properReference
1 12p
;1=2(1;p
);1=2q
;1=2(1;q
);1=2Reference
2 12(1;p
);1=2q
;1=2(1;q
);1=2(1;pq
);1=2 Crowder and Sweeting's3p
;1(1;p
);1q
;1(1;q
);11. Parameter of interest is
p
orq
.2. Parameter of interest is
=pq
or=p
(1;q
)=
(1;pq
).3. See Crowder and Sweeting (1989).
8 Box-Cox Power Transformed Linear Model
Given observationsf
y
1:::y
ng, the model isz
() =8
>
<
>
:
yi;1
6= 0
ln
y
i = 0 =+x
ti+iwhere
is a (k
1) vector of regression coecients,x
iis a vector of covariates, andiN
(0 1) truncated at ;(+x
ti)=
.Je reys prior was obtained by Pericchi (1981),
(
)/p
() k+1 wherep
() is some unspecied prior for .Box and Cox (1964) proposed the prior
(
)/g
;(k+1)(;1);1 whereg
is the geometric mean of they
0s
.Based on the so called data-translated parameterization,
z
() =8
>
<
>
:
;1 +
x
ti+i 6= 0 ln+x
ti+i = 0corresponding to
= (;1)=
or ln, Wixley (1993) proposed the following two priors, ( )/g
;(k+1)(;1);1p
()where
g
is the geometric mean of they
0s
andp
() is some unspecied prior for . This prior is apart from the prior of Box and Cox (1964) only by a factorp
(). It is also the prior used inBox and Tiao (1992).
(
) / 1 +];(k+1)(1;1=);1p
()/
;(k+1)(;1);1p
()where
g
is the geometric mean of they
0s
andp
() is some unspecied prior for . This prior will give a very close resultant posterior distribution to that of Box and Cox (1964).9 Cauchy
The
C
( ), ;1< <
1,>
0, density isf
(x
j ) = 2+ (x
;)2]:
This is a location-scale parameter problem see that section for priors. Posterior analysis can be found in Spiegelhalter (1985) and Howlader and Weiss (1988).
10 Dirichlet
The
D
() density, wherePki=1x
i= 1, 0x
i1, and= (1:::
k)t, i>
0 for alli
, is given byf
(x
j) = ;(0)Qki=1;(
i)k
Y
i=1
x
(ii;1) where0=Pki=1i,The Fisher information matrix is
I
(1:::
k) =0
B
B
B
B
@
PG
(11);PG
(10) ;PG
(10) ...;
PG
(10)PG
(1k);PG
(10)1
C
C
C
C
A
where
PG
(1x
) =P1i=0(x
+i
);2is the PolyGamma function. The Je reys prior isjI
(1:::
k)j1=2.11 Exponential Regression Model
See Ye and Berger (1991).
Y
ijN
(+x+xia 2)where 2
R
, 0< <
1,x
0a >
0,x
anda
known constants, 0i
k
;1 1j
m
, thex
i's are known nonnegative regressors withx
i 6=x
j fori
6=j
and the variance 2>
0 is an unknown constant. It is assumed thatx
i< x
j fori < j
.Prior (
) (Marginal) PosteriorUniform 1
adhoc 1
=
improperJe reys j
j2x;1p
1(a)=
4 proper 2-dimensional numerical integration Reference1 x;1p
(a)=
proper 1-dimensional Reference2 x;1p
(a)=
2 numerical integration Reference3 x;1p
(a)=
3where
p
(a) =p
1(a)=p
2(a), andp
1(w
) = (kX;1i=0
w
2xi;k
1(k;1
X
i=0
w
xi)2)(kX;1i=0
x
2iw
2xi ;1k
(kX;1
i=0
x
iw
xi)2);(kX;1
i=0
x
iw
2xi; 1k
kX;1
i=0
w
xikX;1i=0
x
iw
xi)2p
2(w
) = kX;1i=0
w
2xi;k
1(kX;1 i=0
w
xi)2:
1. Group ordering f
gorf ( ) gor with all permutations of
this is recommended for typical use, and appears to be approximately frequentist matching.
2. Group ordering f ( )g and with all permutations of
. 3. Group ordering f (
)g and with all permutations of
.
The marginal posterior of
for the prior;sx;1p
(a) is given by (jy
)p
(a) 1+ax0(1;a)h
(as y
) whereh
(s y
) =fs
2yy;md
2k(y
)=p
22()]km+s;3p
22() 2x0(1;)2g1=2with
d
k(y
) =kX;1i=0(
y
i:;y
)xi:
12 F Distribution
The
F
( ),>
0,>
0, density isf
(x
j ) = ;(+)=
2]=2=2;(
=
2);(=
2)x
=2;1(
+x
)(+)=2I
(01)(x
):
13 Gamma
The
G
( ),>
0,>
0, density isf
(x
j ) = 1;()x
;1e
;x=I
(01)(x
):
If
is known, is a scale parameter. The Je reys, reference, and MDIP priors are all 1=
, and the posterior is thereforeIG
(n
1=
Pni=1x
i).The Fisher information matrix is
I
() =0
B
@
PG
(1) 1=
1
= =
21
C
A
wherePG
(1x
) =P1i=0(x
+i
);2 is the PolyGamma function.Prior (
) (Marginal) PosteriorUniform 1
Je reys p
PG
(1);1=
Reference1 p(
PG
(1);1)==
proper3 Reference2 pPG
(1)=
1. Group ordering f g. 2. Group ordering f g.
3. See Liseo (1993) and Sun and Ye (1994b) for marginal posterior.
14 Generalized Linear Model
Let
y
1:::y
n be independent observations, with the exponential densityf
(y
iji ) = expfa
;1i ()(y
ii;b
(i)) +c
(y
i)gwhere the
a
i(),b
() andc
() are known functions, anda
i() is of the forma
i() ==w
i, where thew
i's are known weights. Thei's are related to regression coecients, = (1:::
p)t, by the link function i =(i)i
= 1:::n
where
i=x
ti, andx
ti= (x
i1:::x
ip) is a 1p
vector denoting thei
th row of then
p
matrix of covariatesX
, and is a monotonic di erentiable function.This model includes a large class of regression models, such as normal linear regression, logistic and probit regression, Poisson regression, gamma regression, and some proportional hazards models.
Ibrahim and Laud (1991) studied this model, using the Je reys prior. They focus on the case where
is known. The Fisher information matrix they obtained for is given byI
() = ;1(X
tWV
()2()X
), whereW
is ann
n
diagonal matrix withi
th diagonal elementw
i.V
() and () aren
n
diagonal matrices withi
th diagonal elementsv
i=v
(x
ti) =d
2b
(i)=d
2iand
i =(x
ti) =d
i=d
i, respectively. Je reys prior is thus given by ()/jX
tWV
()2()X
j1=2:
Ibrahim and Laud (1991) show that the Je reys prior can lead to an improper posterior for this model, but that the posterior is proper for most GLM's. A sucient condition for the posterior to be proper is that the integral
Z
Sexpf
;1w
(yr
;b
(r
))g(d
2b
(r
)dr
2 )1=2dr
be nite, the likelihood function of
be bounded above, andX
be of full rank. HereS
denotes the range of.In addition, Ibrahim and Laud (1991) give a sucient and necessary condition for the Je reys prior to be proper, namely
Z
S(
d
2b
(r
)dr
2 )1=2dr <
1:
When is unknown, the Fisher information matrix isI
( ) =0
B
@
I
1( ) 00
I
2( )1
C
A
whereI
1( );1(X
tWV
()2()X
) as above, andI
2( ) =;Xni=1
f2
w
i;3(_b
(i)i;b
(i)) +E
(c
(y
i))ghere _
b
(i) =db
(i)=d
i, andc
=@
2c
(y
i)=@
2. The Je reys prior is then the square root of the determinant ofI
( ).15 Inverse Gamma
The
IG
( ),>
0,>
0, density isf
(x
j ) = 1;(
)x
(+1)e
;1=xI
(01)(x
):
If
is known, 1=
is a scale parameter. The Je reys, reference, and MDIP prior is 1=
, and the posterior isIG
(n
1=
Pni=11=x
i).If
is unknown, the Fisher information matrix is the same as that of the Gamma distribution.Thus the reference prior and the Je reys prior are the same as for the Gamma distribution.
16 Inverse Normal or Gaussian
The
IN
( ),>
0,>
0, density isf
(x
j) = (2 )1=2
x
;3=2expf;x
2 (
;x
1)2gI
(01)(x
):
The Fisher information matrix is given byI
( ) = diag(=
1=
22).Prior (
) (Marginal) PosteriorUniform 1
Je reys 1
=
pScale 1
=
proper2Reference1
;1=2;1 1. See Liseo (1993).2. See Sun and Ye (1994b) for marginal posteriors.
Dene
=n
;1, = (u x=
);1=2,q
= (x
);1,x
= Px
i=n
. Banerjee and Bhattacharyya (1979) show that the marginal posterior of has a left-truncated t-distribution with degrees of freedom, location parameter 1= x
, and scale parameterq
, the point of truncation being zero.i.e.,
(
jD
)/(nu=
2);n=2f1 + 1q
2(;1x
)2g;(+1)=2where
u
=x
r;1= x
, andx
r = 1nP(1=x
i). The marginal posterior of is the modied gamma distribution(
jD
)/ (nu=
2)=2;(
=
2) !((n= x
)1=2)H
() exp(;nu=
2)=2;1where !() is the standard normal cdf, and
H
() denotes the cdf of Student's t-distribution with degrees of freedom. The posterior mean and variance are also available see Banerjee and Bhattacharyya (1979) for more detail.Another parametrization of inverse Gaussian distribution, the
IN
(2),>
0, 2>
0, density isf
(x
j2) = (2 2);1=2x
;3=2expf;(x
;)2=
(222x
)gI
(01)(x
):
The Fisher information matrix is given byI
(2) = diag(;3;2(24);1).Prior (
2) (Marginal) PosteriorUniform 1
Je reys
;3=2;3 Reference1 ;3=2;21. See Datta and Ghosh (1993).
17 Linear Calibration
Consider the model
y
i=+x
i+1ii
= 1:::n y
0j =+x
0+2jj
= 1:::k
where
,y
i, andy
0j are (p
1) vectors,x
i's are known values of the precise measurements, the 1i and2j are i.i.d.N
p(02I
p). The object is to predictx
0.Denote
x
=Pni=1x
i=n
,y
=Pni=1y
i=n
,y
0=Pkj=1y
0j=k
,c
x=Pni=1(x
i;x
)2, ^ =Pni=1(x
i;x
)(y
i;y
)=c
x, ands
1 ands
2 be the two sums of residual squared errors based on the calibration and predication experiments, respectively. The statistics ^,y
,y
0, ands
=s
1+s
2 are minimal sucient and mutually independent. Kubokawa and Robert (1993) then consider the reduced modely
N
p(2I
p)z
N
p(x
02I
p)s
22qwhere
y
=c
1x=2^,z
= (y
0 ;y
)(n
;1 +k
;1);1=2,q
= (n
+k
;3)p
, =c
1x=2, andx
0 = (x
0;x
)c
;1x =2(n
;1+k
;1);1=2.The Fisher information matrix for the reduced model is given by
I
( 2x
0) =0
B
B
B
B
B
B
B
B
B
B
B
@
1+x02
2
I
p0...
0
x
0=
2 0 0 q2+24p 0x
0t=
2 0 jjjj2=
21
C
C
C
C
C
C
C
C
C
C
C
A
:
Prior (
x
02) (Marginal) PosteriorUniform 1
Je reys (1 +
x
02)(p;1)=2jjjj(2);(p+3)=2Reference1 (
2);(p+2)=2=
q1 +x
02 proper1. w.r.t. the group ordering f
x
0(2)g. See Kubokawa and Robert (1993).18 Location-Scale Parameter Models
Location Parameter Models
: TheLP
(), 2R
p, density isf
(y
j) =g
(y
;X
)where
X
is a (n
p
) constant matrix andg
is an
-dimensional density function.Prior (
) (Marginal) PosteriorUniform1 1 proper if
rank
(X
tX
) =p
Reference2 t(X
tX
)];(p;2) proper ifrank
(X
tX
) =p >
2 1. Also Je reys and the usual reference prior.2. Baranchik (1964) \shrinkage" prior also, the reference prior for (jj
jjO
), where =O
(jjjj0:::
0)t. This also arises from admissibility considerations see Berger and Strawder-man (1993).
Location-Scale Parameter Models
: TheLSP
(), 2R
p,>
0 density isf
(y
j) = 1ng
(1(y
;X
))where
X
is a (n
p
) constant matrix andg
is an
-dimensional density function.Prior (
) (Marginal) PosteriorUniform 1
Je reys 1
=
(p+1)Reference1 1
=
Reference2 1
t(X
tX
)];(p;2) Reference3 () /;1(c
12+c
2)1. Reference with respect to the group orderingf
g also the prior actually recommended by Je reys (1961).2. Baranchik (1964) \shrinkage" prior also, the reference prior with respect to the parame- ters fjj
jjO
g (see Location Parameter Models).3.
==
is parameter of interest, selecting rectangular compacts on and nuisance parameterc
12+c
22. See Datta and Ghosh (1993).Scale Parameter Models
:The
SP
(1:::
p), i>
0 for 8i
, density isf
(y
j1:::
p) = (Ypi=1
;i ni)g
( 11y
~112
y
~2:::
1py
~p)where
y
~i are (n
i1) vectors,i
= 1:::p
,g
is an
1+ +n
p-dimensional density function.Prior (
1:::
p) (Marginal) PosteriorUniform 1
Je reys
Reference Qpi=1
i;1 MDIP19 Logit Model
Poirier (1992) analyzes the Je reys prior for the conditional logit model as follows. An experiment consists of
N
trials. On trialn
exactly one ofJ
n discrete alternatives is observed.Let
y
nj be a binary variable which equals unity i alternativej
is observed on trialn
otherwisey
nj equals zero. Letz
n =z
tn1:::z
tnJn]t, with eachz
nj being aK
1 vector. is aK
1 vector of unknown parameters. The probability of alternativej
on trialn
is specied to bep
nj() =Prob
(y
nj = 1jz
n ) = exp(z
tnj)PJi=1n exp(
z
tni):
When
J
nJ
andz
nj =e
jx
n, wheree
j is a (J
;1)1 vector with all elements equal to zero except thej
th which equals unity andx
n is aM
1 vector of observable characteristics of trialn
, this is the multinomial logit model.Poirier (1992) gives the Je reys prior for this model as (
)/
N
X
n=1 Jn
X
j=1
p
nj()z
nj;z
~n()]z
nj;z
~n()]t
1=2
where ~
z
n() =PJi=1np
ni()z
ni. The following special cases are also given therein.Multinomial Logit Without Covariates
:Suppose
K
=J
;1 andz
nj =e
j then the Je reys prior reduces to the proper density () = ;(J=
2)J=2 ]YJ
j=1
p
j()]1=2where
p
j =p
nj for anyn
. The density forp
= (p
1:::p
J;1) is the Dirichlet density (p
) = ;(J=
2)J=2 J
Y
j=1
p
;1j=2:
In the binomial case,(
) = ;1fp
1()1;p
1()]g1=2 = exp(=
2) 1 + exp()]and
(
p
) = ;1p
1(1;p
1)];1=2:
The other competing noninformative prior are: the uniform prior on
the uniform prior onp
1 and the MDIP prior forp
1. See Geisser (1984) for discussion.Logistic Regression
:Suppose
y
ij i.i.d. Bernouli ( i), withi=
e
xti1 +
e
xti: i
= 1:::n:
Then the likelihood function is
f
(y
j) =e
Pni=1yixti=
Yni=1(1 +
e
xti) and the Fisher Information matrix isI
() = (x
1:::x
n)0
B
B
B
B
@
1(1; 1) ...
n(1; n)
1
C
C
C
C
A 0
B
B
B
B
@
x
t1x
...tn1
C
C
C
C
A
:
The Je reys prior is the square root of the determinant of above matrix.
For more special cases of the logit model, see Poirier (1992).
20 Mixed Model
Consider mixed model
y
ijk=B
ijkb+W
ijkw+T
i+C
j+ijki
= 1:::I j
= 1:::J k
= 1:::t
ijassuming
ijkN
(02), and independently,C
jN
(0 2c), with b, w,T
i,2, and 2c the parameters of interest.Box and Tiao (1992) used the following prior in the balanced mixed model case, with
t
ij =n
8ij
,(
c22)/ 1 2(2+n
2c):
Chaloner (1987) used three priors for this model, (
2+c2);1,;2(2+2c);1, and;2(2+ c2);3=2.Yang and Pyne (1996) derived the Je reys prior, (
2c2) /2
6
4Det
J
X
j=1
0
B
@
t2j
2(tj 2c+ 2)2 tj
2(tj 2c+ 2)2 tj
2(tj 2c+ 2)2 tj;1
2
4 +2(tj c21+ 2)2
1
C
A
3
7
5 1=2
wheret
j =PIi=1t
ij, and reference prior,(
c22)/ 1 2v
u
u
t
J
X
j=1
t
2j (t
jc2+2)2:
21 Mixture Model
For arbitrary density functions
p
1(x
) andp
2(x
), consider the modelp
(x
j) =p
1(x
) + (1;)p
2(x
):
Bernardo and Gir$on (1988) discussed the reference prior for this model. They found that the reference prior is always proper.