A Catalog of Noninformative Priors

(1)

A Catalog of Noninformative Priors

Ruoyong Yang and James O. Berger Department of Statistics, Purdue University

August, 1996

Abstract

PRELIMINARY DRAFT: This draft of the catalog is incomplete, with much remaining to be lled in and/or added. We are circulating this crude draft in the hopes that readers will know of relevant information that should be added. ]

A variety of methods of deriving noninformative priors have been developed, and applied to a wide variety of statistical models. In this paper we provide a catalog of many of the resulting priors, and list known properties of the priors. Emphasis is given to reference priors and the Jereys prior, although other approaches are also considered.

Key words and phrases

. Je reys prior, reference prior, maximal data information prior.

1 Introduction

1.1 Motivation

The literature on noninformative priors has grown enormously over recent years. There have been several excellent books or review articles that have been concerned with discussing or comparing di erent approaches to developing noninformative priors (e.g., Kass and Wasserman, 1993), but there has been no systematic e ort to catalog the noninformative priors that have been developed. Since use of noninformative priors is becoming routine in Bayesian practice, preparation of such a catalog seemed in order.

ThisresearchwassupportedbyNSFgrantsDMS-8923071andDMS-9303556atPurdueUniversity.

(2)

Although general discussion is not the purpose of this catalog, it is useful to review the numerous reasons that noninformative priors are important to Bayesian analysis:

(i) Frequently, elicitation of subjective prior distributions is impossible, because of time or cost limitations, or resistance or lack of training of clients. Automatic or default prior distributions are then needed.

(ii) The statistical analysis is often required to appear objective. Of course, true objectivity is virtually never attainable, and the prior distribution is usually the least of the problems in terms of objectivity, but use of a subjectively elicited prior signicantly reduces the appearance of objectivity. Noninformative priors not only preserve this appearance, but can be argued to result in analyses that are more objective than most classical analyses.

(iii) Subjective elicitation can easily result in poor prior distributions, because of systematic elicitation bias and the fact that elicitation typically yields only a few features of the prior, with the rest of the prior (e.g, its functional form) being chosen in a convenient, but possibly inappropriate, way. It is thus good practice to compare answers from a subjective analysis with answers from a noninformative prior analysis. If there are substantial di erences, it is important to check that the di erences are due to features of the prior that are trusted, and not due to either unelicited \convenience" features of the prior, or suspect elicitations.

(iv) In high dimensional problems, the best one can typically hope for is to develop subjective priors for the \important" parameters, with the unimportant or \nuisance" parameters being given noninformative priors.

(v) Good noninformative priors can be somewhat magical in multiparameter problems. As an example, the Je reys prior seems to almost always yield a proper posterior distribution. This is \magical," in that the common constant (or uniform) prior will much more frequently fail to yield a proper posterior. Even better, the reference prior approach has repeatedly yielded multiparameter priors that overcome limitations of the Je reys prior, and yield surprisingly good performance from almost any perspective. The point here is that, in multiparameter problems, inappropriate aspects of priors (even proper ones) can accumulate across dimensions in very detrimental ways reference priors seem to \magically" avoid such inappropriate accumulation.

(vi) Bayesian analysis with noninformative priors is being increasingly recognized as a method for classical statisticians to obtain good classical procedures. For instance, the frequentist-

(3)

matching approach to developing noninformative priors is based on ensuring that one has Bayesian credible sets with good frequentist properties, and it turns out that this is proba- bly the best way to nd good frequentist condence sets.

1.2 Approaches to Development of Noninformative Priors

We do not attempt a thorough discussion of the various approaches. See, e.g., Kass and Wasserman (1993), for such discussion. We primarily will just dene the various approaches, and give relevant references.

The Uniform Prior: By this, we just mean the constant density, with the constant typically being chosen to be 1 (unless the constant can be chosen to yield a proper density). This choice was, of course, popularized by Laplace (1812).

The Je reys Prior: This is dened as (

) = ^pdet(

I

(

)), where

I

(

) is the Fisher information matrix. This was proposed in Je reys (1961), as a solution to the problem that the uniform prior does not yield an analysis invariant to choice of parameterization. Note that, in specic situations, Je reys often recommended noninformative priors that di ered from the formal Je reys prior.

The Reference Prior: This approach was developed in Bernardo (1979), and modied for multiparameter problems in Berger and Bernardo (1992c). The approach cannot be simply described, but it can be roughly thought of as trying to modify the Je reys prior by reducing the dependence among parameters that is frequently induced by the Je reys prior there are many well-known examples in which the Je reys prior yields poor performance (even inconsistency) because of this dependence.

The Maximal Data Information Prior (MDIP): This approach was developed in Zellner (1971), based on an information argument. It is given by (

) = exp^f^R

p

(

x

^j

)log

p

(

x

^j

)

dx

^g, where

p

(

x

^j

) is the data density function.

2 Organization and Notation

The catalog is organized around statistical models, with the models being listed in alpha- betical order. Each model-entry is kept as self-contained as possible. Listed for each are (i) the

(4)

model density (ii) various noninformative priors and (iii) certain of the resulting posteriors and their properties. Category (iii) information is often very limited.

Notation is standard. This include (

^j

D

), ^j

A

^j =determinant of

A

, (

) is a density w.r.t.

d

.

Important Notation: Noninformative priors that are proper (i.e., integrate to 1) are given in bold type. Others are improper. (The distinction is important for testing problems, where proper distributions are typically needed for estimation and prediction, improper noninformative priors are typically ne.)

3 AR(1)

The AR(1) model, in which the data

X

= (

X

¹

:::X

T) follow the model

X

t=

X

t^;1+

t

where the

t are i.i.d.

N

(0 ²).

The expressions below are for

² known. If

² is unknown, multiply by ¹

d

(or ¹²

d

²).

Prior (

) (Marginal) Posterior

Uniform 1

Je reys ^h^1;^T² +^1;^1;²²^T^f^E^X²⁰²^]^;^1;¹²^gⁱ¹⁼²

Reference¹ exp^f¹²

E

log(^P_Ti⁼¹

X

_i²^;1)]^g all are proper

Reference

²

8

>

<

>

:

1

=

2 ^p1^;

²] if ^j

^j

<

1 1

=

2 ^j

^j^p

²^;1] if ^j

^j

>

1

:

1. Nonasymptotic reference prior.

2. Symmetrized reference prior, recommended for typical use. See Berger and Yang (1992) for comparison of the noninformative priors.

4 Behrens-Fisher Problem

Let

x

¹

:::x

n be i.i.d. observations from

N

(

² and

y

¹

:::y

n be i.i.d. observations from

N

(

². The parameters of interest are

=

^;

and

=

+

.

(5)

Liseo (1993) computed the Je reys prior and reference prior for this problem as Prior (

²

²) (Marginal) Posterior

Uniform 1

Je reys 1

=

(

)³ proper

Reference¹ 1

=

(

)²

1. Independent of the group ordering of the parameters.

5 Beta

The

Be

( ),

>

0,

>

0, density is

f

(

x

^j ) = ;(

+

)

;(

);(

)

x

^;1(1^;

x

)^;1

I

⁰^1](

x

)

:

The Fisher information matrix is

I

( ) =

0

B

@

PG

(1

)^;

PG

(1

+

) ^;

PG

(1

+

)

;

PG

(1

+

)

PG

(1

)^;

PG

(1

+

)

1

C

A

where

PG

(1

x

) = ^P¹_i⁼⁰(

x

+

i

)^;2 is the PolyGamma function. The Je reys prior is thus the square root of the Fisher information matrix.

6 Binomial

The

B

(

n p

), 0

p

1, density is

f

(

x

^j

n p

) =

0

B

@

n x

1

C

A

p

^x(1^;

p

)⁽ⁿ^;^x⁾

:

Case 1: Priors for

p

, given

n

(6)

Prior (

p

^j

n

Uniform

1

Be

(

p

^j

x

+ 1

n

^;

x

+ 1)

Je reys

¹

p

^;1⁼²(1^;

p

)^;1⁼²

Be

(

p

^j

x

+¹²

n

^;

x

+¹²)

Reference

MDIP

1

:

6186

p

^p(1^;

p

)^(1;^p⁾ proper Novick and Hall's¹

p

^;1(1^;

p

)^;1

Be

(

p

^j

x n

^;

x

)

1. See Novick and Hall (1965). Note this prior is uniform in

= log

p=

(1^;

p

).

Case 2: Priors for

n

Prior (

n

Uniform 1

Je reys

n

^;1⁼²

Reference¹ 1/n

Universal

² ²_:⁸⁶⁵_n^{log (}_n⁾^log^{log (}¹_n⁾^log^log_:::^{log (}_n⁾

1. Discussed by Alba and Mendoza (1995).

2. Last term for which loglog

:::

log(

n

)

>

1. See Rissanen (1983).

7 Bivariate Binomial

f

(

rs

^j

pqm

) =

0

B

@

m r

1

C

A

p

^r(1^;

p

)^m^;^r

0

B

@

r s

1

C

A

q

^s(1^;

q

)^r^;^s

for

r

= 1

::: m

and

s

= 1

:::r

. Polson and Wasserman (1990) compute the Fisher information matrix of this distribution as

I

(

p q

) =

m

diag(^fp(1^;p)^g^;1

p^fq(1^;q)^g^;1).

Prior (

pq

^j

m

Uniform

1

Je reys

²¹(1^;

p

)^;⁼²

q

^;1⁼²(1^;

q

)^;1⁼² proper

Reference

¹ ¹²

p

^;1⁼²(1^;

p

)^;1⁼²

q

^;1⁼²(1^;

q

)^;1⁼²

Reference

² ¹²(1^;

p

)^;1⁼²

q

^;1⁼²(1^;

q

)^;1⁼²(1^;

pq

)^;1⁼² Crowder and Sweeting's³

p

^;1(1^;

p

)^;1

q

^;1(1^;

q

)^;1

1. Parameter of interest is

p

or

q

.

(7)

2. Parameter of interest is

=

pq

or

=

p

(1^;

q

)

=

(1^;

pq

).

3. See Crowder and Sweeting (1989).

8 Box-Cox Power Transformed Linear Model

Given observations^f

y

¹

:::y

n^g, the model is

z

⁽⁾ =

8

>

<

>

:

y_i^;1

⁶= 0

ln

y

i = 0 =

+

x

_ti

+

i

where

is a (

k

1) vector of regression coecients,

x

iis a vector of covariates, and

i

N

(0

1) truncated at ^;(

+

x

_ti

)

=

.

Je reys prior was obtained by Pericchi (1981),

(

)^/

p

(

)

^k⁺¹

where

p

(

) is some unspecied prior for

.

Box and Cox (1964) proposed the prior

(

)^/

g

^;(^k⁺¹⁾⁽^;1)

^;1

where

g

is the geometric mean of the

y

⁰

s

.

Based on the so called data-translated parameterization,

z

⁽⁾ =

8

>

<

>

:

;1 +

x

_ti

+

i

⁶= 0 ln

+

x

_ti

+

i

= 0

corresponding to

= (

^;1)

=

or ln

, Wixley (1993) proposed the following two priors, (

)^/

g

^;(^k⁺¹⁾⁽^;1)

^;1

p

(

)

where

g

y

⁰

s

and

p

(

. This prior is apart from the prior of Box and Cox (1964) only by a factor

p

(

). It is also the prior used in

(8)

Box and Tiao (1992).

(

) ^/ 1 +

]^;(^k^+1)(1;1⁼⁾

^;1

p

(

)

/

^;(^k⁺¹⁾⁽^;1)

^;1

p

(

)

where

g

y

⁰

s

and

p

(

. This prior will give a very close resultant posterior distribution to that of Box and Cox (1964).

9 Cauchy

The

C

( ), ^;1

< <

¹,

>

0, density is

f

(

x

^j ) =

²+ (

x

^;

)²]

:

This is a location-scale parameter problem see that section for priors. Posterior analysis can be found in Spiegelhalter (1985) and Howlader and Weiss (1988).

10 Dirichlet

The

D

(

) density, where^P_ki⁼¹

x

i= 1, 0

x

i1, and

= (

¹

:::

k)^t,

i

>

0 for all

i

, is given by

f

(

x

^j

) = ;(

⁰)

Qki⁼¹;(

i)

k

Y

i⁼¹

x

⁽_iⁱ^;1)

where

⁰=^P_ki⁼¹

i,

I

(

¹

:::

k) =

0

B

@

PG

(1

¹)^;

PG

(1

⁰) ^;

PG

(1

⁰) ...

;

PG

(1

⁰)

PG

(1

k)^;

PG

(1

⁰)

1

C

A

where

PG

(1

x

) =^P¹_i⁼⁰(

x

+

i

)^;2is the PolyGamma function. The Je reys prior is^j

I

(

¹

:::

k)^j¹⁼².

(9)

11 Exponential Regression Model

See Ye and Berger (1991).

Y

ij

N

(

+

^x⁺^xⁱ^a ²)

where ²

R

, 0

< <

1,

x

0

a >

0,

x

and

a

known constants, 0

i

k

^;1

1

j

m

, the

x

i's are known nonnegative regressors with

x

i ⁶=

x

j for

i

⁶=

j

and the variance

²

>

0 is an unknown constant. It is assumed that

x

i

< x

j for

i < j

.

Prior (

Uniform 1

adhoc 1

=

improper

Je reys ^j

^j

²^x^;1

p

¹(

^a)

=

⁴ proper 2-dimensional numerical integration Reference¹

^x^;1

p

(

^a)

=

proper 1-dimensional Reference²

^x^;1

p

(

^a)

=

² numerical integration Reference³

^x^;1

p

(

^a)

=

³

where

p

(

^a) =

p

¹(

^a)

=p

²(

^a), and

p

¹(

w

) = (^k^X^;1

i⁼⁰

w

²^xⁱ^;

k

¹⁽

k^;1

X

i⁼⁰

w

^xⁱ)²)(^k^X^;1

i⁼⁰

x

²_i

w

²^xⁱ ^;¹

k

⁽

k^X^;1

i⁼⁰

x

i

w

^xⁱ)²)

;(^k^X^;1

i⁼⁰

x

i

w

²^xⁱ^; ¹

k

k^X^;1

i⁼⁰

w

^xⁱ^k^X^;1

i⁼⁰

x

i

w

^xⁱ)²

p

²(

w

) = ^k^X^;1

i⁼⁰

w

²^xⁱ^;

k

¹⁽

k^X^;1 i⁼⁰

w

^xⁱ)²

:

1. Group ordering ^f

^gor^f

( ) ^gor with all permutations of

this is recommended for typical use, and appears to be approximately frequentist matching.

2. Group ordering ^f ( )^g and with all permutations of

. 3. Group ordering ^f

(

)^g and with all permutations of

.

(10)

The marginal posterior of

for the prior

^;^s

^x^;1

p

(

^a) is given by (

^j

y

)

p

(

^a)

¹⁺^ax⁰(1^;

^a)

h

(

^a

s y

)

where

h

(

s y

) =^f

s

²_yy^;

md

²_k(

y

)

=p

²²(

)]^km⁺^s^;3

p

²²(

)

²^x⁰(1^;

)²^g¹⁼²

with

d

k(

y

) =^k^X^;1

i⁼⁰(

y

i:^;

y

)

^xⁱ

:

12 F Distribution

The

F

( ),

>

0,

>

0, density is

f

(

x

^j ) = ;(

+

)

=

2]

⁼²

;(

=

2);(

=

2)

x

⁼^2;1

(

+

x

)⁽⁺⁾⁼²

I

⁽⁰¹⁾(

x

)

:

13 Gamma

The

G

( ),

>

0,

>

0, density is

f

(

x

^j ) = 1;(

)

x

^;1

e

^;^x=

I

⁽⁰¹⁾(

x

)

:

If

is known,

is a scale parameter. The Je reys, reference, and MDIP priors are all 1

=

, and the posterior is therefore

IG

(

n

1

=

^P_ni⁼¹

x

i).

I

(

) =

0

B

@

PG

(1

) 1

=

1

= =

²

1

C

A

where

PG

(1

x

) =^P¹_i⁼⁰(

x

+

i

)^;2 is the PolyGamma function.

(11)

Prior (

Uniform 1

Je reys ^p

PG

(1

)^;1

=

Reference¹ ^p(

PG

(1

)^;1)

==

proper³ Reference² ^p

PG

(1

)

=

1. Group ordering ^f ^g. 2. Group ordering ^f ^g.

3. See Liseo (1993) and Sun and Ye (1994b) for marginal posterior.

14 Generalized Linear Model

Let

y

¹

:::y

n be independent observations, with the exponential density

f

(

y

i^j

i ) = exp^f

a

^;1_i (

)(

y

i

i^;

b

(

i)) +

c

(

y

i

)^g

where the

a

i(),

b

() and

c

() are known functions, and

a

i(

) is of the form

a

i(

) =

=w

i, where the

w

i's are known weights. The

i's are related to regression coecients,

= (

¹

:::

p)^t, by the link function

i =

(

i)

i

= 1

:::n

where

i=

x

_ti

, and

x

_ti= (

x

i¹

:::x

ip) is a 1

p

vector denoting the

i

th row of the

n

p

matrix of covariates

X

, and

is a monotonic di erentiable function.

This model includes a large class of regression models, such as normal linear regression, logistic and probit regression, Poisson regression, gamma regression, and some proportional hazards models.

Ibrahim and Laud (1991) studied this model, using the Je reys prior. They focus on the case where

is known. The Fisher information matrix they obtained for

is given by

I

(

) =

^;1(

X

^t

WV

(

)²(

)

X

), where

W

is an

n

diagonal matrix with

i

th diagonal element

w

i.

V

(

) and (

) are

n

diagonal matrices with

i

th diagonal elements

v

i=

v

(

x

_ti

) =

d

²

b

(

i)

=d

²_i

(12)

and

i =

(

x

_ti

) =

d

i

=d

i, respectively. Je reys prior is thus given by (

)^/^j

X

^t

WV

(

)²(

)

X

^j¹⁼²

:

Ibrahim and Laud (1991) show that the Je reys prior can lead to an improper posterior for this model, but that the posterior is proper for most GLM's. A sucient condition for the posterior to be proper is that the integral

Z

Sexp^f

^;1

w

(

yr

^;

b

(

r

))^g(

d

²

b

(

r

)

dr

² ⁾¹⁼²

dr

be nite, the likelihood function of

be bounded above, and

X

be of full rank. Here

S

denotes the range of

.

In addition, Ibrahim and Laud (1991) give a sucient and necessary condition for the Je reys prior to be proper, namely

Z

S(

d

²

b

(

r

)

dr

² ⁾¹⁼²

dr <

¹

:

When

is unknown, the Fisher information matrix is

I

( ) =

0

B

@

I

¹( ) 0

0

I

²( )

1

C

A

where

I

¹( )

^;1(

X

^t

WV

(

)²(

)

X

) as above, and

I

²( ) =^;^Xⁿ

i⁼¹

f2

w

i

^;3(_

b

(

i)

i^;

b

(

i)) +

E

(

c

(

y

i

))^g

here _

b

(

i) =

db

(

i)

=d

i, and

c

=

@

²

c

(

y

i

)

=@

². The Je reys prior is then the square root of the determinant of

I

( ).

15 Inverse Gamma

The

IG

( ),

>

0,

>

0, density is

f

(

x

^j ) = 1

;(

)

x

⁽⁺¹⁾

e

^;1^=x

I

⁽⁰¹⁾(

x

)

:

(13)

If

is known, 1

=

is a scale parameter. The Je reys, reference, and MDIP prior is 1

=

, and the posterior is

IG

(

n

1

=

^P_ni⁼¹1

=x

i).

If

is unknown, the Fisher information matrix is the same as that of the Gamma distribution.

Thus the reference prior and the Je reys prior are the same as for the Gamma distribution.

16 Inverse Normal or Gaussian

The

IN

( ),

>

0,

>

0, density is

f

(

x

^j

) = (

2 )¹⁼²

x

^;3⁼²exp^f;

x

2 (

^;

x

¹⁾²^g

I

⁽⁰¹⁾(

x

)

:

The Fisher information matrix is given by

I

( ) = diag(

=

1

=

2

²).

Prior (

Uniform 1

Je reys 1

=

^p

Scale 1

=

proper²

Reference¹

^;1⁼²

^;1 1. See Liseo (1993).

2. See Sun and Ye (1994b) for marginal posteriors.

Dene

=

n

^;1,

= (

u x=

)^;1⁼²,

q

= (

x

)^;1,

x

= ^P

x

i

=n

. Banerjee and Bhattacharyya (1979) show that the marginal posterior of

has a left-truncated t-distribution with

degrees of freedom, location parameter 1

= x

, and scale parameter

q

, the point of truncation being zero.

i.e.,

(

^j

D

)^/(

nu=

2)^;ⁿ⁼²^f1 + 1

q

²⁽

^;¹

x

⁾²^g^;(⁺¹⁾⁼²

where

u

=

x

r^;1

= x

, and

x

r = ¹_n^P(1

=x

i). The marginal posterior of

is the modied gamma distribution

(

^j

D

)^/ (

nu=

2)⁼²

;(

=

2) !((

n= x

)¹⁼²)

H

(

) exp(^;

nu=

2)

⁼^2;1

where !() is the standard normal cdf, and

H

() denotes the cdf of Student's t-distribution with

degrees of freedom. The posterior mean and variance are also available see Banerjee and Bhattacharyya (1979) for more detail.

(14)

Another parametrization of inverse Gaussian distribution, the

IN

(

²),

>

0,

²

>

0, density is

f

(

x

^j

²) = (2

²)^;1⁼²

x

^;3⁼²exp^f;(

x

^;

)²

=

(2

²

x

)^g

I

⁽⁰¹⁾(

x

)

:

The Fisher information matrix is given by

I

(

²) = diag(

^;3

^;2

(2

⁴)^;1).

Prior (

Uniform 1

Je reys

^;3⁼²

^;3 Reference¹

^;3⁼²

^;2

1. See Datta and Ghosh (1993).

17 Linear Calibration

Consider the model

y

i=

+

x

i+

¹i

i

= 1

:::n y

⁰j =

+

x

⁰+

²j

j

= 1

:::k

where

,

y

i, and

y

⁰j are (

p

1) vectors,

x

i's are known values of the precise measurements, the

¹i and

²j are i.i.d.

N

p(0

²

I

p). The object is to predict

x

⁰.

Denote

x

=^P_ni⁼¹

x

i

=n

,

y

=^P_ni⁼¹

y

i

=n

,

y

⁰=^P_kj⁼¹

y

⁰j

=k

,

c

x=^P_ni⁼¹(

x

i^;

x

)², ^

=^P_ni⁼¹(

x

i^;

x

)(

y

i^;

y

)

=c

x, and

s

¹ and

s

² be the two sums of residual squared errors based on the calibration and predication experiments, respectively. The statistics ^

,

y

,

y

⁰, and

s

=

s

¹+

s

² are minimal sucient and mutually independent. Kubokawa and Robert (1993) then consider the reduced model

y

N

p(

²

I

p)

z

N

p(

x

⁰

²

I

p)

s

²

²_q

(15)

where

y

=

c

¹x⁼²

^,

z

= (

y

⁰ ^;

y

)(

n

^;1 +

k

^;1)^;1⁼²,

q

= (

n

+

k

^;3)

p

,

=

c

¹x⁼²

, and

x

⁰ = (

x

⁰^;

x

)

c

^;1x ⁼²(

n

^;1+

k

^;1)^;1⁼².

The Fisher information matrix for the reduced model is given by

I

(

²

x

⁰) =

0

B

@

1+x⁰²

2

I

p

0...

0

x

⁰

=

² 0 0 ^q²⁺²⁴^p 0

x

⁰

^t

=

² 0 ^jj

^jj²

=

²

1

C

A

:

Prior (

x

⁰

Uniform 1

Je reys (1 +

x

⁰²)⁽^p^;1)⁼²^jj

^jj(

²)^;(^p⁺³⁾⁼²

Reference¹ (

²)^;(^p⁺²⁾⁼²

=

^q1 +

x

⁰² proper

1. w.r.t. the group ordering ^f

x

⁰

(

²)^g. See Kubokawa and Robert (1993).

18 Location-Scale Parameter Models

Location Parameter Models

: The

LP

(

),

²

R

^p, density is

f

(

y

^j

) =

g

(

y

^;

X

)

where

X

is a (

n

p

) constant matrix and

g

is a

n

-dimensional density function.

Prior (

Uniform¹ 1 proper if

rank

(

X

^t

X

) =

p

Reference²

^t(

X

^t

X

)

]^;(^p^;2) proper if

rank

(

X

^t

X

) =

p >

2 1. Also Je reys and the usual reference prior.

2. Baranchik (1964) \shrinkage" prior also, the reference prior for (^jj

^jj

O

), where

=

O

(^jj

^jj

0

:::

0)^t. This also arises from admissibility considerations see Berger and Strawder-

(16)

man (1993).

Location-Scale Parameter Models

: The

LSP

(

),

²

R

^p,

>

0 density is

f

(

y

^j

) = 1

ⁿ

g

(1

⁽

y

^;

X

))

where

X

is a (

n

p

) constant matrix and

g

is a

n

-dimensional density function.

Prior (

Uniform 1

Je reys 1

=

⁽^p⁺¹⁾

Reference¹ 1

=

Reference² ¹

^t(

X

^t

X

)

]^;(^p^;2) Reference³ (

) ^/

^;1(

c

¹

²+

c

²)

1. Reference with respect to the group ordering^f

^g also the prior actually recommended by Je reys (1961).

2. Baranchik (1964) \shrinkage" prior also, the reference prior with respect to the parameters ^fjj

^jj

O

^g (see Location Parameter Models).

3.

=

is parameter of interest, selecting rectangular compacts on

and nuisance parameter

c

¹

²+

c

²

². See Datta and Ghosh (1993).

Scale Parameter Models

:

The

SP

(

¹

:::

p),

i

>

0 for ⁸

i

, density is

f

(

y

^j

¹

:::

p) = (^Y^p

i⁼¹

^;_i ⁿⁱ)

g

( 1

¹

y

_~¹

¹²

y

_~²

:::

¹p

y

_~p)

where

y

_~i are (

n

i1) vectors,

i

= 1

:::p

,

g

is a

n

¹+ +

n

p-dimensional density function.

(17)

Prior (

¹

:::

p) (Marginal) Posterior

Uniform 1

Je reys

Reference ^Q^p_i⁼¹

_i^;1 MDIP

19 Logit Model

Poirier (1992) analyzes the Je reys prior for the conditional logit model as follows. An experiment consists of

N

trials. On trial

n

exactly one of

J

n discrete alternatives is observed.

Let

y

nj be a binary variable which equals unity i alternative

j

is observed on trial

n

otherwise

y

nj equals zero. Let

z

n =

z

_tn¹

:::z

_tnJ_n]^t, with each

z

nj being a

K

1 vector.

is a

K

1 vector of unknown parameters. The probability of alternative

j

on trial

n

is specied to be

p

nj(

) =

Prob

(

y

nj = 1^j

z

n ) = exp(

z

_tnj

)

PJi⁼¹n exp(

z

_tni

)

:

When

J

n

J

and

z

nj =

e

j

x

n, where

e

j is a (

J

^;1)1 vector with all elements equal to zero except the

j

th which equals unity and

x

n is a

M

1 vector of observable characteristics of trial

n

, this is the multinomial logit model.

Poirier (1992) gives the Je reys prior for this model as (

)^/

N

X

n⁼¹ Jn

X

j⁼¹

p

nj(

)

z

nj^;

z

~n(

)]

z

nj^;

z

~n(

)]^t

1=²

where ~

z

n(

) =^P^J_i⁼¹ⁿ

p

ni(

)

z

ni. The following special cases are also given therein.

Multinomial Logit Without Covariates

:

Suppose

K

=

J

^;1 and

z

nj =

e

j then the Je reys prior reduces to the proper density (

) = ;(

J=

2)

J=² ]^Y^J

j⁼¹

p

j(

)]¹⁼²

(18)

where

p

j =

p

nj for any

n

. The density for

p

= (

p

¹

:::p

J^;1) is the Dirichlet density (

p

) = ;(

J=

2)

J=² J

Y

j⁼¹

p

^;1_j⁼²

:

In the binomial case,

(

) = ^;1^f

p

¹(

)1^;

p

¹(

)]^g¹⁼² = exp(

=

2) 1 + exp(

)]

and

(

p

) = ^;1

p

¹(1^;

p

¹)]^;1⁼²

:

The other competing noninformative prior are: the uniform prior on

the uniform prior on

p

¹ and the MDIP prior for

p

¹. See Geisser (1984) for discussion.

Logistic Regression

:

Suppose

y

i^j

i.i.d. Bernouli ( i), with

i=

e

^x^ti

1 +

e

^x^ti

: i

= 1

:::n:

Then the likelihood function is

f

(

y

^j

) =

e

^Pⁿⁱ⁼¹^yⁱ^x^ti

=

^Yⁿ

i⁼¹(1 +

e

^x^ti)

and the Fisher Information matrix is

I

(

) = (

x

¹

:::x

n)

0

B

@

1(1^; ¹) ...

n(1^; n)

1

C

A 0

B

@

x

^t¹

x

..._tn

1

C

A

:

The Je reys prior is the square root of the determinant of above matrix.

For more special cases of the logit model, see Poirier (1992).

(19)

20 Mixed Model

Consider mixed model

y

ijk=

B

ijk

b+

W

ijk

w+

T

i+

C

j+

ijk

i

= 1

:::I j

= 1

:::J k

= 1

:::t

ij

assuming

ijk

N

(0

²), and independently,

C

j

N

(0 ²_c), with

b,

w,

T

i,

², and

²_c the parameters of interest.

Box and Tiao (1992) used the following prior in the balanced mixed model case, with

t

ij =

n

⁸

ij

,

(

_c²

²)^/ 1

²(

²+

n

²_c)

:

Chaloner (1987) used three priors for this model, (

²+

_c²)^;1,

^;2(

²+

²_c)^;1, and

^;2(

²+

_c²)^;3⁼².

Yang and Pyne (1996) derived the Je reys prior, (

²_c

²) ^/

2

6

4Det

J

X

j⁼¹

0

B

@

t²_j

2(tj ²c⁺ ²⁾² t_j

2(tj ²c⁺ ²⁾² tj

2(tj ²c⁺ ²⁾² tj^;1

2

4 +²⁽_t_j _c²¹⁺ ²⁾²

1

C

A

3

7

5 1=²

where

t

j =^P_Ii⁼¹

t

ij, and reference prior,

(

_c²

²)^/ 1

²

v

u

t

J

X

j⁼¹

t

²_j (

t

j

_c²+

²)²

:

21 Mixture Model

For arbitrary density functions

p

¹(

x

) and

p

²(

x

), consider the model

p

(

x

^j

) =

p

¹(

x

) + (1^;

)

p

²(

x

)

:

Bernardo and Gir$on (1988) discussed the reference prior for this model. They found that the reference prior is always proper.