Attention-dependent modulation of cortical taste circuits revealed by Granger causality with signal-dependent noise.

(1)

Circuits Revealed by Granger Causality with

Signal-Dependent Noise

Qiang Luo1,2, Tian Ge3,4, Fabian Grabenhorst5, Jianfeng Feng1,3,4*, Edmund T. Rolls4,6*

1Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, PR China,2College of Information Systems and Management, National University of Defense Technology, Hunan, PR China,3Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, PR China,4Department of Computer Science, University of Warwick, Coventry, United Kingdom,5Department of Physiology, Development and Neuroscience, University of Cambridge, Cambridge, United Kingdom,6Oxford Centre for Computational Neuroscience, Oxford, United Kingdom

Abstract

We show, for the first time, that in cortical areas, for example the insular, orbitofrontal, and lateral prefrontal cortex, there is signal-dependent noise in the fMRI blood-oxygen level dependent (BOLD) time series, with the variance of the noise increasing approximately linearly with the square of the signal. Classical Granger causal models are based on autoregressive models with time invariant covariance structure, and thus do not take this signal-dependent noise into account. To address this limitation, here we describe a Granger causal model with signal-dependent noise, and a novel, likelihood ratio test for causal inferences. We apply this approach to the data from an fMRI study to investigate the source of the top-down attentional control of taste intensity and taste pleasantness processing. The Granger causality with signal-dependent noise analysis reveals effects not identified by classical Granger causal analysis. In particular, there is a top-down effect from the posterior lateral prefrontal cortex to the insular taste cortex during attention to intensity but not to pleasantness, and there is a top-down effect from the anterior and posterior lateral prefrontal cortex to the orbitofrontal cortex during attention to pleasantness but not to intensity. In addition, there is stronger forward effective connectivity from the insular taste cortex to the orbitofrontal cortex during attention to pleasantness than during attention to intensity. These findings indicate the importance of explicitly modeling signal-dependent noise in functional neuroimaging, and reveal some of the processes involved in a biased activation theory of selective attention.

Citation:Luo Q, Ge T, Grabenhorst F, Feng J, Rolls ET (2013) Attention-Dependent Modulation of Cortical Taste Circuits Revealed by Granger Causality with Signal-Dependent Noise. PLoS Comput Biol 9(10): e1003265. doi:10.1371/journal.pcbi.1003265

Editor:Olaf Sporns, Indiana University, United States of America

ReceivedJanuary 19, 2013;AcceptedAugust 20, 2013;PublishedOctober 24, 2013

Copyright:ß2013 Luo et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding:QL is partly supported by grants from the National Natural Sciences Foundation of China (No. 11101429, No. 11271121, No. 71171195), Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20114307120019), and National Basic Research Program of China (No. 2011CB707802). FG was supported by the Gottlieb-Daimler- and Karl Benz-Foundation. JF is a Royal Society Wolfson Research Merit Award holder, partially supported by the National Centre for Mathematics and Interdisciplinary Sciences (NCMIS) of the Chinese Academy of Sciences and Key Program of National Natural Science Foundation of China (No. 91230201). The fMRI investigation was supported by the McDonnell Centre for Cognitive Neuroscience at the University of Oxford, and was performed at the Centre for Functional Magnetic Resonance Imaging of the Brain (FMRIB) at the University of Oxford. Support was also received from the Oxford Centre for Computational Neuroscience. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests:The authors have declared that no competing interests exist.

* E-mail: [email protected] (JF); [email protected] (ETR)

Introduction

In the past decade, Granger causality (GC) has emerged as a widely used method for causal inferences, and has been applied to biological time series obtained from many different types of investigation, for example, to the fMRI blood-oxygen level dependent (BOLD) signals to detect effective connectivity between brain areas and thus to shed light on how the brain works [1–4]. The basic idea of GC can be traced back to Wiener [5], who conceived the notion that if the prediction of one time series can be improved by incorporating the past history of a second one, then the second time series has a causal influence on the first. Granger later formulated this idea in the context of linear autoregressive (AR) models [6]. GC is completely data-driven and based on time precedence. The interactions discovered by GC may be unidirectional or reciprocal. GC is easy to implement, relies on a small set of straightforward assumptions, and does not need any knowledge about how the data are generated. Therefore,

it can be applied directly to almost any time series data [7]. However, over-simplification of the model may result in an incorrect use or interpretation of GC and even spurious causal inferences in some situations [8–10]. Care is therefore needed in the use of GC.

(2)

classical GC based on a simple AR model, which does not deal with time series data with changing volatility (variance). Moreover, although it has been widely observed and investigated that the signal-dependent noise plays important roles in neuronal activities [14–16], it is still unclear whether this property carries through to fMRI BOLD signals, after the neuronal signals are delayed and smoothed by the haemodynamic response function.

In this paper, we provide empirical evidence that the variance of the noise in the fMRI BOLD time series increases linearly with the square of the signal in a number of cortical areas, such as the insular taste, orbitofrontal, and lateral prefrontal cortical areas. In this context we present a Granger causal model with signal-dependent noise to detect GC in both the mean and variance of data with time varying volatility. We also propose a likelihood ratio test to infer GC with signal-dependent noise accurately and efficiently. We show, by simulation studies, that this novel method substantially outperforms classical GC when signal-dependent noise is present.

The new method is evaluated with an fMRI investigation [17] to identify the source of the top-down selective attentional control that differentially biases brain systems involved in affective vs sensory analysis [17–19]. Instructions to pay attention to and later rate the pleasantness of a taste increased the activations to taste stimuli measured with fMRI in the orbitofrontal and pregenual cingulate cortices [17], where the subjective pleasantness of taste is represented [20–24], but not the primary taste cortex in the anterior insula [17], where the subjective intensity and identity of taste are represented [20–22,24–26]. Instructions to pay attention to and later rate the intensity of a taste increased the activations to taste in the insular taste cortex but not in the orbitofrontal and pregenual cingulate cortices [17]. Our new method reveals how the effective top-down connectivity changes when attention is paid to the pleasantness vs the intensity of a taste, and helps in the interpretation of the source of the signals that implement top-down attention.

Materials and Methods

Granger causality with signal-dependent noise

Classical Granger causality. We start with a brief review of classical GC. Consider the following zero-mean vector autore-gressive model (VAR) of orderp:

xt~ Xp

j~1

Ax,jxt{jzEEx,t, t~1,2, , ð1Þ

where xt is a dx-dimensional column random vector,

Ax,j,j~_1,_,_pare fixeddx|dx coefficient matrices, andEE_x,tis adx-dimensional independent identically distributed (i.i.d) white

noise or innovation process, with a positive definite and time invariant covariance matrix Sx. We require that this VAR(p)

process is stable, that is,

det I{Ax,1z{ {Ax,pzp

=0, DzDƒ1, ð2Þ

wheredet(:)is the determinant of a matrix,Iis an identity matrix, andzis a complex variable. This stability condition implies that the VAR(p) process is weakly stationary, i.e., its first and second order moments exist and are time invariant [27].

Now, assumext andyt admit a jointly stable VAR

represen-tation.xt can thus be modeled as

xt~ Xp

j~1

Axx,jxt{jz Xp

j~1

Axy,jyt{jzEExy,t, t~1,2, , ð3Þ

wherey_tis ady-dimensional column random vector, andEE_xy,tis a white noise process with a covariance matrixSxy.

Classical GC depends on temporal precedence and predictabil-ity. The idea is that a cause cannot come after the effect. Thus, if

y_t affectsxt, including the past information ofytshould improve

the predictions ofxt. More formally, if the prediction error ofxtis

reduced when the past information ofy_t is taken into account, then y_t has a causal influence on xt in the sense of Granger.

Formulating the idea in the context of a VAR model, the causal influence fromy_t toxt in the time domain can be quantified as

[28,29]

Fy?_x~log

det(Sx) det(Sxy)

: ð4Þ

Fy?_xw0indicates a causal influence fromy_ttox_t, andF_y?_x~0

otherwise. Note that model (1) is a restricted version of model (3), and that y_t does not cause xt if and only if Axy,j:0 for all

j~1, ,p[27].

When the white noise is Gaussian distributed, it has been shown that the GC measure in Eq. (4) is equivalent to the likelihood ratio test statistic [30]

Ry?_x~

L bhh_restrictedDfxtgTt~1

L bhhfullDfxtgTt~1,fytg T t~1

, ð5Þ

whereL bhh_restrictedDfxtgTt~1

:Pr fxtgTt~1Dbhhrestricted

is the

likeli-hood function, i.e., the probability of the observed time series

fxtgTt~1, given the maximum likelihood estimate of the parameters bhh_restricted of the restricted model (1). L bhhfullDfxtgTt~1,fytg

T t~1

:

Pr fxtgTt~1Dbhhfull,fytg T t~1

is interpreted similarly under the full

model (3). Therefore, a likelihood ratio test can be used for causal inference:

Author Summary

(3)

ry?x~{2

logL bhhrestrictedjfxtgTt~1

{_log_L bhhfulljfxtgTt~1,fytg T t~1

h i

: ð6Þ

The test statisticry?_xis approximately chi-squared distributed, with

degrees of freedomdffull{dfrestricted, wheredffullanddfrestrictedare

the number of free parameters of the full model (3) and the restricted model (1), respectively.

The signal-dependent noise model. To relax the assump-tion of a time invariant covariance matrix in the AR model, Engle invented the first changing volatility model — the autoregressive conditional heteroskedasticity (ARCH) model [31], which was then extended to generalized ARCH (GARCH) models [32,33] as well as multivariate cases [34–36]. Assumert is ad-dimensional

zero mean, serially uncorrelated process, which may be the residual process of some dynamic model and can be represented as

rt~S1tDt={21EEt, ð7Þ

whereEE_tis ad-dimensional i.i.d white noise process,S_tDt{1is the conditional covariance matrix ofrt, givenVt{1~frt{1,rt{2, g,

andS1_tDt=_{12 is the symmetric positive definite square root ofStDt{1.

Then a multivariate ARCH process of orderqtakes the form

vech S_tjt{1

~

c₀zC1vech rt{1rt{1

z zCqvech rt{qrt{q

, ð8Þ

wherevech(:)denotes the half-vectorization operator which stacks the columns of a square matrix from the diagonal downwards in a vector,c0is a12d(dz1)-dimensional column vector of constants

andCj,j~_1,_,_qare1

2d(dz1)|12d(dz1)coefficient matrices.

It can be seen that even for a bivariate series with a low order, this general model has a fairly large number of parameters. Therefore, more restricted models were proposed. For example, Bollerslev et al. considered diagonal ARCH processes where all theCjmatrices

are diagonal [35]. To guarantee the positive definiteness of the conditional covariance matrix StDt{1, Baba, Engle, Kraft and

Kroner investigated the following variant of a multivariate ARCH model, known as the BEKK model [37,38]

StDt{1~C0zC1rt{1rt{1C1z zzCqrt{qrt{qCq, ð9Þ

where is the matrix transpose, C0~C0C0 is positive definite,

and all Cj,j~1, ,q are d|d matrices. In contrast to the diagonal model, the BEKK model produces interactions between second order moments and can generate rich volatility dynamics. We now present a Granger causal model with signal-dependent noise [13]. Consider the following multivariate model with time varying volatility, in particular, signal-dependent noise:

xt~Ppj~1Ax,jxt{jzrx,t, rx,t~S1x,tDt=2{1EEx,t, Sx,tDt{1~CxCxzPPj~1q Bx,jxt{jxt{jBx,j,

ð10Þ

wherextis adx-dimensional column random vector,ex,t is adx

-dimensional Gaussian distributed white noise process with zero

mean and unit variance, p and q are the model orders,

Ax,j,j~1, ,p,Bx,j,j~1, ,qandCx are coefficient

matri-ces. The volatility model is a modification of the BEKK model

[38] in which the conditional covariance matrixSx,tDt{1does not

regress on the residual processrx,t but only depends on the past

history of the process xt before time t. Hence, the covariance

(second order statistics) of the noise process is coupled to the mean (first order statistics). This form also guarantees the positive definiteness of Sx,tDt{1. Clearly, when Bx,j:0 for all j, the

conditional covariance is time invariant and the model reduces to the AR model. In the light of these points, we term our model (10) the AR-BEKK model.

We now summarize our use of the terms ‘signal’ and ‘noise’ in the reminder of the paper for the model and in the empirical analysis of the fMRI data. We assume that the observed time series xtis a realization from the following general process:

xt~f(Ut{1)zs1=2(Ut{1)Et, ð11Þ whereUt~fxt,xt{1, g, Et is an i.i.d white noise process,f(:)

ands(:)can be any continuous functions,s(:)is nonnegative. We define f(Ut{1) to be the ‘signal’, and s1=2(Ut{1)E_t to be the ‘noise’. The variance of the noise is s(Ut{1)~varðxtDUt{1)~ var(xt{f(Ut{1)Þ. Given the past history of the time series, the

signal is a deterministic process while the noise is what cannot be predicted and produces the variation across realizations. Empir-ically, since f(Ut{1)~E(xtDUt{1), the signal is estimated by

projecting xt onto the subspace spanned by Ut{1. The noise is

estimated by the residual of the projection. In the Results section, we investigate different subspaces spanned by Ut{1 to provide

empirical evidence for the signal-dependent noise in fMRI BOLD time series. In the model, we specify particular forms of the functionsf(:)ands(:). In particular, according to model (10), we assumef(Ut{1)is a linear function ofxt{1, i.e.,f(Ut{1)~axt{1,

and we assume s(Ut{1) is a quadratic function of xt{1, i.e., s(Ut{1)~c2zb2x2_t{1. Therefore, in the model, the signal is

estimated byaxt{1and the variance of the noise is estimated by

c2z_b2_x2_t{1. In the Results section, we show the concordance of the definitions of ‘signal’ and ‘noise’ in the model and in the empirical data analysis, that is, in spite of the simplified forms of f(:)ands(:), our model captures a large portion of the variance in the empirical signal and noise. We note that classical Granger causality assumes that the variance of the noise as just defined is constant across the time course of the process (e.g., the time course of an fMRI trial), and that Granger causality with signal-dependent noise allows causality to be calculated more powerfully if the variance of the noise within a trial is not constant.

Granger causality with signal-dependent noise. To define the causal relationship between xt and another dy

-dimensional time seriesy_t, consider the following joint AR-BEKK model:

zt~ Xp

j~1

Ajzt{jzrt, ð12Þ

where zt~ xt ,yt

, rt~ rxy,t,ryx,t

, rxy,t~S1xy,tDt=2 {1EExy,t, ryx,t~S1yx,tDt{1=2 EEyx,t, EE_xy,t and EE_yx,t are d_x-dimensional and d_y -dimensional independent Gaussian distributed white noise pro-cesses with zero mean and unit variance respectively, and

Sxy,tDt{1~CxyCxyz Pq

j~1Bxy,jzt{jzt{jBxy,j, Syx,tDt{1~CyxCyxzPqj~1Byx,jzt{jzt{jByx,j:

ð13Þ

(4)

all coefficient matrices. The causal influence fromy_ttoxtcan be

defined as [13]

Fy?_x~log

det C_xCx

det C_xyCxy

2 4

3

5: ð14Þ

Fy?_xw0ify_thas a causal effect onx_t, andF_y?_x~0otherwise. It

can been seen thaty_t can help improve the prediction of xt by

impacting on either its mean activity through the coefficients inAj,

or its variance through the coefficients inBxy,j. These two cases

correspond to causality in the mean and variance respectively. When the noise inxt is not signal-dependent, i.e.,Bxy,j:0and Byx,j:0 for allj, the model reduces to a VAR model and the definition of causality coincides with classical GC in the time domain (See Eq. (4)).

Stability conditions. To guarantee the stability of the AR-BEKK model, we provide the stability condition for a simple first-order model, i.e.,p~_q~₁in Eq. (12). This is the model that we usually use for fMRI data analysis considering the poor temporal resolution of BOLD signals, and the relatively fast signal transmission between groups of neurons [2,4,39,40]. In the remainder of this paper, both in simulations and real data analysis, we focus on this first-order model unless otherwise specified. We also assume thatEE_xy,tandEE_yx,tare uncorrelated. The stability of the model involves both the first and second order stability conditions, i.e., the unconditional mean and covariance of

zt exist and are time invariant. For the first order stability

condition, it follows from the theory of the AR model that all the eigenvalues ofA1have modulus less than 1. For the second order

stability, note that

cov(ztDUt{1)~cov(rtDUt{1)~diag Sxy,tDt{1,Syx,tDt{1

, ð15Þ

and

cov(ztDUt{1)~E(ztztDUt{1){E(ztDUt{1)E(z_t DUt{1), ð16Þ

whereUt{1~(zt{1,zt{2, ). Therefore,

E(ztzt jUt{1) ~

A1zt{1zt{1A1zdiagfCxyCxyzBxy,1zt{1zt{1Bxy,1, C_yxCyxzByx,1zt{1z_t{1Byx,1

o :

ð17Þ

Taking the expectation on both sides yields

E(ztzt ) ~

A1E zt{1zt{1

A₁zdiagfC_xyCxyzBxy,1E zt{1zt{1

Bxy,1,

C_yxCyxzByx,1E zt{1zt{1

Byx,1 :

ð18Þ

This can be transformed into the following equation using the vectorization operator, which stacks the columns of a square matrix into a column vector:

vec E(ztzt )

~

A16A1vec E zt{1zt{1

zvec diag C_xyCxy,C_yxCyx

n o

h i

zhdiagnB_xy,1,B_yx,1o6diagnB_xy,1,B_yx,1oi:T:vec E zt{1zt{1

:~AAezBBevec E zt{1z_t{1

zCCe,

where6is the Kronecker product,

T~ I 0 6 I 0 z 0 I 6 0 I :

Therefore, it is required that all the eigenvalues ofAAezBBe have modulus less than 1.

Model estimation. Using Bayes’ theorem, the joint density function ofr1, ,rT is

f(r1, ,rT)~f(r1)f(r2Dr1) f(rTDrT{1, ,r1): ð19Þ

Thus, the conditional distribution ofrtgivenUt{1is Gaussian and

if theutare observed quantities, the log-likelihood function of the

AR-BEKK model described by Eq. (12), for a sampleu1, ,uTis

given by

logL hfulljfxtgTt~1,fytg T t~1

~

XT

t~1

logLt hfulljfxtgTt~1,fytg T t~1

,

ð20Þ

wherehfullis a vector of all unknown parameters of the model (12)

and

logLt hfullDfxtgtT~1,fytgTt~1

~{dxzdy

2 log 2p{ 1

2logDSxy,tDt{1D{ 1 2ut S

{1 xy,tDt{1ut,

ð21Þ

where the required initial values for specifyingStDt{1are assumed

to be available. The likelihood function may be maximized with respect to the parameters h_full by using numerical methods. Specifically, the initial values ofA1are given by the least square

estimates ofzt~A1zt{1zrt, assuming a simple AR model, and Bxy,1,Byx,1and Cxy,Cyxare then initialized to diagonal matrices

whosel-th element on the diagonal is the least square estimate of

r2_xy,t h i

l ~ C2_xy

h i

ll z B2_xy,1

h i

ll z 2 t{1

l, ð22Þ

and

r2_yx,t h i

l ~hC2_yxi

ll

zhB2_yx,1i ll z

2 t{1

l, ð23Þ

using the residuals rt from the AR fitting. The constrained

maximum likelihood estimation of the model parameters can be obtained by solving the optimization problem

bhh_full~argmaxh full

XT

t~1

(5)

while satisfying the first and second order stability conditions derived above. We use Matlab function fmincon with the interior-point algorithm to tackle this restricted optimization problem. (Note that sometimes fmincon with the interior-point algorithm may fail to converge to a reasonable solution. In this case, we use the active-set algorithm as an alternative.) The parameters of the restricted model (10) can be estimated similarly.

Causal inferences. The nonparametric bootstrap method for causal inferences used in [13] is time-consuming. Here we develop an analog of the likelihood ratio test for classical GC (See Eq. (6)) to improve computational efficiency. Similarly, the likelihood ratio test statistic takes the form

ry?x~{2 logLL bhhrestrictedjfxtgTt~1

{logLL bhh_full_jf_x_t_gT

t~1,fytg T t~1

h i

~{2PPT_t~1 logLLt bhhrestrictedjfxtgTt~1

{log_L_Lt bhhfulljfxtgTt~1,fytg T t~1

h i

:

ð25_Þ

The test statistic approximately follows a chi-squared distribution, and the degrees of freedom are 2dxdy. Therefore, a parametric

chi-squared test can be carried out to test the significance of the causal influence. This likelihood ratio test also has a connection to the transfer entropy between time series [30]. However, since the residual process in the AR-BEKK model is not a Gaussian white noise, the likelihood ratio test is not equivalent to the measure defined in Eq. (14).

To test the difference between the causalities in the opposite directions between brain areas, note that the difference of the two causality measures isrdiff~1₂ry?x{

1

2rx?y, wherery?xandrx?y

are two chi-squared distributed random variables with the same degrees of freedom. Therefore, the distribution function ofrdiff is

Tm(x)~ 1 2mpffiffiffi_p_C _m_z1

2

xmKm(x), ð26Þ

whereKmis a modified Bessel function,C(:)is a Gamma function,

and m~dxdy{1₂ [41]. A table for the two-sided one and five

percent quantile of this distribution can be found in [41]. For example, in the investigation of a pair of univariate time series, i.e., dx~dy~1, a difference measure of 4.61 implies ap-value of 0.01.

Simulation studies

Methodology assessment. We illustrate the Granger causal model with signal-dependent noise and the likelihood ratio test by a simulation study. Consider the following first-order AR-BEKK model for two univariate time seriesxt andyt:

xt

yt

~ 0:1 0

0 0:1pffiffiffi2

_x

t{1

yt{1

z rxy,t

ryx,t

, rxy,t

ryx,t

~

S1_xy,tDt{1=2 EE_xy,t S1yx,tDt{1=2 EEyx,t 2

4

3 5,

S1_xy=2_,_tDt{1 ~ 1z½q1, (1{q1)q3½xt{1,yt{1½xt{1,yt{1 ½q1, (1{q1)q3 ,

S1_yx=2_,_tDt{1 ~ 1z½(1{q2)q4,q2½xt{1,yt{1½xt{1,yt{1 ½(1{q2)q4,q2 ,

where q1 and q2 are random numbers uniformly distributed in

½0, 1, andq2andq4are random numbers with the probability of

0.6 to be 0 and 0.4 to be 1. It is clear that there is a causal influence fromyttoxtif and only if(1{q1)q3=0, and fromxtto

ytif and only if(1{q2)q4=0.

We generated 100 models with differentqi,i~_1,_{, 4}, and for each model we generated time series of 1000 points with 2 replicates. We then fitted both the classical Granger causal model and the signal-dependent noise model to the data. Using different p-value thresholds, the performance of the two models was compared by the ROC (Receiver Operating Characteristic) curve [42].

An illustration of how signal-dependent noise may arise in BOLD activations. In the Results section below, we provide empirical evidence for the presence of signal-dependent noise in fMRI BOLD time series. In order to illustrate how signal-dependent noise may arise in BOLD activations from the underlying neuronal firing, we performed the following simula-tions. These simulations were on a simple model developed for the purposes of illustration. The concept is to investigate how the close to Poisson firing of neuronal spikes of neurons in the cortex for a given mean firing rate [11,12] might be reflected in a signal produced by feeding the spiking neuronal activity into a widely used generative biophysical model describing the hemodynamic response [43]. This hemodynamic model links neuronal activity to blood flow and incorporates the well established Balloon model [44,45]. For the Poisson spiking, the variance of the spike counts in a time window increases linearly with (and is equal to) the mean spike count.

We simulated spike trains of neurons following Poisson processes with mean firing rates of 5 Hz, 40 Hz and 80 Hz respectively for 1 second. The spike trains were then fed into the following biophysical model describing the hemodynamic response induced by the neuronal activity [43]:

ds

dt~z{ks{c(f{1) df

dt~s tdv

dt~f{v 1=a tdq_dt~f

r 1{(1{r)

1=f

h i

{qv1=a{1 8 > > > > > < > > > > > :

ð27Þ

wherezis the input spike trains;sis the vasodilatory signal;f is the cerebral blood inflow (CBF);vis the cerebral blood volume (CBV); qis the deoxyhemoglobin (dHb) content;1=k is time constant of signal decay; 1=c is the time constant of the feedback auto-regulatory mechanism; t is the mean transit time in the post-capillary venous compartment;ais the Grubb’s parameter andr

is the resting net oxygen extraction fraction by the capillary bed. Finally, the observed BOLD time series is a nonlinear function of the CBV and dHb content:

xt~V0½k1(1{q)zk2(1{q=v)zk3(1{v)zzt, ð28Þ

wherez_t is the observation noise, V0&0:02 is the resting blood volume fraction,k1&7r,k2&2,k3&2r{0:2for 1.5-T scanners. All biophysical parameters were set to their typical values (k~0:65;c~0:41;t~0:98;a~0:32; r~0:34) [46]. For each of these firing rates, the simulation was repeated 100 times, producing 100 simulated trials of the fMRI BOLD time series with temporal resolution 1 ms for a total of 25 seconds. We downsampled the time series to a sampling rate of 1 Hz to reflect the temporal resolution of real BOLD signals. We then empirically estimated the signal and noise as defined above. In particular, we investigated different subspaces spanned by Ut{1, including (1)

(6)

the projection space. Any non-random relationship between the empirically estimated signal E(xtDUt{1) and the empirically

estimated variance of the noise var(xtDUt{1) indicates the

presence of signal-dependent noise. In particular, we tested if there exists a significant correlation between ½E(xtDUt{1)2 and var(xtDUt{1).

fMRI experiment

The fMRI dataset is the same as that obtained and used in previous investigations [17,47,48]. We describe key imaging acquisition, preprocessing and psychophysiological interaction (PPI) analyses for completeness. We refer the readers to previous publications for the full details.

Participants and ethics statement. Twelve healthy volun-teers (6male and6female, age range21{35) participated in the study. Ethical approval (Central Oxford Research Ethics Com-mittee) and written informed consent from all subjects were obtained before the experiment. The subjects had not eaten for three hours before the investigation.

Experimental design. We used the identical taste stimulus,

0:1 M monosodium glutamate (MSG) with 0:005 M inosine monophosphate (see [49]), referred to throughout this paper for brevity as monosodium glutamate, in two different types of trial. A trial started 5 seconds before the taste delivery with the visual attentional instruction either ‘‘Remember and Rate Pleasantness’’ or ‘‘Remember and Rate Intensity’’, which was shown until the end of the taste period. The0:75 mltaste stimulus was delivered at t~5 s. The taste period was fromt~5 suntilt~14 s, and in this period a red cross was also present indicating that swallowing should not occur. The differences between the activations in this period were a measure of the effects of the top-down selective attention instructions while the taste was being delivered. (We note that in order to utilize top-down attention, one needs to hold the object of attention in mind, in this case pleasantness or intensity. This requires a short-term memory. Short-term memory is thus a sine qua non of selective attention [50,51], and it is the source of this top-down bias from a short-term memory system in which we are interested in this investigation.) After the end of the taste period, the visual instruction and red cross were turned off, and a green cross was shown cueing the subject to swallow. After2 sa tasteless rinse was delivered with a red cross, and the rinse period was fromt~16 suntilt~23 s, when the green cross appeared to cue a swallow. After this the rating of pleasantness or intensity was made using button-press operated visual analog, rating scales ranging continuously from z2 (very pleasant) to {2 (very unpleasant) for pleasantness, and4(intense) to0(very weak) for intensity as described previously [52]. These two trial types were interspersed in random permuted sequence with other trials that were part of a different investigation, and each was presented9

times. As different trial types were being run in the scanner at the same time, and included different stimuli, and no instructions were given about the number of stimuli being used, or that the stimuli were the same on the ‘‘Remember and Rate Intensity’’ and ‘‘Remember and Rate Pleasantness’’ trials, the participants simply had to concentrate on following the instructions about what aspect of the taste stimulus, intensity or pleasantness, had to be rated on that trial. The protocol and design are described in [17], and have been used successfully in previous studies to investigate taste cortical areas [22,53–55].

fMRI data acquisition. Images were acquired with a3:0-T VARIAN/SIEMENS whole-body scanner at the Centre for Functional Magnetic Resonance Imaging at Oxford (FMRIB), where27T2weighted EPI coronal slices with in-plane resolution of3|3 mmand between plane spacing of4 mmwere acquired

every 2 seconds (TR~_{2 s}). We used the techniques that we have developed over a number of years [49,53], and as described in detail by [56] we carefully selected the imaging parameters in order to minimize susceptibility and distortion artefact in the orbitofrontal cortex. The relevant factors include imaging in the coronal plane, minimizing voxel size in the plane of the imaging, as high a gradient switching frequency as possible (960 Hz), a short echo time of28 ms, and local shimming for the inferior frontal area. The matrix size was 64|64 and the field of view was

192|192 mm. Continuous coverage was obtained fromz62(A/ P) to{46(A/P). A whole brainT2_{weighted EPI volume of the}

above dimensions, and an anatomicalT1volume with coronal plane slice thickness3 mmand in-plane resolution of 1|1 mm

were also acquired.

fMRI data preprocessing. The imaging data were analyzed using SPM5 (Statistical Parametric Mapping, Wellcome Trust Centre for Neuroimaging, London. http://www.fil.ion.ucl.ac.uk/ spm/). Preprocessing of the data used SPM5 realignment, reslicing with sinc interpolation, normalization to the Montreal Neurolog-ical Institute (MNI) coordinate system [57], and spatial smoothing with a 6 mm full width at half maximum (FWHM) isotropic Gaussian kernel. Time series non-sphericity at each voxel was estimated and corrected for [58], and a high-pass filter with a cut-off period of128seconds was applied.

fMRI data analysis. To investigate task dependent activa-tions of brain areas during the taste period, a Finite Impulse Response (FIR) analysis was performed as implemented in SPM, in order to make no assumption about the time course based on the temporal filtering property of the haemodynamic response function [59,60]. Thea prioridefined areas of interest (ROI) for which we reported results [17] included brain areas where activations to taste stimuli have been found in previous studies including the medial and lateral orbitofrontal cortex, the pregenual part of the cingulate cortex, and the taste and oral somatosensory parts of the insular cortex [22,49,53–55,61,62]; and areas of the lateral prefrontal cortex where activations related to task set, attentional instructions, and remembering rules that guide task performance have been found, including specifically parts of the middle and inferior frontal gyrus [63–70]. A contrast of trials where attention was being paid to taste pleasantness with trials where attention was to intensity revealed significant effects in the orbitofrontal cortex [26, 14,220]. The reverse contrast of trials where attention was to intensity vs trials where attention was to pleasantness revealed significant effects in the right anterior insular taste cortex [42, 18,214] [17].

We then performed PPI analyses [71,72], using the above two brain areas as seed regions, to investigate task-dependent functional connectivity of these areas with other brain areas, that might provide the source of the top-down modulation [47]. We identified an anterior lateral prefrontal cortex (AntLPFC) region at Y~53 mm in which the correlation with activity in the orbito-frontal cortex (OFC) seed region was greater when attention was to pleasantness than to intensity [47]. Conversely, in a more posterior region of lateral prefrontal cortex (PostLPFC) atY~34 mm the correlation with activity in the anterior insula (AntINS) seed region was greater when attention was to intensity than to pleasantness [47]. The locations of the seed regions and the identified foci in AntLPFC and PostLPFC are shown in Figure 1.

(7)

basis functions including (1) linear bases with different time lags; (2) second-order polynomial bases with different time lags; and (3) sixth-order Fourier bases with different time lags, to ensure that the observed signal-dependent noise phenomenon does not depend on the selection of the projection space. Any non-random relationship between the estimated signal E(xtDUt{1) and the

estimated variance of the noise var(xtDUt{1) indicates the

presence of signal-dependent noise. In particular, we tested if there exists a significant correlation between ½E(xtDUt{1)2 and

var(xtDUt{1). We also investigated the correlation between the empirically estimated signal, ½E(xtDUt{1)2, and the model estimate of the signal, ^aaxt{1; and the correlation between the

empirically estimated variance of the noise,var(xtDUt{1), and the

variance of the noise estimated by the model,^cc2_z_^_b_b2_x2

t{1, where

^ a

a,^bband^ccare estimates of the model parameters, to test whether there is good concordance between the model and the empirical data analysis with respect to the signal and noise.

Granger causal analysis of fMRI BOLD signals. The PPI analyses described above do not show the directionality of the influences, as they are based on correlations, and for that reason we applied Granger causal analysis to each pair of the four brain areas (OFC, AntINS, AntLPFC and PostLPFC) [48]. We extracted the mean BOLD signals from33voxels within a sphere of radius2voxels centered at the seed voxels in OFC and AntINS, and the peak voxels identified with the largest PPI effect in

AntLPFC and PostLPFC, for Granger causal analysis. For each of the two experimental conditions (attention to intensity vs attention to pleasantness), the time series for a single subject consisted of9

trials, each with18 BOLD signal data points (2 sapart), starting on each trial at the onset of the instruction to pay attention to the pleasantness or to the intensity of the taste. Each trial was denoised by wavelet using a Matlab routine,mswden, with the Daubechies 2 (db2) wavelet, and threshold options sqtwolog (universal threshold at sqrt½2 log(:)) and sln (rescaling using a single estimation of level noise, based on first level coefficients). Each trial was also detrended and centered to zero mean before causal analyses. For each experimental condition and each pair of the four brain areas, we pooled data from all subjects (12|9trials) to fit the signal-dependent noise model, i.e., we treated the 108 trials as repeated realizations from a common underlying model. We detected unidirectional causal influences as well as significant difference of the causalities in opposite directions [2] to identify the dominant causal influences in a particular direction [48]. We also applied classical GC to the same data set as a comparison.

Results

Simulation results

Figure 2 shows a comparison of performance for the classical Granger causal model and the Granger causal model with signal-dependent noise by ROC (receiver operating characteristic)

Figure 1. Results of the PPI analysis.A. The seed areas for the PPI analysis in the orbitofrontal cortex (1) [26, 14,220], and insular taste cortex (2) [42, 18, 214]. B. The region of the anterior lateral prefrontal cortex (AntLPFC) [240, 54, 14] identified by PPI analysis as correlated with the orbitofrontal cortex seed area when attention was to pleasantness (pv0:029). C. The region of the posterior lateral prefrontal cortex [238, 34, 14] identified by PPI analysis as correlated with the insular taste cortex seed area when attention was to intensity (pv0:049). The full details of the PPI analysis are provided in [47].

(8)

analysis. Clearly, classical GC cannot capture the causal influences well in the presence of dependent noise, while the signal-dependent noise Granger causal model substantially outperforms the classical GC model, and shows a good sensitivity and specificity. Figure 3A shows the mean BOLD signals calculated across the trials for the three firing rates before downsampling. As expected, higher firing rates evoked larger mean modelled haemodynamic responses. However, the variability of the modelled BOLD response was considerable, as illustrated for the mean firing rate of 40 Hz in Figure 3A for 10 randomly selected trials. Figure 3B shows the relation between the empirically estimated variance of the noise and the squared empirically estimated signal at different time points within a trial, using the projection space spanned by the second-order polynomial basis, i.e.,spanfxt{1,x2

t{1,xt{2,x2t{2g. This shows that

the variance of the noise at any point in the time course of a trial is approximately linearly related to the square of the signal. We obtained consistent results using different projection spaces. Consistent results with those just described can also be obtained with a simpler model in which the spike trains are convolved with the canonical haemodynamic response function to generate the BOLD signal, as described previously [73,74]. This simple generative model of BOLD signals thus confirms that Poisson spike trains could produce fMRI BOLD time series in which the variance of the noise across the time course of a trial would be linearly related to the squared signal. We show below that this is also exactly what was found empirically in the fMRI data.

Empirical evidence for signal-dependent noise in BOLD signals

Figure 4 shows the empirically estimated variance of the noise in the fMRI BOLD time series obtained in this investigation as a

function of the squared empirically estimated signal at each time point within a trial, using the projection space spanned by the second-order polynomial basis, i.e., spanfxt{1,x2

t{1,xt{2,

x2

t{2g. Significant correlations are observed for both experimental

conditions by pooling data from the four brain regions (attention to intensity, C~0:56, p~4:17|10{6_{, attention to pleasant,}

C~0:42,p~7:75|₁₀{4_{), which clearly indicates the presence of}

signal-dependent noise in the fMRI BOLD time series. In particular, the results shown in Figure 4 show that the variance of the noise in BOLD time series is approximately linearly related to the squared signal. A similar effect was also found for each brain region when analyzed separately. The results were consistent using different projection spaces. In particular, we observed significant correlations when the project space was spanned by (1) linear bases up to 9 time lags; (2) second-order polynomial bases up to 6 time lags; and (3) sixth-order Fourier bases up to 2 time lags. These results provide strong evidence for the presence of signal-dependent noise in fMRI BOLD time series. Moreover, when fitting our signal-dependent noise model to the real data, we observed excellent concordance and significant correlation between the empirically estimated signal,½E(xtDUt{1)2, and the model estimate of the signal, ^aaxt{1 (attention to intensity,

C~0:35, p~4:00|10{3_{, attention to pleasantness,} _C_~_0:78_,

p~2:88|10{14_{), and between the empirically estimated variance}

of the noise,var(xtDUt{1), and the variance of the noise estimated

by the model, ^cc2_z_^_b_b2_x2

t{1 (attention to intensity, C~0:56,

p~1:17|10{6_{, attention to pleasantness,} _C_~_0:67_, _p_~_1:90_| 10{9_{). The results were also consistent using different projection}

spaces. This indicates that the AR-BEKK model is a good

Figure 2. Comparison by simulations of the performance of the classical Granger causal analysis and the Granger causality with signal-dependent noise analysis by ROC (receiver operating characteristic) analysis.The sensitivity of the methods is plotted against 1{specificity for differentp-value thresholds. The sensitivity is defined as the proportion of actual causal influences that are correctly identified. The specificity measures the proportion of non-causal influences that are correctly identified. By setting differentp-value thresholds for causality, each method gives different sensitivity and specificity. Therefore, the best model is expected to have its performance ROC curve go through the upper left corner, while a random classification algorithm has its performance curve as a diagonal line. The signal-dependent noise model outperforms the classical Granger causal model substantially and consistently.

(9)

description of the data and captures a large portion of the variance in the empirical signal and noise.

fMRI data investigation

Table 1 shows the causal influences between the four brain areas (OFC, AntINS, AntLPFC, PostLPFC) detected by the Granger causality with signal-dependent noise analysis. First, we consider attention to intensity. There are significant (top-down) causal influences from both the AntLPFC and PostLPFC to the insular taste cortex (AntINS). Second, we consider attention to pleasantness.

There are significant (top-down) causal influences from both the AntLPFC and PostLPFC to the OFC, and a significant effect from the OFC to the antLPFC. There is also a (top-down) effect of the PostLPFC on the taste insula (AntINS). Very interestingly too, during attention to pleasantness, there is increased effective connectivity from the insular taste cortex to the OFC, suggesting that information is routed especially to the OFC during attention to pleasantness.

For comparison, Table 2 shows the causal influences between the four brain areas detected by the classical Granger causal model. Only one effective connectivity influence (PostLPFC to

Figure 3. An illustrative model of signal-dependent noise in BOLD signals.A. The mean BOLD signals for the different time points within a trial (calculated across the trials) for three firing rates, with mean rates of 5, 40 and 80 spikes/sec. 10 randomly selected trials of the BOLD signals with the input firing rate of 40 Hz are also shown (gray). B. The empirically estimated variance of the noise in the simulated BOLD time series plotted against the squared empirically estimated signal using the projection space spanned by the second-order polynomial basis, with the input firing rate of 40 Hz. The relation is approximately linear.

(10)

AntLPFC, when paying attention to intensity) was identified as significant. The greater power of the signal-dependent noise model can be clearly observed.

Table 3 shows the difference of the causalities in opposite directions by the Granger causality with signal-dependent noise analysis. In the pleasantness condition, consistent with the hypothesis that the lateral prefrontal cortex is the source of the top-down modulation of activations in the OFC, there are significantly stronger effects from both the AntLPFC and the PostLPFC to the OFC than vice versa. It is also of interest that in the pleasantness condition, a significantly stronger forward influence was detected from the antINS to the OFC. Only one significant difference was detected for the intensity condition, that is the effect from the PostLPFC to AntINS is greater than in the reverse direction. This is consistent with the hypothesis that the major top-down effect on the taste insula during attention to intensity is from the PostLPFC. The bi-directional interaction in the pleasantness condition between the AntLPFC and OFC (Table 1) may be interpreted in the context that there is a significant difference of the causality with AntLPFC to OFC greater than OFC to AntLPFC, thus indicating a stronger influence of AntLPFC on OFC than vice versa (Table 3).

These analyses provide evidence for the effective connectivities in the attention to intensity and pleasantness conditions that are summarized in Figure 5.

Discussion

In this paper, we for the first time provide empirical evidence for signal-dependent noise in fMRI BOLD signals in several cortical areas, such as the insular, orbitofrontal, and lateral prefrontal cortical areas. We then developed a Granger causal model with

signal-dependent noise that can appropriately model BOLD signals and detect causal influences in both mean and variance. By simulation studies, we showed that our Granger causality with signal-dependent noise analysis substantially outperforms classical Granger causal analysis, when signal-dependent noise is present in the time series. We applied our Granger causal model with signal-dependent noise to the data from an fMRI study to investigate the source of the top-down attentional influences on taste processing when attention was to the intensity vs the pleasantness of the taste. We found a top-down effect from the PostLPFC to the insular taste cortex during attention to intensity but not to pleasantness; and a top-down effect from the AntLPFC and PostLPFC to the OFC during attention to pleasantness but not to intensity. In addition, there was stronger forward effective connectivity from the insular taste cortex to the OFC during attention to pleasantness than during attention to intensity.

Assessment of the measurement of Granger causality taking into account signal-dependent noise

Conditionally heteroskedastic data often show volatility cluster-ing and outliers. In particular, the unconditional distribution of the data is leptokurtic, which means that it has more mass around zero and in the tails than the normal distribution and, hence, it can produce occasional outliers [27]. Therefore, models with time varying volatility can better capture the nature of the data, and it is expected that more reliable causal inferences can be made. Comparing to the earlier approaches of causal inferences in data with time varying volatility [75–78], including [13], the model presented in this paper that takes into account signal-dependent noise provides an accurate, efficient and unified method to detect causality in both the mean and variance. The model has a corresponding frequency domain representation [13], which may

Figure 4. Empirical evidence for signal-dependent noise in BOLD signals.For each of the four brain areas and each subject, the empirically estimated variance of noise in the observed fMRI BOLD time series is plotted against the squared empirically estimated signal for each time point within a trial using the projection space spanned by the second-order polynomial basis.Utis defined in the text, and reflects the past history of times eriesxt. Significant correlations are observed for both experimental conditions (attention to intensity,C~0:56,p~4:17|10{6_{, and attention to} pleasantness,C~0:42,p~7:75|10{4_).

(11)

further shed light on frequency-specific interactions. The model described here applies when the variance of the noise is proportional to the square of the signal, which is what we observed from real fMRI BOLD time series, but could in principle be extended to deal with other cases. There are alternative measures of Granger-type causality such as partial directed coherence (PDC) [79], relative power contribution (RPC) [80] and directed transfer function (DTF) [81] that do not explicitly use the noise covariance function to define causality but are based on the transfer function or model coefficients. However, because they are typically formulated under the simple AR model, all these methods are unable to capture causal influences in the second order statistics such as signal-dependent noise.

It is not easy to tease apart ‘signal’ and ‘noise’ from an observed time series. In this paper, we define ‘signal’ as the part of the observations that can be well predicted from the past history of the time series, and ‘noise’ as what is completely unpredictable and produces the variation across realizations. We therefore empirically

estimate the signal by projecting the current state of the time series onto the subspace spanned by its past history. In practice, if the projection space is not constructed appropriately, part of the signal that does not lie in the projection space may migrate to the residual process and produce artificial signal-dependent noise phenomena. In this case, expanding the projection space, i.e., using a more complex model to describe the mean activity of the time series, may mitigate the issues of signal-dependent noise. However, if the variance of the noise is indeed dependent on the signal, simply increasing the complexity of the model in the mean structure will not remove this dependence. In the present paper, we investigated a number of projection spaces, spanned by linear or nonlinear basis functions with different time lags, and always observed signal-dependent noise. Therefore, there is strong evidence for the presence of signal-dependent noise in fMRI BOLD time series. In particular, the variance of the noise is approximately linearly related to the square of the signal. When constructing our signal-dependent noise model, we made use of this relationship and specified a linear Table 1.Causality results by Granger causality with signal-dependent noise analysis.

Intensity

OFC AntINS AntLPFC PostLPFC

OFC – 4.51 (0.10) 0.96 (0.62) 0.68 (0.71)

AntINS 2.15 (0.34) – 28.57 (v10{4₎ _{13.59 (}_1:1_|₁₀{3₎

AntLPFC 0.89 (0.64) 28.44 (v10{4₎ _– _{5.17 (0.08)}

PostLPFC 0.62 (0.73) 29.62 (v10{4₎ _{4.22 (0.12)} _–

Pleasantness

OFC – 16.92 (2|10{4₎ _{44.94 (}_v₁₀{4₎ _{7.12 (0.03)}

AntINS 94.79 (v10{4₎ _– _{2.09 (0.35)} _{17.56 (}₂_|₁₀{4₎

AntLPFC 95.08 (v10{4₎ _{6.72 (0.035)} _– _{6.93 (0.03)*}

PostLPFC 89.16 (v10{4₎ _{22.81 (}_v₁₀{4₎ _{5.70 (0.06)} _–

The causal influence is from row to column. The causality is given for each direction, and the correspondingp-value is presented in brackets. If the uncorrectedp-value is less than10{4_{(surviving Bonferroni correction), the causal influence is identified as significant and indicated in bold in the table. (*: The active-set algorithm was used.)}

doi:10.1371/journal.pcbi.1003265.t001

Table 2.Causality results by classical Granger causal analysis.

Intensity

OFC – 0.0039 (0.0061) 0.0006 (0.30) 0.0001 (0.62)

AntINS 0.0015 (0.09) – 0.0002 (0.56) 0.0003 (0.42)

AntLPFC 0.0000 (0.99) 0.0056 (0.0009) – 0.0043 (0.0041)

PostLPFC 0.0003 (0.43) 0.0122 (v10{4₎ _{0.0013 (0.12)} _–

Pleasantness

OFC – 0.0000 (0.99) 0.0001 (0.71) 0.0000 (0.98)

AntINS 0.0001 (0.74) – 0.0023 (0.03) 0.0018 (0.06)

AntLPFC 0.0000 (0.99) 0.0002 (0.51) – 0.0059 (0.0007)

PostLPFC 0.0008 (0.22) 0.0034 (0.0099) 0.0002 (0.59) –

The causal influence is from row to column. The causality is given for each direction and the correspondingp-value is presented in brackets. If the uncorrectedp-value is less than10{4_{(surviving Bonferroni correction), the causal influence is identified as significant and indicated in bold in the table.}

(12)

function to describe the mean activity of the observations, and a quadratic function for the variance of the noise. Although the model appears to be simple, we have shown that it captures a large portion of the variance in the signal and noise in the empirical BOLD time series. Future studies will be of interest to provide more evidence on signal-dependent noise in different brain areas and different data sets, to further examine the relationship between the variance of the noise and the signal, and to develop more complex models, e.g., using nonlinear functions or kernels, accordingly.

Although we only applied our model to fMRI time series, it is clear that the model can be applied to very many types of data that might exhibit signal-dependent noise, including neurophysiologi-cal data such as single or multi-neuron recordings, magnetoen-cephalography, local field potentials, and beyond neuroscience also to any possibly causal system where there are time series of data from two or many sources. Indeed, the significance of detecting causality from data with time varying volatility might be partly demonstrated in the 2003 Nobel Prize in Economics shared by Granger, who set up the foundation of Granger causal analysis

[6], and Engle, who invented the first changing volatility model [31].

Although our initial implementation of the signal-dependent noise model appears to be successful, due to the highly nonlinear form of the log-likelihood function and optimization problem, fast and robust optimization algorithms deserve future investigation. Also, although a low-order low-dimensional AR-BEKK model is a relatively parsimonious representation of the conditional covari-ance structure of a process, the number of parameters still grows quickly with the dimension of the underlying system. This impedes the application of the model to a modest number of time series. Future studies are needed to find more restricted models that ensure uniqueness of the parameterization, guarantee the positive definiteness of the conditional covariance, while at the same time still produce rich dynamics.

In spite of the wide and successful applications in neurophys-iological data, there is still an ongoing debate on applying GC to fMRI data [10,82–87]. Inferring causality from fMRI time series — an indirect measure of neuronal activities – imposes many more challenges than direct electrophysiological recordings. Granger causal models use the observed fMRI data as a surrogate for the underlying neuronal activity, which is a potential flaw of the method and the main controversy against the application of GC to fMRI data, since the BOLD signal is a blurred and delayed representation of the original neuronal signal, and it is now widely recognized that there is intra- and inter-subject variability of haemodynamic responses [88–91]. However, there have been a series of numerical and theoretical works showing that GC is quite robust to the difference in haemodynamic delays [92–94]. Moreover, as in [4], we calculated the cross-correlation function for each pair of time series used in our Granger causal analysis, and most of the cross-correlation peaks appeared at zero lag, indicating that differences in the regional haemodynamic responses may not be a significant factor in this study. We therefore feel that the application of Granger type causal inferences in the analysis of this particular fMRI data set is justified. However, given the complexity of the brain, much work Table 3.Difference of the causalities in opposite directions

by the Granger causality with signal-dependent noise analysis.

Direction Intensity Pleasantness

PostLPFCRAntLPFC – AntLPFCRPostLPFC 20.48 20.614*

PostLPFCRAntINS – AntINSRPostLPFC 8.02 2.62

PostLPFCROFC – OFCRPostLPFC 20.03 41.03

AntLPFCRAntINS – AntINSRAntLPFC 20.06 2.32

AntLPFCROFC – OFCRAntLPFC 20.03 25.07

AntINSROFC – OFCRAntINS 21.18 38.93

Differences with ap-value smaller than 0.01 (a difference measure greater than 4.61) are indicated in bold. (*: We used the active-set algorithm for this particular link.)

doi:10.1371/journal.pcbi.1003265.t003

Figure 5. Neural circuits revealed by Granger causality with signal-dependent noise.A. Attention to taste intensity. B. Attention to taste pleasantness. Larger arrows represent a stronger influence. The values of significant likelihood ratio test statistics are indicated.

(13)

remains to do to provide reliable and accurate causal analyses for neuroscience.

Neural interpretation

The interpretation of the effective connectivity revealed with our signal-dependent noise model is that during attention to pleasantness, the AntLPFC and PostLPFC regions identified by PPI analysis exert a top-down control of the responsiveness of

the OFC to its taste-related inputs, and indeed to how strongly information is routed to the OFC from its preceding area, the AntINS taste cortex. In contrast, during attention to intensity, the PostLPFC identified by PPI analysis exerts a top-down control of the responsiveness of the insular taste cortex to its taste-related inputs. This interpretation is strengthened by the findings with our componential Granger causal analysis [48], which provides evidence that the top-down effects depend on

Figure 6. A Biased activation theory of selective attention.The short-term memory systems that provide the source of the top-down activations may be separate (as shown), or could be a single network with different attractor states for the different selective attention conditions. The top-down short-term memory systems hold what is being paid attention to active by continuing firing in an attractor state, and bias separately either cortical processing system 1, or cortical processing system 2. This weak top-down bias interacts with the bottom up input to the cortical stream and produces an increase of activity that can be supralinear [98]. Thus the selective activation of separate cortical processing streams can occur. In the example, stream 1 might process the affective value of a stimulus with the areas involved including the anterior lateral prefrontal cortex with a top-down influence on the orbitofrontal cortex, and stream 2 might process the intensity and physical properties of the stimulus with the areas involved including the posterior lateral prefrontal cortex with a top-down influence on the insular taste cortex. The outputs of these separate processing streams then must enter a competition system, which could be for example a cortical attractor decision-making network that makes choices between the two streams, with the choice biased by the activations in the separate streams [19].

(14)

the level of activity in the areas on which there is a top-down effect.

The way that we think of top-down biased competition as operating normally in, for example, visual selective attention [95] is that within an area, e.g. a cortical region, some neurons receive a weak top-down input that increases their response to the bottom-up stimuli [95], potentially sbottom-upra-linearly if the bottom-bottom-up stimuli are weak [50,51,63]. The enhanced firing of the biased neurons then, via the local inhibitory neurons, inhibits the other neurons in the local area from responding to the bottom-up stimuli. This is a local mechanism, in that the inhibition in the neocortex is primarily local, being implemented by cortical inhibitory neurons that typically have inputs and outputs over no more than a few mm [50,51,96]. This model of biased competition is illustrated in [47]. That locally implemented biased competition situation may not apply in the present case, where we have facilitation of processing in a whole cortical area (e.g. orbitofrontal cortex) or even cortical processing stream (e.g. the linked orbitofrontal and pregenual cingulate cortex [47]) in which the activity of taste neurons may reflect pleasantness and not intensity. So the

attentional effect might more accurately be described in the present case as biased activation, without local competition being part of the effect. This biased activation theory and model of attention, illustrated in Figure 6, is a rather different way to implement attention in the brain than biased competition, and each mechanism may apply in different cases, or both mechanisms in some cases [19,47,97].

Acknowledgments

We are grateful to Dr. Wanlu Deng for many helpful discussions. We have made some of the software used in the analysis of Granger causality with signal-dependent noise available at http://www.dcs.warwick.ac.uk/ ,feng/causality.html.

Author Contributions

Conceived and designed the experiments: QL TG FG JF ETR. Performed the experiments: QL TG FG ETR. Analyzed the data: QL TG FG ETR. Contributed reagents/materials/analysis tools: QL TG FG JF ETR. Wrote the paper: QL TG FG JF ETR.

References

1. Goebel R, Roebroeck A, Kim DS, Formisano E (2003) Investigating directed cortical interactions in time-resolved fMRI data using vector autoregressive modeling and Granger causality mapping. Magnetic Resonance Imaging 21: 1251–1261.

2. Roebroeck A, Formisano E, Goebel R, et al. (2005) Mapping directed influence over the brain using Granger causality and fMRI. NeuroImage 25: 230–242.

3. Hwang K, Velanova K, Luna B (2010) Strengthening of top-down frontal cognitive control networks underlying the development of inhibitory control: a functional magnetic resonance imaging effective connectivity study. Journal of Neuroscience 30: 15535–15545.

4. Wen X, Yao L, Liu Y, Ding M (2012) Causal interactions in attention networks predict behavioral performance. Journal of Neuroscience 32: 1284–1292. 5. Wiener N (1956) The theory of prediction. Modern mathematics for engineers.

New York: McGraw-Hill: 165–190.

6. Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society 37: 424–438.

7. Ding M, Chen Y, Bressler S (2006) Granger causality: Basic theory and application to neuroscience. In: Schelter B, Winterhalder M, Timmer J, editors. Handbook of Time Series Analysis. Weinheim: Wiley-VCH.

8. Friston K, Moran R, Seth A (2012) Analysing connectivity with Granger causality and dynamic causal modelling. Current Opinion in Neurobiology 23: 1–7. 9. Ge T, Kendrick KM, Feng J (2009) A novel extended Granger causal model

approach demonstrates brain hemispheric differences during face recognition learning. PLoS Computational Biology 5: e1000570.

10. David O, Guillemain I, Saillet S, Reyt S, Deransart C, et al. (2008) Identifying neural drivers with functional MRI: an electrophysiological validation. PLoS Biology 6: e315.

11. Gerstein GL, Mandelbrot B (1964) Random walk models for the spike activity of a single neuron. Biophysical Journal 4: 41–68.

12. McAdams CJ, Maunsell JHR (1999) Effects of attention on the reliability of individual neurons in monkey visual cortex. Neuron 23: 765–773.

13. Luo Q, Ge T, Feng J (2011) Granger causality with signal-dependent noise. NeuroImage 57: 1422–1429.

14. Harris CM, Wolpert DM (1998) Signal-dependent noise determines motor planning. Nature 394: 780–784.

15. Todorov E (2005) Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation 17: 1084–1108.

16. Selen LPJ, Franklin DW,Wolpert DM (2009) Impedance control reduces instability that arises from motor noise. The Journal of Neuroscience 29: 12606– 12616.

17. Grabenhorst F, Rolls ET (2008) Selective attention to affective value alters how the brain processes taste stimuli. European Journal of Neuroscience 27: 723–729. 18. Rolls ET, Grabenhorst F, Margot C, da Silva M, Velazco M (2008) Selective

attention to affective value alters how the brain processes olfactory stimuli. Journal of Cognitive Neuroscience 20: 1815–1826.

19. Grabenhorst F, Rolls ET (2011) Value, pleasure and choice in the ventral prefrontal cortex. Trends in Cognitive Sciences 15: 56–67.

20. de Araujo IET, Kringelbach ML, Rolls ET, McGlone F (2003) Human cortical responses to water in the mouth, and the effects of thirst. Journal of Neurophysiology 90: 1865–1876.

21. Kringelbach ML, O’Doherty J, Rolls ET, Andrews C (2003) Activation of the human orbitofrontal cortex to a liquid food stimulus is correlated with its subjective pleasantness. Cerebral Cortex 13: 1064–1071.

22. Grabenhorst F, Rolls ET, Bilderbeck A (2008) How cognition modulates affective responses to taste and flavor: top-down influences on the orbitofrontal and pregenual cingulate cortices. Cerebral Cortex 18: 1549–1559.

23. Grabenhorst F, Rolls ET, Parris BA, d’Souza AA (2010) How the brain represents the reward value of fat in the mouth. Cerebral Cortex 20: 1082–1091. 24. Rolls ET, Grabenhorst F (2008) The orbitofrontal cortex and beyond: from

affect to decision-making. Progress in Neurobiology 86: 216–244.

25. Small DM, Gregory MD, Mak YE, Gitelman D, Mesulam MM, et al. (2003) Dissociation of neural representation of intensity and affective valuation in human gustation. Neuron 39: 701–711.

26. Haase L, Cerf-Ducastel B, Murphy C (2009) Cortical activation in response to pure taste stimuli during the physiological states of hunger and satiety. NeuroImage 44: 1008–1021.

27. Lu¨tkepohl H (2005) New introduction to multiple time series analysis. Cambridge Univ Press.

28. Geweke JF (1982) Measurement of linear dependence and feedback between multiple time series. Journal of the American Statistical Association 77: 304–313. 29. Geweke JF (1984) Measures of conditional linear dependence and feedback between time series. Journal of the American Statistical Association 79: 907–915. 30. Barnett L, Bossomaier T (2012) Transfer entropy as a log-likelihood ratio.

Physical Review Letters 109: 138105.

31. Engle R (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica: Journal of the Econometric Society : 987–1007.

32. Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31: 307–327.

33. Taylor S (1986) Modelling financial time serie. World Scientific 113: 266. 34. Engle R, Granger CWJ, Kraft D (1984) Combining cos. mpeting forecasts of

inflation using a bivariate ARCH model. Journal of Economic Dynamics and Control 8: 151–165.

35. Bollerslev T, Engle RF, Wooldridge J (1988) A capital asset pricing model with time-varying covariances. The Journal of Political Economy : 116–131. 36. Diebold FX, Nerlove M (1989) The dynamics of exchange rate volatility: a

multivariate latent factor ARCH model. Journal of Applied Econometrics 4: 1– 21.

37. Baba Y, Engle RF, Kraft DF, Kroner KF (1991) Multivariate simultaneous generalized ARCH. Discussion paper, University of California, San Diego, Department of Economics 11: 122–150.

38. Engle RF, Kroner KF (1995) Multivariate simultaneous generalized ARCH. Econometric theory 11: 122–150.

39. Bressler SL, Tang W, Sylvester CM, Shulman GL, Corbetta M (2008) Top-down control of human visual cortex by frontal and parietal cortex in anticipatory visual spatial attention. The Journal of Neuroscience 28: 10056– 10061.

40. Hamilton JP, Chen G, Thomason M, Schwartz M, Gotlib I (2010) Investigating neural primacy in major depressive disorder: multivariate Granger causality analysis of resting-state fMRI time-series data. Molecular Psychiatry 16: 763– 772.