A Bayesian spatiotemporal model for very large data sets

July 8, 2017 | Autor: Gary Green | Categoría: Algorithms, Statistical Analysis, Magnetic Resonance Imaging, Functional MRI, Linear models, Brain, Humans, Variational Bayes, Computer Simulation, Sampling methods, Neuroimage, Regression Analysis, Three Dimensional Imaging, Spatial Information, Approximate Inference, Bayesian model, Time Factors, General Linear Model, Comparative modeling, Large Data Sets, Spatial Correlation, Bayes Theorem, Normal Distribution, Serial Correlation, Gaussian Markov Random Field, graph Laplacian, Random Field, Brain, Humans, Variational Bayes, Computer Simulation, Sampling methods, Neuroimage, Regression Analysis, Three Dimensional Imaging, Spatial Information, Approximate Inference, Bayesian model, Time Factors, General Linear Model, Comparative modeling, Large Data Sets, Spatial Correlation, Bayes Theorem, Normal Distribution, Serial Correlation, Gaussian Markov Random Field, graph Laplacian, Random Field

Share Embed

Laporkan tautan ini

Descripción

NeuroImage 50 (2010) 1126–1141

Contents lists available at ScienceDirect

NeuroImage j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / y n i m g

Technical Note

A Bayesian spatiotemporal model for very large data sets L.M. Harrison ⁎, G.G.R. Green York Neuroimaging Centre, The Biocentre, York Science Park, University of York, YO10 5DG, UK

a r t i c l e

i n f o

Article history: Received 5 September 2009 Revised 5 December 2009 Accepted 9 December 2009 Available online 21 December 2009 Keywords: fMRI Spatiotemporal models Gaussian Markov random ﬁelds Graph–Laplacian Conditional autoregressive priors Spatially-informed voxel-wise priors Variational Bayes

a b s t r a c t Functional MRI provides a unique perspective of neuronal organization; however, these data include many complex sources of spatiotemporal variability, which require spatial preprocessing and statistical analysis. For the latter, Bayesian models provide a promising alternative to classical inference, which uses results from Gaussian random ﬁeld theory to assess the signiﬁcance of spatially correlated statistic images. A Bayesian approach generalizes the application of these ideas in that (1) random ﬁelds are used to model all spatial parameters, not solely observation error, (2) their smoothness is optimized, and (3) a broader class of models can be compared. The main problem, however, is computational, due to the large number of voxels in a brain volume. Sampling methods are time-consuming; however, approximate inference using variational Bayes (VB) offers a principled and transparent way to specify assumptions necessary for computational tractability. Penny et al. (2005b) described such a scheme using a joint spatial prior and approximated the joint posterior density with one that factorized over voxels. However, a further computational bottleneck is encountered when evaluating the log model evidence used to compare models. This has lead to dividing a brain volume into slices and treating each independently. This amounts to approximating the spatial prior over a full volume with stacked 2D priors. That is, smoothness along the z-axis is not included in the model. Here we describe a VB scheme that approximates the zero mean joint spatial prior with a non-zero mean empirical prior that factors over voxels, thereby overcoming this problem. We do this by modifying the original VB algorithm of Penny et al. using the conditional form of a so-called conditional autoregressive (CAR) prior to update a marginal prior over voxels. We refer to this as a spatially-informed voxel-wise prior (SVP) and use them to spatially regularise general linear model (GLM) and autoregressive (AR) coefﬁcients (over time to model serial correlations). This algorithm scales more favourably with the number of voxels providing a truly 3D spatiotemporal model over volumes containing tens of thousands of voxels. We compare the scaling of compute times with the number of voxels and performance with a joint prior applied to synthetic and single-subject data. © 2009 Elsevier Inc. All rights reserved.

Introduction There has been much interest in modelling the spatial response of fMRI over the past decade (Flandin and Penny, 2007; Friman et al., 2003; Gossl et al., 2001; Harrison et al., 2008a; Hartvig and Jensen, 2000; Kiebel et al., 2000; Kim et al., 2005; Penny et al., 2005b; Sole et al., 2001; Tabelow et al., 2006; Wink and Roerdink, 2004; Woolrich et al., 2004b). In particular, a number of authors have suggested Bayesian models using ideas from Gaussian process priors (GPPs) (Groves et al., 2009; Harrison et al., 2007), Gauss-Markov random ﬁelds (GMRF) (Descombes et al., 1998; Gossl et al., 2001; Penny et al., 2005b; Woolrich et al., 2004b), and spatial bases (Flandin and Penny, 2007; Harrison et al., 2008a). Beneﬁts of the Bayesian approach include making explicit all assumptions underlying a model and that inference proceeds naturally over three levels producing posterior densities over parameters, hyperparameters (including smoothness), and the model itself. The second level of inference enables the ⁎ Corresponding author. E-mail address: [email protected] (L.M. Harrison). 1053-8119/$ – see front matter © 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2009.12.042

optimization of spatial smoothness and the third allows comparison of different models (Rosa et al., 2010). The classical analogue of the third level is the F-contrast (Friston et al., 1995b) used to compute maps over physical space, which quantify the additional explanatory power of a full compared to reduced design matrix.1 Starting from the perspective of continuous random ﬁelds, exact Bayesian inference is possible using GPPs (Bishop, 2006; MacKay, 2003; Rasmussen and Williams, 2006) over observation error and spatial parameters, e.g., general linear model (GLM) coefﬁcients, which we refer to as a Gaussian process model (GPM). Point estimates of covariance parameters, e.g., smoothness, are typically optimized using gradient ascent on the (log) marginal likelihood. This is an elegant scheme; however, the cost is increased computation, as representing spatial dependence involves manipulating covariance matrices with the same number of dimensions as there are measurements in time and space throughout a brain volume. This 1 This is an example of a nested model; however, the Bayesian approach generalizes this to non-nested models, e.g., different parametric designs which cannot be combined in one design matrix.

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

has limited their application to one or two slices (∼4000 voxels) of fMRI data (Harrison et al., 2008a) or ∼1000 voxels for Arterial Spin Labelling (ASL) data (Groves et al., 2009). If only maximum a posteriori estimates of spatial parameters are required, this reduces to solving a linear system of equations (see ﬁrst line of Eq. (3)), for which there are many efﬁcient schemes, e.g., relaxation methods (Ashburner, 2007; Press et al., 2007). However, if inferences are to be made about these then a measure of uncertainty over physical space has to be optimized. Herein lays a computational bottleneck, which requires approximations to facilitate a practical application of the Bayesian approach. The main point here is that exact inference is not feasible and approximations are crucial. The challenge therefore is to determine the appropriate set of assumptions, given the resources, data, and questions neuroimagers would like to ask of them. Such constraints have lead to the classical approach to fMRI data, which uses the so-called mass-univariate approach, where OLS estimates of GLM coefﬁcients are computed for each voxel independently. This produces spatial maps of cortical responses or statistical parametric maps (SPMs) from which statistic images such as T maps can be computed. Assessing the signiﬁcance of statistic images requires choosing a threshold above which a voxel is declared active, which is a non-trivial problem due to spatial correlations in the data. A way to address this is to use results from Gaussian random ﬁeld (GRF) theory to map densities over a continuous function in space to discrete topological features such as the number of clusters or blobs above threshold (Worsley et al., 1996), referred to as the RF-correction (Friston et al., 2006). This is used to control the false positive rate of features (as opposed to voxels) and requires estimating the smoothness of residual errors, which are assumed to be a sample from a continuous random ﬁeld. This is possible if the continuous ﬁeld is adequately approximated using this discrete sample, referred to as a good lattice approximation, as in practice measurements are made at discrete locations, e.g., on a lattice throughout physical space. Smoothness of the residuals relates to a discrete number of resolution elements (RESELs) closely linked to the effective number of observations. Given an adequate estimate of this, a statistical threshold can be computed given a result from GRF theory relating the number of RESELs and threshold required to achieve an expected number of false positive features. For example, given independent samples on a 100-by-100 grid smoothed by a Gaussian kernel with full width half maximum of 10 pixels, the number of RESELs is 100 and a Z threshold of 3.8 is required to ensure that the probability of observing one of more blobs above this is 0.049. As such the RF-correction provides a principled way to assess the signiﬁcance of spatially correlated SPMs. Such an approach can be motivated from a GPM, where (1) the same random ﬁeld is assumed over observation errors and spatial parameters of a GLM, (2) smoothness is approximated given the residuals, not optimized, and (3) the smoothness is used to assess statistic images in a classical framework as opposed to using the posterior density over spatial parameters. The latter relies on theoretical results from continuous random ﬁelds and a good lattice approximation, which motivates smoothing data so as to increase the validity of its assumptions. Practically, this means that the smoothness of data should be on a scale approximately three times the voxel size. This is a two-stage procedure, where the second corrects for assuming spatial independence at the ﬁrst. The important point is that this pragmatic and generic framework is the consequence of one possible set of decisions starting from a GPM. This motivates considering others that preserve the Bayesian framework. In particular, that generalizes the use of random ﬁelds over all spatial parameters (not solely observation error), optimizes their smoothness, and explores model space using established techniques in Bayesian model comparison. A continuous formulation is very ﬂexible in that the model is not constrained to ﬁxed locations and can be used to make predictions in regions where measurements have not been taken (Cressie, 1993).

1127

However, given the density of measurements of fMRI data over space, this may not be necessary and a discrete formulation based on GMRF priors (Rue and Held, 2005) may be sufﬁcient. Sampling methods have been used to estimate such models for fMRI data (Gossl et al., 2001; Woolrich et al., 2004b), however, compute times can be several hours for one slice. This has motivated authors (Penny et al., 2005b; Woolrich et al., 2004a) to use a variational approach (Bishop, 2006; MacKay, 2003) to make approximate instead of exact inference. Here we follow the work of Penny et al. (2007) who used joint GMRF priors to spatially regularize the GLM and autoregressive (AR) coefﬁcients (GLM-AR model) (Penny et al., 2003). They approximated the joint posterior density of coefﬁcients throughout space using the so-called mean ﬁeld approximation (MFA) (Ghahramani and Beal, 2000) to achieve an approximate posterior density that factored over voxels. This addresses computational issues associated with the ﬁrst and second levels of inference; however, a further bottleneck is encountered when evaluating a lower bound to the log model evidence, which is important for comparing models. The reason for this is that the eigenvalues of the spatial precision matrix are required, which limits the number of voxels in any one model to a few thousand and has lead in practice to approximating the prior with one that factors over slices. An alternative to GMRF priors is to use an explicit spatial basis (Flandin and Penny, 2007); however, in general, their computation can be prohibitive.2 The key to a scheme that enables a 3D prior over the full volume is to avoid such computations. This can be achieved using a conditional autoregressive (CAR) prior (Besag and Kooperberg, 1995; Congdon, 2001) to update a marginal density at each voxel, which we refer to as a spatially-informed voxel-wise prior (SVP). This enables their application to be extended to very large data sets. The aim of this work is to address the computational issue at the third level of inference. Our contribution is to introduce SVPs and use them in a modiﬁcation of Penny's spatial GLM-AR model to achieve a truly 3D spatiotemporal model over a full volume. The paper is divided into three sections. The background includes two asides to show how an underlying GPM relates to (1) classical Random Field Theory (RFT) and (2) previous work on Bayesian 2D fMRI analysis. A discursive path through these models not only provides insight into links between them but also serves to motivate the current approach. We take time to consider a GPM and the need for approximations. The classical approach and joint prior with MFA are both motivated as such and the cause of the computational bottleneck using the latter identiﬁed. This motivates replacing the joint prior with SVPs, which we derive in the Theory section. These lead to a tractable expression for the log evidence at each voxel in contrast to a joint spatial prior and enable the analysis of full volumes without assuming independence over slices. We then show using synthetic data that compute times scale more favourably with the number of voxels for SVPs and compare their performance with the MFA approach using synthetic and real data from a single subject. Notation →

We use A = vec(A), where A ∈ Rr×c is a matrix is size r × c. The indices i, j, k, n will always be used to indicate row, column, level of hierarchy, and general index. These are used in sub and superscripts, (k) i.e., Ai,j(k) = A(level) row,column, and the i-th row and j-th column written Ai,⁎ and A⁎,j(k), respectively. Row and column covariances of priors, at the k-th level are S(k) and K(k). The expectation of a function, f(x), with respect to the probability density function (pdf) q(x) is written, 〈f(x)〉x or xˆ = 〈x〉x when using the identity function. The same expression 2 This is not an issue if the basis is known, e.g., discrete sine or cosine set which are the eigenmodes of a Laplacian matrix on a regular graph (depending on the boundary conditions); however, if an irregular graph is used, then eigenmodes have to be computed.

1128

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

R written using the integral symbol will appear as qðxÞf ðxÞ. Likelihood x

and prior pdfs are denoted by p(x), while q(x) is used for posterior pdfs. ― An estimated variable is denoted by a . A multivariate normal density over a random vector of length r with mean and covariance, μ, ∑ is denoted by Nr(x; μ, ∑). Nr,c stands for a matrix-variate normal (MVN) density, where the matrix A ∈ R r×c, has pdf,3 p(A) ∼ Nr,c(M, S ⊗ K), with mean, M, of size r × c, and two covariances, S and K, of size r × r and c × c, for rows and columns, respectively. We use the expression a−1 Gaðα; a; bÞ = Cð1aÞ α ba exp − αb to represent a gamma density.

Using Bayes rule for Gaussian densities (see Appendix I), closed form expressions are known for the posterior mean and precision of β → ˆ = Π − 1 Z T Σð1Þ − 1 y β Π = Σð2Þ

−1

+ Z T Σð1Þ

−1

Where Z = IN ⊗ X, ∑(k)(α) = K(k) ⊗ S(k), y = vec(Y) and α = (ρ(1), ρ(2), σ(1), σ(2)). Point estimates of covariance parameters, α, can be optimized using gradient ascent on the (log) marginal likelihood,5 F(α) = log p(y|α) using

Background Here we consider a GPM along with classical RFT and the use of the MFA given a GMRF prior. In summary the classical approach can be derived from a GPM. However, this requires assuming that the spatial covariance structure of observation noise and GLM parameters are the same. This brings with it the computational saving that enables its use on large volumes of data. If, however, we choose not to accept this assumption then an alternative means of achieving a pragmatic algorithm is needed. A possible solution is to use the MFA. However, if we would like to use the powerful machinery of Bayesian model comparison, this only provides a partial solution. The procedure described in the Theory section is a way to augment the MFA approach and thereby achieve a full solution. The mathematical details supporting this line of reason are considered next, though can be omitted on ﬁrst reading. Gaussian process models Given observations at discrete points in space and time and using a GLM with spatial priors over observation error and GLM coefﬁcients, a two-level hierarchical model under Gaussian assumptions can be written using matrix-variate normal densities ð Þ

Y = Xβ + e 1 ð2Þ β=e

Z

ð1Þ ð1Þ pðY jβÞ = NT;N Y; Xβ; S K pðβÞ = Nr;N β; 0; Sð2Þ K ð2Þ

ð1Þ

Here, Y is a T × N data matrix, X is a T × r design matrix with an associated unknown parameter matrix, β of size r × N. Errors at both levels are assumed to be sampled from a spatial Gaussian process, that (k) is, for N samples the joint distribution p(ɛ(k) i (v1),…,ɛi (vN)), where the subscript indicates a point in time or regressor depending on the level (k), is multivariate Gaussian. For a zero mean GPP, this is deﬁned by the covariance function K(k)(vi, vj) (Rasmussen and Williams, 2006). The consequence of using a continuous function is that it leads to a continuous random ﬁeld formulation. This is to be contrasted with discrete random ﬁelds where a covariance matrix is used at a ﬁxed set of points in space and for which there may not be a continuous function from which to compute matrix elements. The continuous formulation is more ﬂexible in that the covariance function can be used to compute a covariance matrix over an arbitrary set of points in space and to make predictions at locations where measurements have not been taken. Note that the covariances S(1) and S(2) are over time and regressors, respectively. Assuming stationarity, i.e., K(k)(vi, vj) =K(k)(dij) where dij = |vi −vj| is the modulus of the distance between two voxels, and a Gaussian form, the covariance functions are parameterized by power and 2 2 smoothness4 (ρ(k) and σ(k) ) 2 2 dij ðkÞ dij = ρ exp − 2 2σ ðkÞ

!

ð3Þ

Z

pð y jα Þ = ð2π Þ

− TN = 2

jΣj

−1=2

1 T −1 exp − y Σ y 2

ð4Þ

Σ = ZΣð2Þ Z T + Σð1Þ as in Harrison et al. (2007). However, this is computationally burdensome as it involves computing with matrices of size T × N and r × N squared. This is partially due to the general form allowed for spatial covariances, i.e. K(1) ≠ K(2), which means that ∑ cannot be written using a single Kronecker product. A way to deal with this is to factor the prior over subvolumes (Harrison et al., 2008b). This strategy is not used here because we are interested in constructing priors over the full volume. The model is simpliﬁed if the spatial covariances are constrained to be the same. In other words, assuming that the signal and the error have the same spatial covariance structure, i.e. K(1) = K(2) = K, and a Jeffreys prior (Jeffreys, 1946; Minka, 2001), pðβÞ = −1 NδY0 0; δ − 1 X T X K chosen, along with S(1) = IT for ease of description. This leads to simpliﬁed expressions for the posterior mean and precision −1 ˆ = ðδ + 1Þ − 1 X T X β XT Y Π=K

−1

T

ðδ + 1ÞX X

ð5Þ

and marginal likelihood N = 2 δ 1 T −1 jK j − T = 2 exp − tr Y SYK δ+1 2 −1 ð6Þ S = IT − ðδ + 1Þ − 1 X X T X XT

pðY jα Þ = ð2πÞ − TN = 2

ð6Þ where α = (ρ, σ, δ). However, this still requires the inversion of a N squared matrix, K, to optimize power and smoothness, which is prohibitive for very large numbers of voxels. Classical random ﬁeld theory A computationally tractable scheme can be achieved, however, by setting δ = 0, to give OLS estimates of GLM parameters, i.e., a massunivariate approach, and choosing not to optimize power and smoothness,6 but to approximate them given the residuals, ɛˆ = Y − Xβˆols. This is achieved using the variance of residuals and their spatial derivative, ɛˆ′ (Kiebel et al., 1999), where 2

2

ρ̂ = hjê j i * + 2 jê j 2 σ̂ = jEˆ Vj 2

ð7Þ

ð2Þ

These could be used to give an estimate of K in Eq. (5) and the posterior density employed to make inferences; however, this is

Nr,c (M, S ⊗ K) = (2π)− rc/2|S|− c/2|K|− r/2 exp(− 2tr(S − 1(A − M)K-1(A − M)T)). 4 σ(k) is referred to as the characteristic length scale in the GPP literature (Rasmussen and Williams, 2006).

5 This is sometimes referred to as evidence optimization (Groves et al 2009), as p(y| α) is also known as the evidence (of the hyperparameters). 6 Note that p(Y|δ = 0) = 0, which means that the marginal likelihood can not be used to optimize the remaining covariance parameters (ρ, σ).

K

ðkÞ

3

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

1129

limited as the marginal likelihood is zero (substitute δ = 0 into Eq. (6)) and as a consequence different models cannot be compared. If however, a classical perspective is taken then the posterior density is not the object of interest. Instead inference proceeds by testing the null hypothesis that there is no effect, i.e. H0 : β⁎,j = 0. This leads to a multiple testing problem as there are as many tests as there are voxels. This is addressed by controlling the family wise error, or the probability of falsely declaring one voxel active within a volume comprised of many. This is where the issue of spatial dependence reappears, which is dealt with using the approximate smoothness of residuals in the RFcorrection to assess the signiﬁcance of statistic images. From this perspective, we see there are a number of steps necessary to arrive at the RF-correction from a GPM. These are essential, given the amount of data, however, this perspective motivates considering other routes to tractability. In particular, those that preserve the Bayesian framework, generalize the application of random ﬁelds to all spatial parameters, i.e., retain K(1) ≠ K(2), and optimize smoothness. Varitional Bayes provides a principled and transparent way to formulate assumptions necessary for tractability. Here we consider using GMRF priors where the covariance function of Eq. (2) is replaced by the inverse precision matrix, K(2) → Q− 1. Note that a matrix is used instead of a function where matrix elements encode pairwise dependence between a ﬁxed number of locations, i.e., a discrete random ﬁeld formulation is used instead of continuous.

D is diagonal containing the degree of the node set and W contains non-zero off-diagonals encoding the connectivity of the graph. The spatial precision matrix can be constructed as a function of the Laplacian matrix, Q = f(L), e.g. f(L) = L, f(L) = L2 or f(L) = D− 1/2 LD− 1/2, where the second and third are referred to as a bi-Laplacian matrix and normalized graph-Laplacian (Chung, 1997). The latter of these has been used by Woolrich et al. (2004a) and Penny et al. (2007). Note here that the matrix is ﬁxed, i.e., its smoothness is not parameterized. The density is called proper if Q is positive deﬁnite, e.g., f(L) = U − τV or f(L) = exp(Lτ), where the latter is the matrix exponential giving the inverse of a diffusion kernel (Harrison et al., 2008a). In these cases, the spatial precision matrix is no longer ﬁxed but parameterized. Here we use an intrinsic GMRF prior with ﬁxed precision matrix, i.e., Q = L. Using a joint GMRF prior along with S(1) = IT, K(1) = diag(α(1))− 1, (2) S = diag(α(2))− 1, in Eq. (1) leads to the likelihood and prior densities

Gaussian Markov random ﬁeld priors and the mean ﬁeld approximation

The likelihood factors over voxels, however, the prior and therefore the posterior does not, which leads to the same scenario as for GPM in that it involves storing and inverting matrices of size r × N squared. However, the posterior over the j-th voxel at the t-th iteration can be computed by integrating out uncertainty in all neighbouring voxels, given the posterior density at the previous iteration (t − 1). This is an example of the MFA used in variational schemes (Penny et al., 2005b). This is best appreciated by using the general VB update rule (see Appendix II) to derive the posterior over GLM coefﬁcients at the j-th voxel outlined below. We describe the framework in Penny et al 2005b next making explicit the iteration within square brackets, e.g., GLM coefﬁcients at the j-th voxel and tth iteration are denoted by β⁎,j[t]. This will be of use later. The log of the posterior is

We now consider the second aside on the MFA described by Penny et al. (2005b). We describe this making explicit the dependence on iteration, i.e., current or previous, which will be indicated using square brackets, i.e., [t] and [t − 1]. This is different to the original formulation, however, is useful later when deriving the SVP prior. We make the distinction between a “joint” and SVP prior, because it is the non-zero off-diagonal elements of a joint pdf over voxels that is the source of the computational issue. This is problematic as it is also this that engenders a prior “spatial”, i.e., neighbouring voxels are coupling. The GMRF priors used by Penny et al. (2005b, 2007) are examples of a joint prior. The MFA factorizes this spatial coupling over voxels, but only solves the problem when computing the posterior pdf over GLM coefﬁcients, not the posterior (or likelihood) of the model. This is achieved using the procedure leading to the SVP prior described in the Theory section. Given a random vector x ∈ RN × 1, a GMRF prior can be written as 0

1

2 B αX C pðxÞ~exp@− V x −xj A 2 i j i;j i e

Y = Xβ + eð1Þ β = eð2Þ

pðY jβÞ =

N Y

− 1 ð1Þ NT YT;j ; XβT;j ; α j IT

j=1

Z pðβÞ =

r Y

− 1 ð2Þ NN βi;T ; 0; α i Q

ð10Þ

i=1

log q βT;j ½t =

ð1Þ ð2Þ ð1Þ ð2Þ q α log p Y j β; α p β jα +c q βT;n j ½t − 1 q α

Z

β T;n j ½t −1; α

ð1Þ

;α

ð2Þ

D E ð1Þ = log p YT;j jβ T;j ½t ; α j

α ð1Þ

E D ð2Þ + log p β jα

β T;n j ½t − 1;α ð2Þ

+c

ð8Þ

ð11Þ

This prior penalizes differences between neighbouring values which are weighted by elements of V, a spatial interaction matrix (Congdon, 2001) and scaled globally by the scalar precision parameter, α. In other words, smooth samples are more probable. Eq. (8) can be written as

where the subscript \j indicates all indices except j and c is a constant. Terms that include β⁎,j[t] are

α T pðxÞ~exp − x Qx 2

ð9Þ

where Q = U − V is the spatial precision matrix and U is a diagonal N P Vj;j0 . This is a joint matrix whose j-th element is Uj;j = j0 = 1

multivariate Gaussian density referred to as an improper or intrinsic prior because Q is semi-positive deﬁnite.7 An example of such a matrix is the unweighted graph-Laplacian (UGL),8 L = D − W, where 7 This is easily seen as the row sum, i.e. Q times a column of ones, is the zero vector. In other words, the all ones vector has an eigenvalue of zero. 8 Where all edge weights are unity on the graph.

1 h ð1Þ T ð1Þ T α̂ j βT;j ½t X XβT;j ½t hlog p YT;j jβT;j ½t ; α j iαð1Þ = − 2 i ð1Þ T T − 2βT;j ½t α̂ j X YT;j + c

ð12Þ

1h T ð2Þ ð2Þ T βT;j ½t Uj diag α̂ iβT;n j ½t − 1;αð2Þ = − βT;j ½t hlog p β jα 2 i T ð2Þ T β̂ ½t − 1Vj;T + c 2 βT;j ½t diag α̂ 1 ð1Þ T T ð2Þ log q βT;j ½t = − α̂ βT;j ½t X X + Uj;j diag α̂ j 2 ð1Þ T T × βT;j ½t −2βT;j ½t α̂ X YT;j + diag j ð2Þ T β̂ ½t − 1Vj;T α̂ +c

1130

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

which leads to the Gaussian posterior pdf with mean and covariance at the t-th iteration, ð1Þ T β̂ T;j ½t = ΣβT;j α̂ j X YT;j + RT;j − 1 ð2Þ ð1Þ T ΣβT;j = α̂ j X X + Uj;j diag α̂

ð13Þ

T RT;j = diag ðα̂ ð2Þ Þβ̂ ½t − 1Vj;T

That is, the current posterior pdf depends on the expectation, ˆ [t − 1], from the previous iteration. See Fig. 1a for a summary β

of VB updates (and Penny et al., 2005b), which is given for comparison with those derived using SVPs later. An equivalent expression for the last line of Eq. (13) is Ri,j = Ujαˆ(2) i Mi,j, where Mi,j = βi,⁎[t–1]Vj,T⁎ U–1 j (see Eq. (17)), which we use in the derivation of SVPs. The MFA provides a computationally efﬁcient way to optimize posterior densities over effect size, as the problem of inverting a r × N squared matrix has been reduced to N problems each involving r squared matrices. Despite this advance, a further issue awaits when evaluating the lower bound on the log model evidence. This is due to a term involving the log determinant of the spatial precision matrix. To see this we consider the lower

Fig. 1. VB updates using joint prior and MFA (A) and spatial-informed voxel-wise priors (SVPs) (B). (A) Update equations for GLM coefﬁcients β and precisions α of the model in Eq. ð2Þ ð2Þ (2,j) T (10), where ɛˆ = Y − Xβˆ and Ri;j = αˆi βˆi;4 ½t − 1Vj;4 = Uj αˆi Mi;j (see main text). (B) Update equations for GLM coefﬁcients β(1), auxiliary variables βi,⁎ and precisions α of the (2,j) model in Eq. (23), where ɛˆ(1) = Y − X(1)βˆ(1), ɛˆ(2) = βˆ(1) − fˆ and fi,j = f(βi,⁎ ) (see main text).

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

bound (see Appendix II), where all hidden variables are represented by Θ = (β,α). pðY; ΘÞ qðΘÞlog qðΘÞ

Z F ðqÞ = Θ

pðY; β jα Þ qðα Þ = log − log qðβÞ pðα Þ α β;α The ﬁrst term of the last line decomposes into

E D pðY; β jα Þ ð1Þ = log p Y jβ; α log β;α ð1Þ qðβÞ β;α D E ð2Þ + log p β jα −hlogqðβÞiβ ð2Þ

In this paper, we choose the marginal density at the previous iteration to be the posterior pdf, i.e.

ð14Þ

p βi;J ½t − 1 = q βi;J ½t − 1 = Nnj βi;J ½t − 1; β̂ i;J ½t − 1; Σβi;J½t − 1

ð15Þ

where Σβi;J ½t − 1 is a diagonal matrix containing posterior variances over neighbouring voxels for the i-th regressor and nj = | J| is the number of neighbours to the j-th voxel. Given Gaussian densities the computation in Eq. (18) is achieved in closed form using Bayes rule (see the latter part of Appendix I) to give

The second term of this expression is the source of the bottleneck, due to the log determinant of spatial precision matrix E log p β jα ð2Þ

β;α

ð2Þ

=

ð2Þ 1 T r logjQ j −α̂ i β̂ i;T Qβ̂ i;T + tr Q Σβi;T 2 r X ð2Þ ð2Þ + log bi − rN log 2π N ψ ai + i=1

ð2Þ Mi;j = β̂ i;J ½t − 1Xj

Si;j =

ð2ÞT ð2Þ Xj Σβi;J ½t − 1 Xj

where we have used standard results for the integrals, which are given in the latter part of Appendix II. This stems from the fact that the joint GMRF prior cannot, in its current form, be factorized over voxels, N Q i.e., pðβÞ≠ p βT;j . This motivates using a conditional autore-

ð2Þ − 1 + Uj α i

YT;j = XβT;j +

N Y ð1Þ ð1Þ = NT YT;j ; XβT;j ; α j IT p Y jβ; α

ð1Þ eT;j

Z

ð2Þ

βi;j = Mi;j + e˜i;j

j=1 N Y r Y N βi;j ; Mi;j ; Si;j p β jα ð2Þ = j=1 i=1

j=1

gressive prior to achieve such a factored form.

ð21Þ

Theory From joint to spatially-informed voxel-wise priors A way around the computational issues described above is to factorize the joint spatial prior over voxels. We describe a procedure to achieve this next. We consider only a GLM in this section, however, equations for the full GLM-AR model are given in the supplementary material. An approximation that circumvents the bottleneck described above is to use a conditional prior that does factor over voxels to compute a marginal pdf at each voxel. Given an improper joint GMRF prior, a standard result (Congdon, 2001) is that the intrinsic conditional autoregressive density at the j-th voxel is given by (see Appendix III for more details) p βi;j jβi;qj = N βi;j ; Mi;j ; Ci;j

Mi;j =

T

N X

−1

= βi;T Vj;T Uj Vj;j0

j =1 N X ð2Þ

Ci;j = @α i

ð17Þ YT;j = X

1−1 Vj;j0 A

j0 = 1

Z

p βi;j ½t jβi;J ½t − 1 p βi;J ½t − 1

β i;J ½t − 1

ð22Þ

ð1Þ

ð1Þ

ð1Þ

βT;j + eT;j

p Y jβð1Þ ; α ð1Þ N Y ð1Þ ð1Þ ð1Þ − 1 NT YT;j ; X βT;j ; α j IT = j=1

−1 ð2Þ = α i Uj

ð1Þ ð2;jÞ ð2Þ βi;j = f βi;T + ei;j

This can be used as a transition density to update a marginal prior density at the j-th voxel. Using the subscript J, which contains indices of all neighbours to the j-th voxel, i.e., its Markov blanket, and making explicit the iteration in squared brackets, i.e. p(βi,j|βi,\j) → p(βi,j[t]|βi,J [t − 1]), this is achieved using the following integral p βi;j ½t =

ð2;jÞ ð2Þ ð2Þ cov βi;j ½t = cov βi;T Xj + eT;j ð2;jÞ cov βi;T = Σβi;J ½t − 1 ð2Þ ð2Þ − 1 cov ei;j = Uj α i

This leads to a three level model, which we write using the augmented ð2;jÞ ð2;jÞ ð2Þ = Mi;j + βi;4 Xj . notation, X → X(1), β → β(1) and introduce f βi;4

0

0

where cov(ɛ∼i,j(2)) = Si,j, given in Eq. (20). This is an empirical prior, because of its dependence on the previous posterior. A point estimate of the precision parameter α(2) could be optimized using gradient i ascent on the lower bound of the log model evidence, however, a posterior density can be estimated more conveniently by dividing the (2,j) second level of Eq. (21) in two using the auxiliary variable βi,⁎ , where

Vj;j0 βi;j0

j0 = 1

ð20Þ

T −1 Where X(2) j = Vj,J Uj . We refer to this as a spatially-informed voxelwise prior (SVP). This leads to the observation model

ð16Þ

N X

ð19Þ

p βi;j ½t = N βi;j ½t ; Mi;j ; Si;j

β;α

D

1131

ð18Þ

ð1Þ ð2 Þ ð2Þ p β jβ ; α Z N Y r Y ð1Þ ð2;jÞ ð2Þ − 1 N βi;j ; f βi;T ; Uj α i = j=1 i=1

ð2;jÞ

ð3;jÞ

βi;T = ei;T

N Y r Y ð2;jÞ ð2Þ p β Nnj βi;T ; 0; Σβi;J ðt − 1Þ = j = 1 i=1

ð23Þ As the prior and likelihood factor over voxels, so does the posterior pdf. Importantly, the evaluation of the lower bound now does not

1132

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

depend on the determinant of the precision matrix as the expression in Eq. (16) is replaced by hlog p βð1Þ jβð2Þ ; α ð2Þ iβð1Þ ;βð2Þ ;αð2Þ " ! N N X r X 1 X ð2Þ ð2Þ = r logUj − Uj α̂ ê + Σβð1Þ + ni;j i i;j 2 j=1 i;j j=1 i=1 r X ð2Þ ð2Þ + log bi − rNlog 2π × N ψ ai

ð24Þ

i=1 ð2ÞT

ð2Þ

where ni;j = Xj Σβð2;jÞ Xj . This is easily computed for volumes i;4 containing a very large number of voxels. VB algorithm Details of the algorithm for the GLM described in the previous section are given in Appendix V, which are divided into ﬁve steps; (1) the observation model, where all hidden random variables to be estimated will be denoted collectively by Θ, (2) densities used to represent the complete data, i.e. the joint probability of data and Θ, likelihood and priors, (3) the factorized posterior density, (4) expressions used to update marginal posterior pdfs, and (5) a lower bound. Update equations (step 4) are summarized in Fig. 1B. These are updated sequentially until the lower bound, F(q), converges. Comparing them to updates using a joint prior and MFA (Fig. 1A) we note that expressions for q(α(1) j ) are the same, while those for q(β⁎,j) and q(αj(2)) are augmented by terms involving posterior uncertainty in neighboring coefﬁcients, i.e. Σβð1Þ ½t − 1 . The same ﬁve steps for the i;J

full GLM-AR model are provided in Supplementary material. Results Here we compare compute times and accuracy of estimates using a joint prior and SVPs for synthetic data. This is followed by a comparison using fMRI data from a single subject. Both joint and SVP priors used here are derived from the same spatial precision matrix, i.e., a UGL. The only difference is the procedure to factorize a joint prior over voxels, which is the deﬁning feature of the SVP prior. All results were obtained using a standard laptop (CPU 2.2 GHz and 2 Gb RAM) and Matlab (version 7.5, The MathWorks Inc., Natick, MA). Synthetic fMRI data Two sets of synthetic data were generated on a regular 3D lattice with an equal number of voxels in each dimension. These were sampled using a two-level model (see Eq. (1)) using the generative model shown in Fig. 2A. The design matrix (X) is shown on the right, which contained 3 columns over 64 time points; discrete sine and cosine sets at one frequency (one period over the 64 time points) and an offset (column of −1 ones). Covariance matrices used where Sð1Þ = α ð1Þ IT , K(1) =IN, S(2) =Ir and diffusion kernel (Harrison et al., 2008a) for the spatial covariance over GLM coefﬁcients, i.e., K(2) = exp(−Lτ), where L was an UGL and ‘exp’ denotes the matrix exponential. Values used for α(1) and τ are given below. Details of how data were generated are given in Appendix IV. The ﬁrst data set was designed to compare compute times using a joint prior and SVPs. As such data on lattices containing 83 to 383 (in steps of 2) voxels were generated with covariance parameter values τ = 4 and α(1) = 1. Compute times are compared in Fig. 2B and demonstrate a clear difference. Firstly, the increase in compute times is superlinear for both, although the rate of increase is less for SVPs. However, more striking is that the joint prior model is unable to run for voxels above ~4000 due to limited memory when computing eigenvalues of the spatial precision matrix. In contrast, the SVPs

model is able estimate data with ~ 55,000 voxels in less than 20 minutes. The purpose of the second data set was to compare accuracy of estimates using a joint prior and SVPs under different regimes of GLM coefﬁcient smoothness and magnitude of observation noise. These data were generated on a 243 regular lattice with GLM coefﬁcients at three different spatial scales and three levels of observation error; τ ∈ (2,3,4) and α(1) ∈ (10,1,0.1). Examples of the generated GLM coefﬁcients over space (upper row) and synthetic data from one voxel over time (lower row) are shown in Fig. 2C. This produced a total of 9 regimes, each containing a different combination of smoothness and observation error. A data set was sampled from each combination and ﬁt using both priors.9 The volume was divided into independent slices for the joint prior and the full volume modelled using SVPs. Accuracy of estimates was assessed using the mean squared error (MSE) in GLM ð1Þ N 2 coefﬁcients, i.e. MSE = N1 Σj=1 j βˆj −βtrue ; j j . These are shown in Fig. 2D. The main feature of note is increased relative accuracy using SVPs over a joint prior with increasing smoothness and noise. Interestingly, MSE is less for the joint prior when the smoothness and observation error are low (see Table 1). Receiver operating characteristic (ROC) curves are shown in Fig. 2E along with those for OLS estimates. These are plots of sensitivity (or true-positive rate) versus one minus speciﬁcity (false-positive rate) as a parameter is varied. They show how speciﬁcity is preserved as sensitivity increases and the closer the curve is to the top left of the ﬁgure, the better. Both priors are better than OLS estimates, however, SVPs show increased relative improvement with greater smoothness and noise. GLM coefﬁcient estimates from both priors compared to known values through slices along xy and yz planes are shown in Fig. 2F. While estimates for the joint prior are isotropic in the xy plane, this is not so in the yz plane. This is due to independence along the z-axis producing a “stacked” feature of estimates, which is not the case for the SVP model. This is seen also in Fig. 2G showing plots along the z-axis at a voxel within the xy plane. The general pattern is one of SVPs producing smoother estimates, which outperform the joint prior given increased smoothness of GLM coefﬁcients and observation error. fMRI data set Here we show an analysis of the same data set reported elsewhere (Flandin and Penny, 2007; Penny et al., 2005b). As such, we will give a brief description. These data are from a single subject who participated in a 2 × 2 factorial event-related fMRI experiment studying a repetition priming effect for famous and non famous faces (Henson et al., 2002). Functional images were acquired using a continuous echo-planar imaging (EPI) sequence on a 2T VISION system (Siemens, Erlangen, Germany) with TE = 40 ms and TR = 2 s, producing 351 T2⁎-weighted full-brain covered scans. Each volume is composed of 24 transverse slices of dimension 64 × 64 with a voxel size of 3 × 3 × 3 mm3 and a 1.5 mm gap between slices. During the same experiment, a T1-weighted volume was also acquired for anatomical information from the subject. We performed the standard preprocessing steps using SPM8 (SPM, 2009), following the tutorial in the SPM manual. Functional images were realigned (Friston et al., 1995a), slice time corrected, the mean functional image and the structural were then coregistered using a mutual information criterion (Friston et al., 1995a), a nonlinear transformation was then estimated to register the anatomical image with a standard T1-weighted template image in MNI space (Ashburner and Friston, 2005) and the estimated deformation ﬁeld applied to the 351 realigned, slice-timing corrected functional

9 Initially, we used 8 volumes; however, differences in MSE were extremely slight because the number of voxels was so large in each volume. For this reason, we chose to report only 1 volume.

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

1133

1134

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

Fig. 2. Synthetic data. (A) Graphical representation of generative model used to sample data and design matrix (covariates over time). (B) Comparison of compute times for joint prior with MFA and SVPs. (C) Slices through sampled GLM coefﬁcients over space (upper row) and data from one voxel over time (lower row; continuous line shows noiseless data) at three different scales of smoothness and observation error respectively. (D) Accuracy of GLM coefﬁcient estimates using mean squared error (relative to known GLM coefﬁcient values). (E) Receiver operator characteristic curves using joint prior, SVPs and OLS estimates (see upper right for labels of axes). (F) Slices through xy and yz planes of known values of contrast images (covariance parameters τ = 3 and α(1) = 1) cTβ, where c = [110] / 2 and estimates. (G) Same data set as in (F) showing plots of known contrast values and estimates along z-dimension at a ﬁxed point in the xy plane (see upper right for labels of axes).

scans. These images were not smoothed as this step was replaced by a spatial prior. Data were scaled such that regression coefﬁcient values corresponded to “percentage of global mean signal”. The experimental

paradigm was then modelled using a GLM with the design matrix shown in Fig. 3A. This contained ﬁve regressors representing the 4 types of event; ﬁrst or second presentation of images of famous and non-famous faces, and offset. Each regressor corresponds to stimulus

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141 Table 1 Mean squared error of estimates relative to known GLM coefﬁcients. Each cell of the table contains 2 numbers, where the top is for the joint prior model. All numbers to be scaled by 10− 3. Smoothness/Noise level

Low

Medium

High

Low

3.0 6.5 2.1 2.3 1.6 1.1

12.2 8.2 9.5 3.7 8.8 2.2

82.1 20.1 77.4 15.4 76.3 15.0

Medium High

1135

above and below zero respectively (Chatﬁeld, 1996, p. 39). This function is shown over space in Fig. 3D (lower row, last column). Darker regions, i.e., negative values (periodic ACFs), are associated with higher probabilities of white matter, though spatial variability in this map is far from being wholly explained by tissue-type. Fig. 3F shows the use of log evidence maps to compare the full design matrix with a reduced design containing only the offset. This is the Bayesian equivalent of an F map (Rosa et al., 2010). Log evidence maps were smoothed using a 6 mm Gaussian kernel and show increased response for SVPs compared to the joint prior. Discussion

onsets convolved with a canonical hemodynamic response function. Serial correlations were modeled using an autoregressive model of order 2, AR(2). Joint prior and SVPs were used over GLM and AR coefﬁcients. The spatial precision matrix, i.e., Q, used was an UGL. A graphical representation of the generative model is shown in Fig. 3B. See Supplementary material for full details of the model and update equations. Compute times using a joint prior (applied to each slice independently) and SVP (applied to a full volume) were 25 and 30 minutes, respectively, where the volume contained ∼ 55,000 voxels. Contrast images of a transverse and coronal slice from joint prior and SVPs are shown in Fig. 3C, along with plots of estimates along the z-axis through a voxel in the xy plane. The stacked nature of estimates using the joint prior is easily seen (similar features are seen in synthetic data, Figs. 2F and G). This is not the case when using SVPs whose estimates are smooth in all three dimensions. Tissue probability maps (TPM; cerebral spinal ﬂuid (CSF), white and grey matter) and estimated AR coefﬁcients of the AR(2) model are shown in Fig. 3D. TPMs, shown on the top row, were computed using the segmentation routines in SPM (Ashburner and Friston, 2005) and smoothed with a 8 mm Gaussian kernel. The spatial variability in the estimated AR coefﬁcient images is complex; however, there are some features of note. The ﬁrst AR coefﬁcient image is shown below on the left. This shares features with the CSF probability map in that large coefﬁcients tend to occur where the probability of a voxel containing CSF is high. The second AR coefﬁcient image (middle lower row) is less clear. Hypointense regions may be associated with increased probability of white matter. This is more pronounced in the next image (lower row, last column), which is a function of the two AR coefﬁcients. To understand the signiﬁcance of this image, we ﬁrst have to consider the autocorrelation function (ACF) of an AR process, which we describe next. An AR process is a random process (Chatﬁeld, 1996) that can be represented by the AR coefﬁcients or alternatively by an autocorrelation function (ACF). The process is stationary if the probability distribution from which samples are generated does not change with time. The stationary zone of an AR(2) process is shown in Fig. 3e. It is the triangular region marked in bold. In other words, all AR coefﬁcients within this triangle could be used to generate samples, i.e. time-series, of stationary AR processes. This space is further divided into two regimes depending on whether the shape of the ACF is exponential or periodic. All estimated AR coefﬁcients from the volume of fMRI data are shown overlaid on the stationary zone. The average ACF from both regimes is shown in the lower portion of Fig. 3E with periodic [exponential] decay on the left [right]. Note that as these are close to the border between the two regimes, so periodicity of the ACF on the left is difﬁcult to appreciate. In both cases the ACF decays rapidly and is near zero by 3 time (scan) lags. The period of the average oscillatory process was approximately 4 scans or ~ 1/8 Hz as the scan interval was 2 seconds (1/2 Hz). The criterion for distinguishing between these two regimes is a function of the AR coefﬁcients. Using a1 and a2 to symbolize AR coefﬁcients of an AR(2) process at one voxel, the shape of the ACF depends on the value of a21 + 4a2. Exponentially and periodically damped ACFs have values

The naïve application of continuous random ﬁeld priors or Gaussian process priors to fMRI data does not lead to a computationally tractable algorithm, due to the very large number of voxels in a brain volume. Approximations are therefore necessary. We have shown here how the classical approach to inference using the RF correction can be motivated starting from a Gaussian process model. This places it in a broader context and motivates ﬁnding alternative routes to tractability that retain the Bayesian formulation. Gauss– Markov random ﬁeld priors within a variational scheme provide a transparent way to formulate assumptions necessary for tractability, however, the use of a joint spatial prior still leads to a computational bottleneck when computing log evidence maps, which are needed to compare models. One may argue that, in practice, the log determinant could be omitted from the lower bound; however, this limits comparison to models with the same spatial precision matrix. This is undesirable as there are many different and biologically relevant means of coupling neuronal responses over space and their comparison is important if a Bayesian framework is to generalize the functionality currently available using a classical framework. Given this, we have described a modiﬁcation of the variational Bayes algorithm of Penny et al. that overcomes the computational issue using a procedure to factorise a joint prior over voxels. We refer to this as a spatially-informed voxel-wise prior. The updated prior is computed at the j-th voxel by integrating with respect to posterior densities of neighbouring coefﬁcients at the previous iteration, similar to the mean ﬁeld approximation (MFA). Given the composite nature of the resulting variance term of this prior, it was convenient to split the second level of the hierarchical model in two to facilitate a derivation of a density estimate over precision parameters, α(2). Comparing update equations for the MFA and SVPs (see Fig. 1) we see that expressions for precision parameters of observation noise, α(1), are the same. However, updates for GLM coefﬁcients and second-level precision parameters are augmented by terms involving uncertainty over GLM coefﬁcients at the previous iteration. This is in contrast to the MFA which only uses the mean. We demonstrated that compute times for SVPs increase more slowly with the number of voxels than a joint prior. More importantly, SVPs can be used on volumes containing more than ∼ 4000 voxels, which is an imposed limit due to computing eigenvalues of the spatial precision matrix in Matlab. SVPs are able to model very large data sets by replacing this computation with storing intermittent estimates of GLM and AR coefﬁcients and crosscovariance terms (Penny et al., 2005a). This requires time to save and reload, however, these operations are possible no matter how large the data set, whereas computing eigenvalues limits the model to a few thousand voxels. In practice, this is dealt with for the joint prior by forgoing smoothness along the z-axis. A consequence is noisy estimates along this dimension. This, however, is not so for SVPs. They produce smoother results, which leads to improved performance given smooth underlying ﬁelds and noisy data. Here we used GMRF priors based on an unweighted graph-Laplacian; however, a nonstationary spatial prior based on a weighted graph could be used to preserve the

1136 L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141 Fig. 3. fMRI data set. (A) Design matrix of GLM. (B) Graphical representation of spatial GLM-AR model. (C) Transverse and coronal slice through anatomical image (left) and contrast images, cTβˆ where c = [11110] / 4, for joint prior (middle) and SVPs, along with plot along z-axis (see white arrows) to demonstrate effect of independence over slices for the joint prior model. (D) Tissue probability maps (CSF, white and grey matter from left to right, top row) and AR coefﬁcients (ﬁrst, second, and function of both used to discriminate between exponential and periodically damped autocorrelation functions (see main text), from left to right). Note that the grayscale is the same for each image on the top row, but different on the lower row. (E) Stationary zone of AR(2) process, partitioned into two regimes corresponding to exponential and oscillatory autocorrelation functions (ACF). Estimates of AR coefﬁcients from the whole volume are overlaid. The mean AR coefﬁcients in each regime is indicated by a ‘x’ and ‘o’ and its associated ACF shown below. (F) Bayesian model comparison maps for joint (left) and SVPs (threshold of 0.5 and 0.99 used for upper and lower rows).

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

Fig. 3 (continued).

magnitude of responses on regions of activation. This is however a moot point as the additional smoothness of single subject estimates may well be beneﬁcial for group studies to accommodate intersubject variability in functional anatomy. The compute time for the SVP model of the fMRI data set presented here was approximately 30 minutes on a standard laptop computer. This was longer than the synthetic data (∼ 20 minutes) due to a larger design matrix (5 instead of 3 columns) and a spatial AR(2) noise model instead of independent and nonidentical. The compute time was also longer than the joint prior model (25 minutes); however, this required dividing the volume into slices, which is not a true 3D model and resulted in the ‘stacked’ feature of GLM coefﬁcient estimates. As such, SVPs provide a truly 3D spatiotemporal model over a full-brain volume that avoids dividing it into slices. It has long been known that AR processes depend on position in the brain (Bullmore et al., 1996; Penny et al., 2003; Woolrich et al., 2001; Worsley et al., 2002). In particular, that the variability in autocorrelation depends on tissue-type, e.g., larger AR(1) coefﬁ-

1137

cients in CSF (Penny et al., 2003). However, a further study using Bayesian model comparison (Penny et al., 2007) showed that tissuetype is not sufﬁcient to explain this spatial variability. An AR model of order 1 was used in that study and was extended here to order 2. Images of the ﬁrst AR coefﬁcient here concur with those reported by Penny et al. (2003) in that large values tended to occur at regions containing CSF. However, because we used an AR(2) model, we can visualize spatial variability in the two regimes of autocorrelation function, i.e., exponential or oscillatory. The hint of a tissue-speciﬁc pattern in the shape of ACF is intriguing and requires further investigation. A source of periodicity is physiological noise, e.g. effects due to respirations (∼ 1/3 Hz) and heart rate (∼ 1 Hz). However, given the low sampling rate (1/2 Hz), these sources could not be distinguished. We refer to SVPs as empirical priors. This is because the prior at the current iteration depends on the posterior of neighbouring coefﬁcients at the previous. In other words, it is not a true generative model. However, the current formulation could be used to produce generative models using a prior at the previous iteration instead of posterior pdf. An example of this is to derive a diffusionbased spatial prior (Harrison et al., 2007) using the transition pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ density, pðxðt Þjxðt − dt ÞÞ = N xðt Þ; K ðdt Þxðt − dt Þ; 0 , where the discrete random ﬁeld, x(t − dt) ∈ RN × 1 is propagated over time dt pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ using K ðdt Þ = exp − 12 L dt (matrix exponential). Given the prior at the previous time point, p(x(t − dt))=N(x(t − dt); 0, K(t − dt)), the updated prior is p(x(t)) =N(x(t); 0, K(t)), which is the same as that used in Harrison et al. (2007). There is no computational saving here as the updated prior is a joint density over space, however, such a saving can be achieved by integrating over a factorized density, e.g., the posterior pdf. Presenting training samples sequentially and propagating a pdf using a transition density is characteristic of “on-line” algorithms such as the Kalman Filter; however, here algorithmic time is used, not physical. These ideas have also been used to approximate Gaussian process models (Csato and Opper, 2002) and applied to very large data sets (Cornford et al., 2005). We have used a discrete formulation over algorithmic time here; however, this can be generalized to continuous time using a transition density based on the diffusion kernel similar to above. The computational saving here compared to using the diffusion kernel as a spatial covariance matrix is that it involves a matrix–vector product, which can be computed much more efﬁciently, e.g., using Expokit (Sidje, 1998), compared to the matrix exponential (Moler and Van Loan, 2003). Generalizing further to a continuous formulation over space, transition densities used in stochastic partial differential equations, e.g., the Fokker– Planck equation (Risken, 1996), could also be used (Lindgren and Rue, 2007). Acknowledgments The Medical Research Council funded this work. Appendix I Using Gaussian densities in Bayes rule leads to closed form expressions for the posterior and marginal likelihood

N y; Ax; Sy

pð y jxÞpðxÞ = pðx jyÞpð yÞ Nðx; mx ; Sx Þ = N x; mx j y ; Sx j y Nð y; Amx ; SÞ S = ASx AT + Sy −1 mx j y = Sx j y AT S−1 y y + Sx mx −1 + AT S−1 Sx j y = S−1 x y A

ðI:1Þ

1138

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

where N(y; Ax, Sy), N(x; mx, Sx), N(x; mx|y, Sx|y), and N(y; Amx, S) are likelihood, prior, posterior and marginal likelihood respectively. 1 Alternative expressions commonly seen use C = Sx|yATS− y , which can be written as C = SxATS− 1 and after applying the matrix inversion lemma Sx|y = (I − CA)Sx. This leads to the expression mx|y = mx + C(y − Amx), typically seen in formulations of the Kalman Filter, where C is the Kalman Gain matrix. If we take N(y; Ax, Sy) = p(βi,j[t] | βi,J[t − 1]), N(x; mx, Sx) = p(βi,J 1 [t − 1]) and substitute y = βi,j[t], x = βi, J[t − 1]T, A = X(2)T = U− Vj,J, j j T (2) − 1 Sy = (Uj,αi ) , mx = βˆi,J[t − 1] , Sx = Σβi; J ½t − 1 into Eq. (I-1), this yields a closed form expression for the current marginal prior, N(y; Amx, S) = p(βi,j[t])

Appendix III The conditional density can be derived from the joint by partitioning the vector βi,⁎ = [βi,j βi,\j] and matrix Q, to produce a Gaussian over βi,j. 0 ð2Þ α βi;j p βi;j jβi;qj ~exp@− i 2 ~exp −

p βi;j ½t = N βi;j ½t ; Mi;j ; Si;j ð2Þ Mi;j = β̂ i;J ½t − 1Xj ð2ÞT

Si;j = Xj

ð2Þ

Σβi;J ½t − 1 Xj

~exp −

2

Qqj;j

#2 β 31 i;j 4 5A T Qqj;qj βi;qj ! Qj;qj

2 T βi;j Qj;j + 2βi;j Qj;qj βi;qj

ð2Þ 2 αi βi;j − Mi;j Qj;j 2

!

Mi;j = − Qj;j−1 Qj;qj βTi;qj

ðI 2Þ

ð2 Þ − 1 + Uj α i

ð2Þ αi

βi;qj

" Qj;j

ðIII 1Þ This leads to the conditional density p(βi,j | βi,\,j) = N(βi,j; Mi,j, (α(2) i T Uj) ), where − Qj,j− 1Qj,\jβi,\j = βi,⁎Vj,⁎TUj− 1 and Qj,j = Uj (see main text). −1

Appendix II Given the factorized density, qðΘÞ = Π n qðθn Þ, as an approximation of the true posterior pdf, p(Θ|y) = p(y,Θ)/p(y), and the relationship between the log model evidence, L, its lower bound, F and KL divergence between q(Θ)and p(Θ|y), KL L = F + KL L = log pð yÞ Z pð y; ΘÞ F= qðΘÞlog qðΘÞ Θ Z qðΘÞ KL = qðΘÞlog pð ΘjyÞ

ðII 1Þ

Θ

The KL divergence between q(Θ)and p(Θ|y) is minimized by maximizing F using the following update rule for each factor qðθn Þ = Z

expðIn Þ

Appendix IV A sample from a multivariate normal density with covariance matrix K(2)(τ) can be obtained by explicitly computing this matrix, 10 sampling an initial random ﬁeld from an i.i.d process , x(0) and ﬃ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ð 2 Þ computing the matrix-vector product xðτÞ = K ðτÞxð0Þ. However, the matrix exponential is burdensome to compute for very large dimensions. Instead we can use the fact that it is a diffusion kernel of a diffusive process on a graph, which is governed by the heat equation : x = − 12 L x, where x denotes the rate of change with time of the random ﬁeld. Given x(0) and L, the sample x(τ) can be computed iteratively using an Euler step (Smith, 1985), i.e., xðt + dt Þ = xðt Þ − dt 12 L xðt Þ. As L is sparse the matrix–vector product can be computed easily. This can be repeated until the total time of diffusion equals τ, thereby providing a viable alternative to the matrix exponential as a means of sampling a random ﬁeld on very large graph.

expðIn Þdθn ðII 2Þ

In = log qðθn Þ Z q θqn log pð y; ΘÞ =

Appendix V Observation model

θqn

The mathematical form of the factors is typically chosen such that the integral involved in In has a closed form solution. Two frequently used examples are the integration of a quadratic form with respect to a multivariate normal pdf Z T

qðxÞð y− AxÞ K x

−1

T

−1

ð y − AxÞ = ð y − Aμ Þ K ð y − Aμ Þ T −1 + tr A K AΣ qðxÞ = Nr ðx; μ; ΣÞ

(2,j) Starting with the model in Eq. (23) and using fi,j = f(βi,⁎ )

ð1Þ

ð1Þ

YT;j = XβT;j + eT;j ð1Þ

ð2Þ

βi;j = fi;j + ei;j ð2;jÞ

ðV 1Þ

ð3;jÞ

βi;T = ei;T

where xaRr × 1 and log of the positive scalar (x N 0) with respect to a Gamma pdf

ð3;jÞ (1)− 1 − 1 (2)− 1 = where cov(ɛ(1) IT, cov(ɛ(2) αi , cov ei;4 ⁎,j ) = α j i,j ) = U j (2) (2) ˆ (1) Σ ð1Þ , fi,j =Mi,j +βi,(2,j) and the set of hidden ⁎ Xj , Mi,j =β i,J [t–1]Xj

Z

random variables is Θ = {β(1), β(2), α(1), α(2)}.

βi; J ½t − 1

qðxÞlogx = ψðaÞ + logb x

qðxÞ = Gaðx; a; bÞ

where ψ(x) = logΓ(x)/dx is the digamma function.

10 Where each element of a random vector is independent and identically distributed, i.e. the vector x(0)∈Rr×1 is sampled from Nr(x(0); 0,Ir).

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

Auxiliary variables The log posterior is ð2;jÞ ð1Þ ð2;jÞ ð2Þ ð2;jÞ logq βi;T = hlogp βi;j jβi;T ; α i iβð1Þ ;αð2Þ + logp βi;T +c

Complete data, likelihood, and prior These are chosen to be ð1 Þ ð1Þ ð1Þ ð2Þ ð2Þ ð2Þ p β jβ ; α p β pðα Þ pðY; ΘÞ = p Y j β ; α N Y ð1Þ ð1Þ p Y jβð1Þ ; α ð1Þ = NT YT;j ; XβT;j ; α j IT

i;j

j=1 i=1

ðV 7Þ

ðV 2Þ

N Y r Y ð2Þ ð2;jÞ = Nnj βi;T ; 0; Σβi;J ½t − 1 p β j=1 i=1 nk Y ðkÞ ðkÞ ðkÞ ðkÞ = Ga α n ; an ; bn p α n=1

where nk is the number of elements in the vector of precision parameters α(k). Factorized posterior density We assume that the posterior density factorises as follows 0 10 1 N N r Y Y Y ð1Þ ð1Þ A@ ð2;jÞ ð2Þ A @ ðV 3Þ qðΘÞ = q β;j q α j q βi;T q α i j=1

i

where

j=1

N Y r Y ð1Þ ð2Þ ð1Þ ð2;jÞ ð2Þ − 1 = p β jα N βi;j ; f βi;T ; Uj α i

1139

j=1 i=1

Components of the posterior pdf, i.e., marginal densities q(θn), are computed using log q(θn) = ∫ q(θ\n)logp(y,Θ)dθ\n (see Appendix II), where c is a constant as we describe next.

ð1Þ T ð2Þ 1 ˆð1Þ ð1 Þ ð2;jÞ ð2Þ hlogp βi;j jβT;i ; α i iβð1Þ ;α ð2Þ = − βi;j − fi;j Uj αˆi βˆi;j − fi;j + c 2 i;j i ð2Þ ð2ÞT ð2;jÞT 1 h ð2;jÞ ð2Þ β X Uj αˆi Xj βi;T = − 2 i;T j ð1Þ i ð2Þ ð2;jÞ ð2Þ −2βi;T Xj Uj αˆi βˆi;j − Mi;j + c 1 ð2;jÞ −1 ð2;jÞ ð2;jÞ logp βi;T = − βi;T Σβð1Þ ðt − 1Þ βi;T + c ðV 8Þ i;J 2 ð2Þ ð2Þ ð2ÞT 1 h ð2;jÞ ð2;jÞ β = − Uj αˆi Xj Xj logq βi;T 2 i;T ð2Þ −1 ð2;jÞ ð2;jÞ ð2Þ + Σβð1Þ ðt − 1Þ βi;T −2βi;T Xj Uj αˆi i;J ð1Þ i × βˆi;j − Mi;j + c because ð1Þ ð1Þ T ð2Þ ð2Þ ð2Þ ð1Þ βˆi;j − fi;j Uj αˆi βˆi;j − fi;j = fi;j Uj αˆi fi;jT − 2fi;j Uj αˆi βˆi;j ð2Þ ð2Þ ð2;jÞ ð2Þ fi;j Uj αˆi fi;jT = Mi;j + βi;T Xj Uj αˆi ð2;jÞ ð2Þ T × Mi;j + βi;T Xj ð2;jÞ

ð2ÞT

ð2Þ

= βi;T Xj Uj αˆi

Marginal posterior densities

ð2;jÞ

ðV 4Þ

These integrals lead to the expressions 1 h ð1Þ ð1Þ ð1Þ ð1Þ ð1Þ T α̂ j YT;j −X βT;j hlogp YT;j jβT;j ; α j iα ð1Þ = − 2 j i ð1Þ +c × YT;j −X ð1Þ βT;j h T 1 ð1Þ ð2Þ ð1Þ ð2;jÞ ð2Þ βT;j − fˆT;j Uj diag α̂ hlogp βT;j jβ ; α iα ð2Þ ;βð2;jÞ = − 2 i ð1Þ × βT;j − fˆT;j + c ðV 5Þ 1 h ð1Þ ð1Þ ð1ÞT ð1Þ ð1Þ logq βT;j = − β α̂ j X X 2 T;j ð1Þ ð1ÞT + Uj diag α̂ ð2Þ βT;j −2βT;j ð1Þ ð1Þ × α̂ X YT;j j i ð2Þ ˆ f T;j + c + Uj diag α̂ which leads to the Gaussian posterior density ð1Þ ð1Þ ð1Þ q βT;j = Nr βT;j ; βˆT;j ; Σβð1Þ T;j ð1Þ ð1Þ ð1Þ ˆ βT;j = Σβð1Þ α̂ j X YT;j + RT;j T;j − 1 T ð1Þ Σβð1Þ = α̂ j X ð1Þ X + Uj diag α̂ ð2Þ

ðV 6Þ

(2,j) (2) X,j . Compare this to the where Ri,j = Ujαˆi f i,j and fˆi,j = Mi,j + βˆ i,⁎ ˆi(2)Mi,j. algorithm in Penny et al. where Ri,j = Ujα

ð2Þ

ð2Þ ð1Þ ð2Þ ð1Þ ð2;jÞ ð2Þ fi;j Uj αˆi βˆi;j = βi;T Xj Uj αˆi βˆi;j + c

which leads to the Gaussian posterior density ð2;jÞ ð2;jÞ ð2;jÞ = Nnj βi;T ; βˆi;T ; Σβð2;jÞ q βi;T i;J ð1Þ ð2;jÞ ð2Þ ð2ÞT ˆ ˆ βi;T = βi;j − Mi;j Uj αˆi Xj Σβð2;jÞ i;J −1 ð2Þ ð2Þ ð2ÞT −1 Σβð2;jÞ = Uj αˆi Xj Xj + Σβð1Þ ½t − 1 i;J

ðV 10Þ

i;J

Noise precisions The log posterior is ð1Þ ð1Þ ð1Þ ð1Þ = hlogp YT;j jβT;j ; α j iβð1Þ + logp α j +c logq α j T;j

ðV 11Þ

Computing these integrals leads to T 1h ð1Þ ð1Þ ð1Þ ð1Þ ð1Þ ð1Þ −T logα j + α j eˆ4;j eˆT;j hlogp YT;j jβT;j ; α j iβð1Þ = − 2 T;j i ð1ÞT ð1Þ +c + tr X X Σβð1Þ T;j ð1Þ ð1Þ ð1Þ ð1Þ logp α j = − logC aj −aj logbj ð1Þ αj ð1Þ ð1Þ + aj − 1 logα j − ð1Þ bj T ð1Þ ð1Þ T ð1Þ ð1Þ ð1Þ ð1Þ 1 + aj − 1 logα j −α j eˆ eˆ = logq α j 2 2 4;j T;j T 1 ð1Þ ð1Þ + tr X X Σβð1Þ + ð1Þ + c T;j bj ðV 12Þ

T;j

(2) ˆ

ð2Þ

ðV 9Þ

ð2;jÞT

βi;T

+ 2βi;T Xj Uj αˆi Mi;j + c

Update equations are summarized in Fig. 1. In more detail; Regression coefﬁcients The log posterior is ð1Þ ð1Þ ð1Þ logq βT;j = hlogp YT;j jβT;j ; α j iαð1Þ j ð1Þ ð2;jÞ ð2Þ iαð2Þ ;βð2;jÞ + c + hlogp βT;j jβ ; α

ð2ÞT

Xj

where ɛˆ = Y–X βˆ (1) ⁎,j , which leads to the gamma density, shape parameters and expectation at the j-th voxel (1)

(1)

1140

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

ð1Þ ð1Þ ð1Þ ð1Þ = Ga α j ; aj ; bj q αj

The ﬁrst term of the last line decomposes into

T ð1 Þ + aj 2 1 1 ð1ÞT ð1Þ 1 ð1ÞT ð1Þ eˆ4;j eˆT;j + tr X X Σβð1Þ + ð1Þ = ð1Þ 2 T;j bj bj ð1Þ ð1Þ ð1Þ αˆj = aj bj ð1Þ

aj

=

ðV 13Þ

ðV 18Þ ð1Þ ð 1Þ ð 1Þ ð2Þ ð 2Þ = hlog p Y jβ ; α iβð1Þ ;αð1Þ + hlogp β jβ ; α iβð1Þ ;βð2Þ ;αð2Þ ð1Þ ð2Þ ð2Þ −hlog q β iβð1Þ − KL q β ‖p β

q βð1Þ q βð2Þ log p βð1Þ jβð2Þ ; α ð2Þ

β ð1Þ ;βð2Þ

×p α

ð2Þ

1 0 ð1Þ ð 1Þ ð1Þ ð 2Þ ð 2Þ ð 2Þ p y jβ ; α p β jβ ; α p β ð1Þ ð1Þ A q β q β qðα Þlog@ q βð1Þ q βð2Þ

β ð1Þ ;β ð2Þ ;α

Regression coefﬁcient residual precisions Following the previous section, we write out an expression for the vector of precision parameters, α(2), using a factorization over T regressors, which is an approximation if X ð1Þ X ð1Þ contains non-zero off-diagonals (which is typically the case). The log posterior is Z log q α ð2Þ =

Z

Each term is " N 1 X ð1Þ ð1Þ + log bj hlog p Y jβð1Þ ; α ð1Þ iβð1Þ ;αð1Þ = T ψ aj 2 j=1 T ! ð1Þ ð1Þ ð1Þ ð1Þ ð1Þ eˆT;j eˆT;j + tr X X Σβð1Þ −αˆj T;j #

+c

−TN log 2π

r X ð1Þ ð2Þ ð2Þ ð1Þ ð2Þ ð2Þ hlog p β jβ ; α iβð1Þ ;βð2Þ ≈ hlog p βi;T jβ ; α i iβð1Þ ;βð2Þ

ðV 19Þ

i=1

ðV 14Þ This isolates terms involving α(2) leading to the expressions i " ð2Þ ð2ÞT 1 ð2Þ ð2Þ ð1Þ ð2Þ ð Þ eˆi;T Ueˆi;T hlog p βi;T j β 2 ; α i iβð1Þ ;βð2Þ = − −N log α i + α i 2 i;T # +c + tr U Σβð1Þ + diag ni;T

" r 1 X ð2Þ ð2Þ ð1Þ ð2Þ ð2Þ hlogp β jβ ; α iβð1Þ ;βð2Þ ;αð2Þ = + log bi N ψ ai 2 i=1 N X r X ð2Þ log Uj − Uj αˆi + # ð2Þ × eˆi;j + Σβð1Þ + ni;j −rN log2π j=1 i=1

i;j

i;T

ð2Þ ð2Þ ð2Þ ð2Þ log p α i = − log C ai −ai log bi ð2Þ α ð2Þ ð2Þ + ai − 1 log α i − ði2Þ bi N ð2Þ ð2Þ ð2Þ + ai − 1 log α i = log q α i 2 ð2Þ ð2ÞT ð2Þ 1 eˆi;T Ueˆi;T + tr U Σβð1Þ −α i 2 i;T 1 + ð2Þ + c + diag ni;T bi

ðV 15Þ ð2Þ ð2Þ where ɛˆ = βˆ(1) − fˆ and ξi;j Uj = trðXj Uj Xj ∑βð2;jÞ Þ, which implies i; ð2ÞT ð2Þ * ni;j = Xj Σβð2;jÞ Xj . This leads to the gamma density, shape parai;4 meters and expectation over row precisions (2)

ðV 20Þ 2 3 N 1 4X ð1Þ iβð1Þ = − logjΣβð1Þ j + rN + rN log2π5 hlogq β 2 j=1 T;j

N X r X ð2;jÞ ð2;jÞ = KL q βi;T j jp βi;T KL q βð2Þ j p βð2Þ j=1 i=1 1 1 ð2;jÞ ð2;jÞ −1 −1 = logjΣβð1Þ ½t − 1 Σβð2;jÞ j + tr Σβð1Þ ½t − 1 KL q βi;T j p βi;T i;J i;T 2 2 i;J ð2;jÞT ð2;jÞ × βˆ i;T βˆi;T + Σβð2;jÞ −Σβð1Þ ½t − 1 i;T

N ð2Þ + ai 2 ðV 16Þ 1 1 ð2Þ ð2ÞT 1 e = Ue + tr U Σ + ð1Þ + diag ni;T ˆ ˆ i;T i;T βi;T ð2Þ ð2Þ 2 bi bi ð2Þ

=

ð2Þ

αˆi

ð2Þ ð2Þ

= ai bi

Lower bound A lower bound is given by the expression (see Appendix II) Z pðY; ΘÞ qðΘÞlog F ðqÞ = qðΘÞ Θ 1 0 ðV 17Þ ð1Þ p Y; β ; βð2Þ jα qðα Þ iα = hlog@ ð1Þ ð2Þ Aiβð1Þ ;βð2Þ ;α −hlog pðα Þ q β q β

i;J

ð2;jÞ −1 ð2;jÞT 1 −1 logjΣβð1Þ ½t − 1 Σβð2;jÞ j+ βˆi;T Σβð1Þ ½t − 1 βˆi;T = i;J i;T 2 i;J −1 + tr Σβð1Þ ½t − 1 Σβð2;jÞ − nj Þ ðV 22Þ

ð2Þ ð2Þ ð2Þ ð2Þ q αi = Ga α i ; ai ; bi ai

ðV 21Þ

i;J

i;T

where 1 −1 KL Nq ‖Np = logjΣp Σq j 2 T 1 −1 + tr Σp μ q − μ p μ q − μ p + Σq − Σp 2 Nq = N x; μ q ; Σq Np = N x; μ p ; Σp ðV 23Þ and ψ(x) is the digamma function. The second term of Eq. (V-17) is hlog

nk 2 X X qðα Þ ðkÞ ðkÞ i = KL q α n ‖p α n pðα Þ α k=1 n=1

ðV 24Þ

L.M. Harrison, G.G.R. Green / NeuroImage 50 (2010) 1126–1141

where KL Gq ‖Gq =

! C aq bq ap ln bp − aq ln bq − ln + aq − ap ψ aq − ln bq − aq 1 − bp C ap Gq = Ga x; aq ; bq Gp = Ga x; ap ; bp

ðV 25Þ

Appendix B. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.neuroimage.2009.12.042. References Ashburner, J., 2007. A fast diffeomorphic image registration algorithm. NeuroImage 38, 95–113. Ashburner, J., Friston, K.J., 2005. Uniﬁed segmentation. NeuroImage 26, 839–851. Besag, J., Kooperberg, C., 1995. On conditional and intrinsic autoregressions. Biometrika 82, 733–746. Bishop, C., 2006. Pattern Recognition for Machine Learning. Springer, New York. Bullmore, E., Brammer, M., Williams, S.C., Rabe-Hesketh, S., Janot, N., David, A., Mellers, J., Howard, R., Sham, P., 1996. Statistical methods of estimation and inference for functional MR image analysis. Magn. Reson. Med. 35, 261–277. Chatﬁeld, C., 1996. The Analysis of Time Series. Chapman & Hall/CRC, London. Chung, F., 1997. Spectral Graph Theory. American mathematics society, Providence, Rhode Island. Congdon, P., 2001. Bayesian Statistical Modeling. John Wiley & Sons Ltd, Chichester. Cornford, D., Csato, L., Opper, M., 2005. Sequential, Bayesian geostatistics: a principled method for large data sets. Geogr. Anal. 37, 183–199. Cressie, N., 1993. Statistics for spatial data. Wiley. Csato, L., Opper, M., 2002. Sparse on-line Gaussian processes. Neural Comput. 14, 641–668. Descombes, X., Kruggel, F., von Cramon, D.Y., 1998. fMRI signal restoration using a spatiotemporal Markov random ﬁeld preserving transitions. NeuroImage 8, 340–349. Flandin, G., Penny, W.D., 2007. Bayesian fMRI data analysis with sparse spatial basis function priors. NeuroImage 34, 1108–1125. Friman, O., Borga, M., Lundberg, P., Knutsson, H., 2003. Adaptive analysis of fMRI data. NeuroImage 19, 837–845. Friston, K., Ashburner, J., Frith, C.D., Poline, J.B., Heather, J.D., Frackowiak, R.S., 1995a. Spatial registration and normalization of images. Hum. Brain Mapp. 3, 165–189. Friston, K., Ashburner, J., Kiebel, S., Nichols, T., Penny, W., 2006. Statistical Parametric Mapping: The Analysis of Functional Brain Images. Elsevier, London. Friston, K.J., Holmes, A.P., Worsley, K.J., Poline, J.P., Frith, C.D., Frackowiak, R.S.J., 1995b. Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2, 189–210. Ghahramani, Z., Beal, M., 2000. Graphical Models and Variational Methods. MIT Press. Gossl, C., Auer, D.P., Fahrmeir, L., 2001. Bayesian spatiotemporal inference in functional magnetic resonance imaging. Biometrics 57, 554–562. Groves, A.R., Chappell, M.A., Woolrich, M.W., 2009. Combined spatial and non-spatial prior for inference on MRI time-series. NeuroImage 45, 795–809. Harrison, L.M., Penny, W., Ashburner, J., Trujillo-Barreto, N., Friston, K.J., 2007. Diffusion-based spatial priors for imaging. NeuroImage 38, 677–695. Harrison, L.M., Penny, W., Daunizeau, J., Friston, K.J., 2008a. Diffusion-based spatial priors for functional magnetic resonance images. NeuroImage 41, 408–423. Harrison, L.M., Penny, W., Flandin, G., Ruff, C.C., Weiskopf, N., Friston, K.J., 2008b. Graphpartitioned spatial priors for functional magnetic resonance images. NeuroImage 43, 694–707.

1141

Hartvig, N.V., Jensen, J.L., 2000. Spatial mixture modeling of fMRI data. Hum. Brain Mapp. 11, 233–248. Henson, R.N., Shallice, T., Gorno-Tempini, M.L., Dolan, R.J., 2002. Face repetition effects in implicit and explicit memory tests as measured by fMRI. Cereb. Cortex 12, 178–186. Jeffreys, H., 1946. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 186, 453–461. Kiebel, S.J., Poline, J.B., Friston, K.J., Holmes, A.P., Worsley, K.J., 1999. Robust smoothness estimation in statistical parametric maps using standardized residuals from the general linear model. NeuroImage 10, 756–766. Kiebel, S.J., Goebel, R., Friston, K.J., 2000. Anatomically informed basis functions. NeuroImage 11, 656–667. Kim, H.Y., Javier, G., Cho, Z.H., 2005. Robust anisotropic diffusion to produce enhanced statistical parametric map from noisy fMRI. Comput. Vis. Image Underst. 99, 435–452. Lindgren, F., Rue, H., 2007. Explicit construction of GMRF approximations to generalised Matern ﬁelds on irregular grids. Prepr. Math. Sci. 12. MacKay, D.J.C., 2003. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge. Minka, T. (2001). Bayesian Linear Regression. Moler, C., Van Loan, C., 2003. Nineteen dubious ways to compute the exponential of a matrix, twenty-ﬁve years later. SIAM Rev. 45, 3–49. Penny, W., Kiebel, S., Friston, K., 2003. Variational Bayesian inference for fMRI time series. NeuroImage 19, 727–741. Penny, W., Flandin, G., Trujillo-Barreto, N., 2007. Bayesian comparison of spatially regularised general linear models. Hum. Brain Mapp. 28, 275–293. Penny, W.D., Trujillo-Bareto, N., Flandin, G., 2005a. Bayesian Analysis of Single-Subject fMRI Data: SPM5 Implementation. Wellcome Trust Centre for Neuroimaging, UCL, London. Penny, W.D., Trujillo-Barreto, N.J., Friston, K.J., 2005b. Bayesian fMRI time series analysis with spatial priors. NeuroImage 24, 350–362. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 2007. Numerical Recipes, 3rd edn. Cambridge University Press, Cambridge, UK. Rasmussen, C., Williams, C., 2006. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, Massachusetts. Risken, H., 1996. The Fokker–Planck Equation, 3 ed. Springer-Verlag Berlin Heidelberg, New York. Rosa, M.J., Bestmann, S., Harrison, L., Penny, W., 2010. Bayesian model selection maps for group studies. NeuroImage 49, 217–224. Rue, H., Held, L., 2005. Gaussian Markov Random Fields: Theory and Application, Vol 104. Chapman & Hall, London. Sidje, R.B., 1998. Expokit: a software package for computing matrix exponentials. Acm Trans. Math. Softw. 24, 130–156. Smith, G., 1985. Numerical Solutions of Partial Differential Equations: Finite Difference Methods. Clarendon Press, Oxford. Sole, A.F., Ngan, S.C., Sapiro, G., Hu, X., Lopez, A., 2001. Anisotropic 2-D and 3-D averaging of fMRI signals. IEEE Trans. Med. Imaging 20, 86–93. SPM (2009). Wellcome Trust Centre for Neuroimaging. (Available from: http://www. ﬁl.ion.ucl.ac.uk/spm/). Tabelow, K., Polzehl, J., Voss, H.U., Spokoiny, V., 2006. Analyzing fMRI experiments with structural adaptive smoothing procedures. NeuroImage 33, 55–62. Wink, A.M., Roerdink, J.B., 2004. Denoising functional MR images: a comparison of wavelet denoising and Gaussian smoothing. IEEE Trans. Med. Imaging 23, 374–387. Woolrich, M.W., Behrens, T.E., Smith, S.M., 2004a. Constrained linear basis sets for HRF modelling using variational Bayes. NeuroImage 21, 1748–1761. Woolrich, M.W., Jenkinson, M., Brady, J.M., Smith, S.M., 2004b. Fully Bayesian spatiotemporal modeling of FMRI data. IEEE. Trans. Med. Imaging 23, 213–231. Woolrich, M.W., Ripley, B.D., Brady, M., Smith, S.M., 2001. Temporal autocorrelation in univariate linear modeling of FMRI data. NeuroImage 14, 1370–1386. Worsley, K.J., Marrett, S., Neelin, P., Vandal, A.C., Friston, K.J., Evans, A.C., 1996. A uniﬁed statistical approach for determining signiﬁcant voxels in images of cerebral activation. Hum. Brain Mapp. 4, 58–73. Worsley, K.J., Liao, C.H., Aston, J., Petre, V., Duncan, G.H., Morales, F., Evans, A.C., 2002. A general statistical analysis for fMRI data. NeuroImage 15, 1–15.

Lihat lebih banyak...

A Bayesian spatiotemporal model for very large data sets

Descripción

Comentarios