The geometry of the balanced ANOVA model (with fixed effects)

Posted on June 6, 2017 by Stéphane Laurent
Tags: maths, statistics

Most usually, the mathematical treatment of Gaussian linear models starts with the matricial writing \(Y=X\beta+\sigma G\), where \(Y\) is a random vector modelling the \(n\) response values, \(X\) is a known matrix, \(\beta\) is the vector of unknown parameters, and \(G\) has the standard normal distribution on \(\mathbb{R}^n\).

There are good reasons to use this matricial writing. However it is cleaner to treat the theory with the equivalent vector space notation \(Y = \mu + \sigma G\), where \(\mu\) is assumed to lie in a linear subspace \(W\) of \(\mathbb{R}^n\), corresponding to \(\text{Im}(X)\) in the matricial notation. For example, denoting by \(P_W\) the orthogonal projection on \(W\), the least-squares estimate \(\hat\mu\) of \(\mu\) is simply given by \(\hat\mu=P_Wy\), and \(P_W^\perp y\) is the vector of residuals, denoting by \(P^\perp_W\) the projection on the orthogonal complement of \(W\). Thus there is no need to consider \(W=\text{Im}(X)\) to derive the general principles of the theory. The balanced one-way ANOVA model, which is the topic of this article, illustrates this approach.

Standard normal distribution on a vector space

The main tool used to treat the theory of Gaussian linear models is the standard normal distribution on a linear space.

Theorem and definition

Let \(X\) be a \(\mathbb{R}^n\)-valued random vector, and \(W \subset \mathbb{R}^n\) be a linear space. Say that \(X\) has the standard normal distribution on the vector space \(W\), and then note \(X \sim SN(W)\), if it takes its values in \(W\) and its characteristic function is given by \[\mathbb{E} \textrm{e}^{i\langle w, X \rangle} = \textrm{e}^{-\frac12{\Vert w \Vert}^2} \quad \text{for all } w \in W.\] The three following assertions are equivalent (and this is easy to prove):
1. \(X \sim SN(W)\);
2. the coordinates of \(X\) in some orthonormal basis of \(W\) are i.i.d. standard normal random variables;
3. the coordinates of \(X\) in any orthonormal basis of \(W\) are i.i.d. standard normal random variables.

Of course we retrieve the standard normal distribution on \(\mathbb{R}^n\) when taking \(W=\mathbb{R}^n\).

From this definition-theorem, the so-called Cochran’s theorem is an obvious statement. More precisely, if \(U \subset W\) is a linear space, and \(Z=U^\perp \cap W\) is the orthogonal complement of \(U\) in \(W\), then the projection \(P_UX\) of \(X\) on \(U\) has the standard normal distribution on \(U\), similarly the projection \(P_ZX\) of \(X\) on \(Z\) has the standard normal distribution on \(Z\), and moreover \(P_UX\) and \(P_ZX\) are independent. This is straightforward to see from the definition-theorem of \(SN(W)\), and it is also easy to see that \({\Vert P_UX\Vert}^2 \sim \chi^2_{\dim(U)}\).

The balanced ANOVA model

The balanced ANOVA model is used to model a sample \(y=(y_{ij})\) with a tabular structure: \[ y=\begin{pmatrix} y_{11} & \ldots & y_{1J} \\ \vdots & y_{ij} & \vdots \\ y_{I1} & \ldots & y_{IJ} \end{pmatrix}, \] \(y_{ij}\) denoting the \(j\)-th measurement in group \(i\). It is assumed that the \(y_{ij}\) are independent and the population mean depends on the group index \(i\). More precisely, the \(y_{ij}\) are modelled by random variables \(Y_{ij} \sim_{\text{iid}} {\cal N}(\mu_i, \sigma^2)\).

So, how to write this model as \(Y=\mu + \sigma G\) where \(G \sim SN(\mathbb{R}^n)\) and \(\mu\) lies in a linear space \(W \subset \mathbb{R}^n\) ?

Tensor product

Here \(n=IJ\) and one could consider \(Y\) as the vector obtained by stacking the \(Y_{ij}\). For example if \(I=2\) and \(J=3\), we should write \[Y={(Y_{11}, Y_{12}, Y_{13}, Y_{21}, Y_{22}, Y_{23})}'.\]

Actually this is not a good idea to loose the tabular structure. The appropriate approach for writing the balanced ANOVA model involves the tensor product. We keep the tabular structure of the data: \[Y = \begin{pmatrix} Y_{11} & Y_{12} & Y_{13} \\ Y_{21} & Y_{22} & Y_{23} \end{pmatrix}\] and we take \[G \sim SN(\mathbb{R}^I\otimes\mathbb{R}^J)\] where the tensor poduct \(\mathbb{R}^I\otimes\mathbb{R}^J\) of \(\mathbb{R}^I\) and \(\mathbb{R}^J\) is nothing but the space of matrices with \(I\) rows and \(J\) columns. Here \[ \mu = \begin{pmatrix} \mu_1 & \mu_1 & \mu_1 \\ \mu_2 & \mu_2 & \mu_2 \end{pmatrix}, \] lies, as we will see, in a linear space \(W \subset \mathbb{R}^I\otimes\mathbb{R}^J\) which is convenient to define with the help of the tensor product \(x \otimes y\) of two vectors \(x \in \mathbb{R}^I\) and \(y \in \mathbb{R}^J\), defined as the element of \(\mathbb{R}^I\otimes\mathbb{R}^J\) given by \[ {(x \otimes y)}_{ij}=x_iy_j. \] Not all vectors of \(\mathbb{R}^I\otimes\mathbb{R}^J\) can be written \(x \otimes y\), but the vectors \(x \otimes y\) span \(\mathbb{R}^I\otimes\mathbb{R}^J\). In consistence with this notation, the tensor product \(U \otimes V\) of two vector spaces \(U \subset \mathbb{R}^I\) and \(V \subset \mathbb{R}^J\) is defined as the vector space spanned by the vectors of the form \(x \otimes y\), \(x \in U\), \(y \in V\).

\[ \mu = (\mu_1, \mu_2) \otimes (1,1,1), \] and \(\mu\) is assumed to lie in the linear space \[ W = \mathbb{R}^I \otimes [{\bf 1}_J]. \]

Moreover, there is a nice orthogonal decomposition of \(W\) corresponding to the usual other parameterization of the model: \[\boxed{\mu_i = m + \alpha_i} \quad \text{with } \sum_{i=1}^I\alpha_i=0.\] Indeed, writing \(\mathbb{R}^I=[{\bf 1}_I] \oplus {[{\bf 1}_I]}^\perp\) yields the following decomposition of \(\mu\): \[ \begin{align*} \mu = (\mu_1, \ldots, \mu_I) \otimes {\bf 1}_J & = \begin{pmatrix} m & m & m \\ m & m & m \end{pmatrix} + \begin{pmatrix} \alpha_1 & \alpha_1 & \alpha_1 \\ \alpha_2 & \alpha_2 & \alpha_2 \end{pmatrix} \\ & = \underset{\in \bigl([{\bf 1}_I]\otimes[{\bf 1}_J]\bigr)}{\underbrace{m({\bf 1}_I\otimes{\bf 1}_J)}} + \underset{\in \bigl([{\bf 1}_I]^{\perp}\otimes[{\bf 1}_J] \bigr)}{\underbrace{(\alpha_1,\ldots,\alpha_I)\otimes{\bf 1}_J}} \end{align*} \]

Least-squares estimates

With the theory introduced above, the least-squares estimates of \(m\) and the \(\alpha_i\) are given by \(\hat m({\bf 1}_I\otimes{\bf 1}_J) = P_U y\) and \(\hat\alpha\otimes{\bf 1}_J = P_Zy\) where \(U = [{\bf 1}_I]\otimes[{\bf 1}_J]\) and \(Z = {[{\bf 1}_I]}^{\perp}\otimes[{\bf 1}_J] = U^\perp \cap W\), and we also know that \(\hat m\) and the \(\hat\alpha_i\) are independent. The least-squares estimates of the \(\mu_i\) are given by \(\hat\mu_i=\hat m +\hat\alpha_i\). Deriving the expression of these estimates and their distribution is left as an exercise to the reader. As another exercise, check that \[ {\Vert P_Z \mu \Vert}^2 = J \sum_{i=1}^I {(\mu_i - \bar\mu_\bullet)}^2 = J \sum_{i=1}^I \alpha_i^2. \]