# The chi-square approximation of Pearson's statistic

Our goal is to derive the asymptotic distribution of Pearson’s statistic for goodness-of-fit testing. We follow the method given in David Williams’s book *Weighing the odds*, but we provide a bit more details.

Let \(Y_1\), \(\ldots\), \(Y_n\) be independent and identically distibuted random variables in \(\{1, \ldots, b\}\) whose common distribution is given by \[ \Pr(Y_m = k) = p_k \] with \(\sum_{k=1}^b p_k = 1\).

Denoting by \(N_k\) the number of \(Y_m\) equal to \(k\), and setting \[
W_k = \frac{N_k - np_k}{\sqrt{np_k}},
\] the *Pearson statistic* is \(\sum_{k=1}^b W_k^2\). To derive its asymptotic distribution, we firstly derive the one of the random vector \({(W_1, \ldots, W_k)}'\).

Define the random variables \[ X_k^{(m)} = \begin{cases} 1 & \text{if } Y_m = k \\ 0 & \text{if } Y_m \neq k \end{cases}, \] so that \[ N_k = X_k^{(1)} + \cdots + X_k^{(n)}. \]

For any \(m_1\), \(m_2\), the random vectors \((X_1^{(m_1)}, \ldots, X_b^{(m_1)})\) and \((X_1^{(m_2)}, \ldots, X_b^{(m_2)})\) have the same distribution. It is easy to see that \[ \mathbb{E}X_k^{(m)} = p_k, \quad \mathbb{E}X_k^{(m)}X_\ell^{(m)} = \begin{cases} p_k & \text{if } k = \ell \\ 0 & \text{if } k \neq \ell \end{cases}, \] leading to \[ V_{k\ell} := \text{Cov}\bigl(X_k^{(m)},X_\ell^{(m)}\bigr) = \begin{cases} p_k(1-p_k) & \text{if } k = \ell \\ -p_k p_\ell & \text{if } k \neq \ell \end{cases}. \]

One can write \[ \begin{pmatrix} N_1 \\ \vdots \\ N_b \end{pmatrix} = \begin{pmatrix} X_1^{(1)} \\ \vdots \\ X_b^{(1)} \end{pmatrix} + \cdots + \begin{pmatrix} X_1^{(n)} \\ \vdots \\ X_b^{(n)} \end{pmatrix}. \] and this is a sum of independent and identically distributed random vectors. Therefore we have, for large \(n\), \[ \begin{pmatrix} N_1 \\ \vdots \\ N_b \end{pmatrix} \approx \mathcal{M}\mathcal{N}\left(\begin{pmatrix} n p_1 \\ \vdots \\ n p_b \end{pmatrix}, n V\right) \] by the multivariate central limit theorem.

Recall that \[ W_k = \frac{N_k - np_k}{\sqrt{np_k}}. \] Then one has \[ \begin{pmatrix} W_1 \\ \vdots \\ W_b \end{pmatrix} \approx \mathcal{M}\mathcal{N}\left(\begin{pmatrix} 0 \\ \vdots \\ 0 \end{pmatrix}, C\right) \] where \[ C_{k\ell} = \begin{cases} 1-p_k & \text{if } k = \ell \\ -\sqrt{p_k p_\ell} & \text{if } k \neq \ell \end{cases}. \]

Now we are going to derive the characteristic function of \(\mathcal{M}\mathcal{N}(\mathbf{0}, C)\).

The covariance matrix \(C\) is not a strictly positive definite matrix since \(\sum_{k=1}^b \sqrt{np_k} W_k = 0\). But let us firstly give an expression of \(\mathbb{E}\mathbf{e}^{-\alpha \sum_{k=1}^b W_k^2}\) when \({(W_1, \ldots, W_b)}' \sim \mathcal{M}\mathcal{N}(\mathbf{0}, C)\) in the case of a strictly positive definite covariance matrix \(C\). Then we will argue that this expression still holds for our \(C\). In the strictly positive case, by using the *pdf* of the multivariate normal distribuion, we get \[\begin{align}
\mathbb{E}\mathbf{e}^{-\alpha \sum_{k=1}^b W_k^2} & =
\frac{1}{{(2\pi)}^\frac{b}{2}\sqrt{\det(C)}}
\int_{\mathbb{R}^b} \mathbf{e}^{-\frac12\mathbf{w}'
(C^{-1} + 2 \alpha I)
\mathbf{w}}\mathrm{d}\mathbf{w} \\
& = \frac{{\bigl(\det(C^{-1} + 2\alpha I)\bigr)}^{-\frac12}}{\sqrt{\det(C)}} \\
& = {\bigl(\det(I+2\alpha C)\bigr)}^{-\frac12}.
\end{align}\] Now, by a continuity argument, this equality holds when \(C\) is non-negative definite. Let us detail this point. Let \(\mathbf{W} = {(W_1, \ldots, W_b)}' \sim \mathcal{M}\mathcal{N}(\mathbf{0}, C)\) where \(C\) is non-negative and not strictly positive. Let \(\mathbf{G} = {(G_1, \ldots, G_b)}'\) be a standard normal distribution on \(\mathbb{R}^b\) and let \(\epsilon > 0\). Then \[
\mathbf{W} + \sqrt{\epsilon}\mathbf{G} \sim
\mathcal{M}\mathcal{N}(\mathbf{0}, C + \epsilon I),
\] and since \(C + \epsilon I\) is strictly positive, we know by the previous result that \[
\mathbb{E}\mathbf{e}^{-\alpha \sum_{k=1}^b {(W_k + \sqrt{\epsilon}G_k)}^2}
= {\Bigl(\det\bigl(I+2\alpha (C+\sqrt{\epsilon}I)\bigr)\Bigr)}^{-\frac12},
\] and we get the announced result by letting \(\epsilon \to 0\).

Thus we have to derive \(\det(I+2\alpha C)\) now. Observe that \[ I-C = \mathbf{u} \mathbf{u}' \] where \(\mathbf{u} = {\bigl(\sqrt{p_1}, \ldots, \sqrt{p_b}\bigr)}'\). Since \(\mathbf{u}' \mathbf{u} = 1\), \[ I-C = \mathbf{u} {\bigl(\mathbf{u}'\mathbf{u})}^{-1} \mathbf{u}' \] is the matrix of the orthogonal projection on the line directed by \(\mathbf{u}\), therefore \(C\) is the matrix of the orthogonal projection on \({[\mathbf{u}]}^\perp\).

Thus \(C\) has one eigenvalue \(0\), with associated eigenvector \(\mathbf{u}\), and \(b-1\) eigenvalues equal to \(1\). Consequently, for every real number \(\alpha\), the matrix \(I + 2\alpha C\) has one eigenvalue \(1\) and \(b-1\) others \(1+2\alpha\). Therefore \[ \det(I+2\alpha C) = {(1+2\alpha)}^{b-1}. \] We finally get \[ \mathbb{E}\mathbf{e}^{-\alpha \sum_{k=1}^b W_k^2} \approx {(1+2\alpha)}^{-\frac{b-1}{2}}, \] and we recognize the characteristic function of the \(\chi^2_{b-1}\) distribution.