Edgeworth Expansions for \(\chi^2_2\) Random Variables

Compute empirical distribution of \(W_n = \sqrt{n}\frac{\bar{X} - \mu}{\sigma}\) where \(\bar{X}\) is mean of \(n\) \(\chi^2_2\) random variables.

## simulation parameters
N <- 5000 ## number of samples to empirical density
df <- 2
mu <- df
sigma2 <- 2*df
rho3 <- sqrt(8/df)

## distribution of normalized sums
n <- 5 ## sample size
x <- matrix(rchisq(n*N,df=2),ncol=n)
muhat <- rowMeans(x)
wn <- sqrt(n)*(muhat-mu)/sqrt(sigma2)
hist(wn,freq=FALSE,breaks=30,
     xlab="Wn",ylab="Density",main="Empirical Distribution")

Fit the 1 term (normal) and 2 term Edgeworth expansions to pdf.

EdgeApprox1 <- function(x,n){
  return(dnorm(x))
}
EdgeApprox2 <- function(x,n){
  return(dnorm(x) + dnorm(x)*(-1/6)*rho3*(2*x - (x^2-1)*hermite(x,1))/sqrt(n))
}
hist(wn,freq=FALSE,breaks=30,
     xlab="Wn",ylab="Density",main="Edgeworth Approximations")
curve(EdgeApprox1(x,n),-3,3,add=TRUE,lwd=2)
curve(EdgeApprox2(x,n),-3,3,add=TRUE,col="red",lty=2,lwd=2)
legend("topright",c("1 Term Edge. (N(0,1))","2 Term Edge."),col=c(1,2),lty=1:2,cex=0.75)

Sample Size

Since the first term in the Edgeworth expansion is order 1 (normal distribution) and the second term is \(O(n^{-1/2})\), as the sample size gets larger, the second term in the expansion makes less and less of a difference. This is equivalent to saying that for larger sample sizes, the normal approximation becomes better (i.e. CLT approximation is better). We consider the 1 and 2 term Edgeworth expansions at various sample sizes below.

## distribution of normalized sums
par(mfcol=c(1,3),mar=c(5,5,1,1))
ns <- c(5,30,100) ## sample size
for(ii in 1:length(ns)){
  x <- matrix(rchisq(ns[ii]*N,df=2),ncol=ns[ii])
  muhat <- rowMeans(x)
  wn <- sqrt(ns[ii])*(muhat-mu)/sqrt(sigma2)
  hist(wn,freq=FALSE,breaks=30,
       xlab="Wn",ylab="Density",main=paste0("n=",ns[ii]))
  curve(EdgeApprox1(x,ns[ii]),-3,3,add=TRUE,lwd=2)
  curve(EdgeApprox2(x,ns[ii]),-3,3,add=TRUE,col="red",lty=2,lwd=2)
}