Sampling Distribution of a Correlation

This program illustrates the concept of sampling distributions.  Here, you can change the population correlation (true.r) and the sample size (the values in red font in the program below).  Given these two values, the program will show you the distribution of sample correlations that you would observe over the long run if you were to take repeated samples from the population.

The mean of this sampling distribution is the expected value (i.e., the value of r that you would expect, if you were to draw one sample at random of size n from a population in which the two variables are correlated true.r).  The standard deviation of this sampling distribution is the standard error, and tells you how far off the observed correlation will be from the true correlation, on average.

Recall that in a nil hypothesis significance test, the sampling distribution is used to determine which observations would be unlikely (e.g., which observations would occur less than 5% of the time) if the true hypothesis was zero (i.e., true.r = 0).

Note: The sampling distribution in this program is based on 1000 trials, rather than an infinite number of trials.  Thus, the curves will not always be perfectly smooth.

Exercise:  Copy the program below and paste it into a script window in S-Plus or R. Alter the sample size and see how that affects the width of the distribution.  What range of sample correlations would be deemed “statistically significant” in each case?  Does this change as a function of sample size?

# Fraley August 2001

# Program for illustrating the basics of sampling distributions

true.r<- .0

n<-      100

m<-1000

cor.vec<-1:m

for(i in 1:m){

x1<-rnorm(n,m=0,sd=1)

x2<- true.r*x1 + rnorm(n,m=0,sd=sqrt(1-true.r^2))

cor.vec[i]<-cor(x1,x2)

}

par(pty="s")

plot(density(cor.vec),type="l",xlim=c(-1,1),xlab="expected r's, given N")

cat("Expected value of r", mean(cor.vec),"\n")

cat("Expected deviation from true r (i.e., standard error)", sqrt(var(cor.vec)),"\n")