1 INFO & DEC SCI 460 TEXT: SCHEAFFER/MENDENHALL/OTT SAMPLE SURVEYS PROF. SCLOVE SOLUTIONS TO REVIEW QUESTIONS (Reminder: The final exam will be 3:30 - 5:20, Wednesday, May 3rd.) 1. Define the following kinds of averages: arithmetic average, root-mean-square, geometric mean. The "arithmetic average" is the usual mean, i.e., the sum over the number. The "root-mean-square" is the square root of the arithmetic average of the squares. The root-mean-square is meaningful for positive numbers. Also, the geometric mean of a set of n positive numbers is the n-th root of their product. 2. In terms of n sample measurements, define the sample mean, mean (absolute) deviation, root-mean-square deviation, and standard deviation. The root-mean-square has a divisor of n; the standard deviation, a divisor of n-1. 3. Define the population mean and variance. See NtsCh2. Also, be familiar with the definition of mean and variance of a discrete probability distribution. 4. State Markov's inequality and Tchebysheff's inequality. Markov's inequality states that for a positive random variable x, P(x>a) < E(x)/a. Taking x to be [y-E(y)]**2 gives Tchebysheff's inequality. See NtsCh2 for further details. 5. Define "simple random sample." What is the probability that a specific element (pair of elements, triplet of elements) is included in the sample? n/N; n(n-1)/[N(N-1)]; n(n-1)(n-2)/[N(N-1)(N-2)] 6. What is a Likert scale? A scale consisting of semantic ordered categories, such as "strongly agree, agree somewhat, disagree somewhat, strongly disagree." 7. If ordered categories such as Strongly Agree, Agree Somewhat, Neutral, Disagree Somewhat, Disagree Strongly are coded as 1, 2, 3, 4, 5, what is the maximum possible standard deviation? The distribution with half the responses at 1 and half at 5 gives the maximum variance, which is 4; so the max std dev is 2. 1 IDS 460: Sample Surveys (Sclove): Solutions to Review Questions p. 2 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 8. What is a summated rating scale? This is summation across several Likert-scale items. E.g., the sum across five 5-point Likert scales would have possible values between 5 and 25. 9. What is PPS sampling? "Probability Proportional to Size": an auxiliary variable x is used to suggest probabilities. 9. Express the population variance in terms of the stratum variances and means. This comes from the analysis-of-variance decomposition of N times the population variance. 10. What is the "fpc"? The "finite population correction": for the variance of the sample mean, it is (N-n)/(N-1). 11. What is the proportional allocation of observations to strata? the Neyman allocation? the optimal allocation? See NtsCh5. 12. The expected value ("E") of the sample variance is [N/(N-1)] times the population variance. Hence the sample is variance is (A) unbiased; (B) biased upwards; (C) biased downwards. The sample variance is too large, on the average: It is "biased upwards." 13. If the sample variance is biased, what then is an unbiased estimate of the population variance? [(N-1)/N] x (sample variance) 14. Define the binomial, geometric, and hypergeometric distributions and mention some ways in which they arise in connection with sample surveys. Binomial: distribution of sum of (0,1)-variables Geometric: number of trials to get first success; inverse sampling Hypergeometric: distribution of number of marked units in the second sample in the capture/recapture method. 1 IDS 460: Sample Surveys (Sclove): Solutions to Review Questions p. 3 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 15. (difference estimation) How do you construct two-sigma confidence limits for the true difference, say D, between the means of y and x? How would you use these limits to make a preliminary test to decide whether to apply the difference adjustment to the mean of x in estimating the mean of y? A preliminary test is given by testing whether the regression coefficient of y on x is 1, to decide whether to use differences in the first place. A second preliminary test is to decide whether the true mean difference is zero or not; this can be done by seeing whether zero is in the confidence interval for D. 16. Given (x,y) = (1,1), (2,1), (3,4), (4,5), find the sample regression coefficient. Sum of x = 10 Sum of y = 11 Sum of sqs of x = 1+4+9+16 = 30 Sum of x*y = 1 + 2 + 12 + 20 = 35 Sum of cross-products of deviations = 35 - 10(11)/4 = 7.5 Sum of squared deviations of x = 30 - 10**2/5 = 10 regression coeff. = 10/7.5 = 1.33 17. (regression estimation) How do you construct two-sigma confidence limits for beta, the true coefficient in the regression of y on x? How would you use these limits to make a preliminary test to decide whether to use the regression adjustment? See whether the confidence interval includes 0 or not. 18. How might you use regression estimation in the so-called "double sampling" situation, where x is a variable at time 1 and y is the same variable at a later time 2? At time 2, revisit only a sample of the original units. 19. How might you use analysis-of-variance in connection with interpenetrating subsamples to decide whether one or more subsamples should be discarded before combining the results into an overall estimate of mu ? If one subsample seems to be an outlier, relative to the others, you might consider discarding it. Recall that in an important example of "interpenetrating subsamples" the different subsample correspond to different interviewers. 1 IDS 460: Sample Surveys (Sclove): Solutions to Review Questions p. 4 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 20. (Bayes' Theorem) Sixty percent of widgets come from assembly Line 1, the other 40% from Line 2. Five percent of the widgets from Line 1 are defective; ten percent of those from Line 2 are defective. What is the probability that a defective widget came from Line 2? P(Line 2|def) = P(Line2)P(def|Line2)/ [P(Line1)P(def|Line1)+ P(Line2)P(def|Line 2)] 21. (random response design) In a lecture there were 126 students. Two questions were written down so that everyone could see them: Question 1. Have you ever smoked pot? Question 2. Is the last digit of your social security number odd? A sheet of random numbers was given to each student. Instructions were given: (i) Choose a two-digit random number. If your random number is 70 or above, answer Question 1. (ii) If your random number is 69 or below, answer Question 2. There were 62 responses of "yes" (49.2%) and 64 responses of "no" (50.8%). From this, estimate the percentage of the students who had smoked pot. Pr(Yes) = Pr(Q1)Pr(Yes|Q1)+Pr(Q2)Pr(Yes|Q2) = .3p+.7(.5) = .3p+.35 Let "p-hat" denote the estimate of p. Then .492 = "yes" proportion = .3(p-hat)+.35; .3(p-hat) = .142; p-hat = .142/.3 = .473 1995: Apr 27 SLS:ss/frevslns.ids460