University of Illinois at Chicago
Instructor           Prof. Stanley L. Sclove

### SOLUTIONS TO EXERCISES IN STATISTICS REVIEW

#### PART 1.   COMPUTATIONS FOR A SAMPLE

For a sample of n = 3 observations, the sum is 6 and the sum of squares is 50.
1.1. The sum of squared deviations (from the sample mean) is   ? SOLUTION:Use the computational formula for the SSD:
```SSD = SSQ - SUM2/n
= 50 - 62/3
= 50 - 12 = 38

PART 2.   CALCULATIONS FROM A SAMPLE

2.1.
A sample of three observations has
a sum of 6 and a sum of squares equal to 50.   One observation
is -3. What are the   other two observations?

[1]: X1 + X2 + X3 = 6

[2]: X12 + X22 + X32= 50

[3]: X3 = -3

Use [3] in [1] and [2]:

[1']: X1 + X2 = 9; X2 = 9 - X1

[2']: X12+ X22 = 41,

[2'']: X12 + (9-X1)2 = 41

X12 + 81 - 18 X1 + X12 = 41

2*X12 - 18*X1 -40 = 0

(X1 - 4)(X1 - 5) = 0

X1 = 4 or 5

Say X1 = 4; then X2 = 9-X1 = 5.

(Many people obtain the solution by trial-and-error.)

2.2. A sample of 10 observations has a mean of 100. The sum
of 9 of the observations is 900. What is the value of the other observation?
SOLUTION:  Sum of all ten = 10 x mean = 10 x 100 = 1000. Sum of
nine = 900
Difference = value of other observation = 100

2.3.   A sample of   n = 14 has a standard
deviation of 3.1.
What is the sum of squared deviations?
SOLUTION:
Sample variance, s2 = 3.12  = 9.61 =  SSD/(n-1) =  SSD/13; SSD =  (9.61)(13) = 124.93

2.4.    Consider the following table.

TABLE.  Distribution of Number of Magazine Subscriptions in Households

-----------------------------------------------
Number of subscriptions    0    1    2    3
Number of households      10   40   30    f
-----------------------------------------------
Find the value of    f

such
that the mean number of subscriptions per household is 2.0.     f
= ?
SOLN:  2.0 = mean = sum/number
= [10(0)+40(1)+30(2)+3f
]
/ (10+40+30+f
); f
= 60

PART 3.  PROBABILITY: EQUALLY LIKELY CASES

A custodian is asked to rank four brands (A, B, C, D) of common household
cleanser according to his preference, number 1 being the cleanser he prefers
most, and so on. Suppose the custodian really has no preference among the
four brands and hence all orders are equally likely to occur.

3.1.
What is the probability that C is first and
D is third in the ranking? C D  There are two ways to fill in the
blanks, A then ___ ___ ___ ___ B or B then A. The total number of possibilities
1st 2nd 3rd 4th is 4!, or 24. The prob. is 2/24 = 1/12 = .0833.
3.2.  What is the probability that A is ranked either second
or third? These are 2 out of 4 equally probable rankings for A; hence the
prob is 2/4, or 1/2.

PART 4. PROBABILITY: COMPOUND EVENTS

4.1.  A state highway department has contracted for the delivery
of
sand, gravel, and cement at a construction site. Due to other work commitments
and labor force problems, contracting firms cannot always deliver items
on the agreed delivery date. Based on past evidence, the probabilities
that sand, gravel, and cement will be delivered on the promised delivery
dates by the contracting firms are .3, .6 and .8, respectively. Assume
that the delivery or nondelivery of one material is independent of another.
Find the probability that all three materials will be delivered on time.

.3 x .6 x .8 = .144

PART 5. EXPECTED VALUE OF A RANDOM VARIABLE

5.1. Consider the following probability distribution of a random variable
x.
-----------------------------

v         -3    5     10

P(x = v)  .2    p   .8-p

-----------------------------

What is the value of  p  so that the expected value
of x is 5.0?
5.0 = E(x) = .2(-3) + 5p + 10(.8-p), p
= .48

PART 6.  DISCRETE DISTRIBUTIONS

6.1.   Often the values 1, 2, 3, 4 and 5 are assigned
to categories such as "Strongly Disagree," "Disagree Somewhat," "Neutral,"
"Agree Somewhat," "Strongly Agree."
This question has to do with such a
"scaling." Of all random variables taking the values 1, 2, 3, 4, 5, the
one with P(x=1) = 1/2 and P(x=5) = 1/2 has maximum standard deviation.

What is the value of this standard deviation?
SOLUTION:
Variance of a random variable = probability-weighted average of the squared deviations from the mean
= .5(1-3) + .5(5-3)2 =4;
std.dev. = 2.

PART 7.   BINOMIAL DISTRIBUTIONS
7.1.   Suppose a binomial distribution has a mean of
6 and a variance of 3. Then what are the values of the parameters n and
p of the distribution?  mean = np = 6; 3 = variance = npq = 6; q =
1/2, p = 1/2,   n = 12

PART 8. SAMPLING FROM A FINITE POPULATION

8.1.   A sample of n = 400 is to be drawn (without replacement)
from a population of  N = 2000 with a standard deviation of \$4000.
What is the standard deviation of the  sample mean?
Var(y) = [pop.var./n][(N-n)/(N-1)] = [40002/400][(2000-400)/(2000-1)]
= [40002/400](1600/1999) _ SD(y) = (4000/20)*sqrt(1600/1999) = 200*sqrt(.8004)
= 200*.895 = \$178.93

PART 9. NORMAL DISTRIBUTIONS: PERCENTILES
9.1.   What is the 95th percentile of the standard normal distribution?
95th pctile = value exceeded by only 5% of the distribution: z = 1.645
14. What is the 75th percentile of the standard normal of the standard
normal distribution?
z = 0.6745

PART 10. RANDOM SAMPLING FOR MEASURED CHARACTERISTICS;  THE NORMAL
DISTRIBUTION
In quality control, samples are selected from a production line and
various quality characteristics are measured in order to check that the
process is "in control." Suppose that a bottling process is intended to
fill bottles with, on average, 21 fluid ounces of beverage. Variation around
this mean follows the normal distribution with a standard deviation of
0.5 fluid ounces.
10.1.  If a technician samples 25 bottles (when the process
is "in control") and measures the amount of beverage in each, what is the
probability that the sample average (for the 25 bottles) will exceed 21.2
fluid ounces?
s.e. mean = 0.5/sqrt(25) = 0.1 oz.; z = (21.2-21)/0.1 = +2.00, P(x>2.00)
= .0228

PART 11. NORMAL APPROXIMATION TO THE BINOMIAL
Statistics released by the National Highway Traffic Safety Administration
and the National Safety Council show that on an average weekend night,
1 out of every 10 drivers on the road is drunk. If 400 drivers are randomly
checked next Saturday night, what is the probability that the number of
drunk drivers will be
11.1.   More than 49?
mean = np = 400(.1) = 40; variance = npq = 400(.1)(.9) = 36; SD = sqrt(36)
= 6. z = (49.5-40)/6 = 9.5/6 = 1.583, P(z > 1.583) = .057
11.2.   Exactly 40? z = (40.5-40)/6 = .5/6 = 1/12 = .0833
P(0 < z < .0833) = .033 P(-.0833 < z < .0833) = 2(.033) = .066

11.3.   Exactly 49 ?

z = (49.5-40)/6 = 9.5/6 = 1.583, P(z > 1.583) = .057
z = (48.5-40)/6 = 8.5/6 = 1.4167, P(z > 1.4167) = .078
P(exactly 49) is approx. .078-.057 = .021 .

PART 12. INTERVAL ESTIMATION OF A BINOMIAL PROBABILITY

In a random sample of n = 625 persons,
three hundred (300)   favor Paul Parrot for President.
12.1.   What is the estimate of the standard deviation
of the sampling distribution of the sample proportion ("p-hat")?
Sample proportion = 300/625 = .48 Var = pq/n, est'd by .48(.52)/625
= .000399; est of SD is sqrt of this, or .01998, very close to .02.

PART 13.  SAMPLE SIZE DETERMINATION FOR A CONFIDENCE INTERVAL

Suppose that GMAT scores have a known standard deviation of 100.
A sample of scores of UIC MBA students is to be taken to estimate the mean
in that population. Determine how large a sample is required to form a
95% confidence interval with a margin of error (half-width) of 25 points.
13.1.    The required sample size is about ?
25 = B =  1.96 SD(y) = (1.96)(100)/sqrt(n); sqrt(n) = (1.96)(100)/25 = 7.84;
n = 61.47 (use 61 or 62)

PART 14.  OBSERVED SIGNIFICANCE LEVEL: p-VALUE

14.1.   In a test of the null hypothesis that the mean
is 18 versus alternatives that the mean is greater than 18, a sample of
n = 100 observations gave a mean of 19.5 and standard deviation of 6.00.
What is the p-value (i.e., the achieved level of significance)?

z = (19.5 - 18)/0.6 = +2.5, p-value = .006

PART 15. CONFIDENCE INTERVAL FOR A DIFFERENCE BETWEEN TWO MEANS

Lifetimes of two types of batteries were compared. Summary statistics are
given in the table.
Note that the means are given in hours and the standard
deviations are given in minutes.

TABLE. Statistics from samples of battery lifetimes

n        mean         std.dev.

--------------------------------------------------

Battery A           64        7 hr          30 min

Battery B          100        6 hr          15 min

----------------------------------------------------

In what follows, do not pool the variances since,
judging from the ratio
of 2 between the two sample standard deviations, it appears that the population
variances differ. Also, use the normal distribution (rather than t) due
to the large sample sizes.
15.1.   What is the estimate of the standard deviation of
the difference between sample means?
Var(diff of uncorrelated variables) = sum of variances = Var(mean1)
+ Var(mean2) = 302/64 + 152/100 = 16.3125 SD(mean1-mean2) = sqrt(16.3125)
= 4.039 min.
15.2.   What is the 95% confidence interval for the difference
between means?  95% C.I. = (diff-B,diff+B), where diff = mean1-mean2
= 7 hr. - 6 hr.= 1 hr. = 60 min., and B = 1.96(4.039 min.) = 7.9 min.,
so 95% C.I. = (52.1, 67.9) min.

PART 16. CHI-SQUARE GOODNESS OF FIT TEST

16.1.  The numbers of accidents per day were recorded in a
plant for 100 days. The data are tabulated below. Compute the value of
chi-square for testing the hypothesis that the following data came from
the distribution (.4, .3, .2, .1) over the values, 0, 1, 2, 3 or more.

Number of accidents012more than 2
Number of days with this many accidents4525
2010

Hypothesized prob of this no. of accidents.4.3.2.1

The value of the chi-square test statistic is ?

O          45        25  20   10

E          40        30  20   10

O-E         5        -5   0    0

(O-E)2   25        25   0    0

(O-E)2/E  0.625 0.833   0    0
Value of chi-sq test statistic = .625 + .833 = 1.46

[The number of degrees of freedom is   3 and the p-value is .6915, which is much
greater than conventional cut-off values such as .10 or .05 and
as such indicates a satisfactory fit.]

PART 17. CONTINGENCY TABLES

OWN A CAR?

| Yes No |

_________

___|_______ |___

Full time | 16   1 | 17

EMPLOYMENT

Part time | 68  15 | 83

None | 50  19 | 69

____|________|_____

| 134 35 | 169

17.1.     Of all 169 people, what percentage
are employed full time and own a car?  16/169 = 9.47%  25. Of
those who are employed full time, what percentage own a car?
16/17 = 94%
17.2.    Of those who own a car, what percentage are
employed full time?
16/134 = 11.9%
17.3.    What is the number of degrees of freedom associated
with the chi-square test statistic for this table?  (r-1)(c-1) = 2
28.  Compute the value of the chi-square test statistic. (Answer:
4.586)

MTB >  chisquare c1 c2

Expected counts are printed below observed counts

C1     C2      Total

-----------------------

1        16     1     17

13.48  3.52

2        68     15     83

65.81  17.19

3        50     19     69

54.71  14.29

-----------------------

Total   134     35   169

ChiSq = 0.471 + 1.805 + 0.073 + 0.279 + 0.405 + 1.552 = 4.586

df = 2

1 cells with expected counts less than 5.0
17.4.  (continuation)  Find the corresponding p-value.
MTB > cdf 4.586;  SUBC> chisq 2. 4.5860 0.8990

MTB > let k1 = 1-.8990

MTB > print k1  K1 0.101000

PART 18.  REGRESSION
Suppose that for speeds between 5 and 95 mph the miles per gallon (G) and
speed (S) are related according to the regression equation

Y = 7.216 - 1.073 X ,

where Y = ln G and X = ln S.
18.1.   If the speed (S) is 65, then the predicted
gasoline mileage (G) is ?
pred.val. of ln G is 7.216 - 1.073 ln S = 7.216 - 1.073(4.174) = 2.737;
pred.val. of G is exp(2.737) = 15.44 mpg

Created     1995:  Feb 12

latest update       2004:  Dec 24

```