
Comparing more than 2 population means 
Analysis of Variance (ANOVA)
is a statistical test used to determine if more than two population means
are equal.
The test uses the Fdistribution (probability
distribution) function and information about the variances of each population
(within)
and grouping of populations (between) to help
decide if variability between and within each populations are significantly
different.
So the method of ANOVA test the hypotheses that:
H_{0}:
or
H_{a}: Not all the means are equal 
1. Know the purpose of the analysis of variance test.
The analysis of variance (ANOVA) test statistics is used to test if more than 2 population means are equal.
2. Know the difference between the withinsample
estimate of the variance and the betweensample estimate of the variance
and how to calculate them.
When comparing two or more populations there are several ways to estimate the variance.
The withinsample or treatment
variance
or variation is the average of the all the variances for each population
and is an estimate of
whether the null hypothesis, H_{0} is true or not.
, for j = 1 to k, where k is the number of samples or populations.
Example: Given 5 populations with sample variances: 2.0, 2.2, 2.35, 2.41 and 2.45, the within standard deviation is:
The withinsample variance is often called the unexplained variation.
The betweensample variance
or error is the average of the square variations of each population mean
from the mean
or all the data (Grand Mean, )
and is a estimate of
only if the null hypothesis, H_{0} is true.
When the null hypothesis is false this variance is relatively large
and by comparing it with the withinsample variance
we can tell statistically whether H_{0} is true or not.
The betweensample variance is associated with the explained variation of our experiment.
Example: Given the means of 5 samples: 210, 212, 213, 214 and 214,
Then another estimate of the sample variance of the means is:
for
j = 1 to k, where k is the number of samples or populations.
Sample  Sample mean
(1) () 
Mean  Grand
Mean (2)
() 
Sum of Square
()^{2} (3) 
1  210  1  6.76 
2  212  2  0.36 
3  213  3  0.16 
4  214  4  1.96 
5  214  5  1.96 
Average (1) =212.6  Total (3) =11.2 
Grand Mean, = 212.6, k = 5, and s^{2}_{B} (between) = ()^{2} / (k1) = 11.2 / 4 = 2.8
When the null hypothesis, H_{0} is true the withinsample
variance and the betweensample variance will be about the same;
however, if the betweensample variance is much larger than the within,
we would reject H_{0}.
If the data from both examples above are from the same 5 samples or
populations then a ratio of both estimates of the variance
would give the following:
This ratio has a Fdistribution.
3. Know the properties of an F Distribution.
There is an infinite number of FDistribution based on the combination
of alpha significance level,
the degree of freedom (df_{1})
of the withinsample variance and the degree of freedom (df_{1})
of the betweensample variance.
The FDistribution is the ratio of the betweensample estimate of and the withinsample estimate:
If there are k number of population and n number of data
values of the all the sample,
then the degree of freedom of the withinsample variance, df_{1}
= k 1 and
the degrees of freedom of the betweensample variance is given has
df_{2}
= n  k.
The graph of an F probability distribution starts a 0 and extends indefinitely
to
the right.
It is skewed to the right similar to the graph shown below.
FDistribution Graphs: 
4. Know how sum of squares relate to Analysis of Variance.
Remember back in Chapter
3 (Regression) we introduced the concept that the total
sum of squares
is equal to the sum of the explained and unexplained variation; this
section is an extension of that discussion.
Total Variation = Explained Variation + Unexplained Variation. 
The sum of squares for the betweensample
variation is either given by the symbol SSB (sum of squares between)
or SSTR (sum of squares for treatments) and is the explained
variation.
To calculate SSB or SSTR, we sum the squared deviations of the
sample treatment means from the grand mean
and multiply by the number of observations for each sample.
The sum of squares for the withinsamplevariation
is either given by the symbol SSW (sum of square within)
or SSE (sum of square for error).
To calculate the SSW we first obtained the sum of squares for each sample and then sum them.
The Total Sum of Squares, SSTO = SSB + SSW
The betweensample variance, ,
where k is the number of samples or treatment and is often
called the Mean Square Between, ,
The withinsample variance, ,
where n is the total number of observations in all the samples is
often
called the Mean Square Within or Mean Square Error, ,
Examples Compute the SSB, SSE
and SSTO for the following samples:
Rows, i  Treatment of samples j ( 1 to k), k = 3  
Sample 1  Sample 2  Sample 3  
1  77  83  80 
2  79  91  82 
3  87  94  86 
4  85  88  85 
5  78  85  80 
Mean  81.2  88.2  82.6 
Grand Mean, 

Rows, i  Treatment of samples j ( 1 to k), k = 3  
SS (Column 1)  SS (Column 2)  SS (Column 3)  
1  (7781.2)^{2} = 17.64  (8388.2)^{2} = 27.04  (8082.6)^{2} = 6.76 
2  (7981.2)^{2} = 4.84  (9188.2)^{2} = 7.84  (8282.6)^{2} = 0.36 
3  (8781.2)^{2} = 33.64  (9488.2)^{2} = 33.64  (8682.6)^{2} = 11.56 
4  (8581.2)^{2} = 14.44  (8888.2)^{2} = 0.04  (8582.6)^{2} = 5.76 
5  (7881.2)^{2} = 10.24  (8588.2)^{2} = 10.24  (8082.6)^{2} = 6.76 
SS_{j} (Col)  80.8  78.8  31.2 
Sample size, n_{j}  5  5  5 
SSW = SSE = = 80.8 + 78.8 + 31.2 = 190.8
SSB = SSTR = = 5[(81.2  84)^{2 }+ (88.2  84)^{2} + (82.6  84)^{2} ] =137.2
MSB = and MSW = MSE =
F =
5. Know how to construct an ANOVA Table.
The various statistics computed from the analysis of variance above
can be summarized in an ANOVA Table
as shown below: These summaries are then used to draw inference about
the various samples or treatments of which we are studying.
ANOVA Table  Basic layout:
Source  Sum of Squares
(SS) 
Degree of
Freedom
(df) 
Mean Square  FStatistics  Pvalue 
Between Samples
(Explained) 
SSB  k1  MSB=  F=  Value from Table 
Within
Samples (Unexplained) 
SSE  nk  MSE=  
Total  SSTO  n1 
The ANOVA table is easily constructed from the ANOVA
program by entering each observations for each sample in
appropriate columns and deleting any division by 0 in selected regions
of the program.
For the data above the ANOVA table is:
6. Know how to interpret the data in the ANOVA table against the null hypothesis.
The ANOVA table program computes the necessary statistics for evaluating
the null hypothesis that
the means are equal: H_{0}: .
Use the degrees of freedom and an alpha significance level to obtain
the expected FDistribution statistics
from the lookup
table or from the ANOVA
program.
Acceptance Criteria for the Null Hypothesis:
If the Fstatistics computed in the ANOVA table is less than the Ftable
statistics or
the Pvalue if greater than the alpha level of significance, then there
is not reason to reject the null hypothesis
that all the means are the same:
That is accept H_{0} if: FStatistics < Ftable or Pvalue > alpha. 
For the example above we would reject the null
hypothesis at the 5% significance level and conclude
that the means are not equal since FStat > FTable.
7. Know the procedure for testing the null hypothesis that the mean for more than two populations are equal.
Step 1  Formulate Hypotheses:
H_{0}: and
H_{a}: Not all the means are equal
Step 2. Select the FStatistics Test for equality of more than two means
Step 3. Obtain or decide on a significance level for alpha, say
Step 4. Compute the test statistics from the ANOVA table.
Step 5. Identify the critical Region: The region of rejection of H_{0} is obtained from the Ftable with alpha and degrees of freedom (k1, nk).
Step 6. Make a decision:
That is accept H_{0} if: FStatistics < Ftable or Pvalue > alpha.
Worksheet
for Computing Analysis of Variance
Statistics:
Rows, i  Treatment of samples j ( 1 to k  
Sum of Square
j=1 
Sum of Square
j=2 
Sum of Square
j=3 
Sum of Square
j=4 
Sum of Square
j=5 

1  
2  
3  
4  
5  
6  
7  
8  
9  
10  
11  
12  
13  
14  
SS_{j} (Column Total)  
Sample size, n_{j}  
SSW = SSE = =  
SSB = SSTR = = 
Source  Sum of Squares
(SS) 
Degree of
Freedom
(df) 
Mean Square  FStatistics  Pvalue / 
Between Samples
(Explained) 
SSB=  k1=  MSB==  F==  
Within
Samples (Unexplained) 
SSE=  nk=  MSE==  
Total  SSTO=  n1= 