Statistical Inference: Working with Confidence Interval

Content
 
Terms / Definition
Objectives
Confidence Interval
Statistical Estimates
Example of Confidence Interval
 

Terms / Definitions  Top
 

TERMS DEFINITION
Confidence interval An interval or range computed from the sample data that has a known probability (e.g. 95%) of containing the population unknown parameter
Parameter A number that describes the population , e.g. a mean or percent
Sample proportion The proportion,  p-hat of the member of a sample with certain characteristic(s)
Statistic A number that describes a sample, an estimate of the population parameter
sqrt square root
 
Chapter Objectives: Top

This chapter answers the question of how sure are we about

a certain statistics calculated from a sample of sample size n.

With some percent assurance or confidence say 95% we make a statement like:

"We are 95% sure that the true value we seek from the population is within a certain range of a statistics

we have calculated"; for example, we may say that the mean of 60% of those who will vote for the next

national leader is really 40 to 70% and that we are almost certain of this, at least we are 95% sure.
 
Confidence Intervals: Top

The confidence interval (CI): is an interval or range obtained from the sample study
that tells us that 95% of the samples will fall or be included in this interval.

CI = estimate or statistics +/- margin of error

or CI = statistics like the mean +/- 2 x standard deviation

example if the sample mean is 45% and the standard deviation is 5 % from a certain sample size of 500.

Then CI = 45 +/- 2x5 = 45 +/- 10 or 35 to 55 %

What does this mean: we are 95% confident that the true mean is between 35 and 55%

Estimates Top

Since the population parameters or some mathematical statement about a population is typically unknown

or rather than studying the entire population we take a sample for this is save on COST or TIME.

Depending on how we take our sample we will get a different estimate of the "true" population value;
therefore, the goodness of our estimate is dependent on following factors:

        1. Sample size, n: How big or large is the sample size - The larger the better for our estimate

        2. How random is our sample?: The more random or homogeneous the sample the better
                                                        we feel about our sample representing the population.

        3. What is the question we are trying to answer?: Depending on how we ask our question
                                                        will determine how we calculate our statistic.

        4. What is the error or standard deviation of our sampling? This gives us along with our sample size, n
                                                        our confidence interval.

 
Table of comparisons for x-bar
Sample statistics Population parameters
x-bar mu or m
s or standard deviation sigma or sigma/sqrt(n), i.e. the population standard deviation divided by the square root of the sample size .. in this class we will give you s.
distribution of x-bar is a normal distribution  
 
Table of comparisons for proportion, p-hat
Sample statistics Population parameters
p hat
sigmap sqrt(p(100-p)/n): 

1. convert proportion to %, e.g. p =.12 = 10% 

2. subtract p from 100 

3. multiple step 2 by p in percent 

4. divide step 3 by sample size n 

5. take the square root of step 4 

6. show answer as percent 

see examples below

the sample distribution of p-hat is approximately normal and gets closer to a normal distribution as sample size, n get larger  
 
 
Examples of Confidence Interval calculations: Top
 

Example 1: From a random study of 5, 000 household we found that the average income is
                    $40,000. If the standard deviation was calculated to be $5,000, what can we say about
                    the average income in the community that was studied?

        Give an interval that show the average income of most of the people in the community.

        Answer: x-bar or the mean income is $40,000 and s or standard deviation is $5,000

        CI = 40,000 +/- 2(5,000) = 40,000 +/- 10,000  or $30,000 to $50,000
 
        We can say that most of the household of this community make an income between $30,000 and $50,000.
 

Example 2: From a random study of 1500 adults 600 said that they fear going out at night.

        Give an interval that show the percent of adults who fear going out at night.

        Answer: p-hat or proportion of adult who fear going out at night = 600 / 1500 =0.4

        or 40% (p converted to percent)

        the sigmap is equal sqrt(p(100-p)/n)

        or sqrt((40x60)/1500)) = sqrt(1.6) = 1.265 %

        CI = 40 +/- 2(1.265) = 40 +/ 2.53 or 37.47 to 40.253

        So we are 95% certain that between 37.47 to 40.253 of most adults fear going out at night.

Example 3. From a random study of the 200 rolls of a dice it is observed that 105 times
                of the total rolls we get even numbers. What can we say about the chance or
                probability of getting even numbers from this study?

                Answer: proportion of even numbers = 105 / 200 = 0.525

                or % p -hat is 52.5%

                sigmap is equal sqrt(p(100-p)/n) = sqrt((52.5(47.5)/200) = 3.53%

                CI = 52.5 +/- 2(2.53) or 45.44 to 59.56%

                So we are 95% confident that getting even numbers with the roll of this dice is between 45.44 and 59.56 %