Statistics / Sampling / Experiments

General Statistics

Introduction
Statistics / Sampling / Experiments

Statistics

Statistics is a way of simply making summaries either graphically or with a single or set of values called parameters of small or large data. When data are collected and organized via graphical means (charts or graphs) or summarized by some numerical value or set of values we call this descriptive statistics. When the data gathered and summarized is used to make inference about the sample or entire group in which it attempts to study, we call this inference statistics.

1. Know and identify the components or building blocks for statistical calculations.

Data, values from experiments, numerical or non numerical information and observations are inputs for many statistical calculations.

Examples: results of voting polls, height of basketball players, average rain fall in New York City are all data used to determine statistics.

2. Know the meaning and recognize examples of a population and variable value. *

A variable is a quantity capable of assuming any of a set of values. A symbol representing such a quantity is also call a variable.

Example: (1) Height of men is a variable whose possible values could range from 4 feet to 6 feet 6 inches, and represented by the symbol, H. (e.g. Height is an example of a Quantitative variable)

(2) Response to a question is a variable, the possible answers may be: YES, NO, or Not Applicable (NA), Yes, No and NA are values of the variable and since these values are non numeric, Response is an example of a Qualitative variable.

3. Know the meaning and recognize examples of qualitative and quantitative variables.

Quantitative variables are variables that indicate an amount of difference between values of the variables.

Qualitative variables can only show relative difference in values between values of the variables.

Example: (1) snowfall on day 1 is 6 inches and snowfall on day 7 is 4 inches are quantitative values of the variable snowfall in inches where the difference between day 1 and day 7 is 2 inches which is a numerical amount.

(2) The color of the sky on day 1 is blue and the color on day 12 is dark blue. There is no quantitative degree of difference from the values statement, but day 12 is relatively darker than day 1 so color of the sky is a qualitative variable.

4. Know the broad definitions between the field of Statistics and Probability.

Statistics is a branch of mathematics concern with the collection, summary, description and making inference based on data analysis.

Probability is a branch of mathematics which deals with the laws of chance or rules of uncertainty.

Example: (1) Using the result of a study to calculate the difference between the highest and lowest values (range) is statistics and also drawing a graph to show how often certain values of the variables appear.

(2) Looking at past data to gust the chance of it raining tomorrow is probability.

5. Know the distinction between the two branches of statistics: Descriptive and Inferential Statistics.

Descriptive statistics concerns itself with collecting, organizing, summarizing and describing data.

Inferential Statistics concerns itself with drawing conclusions or making judgment base on analysis of data.

Example: (1) making a study to see the percent of ethic grouping of students is an example of descriptive statistics.

(2) Gather data to make suggestions about future events is an example of inference statistics.

6. Know the meaning of a descriptive value and an inference. *

A descriptive value is a number that describes a population or a sample. A parameter is a descriptive value of a population and a statistics is a descriptive value of a sample.

An inference in statistics is an estimate of a population parameter value based on a statistics computed from a sample of the population.

Example: (1) An histogram or bar graph that shows frequencies is a estimated frequency or an inference or possible values at each value of a variable. (i.e. Inference is often a procedure for making generalization about a population based on results of a sample).

7. Know the meaning and recognize examples of a population and sample. *

A population is the entire set of members or observations which have some common observable characteristics or attributes.

A sample is a subset or subgroup or portion of a population. When the entire population is included into the sample it is called a census.

Example: (1) A survey on all 8th graders in a school is a sample of the entire student body.

(2) Opinions from every member of the school board is a census of the school board.

8. Know the meaning and recognize characteristics and examples of parameters and statistics. *

Parameters is the numerical property of a population.

Statistics is the numerical property of a sample.

Example: polling a regional group at a conference is a statistics since there are many regional groups that makes up the entire conference, but polling all faculty at a university is a parameter of one is concern only with the results for the university being polled.

9. Know the letters used to indicate parameters and statistics. *

Parameters are symbolized by Greek letters ( ) and

Statistics are symbolized by Roman letters( p, s, m, a, etc.)

Example: The standard deviation of a population is given by , and the standard deviation of a sample is given by, s.

Sampling

Sampling is the technique used to gather data for study or to examine or determine some experimental outcome. Often a sample is taken as a portion of a larger group of sample called the population in an attempt to find some statistics about the population or sample. Two important features of most sampling strategies are:

1. Randomness - equal likelihood of each item in the population being selected as part of the sample.
2. Representation - samples should be represented of the population characteristics being studied or observed.

1. Know two important characteristics about sampling.

Two main characteristics of a good sample are:

(a) A sample must be representative of the population if the sample is being used to make inference about a population. Representation suggests that the sample should resemble or similar to the characteristics of the population being measured.

(b) The sample is taken randomly. That is samples of the same size have an equal likely chance of being selected. A method of obtaining samples with equal likelihood chance of selection is called random sampling.

Example: Each ball is a numbered lottery game has an equal chance of being selected for the draw. Otherwise the game would be considered bias.

Bias - an inclination or preference, especially one that interferes with impartial judgment: prejudice

2. Know how to label samples for random drawing.

If the members of a population are not already numbered in some sequence one may label each member from 1 to n, the total number of subjects in the population. Then a random number generator with create outputs of numbers 1 to n with each output equality likely. Select each member from the population based on their numbered labeled.

Example is we have 100 candidates for winning the prize of a random drawing. We label each candidates according to table below: Then we generate random numbers between 1 and 100.

If the number 33 is drawn, then Bob will win the prize. (this is called simple random sampling)

Label 1 2 3 .. 32 33 .. 97 98 99 100

Candidate Sam Joe Kay .. Kim Bob .. Mike Jay Noel Sue

3. Know the meaning of and be recognize examples of various types of samples

I. Stratified sample - If a population is divided into subgroups and we take a random sample from each, we have a stratified sample. Each subgroup is called a strata and the members of a stratum shares similar characteristics or attributes.

Subgroups are often homogenous internally.

Example: If the racial makeup of a town is 60% Whites, 20% Blacks, 10% Native American Indians and the rest Other mixed ethnic groups. A stratified sample for 100 people could be 60 Whites, 20 Blacks , 10 Native American Indians and 10 Others.

II. Cluster Sample - Sampling some of the subgroups of a stratified sample - often selecting all members of the selected subgroups.

Subgroups are often diverse internally.

Example: In the stratified sample example above, if 30 Whites and 10 Blacks are selected we could have a cluster sample.

A subset of Cluster sampling is multistage cluster sampling - sampling is often taken a subgroup, then a subgroup of selected subgroup and then a subgroup of the subgroup and so on.

Example: Given in text - A US census may perform multistage sampling in this manner:

First a sample of some Counties

Then a sample of towns within the counties

Then a sample of streets within the town

Then a sample of household on the street

III. Systematic sampling - A sampling from a list of members of a population staring by chance at any point and then select at determined interval, n other members of the list.

Example: Survey of members of a company may label each member on a list then staring at some random arbitrary point select every 5th member on the list, where 5 is n.

IV. Samples of Convenience - Often samples not taken at random - Usually the sample is available and not chosen at random or any chance process.

Example: I may sample students taking this course to evaluate the effectiveness of online lectures.

4. Understand bias and the role it plays in sampling

Bias - is an inclination or preference, especially one that interferes with impartial judgment or introduce prejudice.

I. Nonresponse bias occurs when certain members of the population is not represented in the sample; however if one can show that the omitted group is similar to the sample, then one may minimize this bias. Nonresponse bias should be avoided.

Example: Telephone interview that omit respondent with cell phones.

II. Response Bias - A response bias occurs when either the respondents falsify results or response or the methodology for obtaining the response is flawed.

Example: Asking responding to answer Yes of No to the question, "Have you ever used an illegal drug?" Or "have you ever committed a crime?"

Design of Experiments

Experiment is any activity design to study outcomes , results, possibility of chance. The possible outcome is called the sample space. Design of experiment is a scientific approach to setup and administer experimental studies so as to form an unbiased inference about the results of the experiment.

1. Know the scientific method of experiment.

1. Problem statement - describe the problem

2. Hypothesis - Make or state an opinion about the experiment or intended expectations

3. Methods - Select appropriate methods to carry out experiment

4. Procedure - Describe methods and carry out experiment using selected methodology

5. Data / Results - Summarize data of used methodology to present results

6. Conclusion - Make conclusion from results of experiment.

2. Know the method used in statistical hypothesis testing.

1. Problem Statement - formulate problem statement

2. Make hypothesis about problem, State null hypothesis, H₀

3. Select appropriate test statistics or statistical methodology

4. Conduct experiment

5. Summarized data according to statistical methodology

6. Compare test results with hypothesis statistics

7. Draw conclusion

8. Make inference

3. Know the meaning of treatment and the types of treatment groups.

The treatment is the property or characteristic being studied.

The treatment group is the group possessing the property being studied.

The control group is the group not processing the property being studied. Every experiment should have a control group.

4. Know what is meant by confounding factor.

A confounding factor is a property or characteristic other than the treatment than can affect an experimental study.

Example: If the study is about the affect of time of day (treatment) on mood swings and the experiment does not consider time of year, which could also affects mood swing, then time of year is a confounding factor.

5. Know the various types of experiments.

Controlled experiment is an experiment in which the researcher have control over which subject should obtain the treatment.

Example: Giving a drug or not giving a drug to certain subjects has potential for a control experiment.

Observational study is a study in which the researcher does not have control over which subject gets on not get the treatment. All not controlled experiments are observational study.

Example: Observing behavior on a beach after 12 noon.

Even though both the above types of experiments may have a control group, a control experiment requires who gets or not get the treatment.

A randomized controlled experiment is a controlled experiment in which subjects are selected at random.

Example: Selecting at random passerby to taste and judge a new soft drinks, the control group gets to taste the old soft drink.

A double-blind randomized controlled experiment is an experiment in which neither the subjects nor those administrating the treatment know which subjects are assigned to the treatment group or control group.

Example: Consumer product samples wrapped in similar package and both treatment sample and control sample labeled so only identifiable by the researcher not those evaluating subject response to sample.

6. Know the different the two types of studies.

Cross-sectional study is a study in which different subjects are compared to one another at the same point in time.

Example: A study on the effect of coffee on health for coffee drinkers and non coffee drinkers by studying both groups at the same time.

Longitudinal study is a study in which subject are compared to themselves at different points in time. (comparing the same subjects over time).

Example: A study on the effects of coffee on health by studying health effects on coffee drinkers before they stopped drinking coffee and after they stopped.

Studying two different groups for the same treatment at different point in time is not longitudinal study.