Yeah Right, mm mmm?
Tony 
 
Yen and the art of learning statistics without really trying

A Chance Meeting

Statistics is Fun!
Rose 

Introduction to Probability and Statistics
Terms are defined clicking on blue texts.
Worked out Examples

This is the story about Tony who met a stranger in the cafeteria, next to Bell Hall and the rest is a statistics.

Tony: "You and another person are walking down the street, the other person turned to you and said: 'Let's flip a coin to see who will buy lunch, what will it be, Heads or Tails? (if a Head, coin flipper buys, if a Tail the other person buys)'"

Rose: "I will call Heads, even though come people think that the coin, a quarter, is fair towards either heads or tails, but it favors heads because the quarter is flatter on the tail side, therefore it is biased towards head ".

Rose: Here is a question for you. "What are the possible outcomes (sample space) for flipping a coin (a quarter) and what are the probabilities of each outcome?"

Tony: "That's simple, a Head or a Tail and the probabilities are 50% or 0.50 or ½ for each."

Rose: "What about the possibility of getting neither Heads nor Tails?","I once observed a coin standing on its edge after it was tossed."
 
Note: the probability of an event, say getting a Tail when tossing a fair coin, is the number of ways or times a Tail can occur divided by the total number of possible outcomes.

Probability of A = Number of times A divided by Total number of possible outcomes.

When a coin is flipped the probability of Tails, Prob(Tail) or Pr(T) or P(T) = ½

Since the total possible outcome is Head or Tails, we get 2 for possible outcomes.

If H denotes Head and T denotes Tail, what is the probability of getting two Heads if you toss two coins? (it does not matter which coin get heads or tails )

Possible outcomes (sample space) = HH, HT, TT, so the sample space = 3 and there is 1 way of getting two heads , so Prob(2 Heads) or P(HH) = .

What is the probability of getting even numbers , P(even), when you roll a dice? (Answer, ½ , since sample space is 1, 2, 3, 4, 5, 6 and even numbers from these are: 2, 4, 6)

Rose: "Why are you so interested in probability, the science of chance?"

Tony: "I was on vacation in Atlantic City, New Jersey, and watching people play card games got me thinking about what is certain and uncertain about the research experiment I am conducting."

Note: Tony's Experiment:" I was asked to find out if the people in the ABC company knew about their department's safety policies, and this is what I found.":
 
Type of Persons Surveyed The number of people  who
knows about Safety Policies
The number of people who
don't know about Safety Policies
Regular Employees 13 10
Staff Supervisors 5 0
Managers 4 1
Part-time Employees 10 5

Rose: "Was the results from all the people in the ABC company (the population) or just a portion (the sample)?"

Tony: "It was about ¼ of the total company which would make it a sample and since I surveyed the same proportions of employees, staff and managers as there were in the company, it is a stratified sample." If it was the entire population, the data collected would be a census.

"Since its is also a sample, and I am asked to use the information gathered from it to draw some conclusions by summarizing and describing the data collected (descriptive statistics) and later use this summary to make predictions or judgment about the population (inference), I will have to generate a statistical parameters and graphs to help describe how many of each type of company personnel knows about their safety policies."

Rose: "It seems like the Types of People you surveyed is a qualitative characteristics (qualitative variable) with attributes or values as: Employee, Staff, and Managers and Safety Policy Responses are also qualitative, with attributes such as Yes, Know about safety policy or No, don't know about safety policy."

"Can you give me two examples of a quantitative variable (one that can be represented by a numerical value) and the  varieties or possible values of each?"

For example, Time is a quantitative variable and 1 ¼, 12 ½, 17 ¼ (or 1:25 am, 12:30 p.m. and 5:30 p.m.) are some possible values of Time (can you think of 2 more examples?).

Rose: "Was the sample or each person surveyed selected randomly? That is, all members of the company (population) had an equal likely chance of being selected?"

Tony: "No it was not a random sample for I took a sample of convenience; however, it was representative of the company or characteristics of it in terms of equal proportions of all the types of members. I understand that a good sample needs to be both (1) random and (2) representative or reflects the population characteristics. How could I have selected my sample or collected my responses randomly?

Rose: "To select a random sample from the company, you must be able to survey any member of the company (randomness) from a list of all members:

"If sample is homogeneous (not stratified i.e. you do not care about the types of members who responded) you would select a random sample by following these steps:"

Step 1. Label each member 1 to N, where N is the population size and n is the sample size. If the company size is 200, N = 200 and you want to select a sample of n = ¼ N = ¼ x 200 then n= 50.

Example:
 
Label Random Number (start column 5, row 1) Members Name
37 37 John Boxer
25 25 Mary Smith
--   --
n etc. Mita Black

Step 2. Generate a set of n, random number or select n numbers from a table of random number between 1 and N.

Random Numbers Table or Random Number Generator
(a more powerful program can generate as many random number you wish between 1 and that quantity needed, n - it is called in this course
n-random number generator, try it!)

So we need to use the random table to select 50 numbers between 1 and 199 or 0 and 200 (or you need 3 digits, xxx).

One way to use the table to select 50 numbers from numbers 1 to 199, is to use the first 3 digits after the decimal point and starting at any point in the table move in an orderly manner until you have selected 50 numbers. Ignore number from the 3 digits in the table that is less than 1 and greater than 199.

Example starting from the 2nd column first row (0.708988) and moving downward from each consecutive columns, acceptable number are highlighted in blue:
 
Table value 0.708 0.28 0.928 0.815 0.952 0.191 0.708 0.866 0.129 0.674 0.867 0.027
Random

(3 digits)

708 280 928 815 952 191 708 866 129 674 867 27
Selected 

Label

          191     129     27

So the first 3 persons out of 50 that would be giving the survey are the members whose labels corresponded to 191, 129, and 27

Step 3. Continue step 2 until you have selected 50 numbers between 1 and 199 (note that 1 + 199 = 200)

So the first 16 persons out of 50 that would be given the survey are the members whose label corresponded to 191, 129, 27, 35, 51, 3, 199, 80, 39, 92, 111, 85, 141, 55 and 21

Note: if a number appears more than once discard it and do not round digits.
 

Problem Example (a): How would you select 5 random numbers from 1 to 1,500? (Hint, Use 4 digits).

Solution:, n = 5, N = 1,500
Random Number
Between 1 and N
(n such numbers)
(1)
Sample Label
 
 

same as (1)

Sample Name or Description
Sample Response(s)
0.0947
947
John Boxer
Yes
0.0805
805
Mary Smith
No
0.1207
1207
Jose' Lopez
Yes
0.1291
1291
Ysen, Wu
Yes
0.0273
273
Josephine Williams
No

(b) How would you select 50 random samples from among Tony's stratified groups?

(Hint, Divide the 50 among the proportion or percent of the types of members in the company and label and select each group's number separately, e.g. If managers represents 10% of the company, then we need to survey 5 of 20 managers - since 10% of 200 (company members) is 20 and 10% of sample of 50 is 5). Label all 20 managers 1 to 19 and select the first 5 random number between 1 and 19 (note 1 + 19 = 20).

Use Programs:

Tony: "If I had only sampled managers and part-time employee, that is, a select number of types of members instead of all types, I would have taken a cluster sample. Would my sample be biased or unbiased?"

Rose: "Bias is an inclination or preference, especially one that interferes with impartial judgment or introduce prejudice, did you do any of these?"

Tony: "Not on purpose, but if I had omitted Staff I may have introduced a nonresponse bias, since all Member Types (such as Staff) was not included in my survey.

Rose: "Did you follow appropriate experimental procedures in conducting your experiment?

Did you apply any treatment, where a control group was used or was this experiment observational or a control experiment?

Tony: "mm-mm?, I only wanted to observed with a survey so my experiment was an observational study and since no treatment was given, that is I did not do anything deliberate to control each member's response." "There was no control group in my experiment."

"However, since almost ½ the Staff were on vacation, maybe there is a factor, the effects of vacation season on my survey results, I have not yet considered for this experiment (confounding factor).

Rose: "Another type of experiment is a double-blind randomized controlled experiment , an experiment in which neither the subjects nor those administrating the treatment know which subjects are assigned to the treatment group or control group and subjects are selected randomly."

Tony: "I have heard about such experiment, usually in the testing of the effects of drugs on the treatment of certain ailments" "Here drugs are labeled in such a manner that only the researchers knew their true description and are given to subjects without illness (control group) and those with the illness (the treatment group), but neither the subjects nor the folks giving out the drugs know which labeled container contain the real medicine."

Rose: "I understood from reading your experiment that they wanted you to do a follow up survey of the same company after the company has had an opportunity to reeducate members on Safety Policies next April."

"This would be a longitudinal study, since you are studying the same group but at different times." "If the company educated some groups and not others on safety do you see possibilities for control groups and treatment groups since the treatment is educating or not educating on safety Policies?"

"If you study, say, two groups at the same time: (a) company members who have been educated on Safety Policies and (b) those who have not, could this study be a cross-sectional study?

Tony: "You seem to know a lot about Statistics and Probability, all I want is for someone like yourself to guide me so that I am able to do my Ph.D. research with understanding and avoid the tedious drudgery of computations, if I want details, I guest, I have to study a detailed Learning Module on General Statistics and Probability for Research."

Rose: "I will be your mentor, but you can only learn by doing with understanding"

"I will provide you with all necessary tools for statistical computations and will walk you through each concept of statistical description, summary or inference you must undertake; however, to maximize this experience it would be helpful if you have Microsoft Internet Browser 4.01 or above or some knowledge of Excel or able to workout computations using a worksheet with examples like the one below. We will have to meet weekly and you must work several problems weekly to reinforce your learning experiences."

"Mathematics, especially statistics is one of those discipline that is best learned in sequence. A good sequence is one that also follows the typical way research data is collected, summarized, evaluated and in which conclusions and inference are performed."

Tony: "We will use problems from my various research projects as examples".

"I am Tony, what is your name?"

Rose: "Rose, but my friends call me Rosey."



Programs:

Random Number Generator:

(1)  Web

(2)  Interactive Web (IE) or What version of IE are you using?  - Press Help then About Internet Explorer and it will show you the version.

(3)   Excel  (Excel) or

(4) Worksheet.