STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data....

16
STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. Example: Suppose we are conducting a study about the people who have climbed Mt. Apo. Q: Who are the individuals of interest here? If possible, all the people who have actually made it to the summit of Mt. Apo. This is the population of interest here. Q: What are the characteristics of these individuals we would like to observe or measure which are possibly relevant to our study? Heigh t, Age,Weight , Gender , Nationali ty, Income , etc. These are the variables in our study.

Transcript of STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data....

Page 1: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data.

Example: Suppose we are conducting a study about the people who have climbed Mt. Apo.

Q: Who are the individuals of interest here?

If possible, all the people who have actually made it to the summit of Mt. Apo. This is the population of interest here.

Q: What are the characteristics of these individuals we would like to observe or measure which are possibly relevant to our study?

Height,

Age,Weight, Gender,

Nationality,

Income,

etc.

These are the variables in our study.

Page 2: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

BASIC IDEAS

In a statistical study, we concentrate on individuals or objects which are pertinent to its goals.

The POPULATION is the complete collection of all individuals or objects which are of interest to the study.

Since it is not always possible to use the entire population for a study (due to the expense, time, size, etc.), we only select certain portions of the population which possess the same or similar characteristics of interest to the study.

A SAMPLE is a group of individuals or objects selected from a population (to represent the large population).

Page 3: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

Example: In a study about the people who have climbed Mt. Apo,

Variable Data Values

A characteristic of individuals or objects being measured in the study is called a VARIABLE. The values of these variables are called DATA or DATA VALUES.

Height

Weight

Age

Gender

Nationality

5.5ft, 5.7ft, 5.9ft, 6ft, …

75kg, 62kg, 68kg, 70.3kg, …

22yrs, 28yrs, 29yrs, …

M (Male), F (Female)

Filipino, Chinese, …

Page 4: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

As we can see from the example, we have two kinds of variables, depending on the kind of data they yield:

QUALITATIVE VARIABLES yields data which describe, label, or categorize an element of the population. Also called ATTRIBUTE or CATEGORICAL VARIABLES.

QUANTITATIVE VARIABLES yields data which numerically measure an element of the population. Also called NUMERICAL VARIABLES.

Example: A nationwide survey of adult asks, “ how many times per week do you eat in a fast-food restaurant?”

A. What is the implied population of this survey?

B. Identify the variable.

C. Is the variable qualitative or quantitative?

Page 5: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

Quantitative variables are more interesting since they yield data which can meaningfully undergo arithmetic operations. We have two types of quantitative variables:

The DISCRETE type yield data values which are countable.Examples:

A. The number of dependents an employee has

B. The number of topics in a three-unit algebra course

C. The number of times per week a student eats in McDonald’s

The CONTINUOUS type yield data values which can lie any-where in an interval (and, therefore, not countable).

Examples:

A. The height of a mountain climber

B. The weight of a power lifter

Page 6: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

VARIABLE / DATA

QUALITATIVE QUANTITATIVE

DISCRETE CONTINUOUS

Page 7: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

Besides this classification of variables (according to the type of data they yield), we can also distinguish the scale, level, or depth of measuring variables (that is, the depth of arithmetic we can perform on their data). We have four levels of measurement:Data on the NOMINAL LEVEL consists of names, labels, or categories, where no ranking or ordering can be applied.Examples:

Gender,

Eye color,

Religion,

Nationality,

Zip code

Data on the ORDINAL LEVEL also consists of names, labels, or categories, but ranking or ordering can be applied.Examples:

Grade (A, B+, B, …, F),Performance rating (Poor, Fair, Good, Very Good,

Excellent),T-shirt size (XS, S, M, L, XL),

Qualitative variables belong to either nominal or ordinal level of measurement.

Page 8: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

Data on the INTERVAL LEVEL can be ranked or ordered, and precise differences between these values are meaningful.Examples:A. IQ scores. (50 – 200)

IQ scores can be ranked (obviously) from lowest to highest.A difference of 20 in the IQ scores of two students (say, Student 1 has an IQ score of 122 and Student 2 has 142) means: Student 2 is able to achieve more academically.

B. Temperature (ºC)

Temperature can be ranked from lowest to highest.

A difference of 23 ºC in the temperatures of two objects (say, Object 1 has a temperature of 54ºC and Object 2 has 31ºC) means: Object 1 is warmer.

Page 9: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

Data on the RATIO LEVEL can be ranked or ordered, and both differences and ratios (quotients) between these values are meaningful.

Examples:

Height can be ranked from smallest to tallest.

Suppose Object 1 is 3ft tall and Object 2 is 12ft tall. The difference, 12 – 3 = 9ft, means

We can also divide the values. The quotient, 12 / 3 = 4, means

Object 2 is taller by 9ft.

A. Height

B. Other examples are:

Object 2 is four times taller

Weight,

Age, Salary

Quantitative variables belong to either interval or ratio level of measurement.

Page 10: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

NOMINAL LEVEL

ORDINAL LEVEL

INTERVAL LEVEL

RATIO LEVEL

Qu

an

tita

tive

vari

ab

les

Qu

alit

ati

ve

vari

ab

les

Page 11: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

THE “AVERAGE”

What we call the “average” is the most popular statistical measurement we know of.

sum of scores

averageno. of scores

Example: The results of a 50-item diagnostic exam in Math 1 on a class of 30 students are as follows:

43 23 34 40 36 47

45 25 32 35 32 31

22 41 33 34 23 24

35 37 41 32 22 20

33 36 40 32 35 39

Page 12: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

A. The population is:

the class of 30 students.

B. The population average is:

33.4

C. Select randomly a sample of size 10 and find the average.

43 23 34 40 36 47

45 25 32 35 32 31

22 41 33 34 23 24

35 37 41 32 22 20

33 36 40 32 35 39

Sample ave. = 34.5

D. Again.

43 23 34 40 36 47

45 25 32 35 32 31

22 41 33 34 23 24

35 37 41 32 22 20

33 36 40 32 35 39

Sample ave. = 30.3

Page 13: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

E. And again.

43 23 34 40 36 47

45 25 32 35 32 31

22 41 33 34 23 24

35 37 41 32 22 20

33 36 40 32 35 39

Sample ave. = 36.4

Take note:

The population average is fixed at 33.4.

The sample averages vary, but close to the population average. In other words, the sample averages give a good

estimate of the population average.

This is why, in general, when the population of interest to the study is very large, we collect and study adequately selected, smaller samples instead.

Page 14: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

The “average” is one of the central ideas in Statistic. We have other statistical measures of closer importance such as the median, mode, variance, standard deviation, etc. etc. etc.

For a given population, a numerical measurement obtained from all its data (such as the “average”) is called a PARAMETER. The parameters of a population are fixed quantities, being the actual values of such measurements.

For a sample, a numerical measurement obtained from all data in the sample is called a SAMPLE STATISTIC. Sample statis-tics are variable (they vary), yet may give good estimates to the corresponding parameter.

Page 15: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

HOW DO WE RANDOMIZE SAMPLE SUBJECTS?

We use samples to represent a large population. To do this, the selection of sample subjects must be unbiased -- that is, each subject has an equally likely chance of being selected.

We have four basic methods of unbiased sampling:

In RANDOM SAMPLING, each member of the population is labeled with a number, then a computer (or calculator) is tasked to generate random numbers from the set of all these numerical labels as many as desired for the sample size.In SYSTEMATIC SAMPLING, each member is also labeled with a number and every kth member from the first random subject is selected. For example, suppose there are 2000 subjects in the population, and a sample of size 50 is desired; then k = 2000/50 = 40. The first subject is selected randomly (say member #23); and the next ones would be #63, #103, #143, and so on.

Page 16: STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE, ANALYZE, and INTERPRET a collected data. STATISTICS is about how to COLLECT, ORGANIZE, SUMMARIZE,

In STRATIFIED SAMPLING, the population is divided in groups (called strata) and samples within the strata (of any desired size) are randomly selected.

In CLUSTER SAMPLING, the population is also divided in groups (called clusters), then some of these clusters are selected and all the members of these selected clusters are used as sample subjects.

In all of these methods, a random item is involved. RANDOM does not mean a haphazard personal selection. This is why computers are employed in implementing random selections.