STA 291 Spring 2010

24
STA 291 Spring 2010 Lecture 1 Dustin Lueker

description

STA 291 Spring 2010. Lecture 1 Dustin Lueker. Topics. Statistical terminology Descriptive statistics Probability and distribution functions Inferential statistics Estimation (confidence intervals) Hypothesis testing Simple linear regression and correlation. Why study Statistics?. - PowerPoint PPT Presentation

Transcript of STA 291 Spring 2010

Page 1: STA 291 Spring 2010

STA 291Spring 2010

Lecture 1Dustin Lueker

Page 2: STA 291 Spring 2010

Statistical terminology Descriptive statistics Probability and distribution functions Inferential statistics

◦ Estimation (confidence intervals)◦ Hypothesis testing

Simple linear regression and correlation

Topics

STA 291 Spring 2010 Lecture 1

Page 3: STA 291 Spring 2010

Research in all fields is becoming more quantitative◦ Research journals◦ Most graduates will need to be familiar with basic

statistical methodology and terminology Newspapers, advertising, surveys, etc.

◦ Many statements contain statistical arguments Computers make complex statistical

methods easier to use

Why study Statistics?

STA 291 Spring 2010 Lecture 1

Page 4: STA 291 Spring 2010

Many times statistics are used in an incorrect and misleading manner

Purposely misused◦ Companies/people wanting to further their

agenda Cooking the data

Completely making up data Massaging the numbers

Altering values to get desired result Incidentally misused

◦ Using inappropriate methods Vital to understand a method before using it

Lies, Damn Lies, and Statistics

STA 291 Spring 2010 Lecture 1

Page 5: STA 291 Spring 2010

Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data

Applicable to a wide variety of academic disciplines◦ Physical sciences◦ Social sciences◦ Humanities

Statistics are used for making informed decisions◦ Business◦ Government

What is Statistics?

STA 291 Spring 2010 Lecture 1

Page 6: STA 291 Spring 2010

Design •Planning research studies•How to best obtain the required data•Assuring that our data is representational of the entire population

Description •Summarizing data•Exploring patterns in the data•Extract/condense information

Inference •Make predictions based on the data•‘Infer’ from sample to population•Summarize results

General Statistical Methodology

STA 291 Spring 2010 Lecture 1

Page 7: STA 291 Spring 2010

Population◦ Total set of all subjects of interest

Entire group of people, animals, products, etc. about which we want information

Elementary Unit◦ Any individual member of the population

Sample◦ Subset of the population from which the study

actually collects information◦ Used to draw conclusions about the whole

population

Basic Terminology

STA 291 Spring 2010 Lecture 1

Page 8: STA 291 Spring 2010

Variable◦ A characteristic of a unit that can vary among

subjects in the population/sample Ex: gender, nationality, age, income, hair color, height,

disease status, state of residence, grade in STA 291 Parameter

◦ Numerical characteristic of the population Calculated using the whole population

Statistic◦ Numerical characteristic of the sample

Calculated using the sample

Basic Terminology

STA 291 Spring 2010 Lecture 1

Page 9: STA 291 Spring 2010

Why take a sample? Why not take a census? Why not measure all of the units in the population?◦ Accuracy

May not be able to find every unit in the population◦ Time

Speed of response from units◦ Money◦ Infinite Population◦ Destructive Sampling or Testing

Data Collection and Sampling Theory

STA 291 Spring 2010 Lecture 1

Page 10: STA 291 Spring 2010

University Health Services at UK conducts a survey about alcohol abuse among students◦ 200 of the students are sampled and asked to

complete a questionnaire◦ One question is “have you regretted something

you did while drinking?” What is the population? What is the sample?

Example

STA 291 Spring 2010 Lecture 1

Page 11: STA 291 Spring 2010

‘Flavors’ of Statistics Descriptive Statistics

◦ Summarizing the information in a collection of data

Inferential Statistics◦ Using information from a sample to make

conclusions/predictions about the population Ex: using a sample statistic to estimate a population

parameter

STA 291 Spring 2010 Lecture 1

Page 12: STA 291 Spring 2010

Example The Current Population Survey of about 60,000

households in the United States in 2002 distinguishes three types of families: Married-couple (MC), Female householder and no husband (FH), Male householder and no wife (MH)

It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level◦ Are these numbers statistics or parameters?

The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5%◦ Is this an example of descriptive or inferential statistics?

STA 291 Spring 2010 Lecture 1

Page 13: STA 291 Spring 2010

Univariate vs. Multivariate Univariate data

◦ Consists of observations on a single attribute Multivariate data

◦ Consists of observations on several attributes Special case

Bivariate Data Consists of observations on two attributes

STA 291 Spring 2010 Lecture 1

Page 14: STA 291 Spring 2010

Quantitative or Numerical◦ Variable with numerical values associated with

them Qualitative or Categorical

◦ Variables without numerical values associated with them

Scales of Measurement

STA 291 Spring 2010 Lecture 1

Page 15: STA 291 Spring 2010

Ordinal◦ Disease status, company rating, grade in STA 291

Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than

another unit Nominal

◦ Gender, nationality, hair color, state of residence Nominal variables have a scale of unordered

categories It does not make sense to say, for example, that green

hair is greater/higher/better than orange hair

Qualitative Variables

STA 291 Spring 2010 Lecture 1

Page 16: STA 291 Spring 2010

Quantitative◦ Age, income, height

Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval

scale

Quantitative Variables

STA 291 Spring 2010 Lecture 1

Page 17: STA 291 Spring 2010

A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following◦ Nominal (Qualitative): Requires assistance from staff?

Yes No

◦ Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque Abundant plaque

◦ Interval (Quantitative): Number of teeth

Example

STA 291 Spring 2010 Lecture 1

Page 18: STA 291 Spring 2010

A birth registry database collects the following information on newborns◦ Birth weight: in grams◦ Infant’s Condition:

Excellent Good Fair Poor

◦ Number of prenatal visits◦ Ethnic background:

African-American Caucasian Hispanic Native American Other

What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal)

Example

STA 291 Spring 2010 Lecture 1

Page 19: STA 291 Spring 2010

Statistical methods vary for quantitative and qualitative variables

Methods for quantitative data cannot be used to analyze qualitative data

Quantitative variables can be treated in a less quantitative manner◦ Height: measured in cm/in

Interval (Quantitative) Can be treated at Qualitative

Ordinal: Short Average Tall

Nominal: <60in or >72in 60in-72in

Importance of Different Types of Data

STA 291 Spring 2010 Lecture 1

Page 20: STA 291 Spring 2010

Try to measure variables as detailed as possible◦ Quantitative

More detailed data can be analyzed in further depth

◦ Caution: Sometimes ordinal variables are treated as quantitative (ex: GPA)

Other Notes on Variable Types

STA 291 Spring 2010 Lecture 1

Page 21: STA 291 Spring 2010

A variable is discrete if it can take on a finite number of values◦ Gender◦ Nationality◦ Hair color◦ Disease status◦ Grade in STA 291◦ Favorite MLB team

Qualitative variables are discrete

Discrete Variables

STA 291 Spring 2010 Lecture 1

Page 22: STA 291 Spring 2010

Continuous variables can take an infinite continuum of possible real number values◦ Time spent studying for STA 291 per day

43 minutes 2 minutes 27.487 minutes 27.48682 minutes

Can be subdivided into more accurate values Therefore continuous

Continuous Variables

STA 291 Spring 2010 Lecture 1

Page 23: STA 291 Spring 2010

Number of children in a family Distance a car travels on a tank of gas % grade on an exam

Examples

STA 291 Spring 2010 Lecture 1

Page 24: STA 291 Spring 2010

Quantitative variables can be discrete or continuous

Age, income, height?◦ Depends on the scale

Age is potentially continuous, but usually measured in years (discrete)

Discrete or Continuous

STA 291 Spring 2010 Lecture 1