Important Ideas in Data Analysis for PreK-12 Students, Teachers, and Teacher Educators Denise S....

67
Important Ideas in Data Analysis for PreK-12 Students, Teachers, and Teacher Educators Denise S. Mewborn University of Georgia

Transcript of Important Ideas in Data Analysis for PreK-12 Students, Teachers, and Teacher Educators Denise S....

Important Ideas in Data Analysis for PreK-12 Students, Teachers, and

Teacher Educators

Denise S. Mewborn

University of Georgia

Data analysis/statistics…

• helps us answer questions.

• helps us make better decisions.

• helps us describe and understand our world.

• helps us quantify variability.

What questions can we ask?

• Where are you from?

• How did you get here?

• How long are you staying?/What day are you leaving?

• How many times have you been to TEAM?

• What is your day job?

Answering our questions

• Collect data

• Make a graph

Where's home?

0

10

20

30

40

50

60

70

80

90

100

North South East West Central Out ofState

Location

Nmber of people

Series1

Wait! There’s more!!!!

• Analyze and interpret data– Answer the original question– Make inferences– Make predictions– What other questions can we answer with

this data display?

Standards 2000

Instructional programs should enable all students to–

• formulate questions that can be addressed with data and collect, organize, and display relevant data to answer them;

• select and use appropriate statistical methods to analyze data;

• develop and evaluate inferences and predictions that are based on data;

GAISEhttp://www.amstat.org/education/gaise/

Statistical Problem Solving

• Formulate Questions– clarify the problem at hand – formulate question(s) that can be answered with data

• Collect Data– design a plan to collect appropriate data– employ the plan to collect the data

• Analyze Data– select appropriate graphical or numerical methods– use these methods to analyze the data

• Interpret Results– interpret the analysis – relate the interpretation to the original question

Main Points

• We are not asking enough of students!!!• We are not providing them with rich enough

experiences in data analysis to enable them to move confidently into higher grades or to make sense of the world.

• Statistics is an opportunity to APPLY lots of other mathematical ideas in a context.

• Need to end the “mean-median-mode ad nauseum” pattern we’ve been using.

Big ideas that need more attention

• Context– Why do we want to know these things?

• Variability– natural vs. induced

• Inference, prediction

THE GAISE FRAMEWORK MODEL 

ProcessComponent

Level A Level B Level C

FormulateQuestion

Beginning awareness of the statistics question distinction 

Increased awareness of the statistics question distinction 

Students can make the statistics question distinction 

CollectData

Do not yet design for differences 

Awareness of design for differences 

Students make designs for differences 

AnalyzeData

Use particular properties of distributions in context of specific example

Learn to use particular properties of distributions as tools of analysis 

Understand and use distributions in analysis as a global concept

InterpretResults

Do not look beyond the data

Acknowledge that looking beyond the data is feasible 

Able to look beyond the data in some contexts

THE FRAMEWORK MODEL 

Nature ofVariability   

Focus onVariability

Measurement variabilityNatural variabilityInduced variability  Variability within a group   

Sampling variability    

Variability within a group and variability between groups Co-variability

Chance variability    

Variability in model fitting

Classroom Census

• Most common and most appropriate type of data collection for PreK-5

• Involves collecting and analyzing data about us/our classroom

• Examples– Favorite ______– Type of shoes– Lunch count– Weather– Birthdays– Bus riders/car

riders/walkers

Type of shoes we’re wearing

• What is the most popular type of shoe in our class today?

Pushing to higher levels

• Formulate questions– Allow children to generate questions from a

context• Tie shoes vs. not tie shoes• Tie shoes, slip-on shoes, buckle shoes• Shoe color• Type of soles• Material from which shoe is made

Pushing…

• Collect data– What data do we need in order to answer

our question?– How could we get this data?

• Use actual shoes• Raise hands and count• Use Unifix cubes to make towers• Use sticky notes to make a graph

Pushing…

• Analyze data Decide on an appropriate graphical

representation Describe the shape of the distribution Locate individuals within group data

Pushing…

• Interpret results– Answer the original question– Make inferences

• Why might so many people be wearing tie shoes today?

– Make predictions• Would you expect the same results if we collected this

data in December?• Would we get the same results if we collected data

from Ms. Murphy’s class?• Would we get the same results if we went to <local

business> and collected data?

Pushing…

• Extending to new problems– What other questions could we answer with

this data?• How many more people are wearing tie shoes

than slip-on shoes?• How many people are wearing tie shoes or

buckle shoes?

Simple Experiment

• Science experiment– Beans grown in dark or light

• Comparison of 2 existing items– Sugar content in bubble gum vs. minty gum

Simple experiment

• Formulate questions– What things affect how well a bean plan

grows? (light, soil, water, temperature)– What does it mean that a bean “grows

well?”– Which condition are we most interested in

investigating?

Simple experiment

• Collect Data– Plan the experiment

• Decide what data to collect (height of beans)• How will we collect it? (ruler–inches vs.

centimeters, Unifix cubes, string)• When will we collect it?

– Conduct the experiment

Simple experiment

• Analyze Data– Dot plot– Did all beans from one condition grow

better than all beans from the other condition?

– Answer the original question.

Simple experiment

• Interpret Results– Does this fit with what you know and observe

about growing flowers, plants, and vegetables?– Why didn’t some beans in the light sprout at all?– Does this mean we can’t grow plants inside?

• Predict– Does it matter what kind of seeds we use?

• Extend– How much taller was the tallest bean than the

shortest bean?

Evolution of the mean

• Level A: fair share

• Level B: balance point of a distribution

• Level C: distribution of sample means

• The Family Size Problem: How large are families today?

Level A

• 9 children each represent their family size with cubes

2 3 3 4 4 5 6 7 9

How many people would be in each family if they were all the same size (e.g., no variability)?

All 43 Family Members

Results

• Fair share value

• Leads to algorithm for the mean

Upside down and backward

• What if the mean is 6?

• What could the 9 families look like?

Two Examples with Fair Share Value of 6.

Which group is “closer” to being “fair?”

How might we measure “how close” a group of numeric data is to being fair?

Which group is “closer” to being “fair?”

The blue group is closer to fair since it requires only one “step” to make it fair. The lower group requires two “steps.”

How do we define a “step?”

When a snap cube is removed from a stack higher than the fair share value and placed on a stack lower than the fair share value, we count a step.

“fairness” ~ number of steps to make it fair

Fewer steps is closer to fair

Number of Steps to Make Fair: 8

Number of Steps to Make Fair: 9

Students completing Level A understand:

• the notion of “fair share” for a set of numeric data

• the fair share value is also called the mean value

• the algorithm for finding the mean

• the notion of “number of steps” to make fair as a measure of variability about the mean

• the fair share/mean value provides a basis for comparison between two groups of numerical data with different sizes (thus can’t use total)

Level B

• Balance point

• Developing measures of variation about the mean

Create different dot plots for of nine families with a mean of 6.

-+--+--+--+--+--+--+--+--+- 2 3 4 5 6 7 8 9 10

-+--+--+--+--+--+--+--+--+- 2 3 4 5 6 7 8 9 10

-+--+--+--+--+--+--+--+--+- 2 3 4 5 6 7 8 9 10

In which group do the data (family sizes) vary (differ) more from the mean value of 6?

-+--+--+--+--+--+--+--+--+- 2 3 4 5 6 7 8 9 10

-+--+--+--+--+--+--+--+--+- 2 3 4 5 6 7 8 9 10

4 2 1

1

0 1 2

2

3

4 3 2 0

0

0

2 3 4

In Distribution 1, the Total Distance from the Mean is 16.

In Distribution 2, the Total Distance from the Mean is 18.

-+--+--+--+--+--+--+--+--+- 2 3 4 5 6 7 8 9 10

4 2 1

1

0 1 2

2

3

The total distance for the values below the mean of 6 is 8, the same as the total distance for the values above the mean. So, the distribution will “balance” at 6 (the mean).

The SAD is defined to be:

The Sum of the Absolute Deviations

Relationship between SAD and Number of Steps to Fair from Level A:

SAD = 2 x number of steps

Number of Steps to Make Fair: 8

Number of Steps to Make Fair: 9

An Illustration where the SAD doesn’t work!

-+--+--+--+--+--+--+--+--+- 2 3 4 5 6 7 8 9 10

-+--+--+--+--+--+--+--+--+- 2 3 4 5 6 7 8 9 10

4 4

1

1

1 1

1

1

11

The SAD is 8 for each distribution, but in the first distribution the data vary more from the mean.

Why doesn’t the SAD work?

Adjusting the SAD for group sizes yields the:

MAD = Mean Absolute Deviation

Measuring Variation about the Mean

• SAD = Sum of Absolute Deviations

• MAD = Mean of Absolute Deviations

• Variance = Mean of Squared Deviations

• Standard Deviation = Square Root of Variance

Summary of Level B and Transitions to Level C

• Mean as the balance point of a distribution

• Mean as a “central” point

• Various measures of variation about the mean

Level C

• Sampling distribution of the sample means– Links probability and statistics– Transition from descriptive to inferential

statistics

Eighty Circles/What is the Mean Diameter?

Activity

• Choose 10 circles that you think have a diameter close to the mean. Find the mean diameter of your 10 circles.

vs.

• Select random samples of 10 circles and find the mean.

Sample Means2.22.01.81.61.41.21.0

Random Selection

Self Selection

Dotplot of Random Selection versus Self Selection

Population Mean = 1.25

Sampling Distributions provide the link to two important concepts in statistical inference:

• Margin of Error

• Statistical Significance

Resources

• NCTM Principles and Standards

• GAISE Framework

• NCTM Navigations series

• Quantitative Literacy series

Statistical Problem Solving

• Formulate Questions– clarify the problem at hand – formulate question(s) that can be answered with data

• Collect Data– design a plan to collect appropriate data– employ the plan to collect the data

• Analyze Data– select appropriate graphical or numerical methods– use these methods to analyze the data

• Interpret Results– interpret the analysis – relate the interpretation to the original question