Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring...
Transcript of Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring...
![Page 1: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/1.jpg)
Descriptive Statistics andDescriptive Statistics and Exploratory Data AnalysisExploratory Data Analysis
DeanDean’’s Faculty and Residents Faculty and Resident Development SeriesDevelopment Series
UT College of Medicine ChattanoogaUT College of Medicine ChattanoogaProbasco Auditorium at ErlangerProbasco Auditorium at Erlanger
January 14, 2008January 14, 2008
Marc Loizeaux, PhDMarc Loizeaux, PhDDepartment of MathematicsDepartment of Mathematics
University of Tennessee at ChattanoogaUniversity of Tennessee at Chattanooga
![Page 2: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/2.jpg)
What is descriptive statistics?What is descriptive statistics?
Descriptive statistics Descriptive statistics describesdescribes your your data.data.
Visual and NumericalVisual and Numerical
Inferential statistics Inferential statistics draws inferencesdraws inferences about a larger population.about a larger population.
Estimation and hypothesis testingEstimation and hypothesis testing
![Page 3: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/3.jpg)
Statistics
Descriptive Inferential
Visual Numerical Estimation HypothesisTesting
The Big PictureThe Big Picture
![Page 4: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/4.jpg)
Why descriptive statistics?Why descriptive statistics?
To summarize our dataTo summarize our dataTo help us get to know our dataTo help us get to know our dataTo help us describe our data to an To help us describe our data to an audienceaudienceTo help us explore our data.To help us explore our data.
![Page 5: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/5.jpg)
What is Exploratory Data What is Exploratory Data Analysis?Analysis?
““Exploratory data analysis is detective workExploratory data analysis is detective work–– numerical detective work numerical detective work
–– or counting detective work or counting detective work –– or graphical detective workor graphical detective work””
-- John Wilder John Wilder TukeyTukey, , Exploratory Data AnalysisExploratory Data Analysis, page 1, page 1
![Page 6: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/6.jpg)
Exploring our dataExploring our data
Gives us an overall viewGives us an overall viewHelps us consider basic assumptionsHelps us consider basic assumptionsHelps us spot oddball valuesHelps us spot oddball valuesHelps us avoid embarrassing oversightsHelps us avoid embarrassing oversightsMay help us decide on the next stepMay help us decide on the next step
![Page 7: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/7.jpg)
Visual DescriptionsVisual Descriptions (Tools for exploring your data visually)(Tools for exploring your data visually)
Charts and GraphsCharts and Graphs–– HistogramHistogram–– DotplotDotplot–– Stem and leaf plotStem and leaf plot–– BoxplotBoxplot–– ScatterplotScatterplot–– And many moreAnd many more
![Page 8: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/8.jpg)
A simple exampleA simple example Grades on the first examGrades on the first exam
84 75 83 48 70 31 39 51 57 68 5584 89 45 53 55 69 93 54 65 75 78
88 90 91 95 88 55 55 41 47 78
20 30 40 50 60 70 80 90 100
![Page 9: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/9.jpg)
Numerical DescriptionsNumerical Descriptions((UnivariateUnivariate, interval data) , interval data) We want to describeWe want to describe……..
–– The The central tendencycentral tendency of the dataof the dataWhat is a center point for the data?What is a center point for the data?What is a typical score?What is a typical score?
–– The The variationvariation of the data?of the data?How much spread is there to the data?How much spread is there to the data?How far apart are the data values from each other?How far apart are the data values from each other?
![Page 10: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/10.jpg)
Measures of Central TendencyMeasures of Central Tendency
The The meanmean is the arithmetic average.is the arithmetic average.–– Easy to calculate, easy to understandEasy to calculate, easy to understand–– The balance point of the dataThe balance point of the data
The The medianmedian is the score in the middle.is the score in the middle.–– Resistant to extreme scoresResistant to extreme scores
![Page 11: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/11.jpg)
Measures of DispersionMeasures of Dispersion
The range.The range.–– Easy to calculate and quickEasy to calculate and quick
Range = high score Range = high score –– low scorelow score–– Limited Limited –– only considers two scoresonly considers two scores
The standard deviation.The standard deviation.–– More complicated, butMore complicated, but……–– Indicates a Indicates a ““typicaltypical”” deviation from the meandeviation from the mean
![Page 12: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/12.jpg)
Childhood Respiratory DiseaseChildhood Respiratory Disease ((playing with the data)playing with the data)
Data available from Data available from OzDASLOzDASL, , StatSci.orgStatSci.org
FEV (forced expiratory volume) is an index of pulmonary functionFEV (forced expiratory volume) is an index of pulmonary function that that measures the volume of air expelled after one second of constantmeasures the volume of air expelled after one second of constant effort. effort.
The data: determinations of FEV on 654 children ages 6The data: determinations of FEV on 654 children ages 6--22 who were seen 22 who were seen in the Childhood Respiratory in the Childhood Respiratory DeseaseDesease Study in 1980 in East Boston, Study in 1980 in East Boston, Massachusetts. The data are part of a larger study to follow theMassachusetts. The data are part of a larger study to follow the change in change in pulmonary function over time in children.pulmonary function over time in children.
Source:Source:–– TagerTager, I. B., Weiss, S. T., , I. B., Weiss, S. T., RosnerRosner, B., and , B., and SpeizerSpeizer, F. E. (1979). Effect of , F. E. (1979). Effect of
parental cigarette smoking on pulmonary function in children. parental cigarette smoking on pulmonary function in children. American Journal American Journal of Epidemiologyof Epidemiology, , 110110, 15, 15--26. 26.
–– RosnerRosner, B. (1990). , B. (1990). Fundamentals of Biostatistics, 3rd EditionFundamentals of Biostatistics, 3rd Edition. PWS. PWS--Kent, Kent, Boston, Massachusetts. Boston, Massachusetts.
![Page 13: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/13.jpg)
Some of the DataSome of the Data
ID Age FEV Height Sex Smoker46951 12 3.082 63.5 Female Non47051 13 3.297 65 Female Current47052 11 3.258 63 Female Non72901 12 2.935 65.5 Male Non73041 16 4.27 67 Male Current73042 15 3.727 68 Male Current73751 18 2.853 60 Female Non75852 16 2.795 63 Female Current77151 15 3.211 66.5 Female Non
![Page 14: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/14.jpg)
Descriptive StatisticsDescriptive Statistics
Age FEV Height
Mean 9.93 2.64 61.14
Median 10.00 2.55 61.50
Mode 9 3.08 63
Standard Deviation 2.95 0.87 5.70
Range 16 5.00 28
Minimum 3 0.79 46
Maximum 19 5.79 74
![Page 15: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/15.jpg)
Pictures may say morePictures may say more
![Page 16: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/16.jpg)
The ages look like thisThe ages look like this
![Page 17: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/17.jpg)
And againAnd again
![Page 18: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/18.jpg)
One variable, then twoOne variable, then two……
A A univariateunivariate explorationexploration–– Explore each data column individuallyExplore each data column individually
A multivariate explorationA multivariate exploration–– Explore the relationships between two data Explore the relationships between two data
columnscolumns
![Page 19: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/19.jpg)
Consider natural subgroupsConsider natural subgroups
![Page 20: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/20.jpg)
Raising more questions?Raising more questions?
![Page 21: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/21.jpg)
It starts to make senseIt starts to make sense
![Page 22: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/22.jpg)
Something else to study?Something else to study?
![Page 23: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/23.jpg)
![Page 24: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/24.jpg)
Differentiating SubgroupsDifferentiating Subgroups
![Page 25: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/25.jpg)
Preparing for an AudiencePreparing for an Audience
Some DoSome Do’’ss–– Pick and choose your graphsPick and choose your graphs–– Include appropriate numbers for your type of Include appropriate numbers for your type of
datadata–– Include narrativeInclude narrative
Does the histogram indicate asymmetry? Does the histogram indicate asymmetry? Are there unexpected values in the data set?Are there unexpected values in the data set?Are there special problems you had to deal with to Are there special problems you had to deal with to describe the data?describe the data?
![Page 26: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/26.jpg)
Preparing for an Audience (2)Preparing for an Audience (2)
Some DonSome Don’’tsts–– DonDon’’t include everything t include everything –– that just confuses that just confuses
us.us.–– DonDon’’t be redundant t be redundant –– some graphs say the some graphs say the
same thing.same thing.–– DonDon’’t include descriptors you dont include descriptors you don’’t t
understand (kurtosis?) understand (kurtosis?) –– ask the chauffeurask the chauffeur
![Page 27: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/27.jpg)
Points to RememberPoints to Remember (in no particular order)(in no particular order)
DonDon’’t skip the simple stuff!t skip the simple stuff!Spend time playing with your data.Spend time playing with your data.Pictures say a lot.Pictures say a lot.Describe the spread as well as the center.Describe the spread as well as the center.Consider the natural subgroups in your Consider the natural subgroups in your data.data.
![Page 28: Descriptive Statistics and Exploratory Data Analysis...Exploratory Data Analysis, page 1 Exploring our data Gives us an overall view Helps us consider basic assumptions Helps us spot](https://reader030.fdocuments.us/reader030/viewer/2022040201/5e5fc8b37d1b4b75ac001970/html5/thumbnails/28.jpg)
Next TimeNext Time
Confidence Intervals,Confidence Intervals,Hypothesis Tests,Hypothesis Tests,
and Statistical Significanceand Statistical Significance2 x 2 tables2 x 2 tables
Monday, February 11Monday, February 11