[Andy field] discovering_statistics_using_spss_(in(book_fi.org)

854
ANDY FIELD DISCOVERING STATISTICS USING SPSS THIRD EDITION (and sex and drugs and rock ’n’ roll)

description

If you dont like statistics, you have to read this book. If you do like statistics you have to read this book. Andy Field, is a nice professor who has described SPSS so interesting and easily. Read it!

Transcript of [Andy field] discovering_statistics_using_spss_(in(book_fi.org)

  • 1. DISCOVERING STATISTICS USING Spss T H I R D E D I T I O N (and sex and drugs and rock n roll)ANDY FIELD

2. Andy Field 2009 First edition published 2000 Second edition published 2005 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. SAGE Publications Ltd 1 Olivers Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1 Mohan Cooperative Industrial Area Mathura Road New Delhi 110 044 SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 Library of Congress Control Number: 2008930166 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN 978-1-84787-906-6 ISBN 978-1-84787-907-3Typeset by C&M Digitals (P) Ltd, Chennai, India Printed by Oriental Press, Dubai Printed on paper from sustainable resources 3. CONTENTSPrefacexixHow to use this bookxxivAcknowledgementsxxviiiDedicationxxxSymbols used in this bookxxxiSome maths revisionxxxiii1 Why is my evil lecturer forcing me to learn statistics? 1.1. 1.2. 1.3. 1.4. 1.5.What will this chapter tell me? 1 What the hell am I doing here? I dont belong here 1.2.1. The research process1.5.1. Variables1.5.2. Measurement error1.5.3. Validity and reliability1.6. 1.7. Initial observation: finding something that needs explaining Generating theories and testing them 1 Data collection 1: what to measure 1 111 1 1Data collection 2: how to measure 1.6.1. Correlational research methods111.6.2. Experimental research methods 1.6.3. RandomizationAnalysing data1111.7.1. Frequency distributions11.7.2. The centre of a distribution11.7.3. The dispersion in a distribution11.7.4. Using a frequency distribution to go beyond the data1.7.5. Fitting statistical models to the data What have I discovered about statistics? Key terms that Ive discovered Smart Alexs stats quiz Further reading Interesting real research1 1111 1 2 3 3 4 7 7 10 11 12 12 13 17 18 18 20 23 24 26 28 28 29 29 30 4. viD I S C O VE R I N G STAT I ST I C S US I N G S PSS2 Everything you ever wanted to know about statistics (well, sort of) 2.1. 2.2. 2.3. 2.4.What will this chapter tell me? Building statistical models 1 Populations and samples 1 Simple statistical models 1 12.4.1. The mean: a very simple statistical model131 31 32 34 35 352.4.2. Assessing the fit of the mean: sums of squares, variance and standard2.4.3. Expressing the mean as a modeldeviations2.5.1Going beyond the data12.5.1. The standard error2.5.2. Confidence intervals122.6.2Using statistical models to test research questions 2.6.1. Test statistics12.6.2. One- and two-tailed tests12.6.3. Type I and Type II errors2.6.4. Effect sizes2.6.5. Statistical power21 33.1. 3.2. 3.3. 3.4.The spss environment 2 3.5. 3.6. 3.7. 3.8. 3.9. What have I discovered about statistics? Key terms that Ive discovered Smart Alexs stats quiz Further reading Interesting real research159 59 59 60 603.4.1. Entering data into the data editor 3.4.2. The Variable View 111161 62 62 63 69 70 77 78 81 82 83 84The spss viewer 1 The spss SmartViewer The syntax window 3 Saving files 1 Retrieving a file 1 What have I discovered about statistics? 1 85 Key terms that Ive discovered 85 Smart Alexs tasks 85 Further reading 86 Online tutorials 864 Exploring data with graphs 4.1. 4.2. 61What will this chapter tell me? Versions of spss 1 Getting started 1 The data editor 1 3.4.3. Missing values1135 38 40 40 43 48 52 54 55 56 58What will this chapter tell me? The art of presenting data 1 87 14.2.1. What makes a good graph? 14.2.2. Lies, damned lies, and erm graphs187 88 88 90 5. viiC ontents 4.3. 4.4. 4.5. 4.6. 4.7. 4.8. The spss Chart Builder 1 Histograms: a good way to spot obvious problems Boxplots (boxwhisker diagrams) 1 Graphing means: bar charts and error bars 1 4.6.1. 4.6.2. 4.6.3. 4.6.4. 4.6.5.Simple bar charts for independent means 1 Clustered bar charts for independent means 1 Simple bar charts for related means 1 Clustered bar charts for related means 1 Clustered bar charts for mixed designs 1 Line charts 1 Graphing relationships: the scatterplot 4.8.1. 4.8.2. 4.8.3. 4.8.4. 4.8.5. 4.8.6.11Simple scatterplot 1 Grouped scatterplot 1 Simple and grouped 3-D scatterplots Matrix scatterplot 1 Simple dot plot or density plot 1 Drop-line graph 1 4.9. 55.1. 5.2. 5.3. 5.4.1Exploring assumptions 5.5. 5.6. 5.7. Editing graphs1What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorial Interesting real research1129 130 130 130 130 130131What will this chapter tell me? 1 What are assumptions? 1 Assumptions of parametric data 1 The assumption of normality 1 5.4.1. no, its that pesky frequency distribution again: checking Oh normality visually 1 5.4.2. Quantifying normality with numbers 1 5.4.3. Exploring groups of data 1 Testing whether a distribution is normal15.5.1. Doing the KolmogorovSmirnov test on spss 5.5.2. Output from the explore procedure 1 5.5.3. Reporting the KS test 1 Testing for homogeneity of variance 5.6.1. Levenes test 5.6.2. Reporting Levenes test1111Correcting problems in the data 5.7.1. 5.7.2. 5.7.3. 5.7.4.91 93 99 103 105 107 109 111 113 115 116 117 119 121 123 125 126 1262Dealing with outliers 2 Dealing with non-normality and unequal variances Transforming the data using spss 2 When it all goes horribly wrong 3 2131 132 132 133 134 136 140 144 145 146 148 149 150 152 153 153 153 156 162What have I discovered about statistics? 1 164 Key terms that Ive discovered 164 Smart Alexs tasks 165 Online tutorial 165 Further reading 165 6. viiiD I S C O VE R I N G STAT I ST I C S US I N G S PSS6 Correlation 6.1. 6.2. 6.3.166What will this chapter tell me? 1 Looking at relationships 1 How do we measure relationships?1166 167 167 167 169 171 172 173 174 175 175 177 179 181 182 186 186 188 190 191 191 191 192 1936.3.1. A detour into the murky world of covariance6.3.2. Standardization and the correlation coefficient16.3.3. The significance of the correlation coefficient6.3.4. Confidence intervals for r 6.4. 6.5. 6.6.3136.3.5. A word of warning about interpretation: causalityData entry for correlation analysis using spss Bivariate correlation 1 16.5.2. Pearsons correlation coefficient16.5.3. Spearmans correlation coefficient 6.5.4. Kendalls tau (non-parametric)11236.6.2. Partial correlation using spss6.6.3. Semi-partial (or part) correlations 6.8. 6.9. 6.5.5. Biserial and pointbiserial correlationsPartial correlation16.6.1. The theory behind part and partial correlation6.7.16.5.1. General procedure for running correlations on spssComparing correlations322 216.7.1. Comparing independent rs336.7.2. Comparing dependent rsCalculating the effect size 1 How to report correlation coefficentsWhat have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorial Interesting real research1195 195 195 196 196 1967 Regression 7.1. 7.2. 7.3. 7.4. 7.5. 7.6.197What will this chapter tell me? 1 An introduction to regression 1 7.2.1. 7.2.2. 7.2.3. 7.2.4.Some important information about straight lines 1 The method of least squares 1 Assessing the goodness of fit: sums of squares, R and R2 Assessing individual predictors 1 Doing simple regression on spss 1 Interpreting a simple regression 1 7.4.1. Overall fit of the model 7.4.2. Model parameters 1 7.4.3. Using the model 1 1Multiple regression: the basics27.5.1. An example of a multiple regression model 7.5.2. Sums of squares, R and R2 2 7.5.3. Methods of regression 2 How accurate is my regression model?221197 198 199 200 201 204 205 206 206 207 208 209 210 211 212 214 7. ixC ontents 7.7. 7.8. How to do multiple regression using spss 7.7.1. 7.7.2. 7.7.3. 7.7.4. 7.7.5. 7.7.6.Interpreting multiple regression 7.8.1. 7.8.2. 7.8.3. 7.8.4. 7.8.5. 7.8.6. 7.8.7.22 2214 220 225 225 225 227 229 230 231 233 233 234 237 241 241 244 247 251 252 253 253 256Some things to think about before the analysis Main options 2 Statistics 2 Regression plots 2 Saving regression diagnostics 2 Further options 2 2Descriptives Summary of model 2 Model parameters 2 Excluded variables 2 Assessing the assumption of no multicollinearity Casewise diagnostics 2 Checking assumptions 2 27.9. What if I violate an assumption? 2 7.10. How to report multiple regression 2 7.11. Categorical predictors and multiple regression 7.6.1. Assessing the regression model I: diagnostics 2 7.6.2. Assessing the regression model II: generalization7.11.1. Dummy coding 3 7.11.2. Spss output for dummy variables332What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorial Interesting real research1261 261 262 263 263 2638 Logistic regression 8.1. 8.2. 8.3. 8.4. 8.5. 8.6.264What will this chapter tell me? 1 Background to logistic regression 1 What are the principles behind logistic regression? 8.3.1. 8.3.2. 8.3.3. 8.3.4. 8.3.5.Assumptions and things that can go wrong 8.4.1. 8.4.2. 8.4.3. 8.4.4.3Assessing the model: the log-likelihood statistic 3 Assessing the model: R and R2 3 Assessing the contribution of predictors: the Wald statistic The odds ratio: Exp(B) 3 Methods of logistic regression 2 4Assumptions 2 Incomplete information from the predictors Complete separation 4 Overdispersion 4 2 4Binary logistic regression: an example that will make you feel eel 8.5.1. 8.5.2. 8.5.3. 8.5.4. 8.5.5.The main analysis 2 Method of regression 2 Categorical predictors 2 Obtaining residuals 2 Further options 2 Interpreting logistic regression22264 265 265 267 268 269 270 271 273 273 273 274 276 277 278 279 279 280 281 282 8. xD I S C O VE R I N G STAT I ST I C S US I N G S PSS 8.7. 8.8. 8.9.8.6.1. The initial model238.6.3. Listing predicted probabilities 8.6.4. Interpreting residuals228.6.5. Calculating the effect size2How to report logistic regression 2 Testing assumptions: another example 8.8.1. Testing for linearity of the logit 8.8.2. Testing for multicollinearity323Predicting several categories: multinomial logistic regression8.9.1. Running multinomial logistic regression in spss8.9.2. Statistics 3282 284 291 292 294 294 294 296 297 300 301 304 305 306 3128.6.2. Step 1: intervention338.9.3. Other options38.9.4. Interpreting the multinomial logistic regression output38.9.5. Reporting the resultsWhat have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorial Interesting real research1313 313 313 315 315 3159 Comparing two means 9.1. 9.2. 9.3. 9.4. 9.5. 9.6. 9.7. 9.8.316What will this chapter tell me? Looking at differences 1 19.2.1. problem with error bar graphs of repeated-measures designs A 9.2.2. Step 1: calculate the mean for each participant 9.2.3. Step 2: calculate the grand mean229.2.4. Step 3: calculate the adjustment factor29.2.5. Step 4: create adjusted values for each variableThe t-test129.3.1. Rationale for the t-test19.3.2. Assumptions of the t-testThe dependent t-test119.4.1. Sampling distributions and the standard error 9.4.2. The dependent t-test equation explained119.4.3. The dependent t-test and the assumption of normality 9.4.4. Dependent t-tests using spss19.4.5. Output from the dependent t-test 9.4.6. Calculating the effect size219.4.7. Reporting the dependent t-testThe independent t-test119.5.1. The independent t-test equation explained 9.5.2. The independent t-test using spss 9.5.3. Output from the independent t-test 9.5.4. Calculating the effect size21 112 9.5.5. Reporting the independent t-test1Between groups or repeated measures? 1 The t-test as a general linear model 2 What if my data are not normally distributed?11316 317 317 320 320 322 323 324 325 326 326 327 327 329 329 330 332 333 334 334 337 339 341 341 342 342 344 9. xiC ontents What have I discovered about statistics? Key terms that Ive discovered Smart Alexs task Further reading Online tutorial Interesting real research1345 345 346 346 346 34610 Comparing several means: anova (glm 1) 10.1. What will this chapter tell me? 10.2. The theory behind anova 2 10.2.2. 10.2.3.10.2.4. 10.2.5.10.2.6. 10.2.7.10.2.8. 10.2.9.Inflated error rates Interpreting f22Anova as regression Logic of the f-ratio22Total sum of squares (sst)2Model sum of squares (ssm) 2Mean squares The F-ratio22 210.2.12. Post hoc procedures 2Planned comparisons using spss Post hoc tests in spss210.4.1.Output for the main analysis22Options10.4.3.10.3.3.10.4.2.2310.4. Output from one-way anova10.2.11. Planned contrasts10.3.2.210.2.10. Assumptions of anova10.3.1.Residual sum of squares (ssr)10.3. Running one-way anova on spss10.2.1.134722Output for planned comparisons Output for post hoc tests22 10.5. Calculating the effect size 2 10.6. Reporting results from one-way independent anova 2 10.7. Violations of assumptions in one-way independent anova What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorials Interesting real research12392 392 393 394 394 39411 Analysis of covariance, ancova (glm 2) 11.1. What will this chapter tell me? 2 11.2. What is ancova? 2 11.3. Assumptions and issues in ancova 11.3.1. 11.3.2.Homogeneity of regression slopes11.4.1.Inputting data12395Independence of the covariate and treatment effect11.4. Conducting ancova on spss33311.4.2. Initial considerations: testing the independence of the independent variable and covariate2347 348 348 349 349 354 356 356 357 358 358 359 360 372 375 376 378 379 381 381 384 385 389 390 391395 396 397 397 399 399 399 400 10. xiiD I S C O VE R I N G STAT I ST I C S US I N G S PSS 211.6. 11.7. 11.8. 11.9. 11.10. 11.5.1. 11.5.2. 11.5.3. 11.5.4.401 401 404 404 405 407 408 408 413 415 417 41811.5. Interpreting the output from ancova 11.4.3. The main analysis 2 11.4.4. Contrasts and other options2What happens when the covariate is excluded? The main analysis 2 Contrasts 2 Interpreting the covariate 2 2Ancova run as a multiple regression 2 Testing the assumption of homogeneity of regression slopes Calculating the effect size 2 Reporting results 2 What to do when assumptions are violated in ancova 3 What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorials Interesting real research23418 419 419 420 420 42012 Factorial anova (glm 3) 12.1. What will this chapter tell me? 2 12.2. Theory of factorial anova (between-groups) 12.2.1.12.2.2. 12.2.3.12.2.4.12.2.5.12.2.6.Factorial designs2 12.3.1.12.3.2. 12.3.3.12.3.4. 12.3.5.Total sums of squares (sst) 12.4.1.12.4.2. 12.4.3.22The residual sum of squares (ssr) 22The model sum of squares (ssm) The F-ratios 22Entering the data and accessing the main dialog box Graphing interactions Contrasts222 2Output for the preliminary analysis Levenes test222The main anova table12.4.4. Contrasts2Post hoc tests Options212.4. Output from factorial anova2An example with two independent variables12.3. Factorial anova using spss4212312.4.5.Simple effects analysis12.4.6.Post hoc analysis2 12.5. 12.6. 12.7. 12.8. 12.9.Interpreting interaction graphs 2 Calculating effect sizes 3 Reporting the results of two-way anova 2 Factorial anova as regression 3 What to do when assumptions are violated in factorial anova What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks23421 422 422 423 424 426 428 429 430 430 432 432 434 434 435 435 436 436 439 440 441 443 446 448 450 454 454 455 455 11. xiiiC ontents Further reading Online tutorials Interesting real research456 456 45613 Repeated-measures designs (glm 4) 45713.1. What will this chapter tell me? 2 13.2. Introduction to repeated-measures designs 13.10. 13.11. 13.12.457 458 13.2.1. The assumption of sphericity 2 459 13.2.2. How is sphericity measured? 2 459 13.2.3. Assessing the severity of departures from sphericity 2 460 3 13.2.4. What is the effect of violating the assumption of sphericity? 460 13.2.5. What do you do if you violate sphericity? 2 461 Theory of one-way repeated-measures anova 2 462 13.3.1. The total sum of squares (sst) 2 464 13.3.2. The within-participant (ssw) 2 465 2 13.3.3. The model sum of squares (ssm) 466 13.3.4. The residual sum of squares (ssr) 2 467 13.3.5. The mean squares 2 467 13.3.6. The F-ratio 2 467 13.3.7. The between-participant sum of squares 2 468 One-way repeated-measures anova using spss 2 468 13.4.1. The main analysis 2 468 13.4.2. Defining contrasts for repeated-measures 2 471 13.4.3. Post hoc tests and additional options 3 471 Output for one-way repeated-measures anova 2 474 13.5.1. Descriptives and other diagnostics 1 474 13.5.2. Assessing and correcting for sphericity: Mauchlys test 2 474 2 13.5.3. The main anova 475 13.5.4. Contrasts 2 477 13.5.5. Post hoc tests 2 478 Effect sizes for repeated-measures anova 3 479 Reporting one-way repeated-measures anova 2 481 2 Repeated-measures with several independent variables 482 13.8.1. The main analysis 2 484 13.8.2. Contrasts 2 488 13.8.3. Simple effects analysis 3 488 13.8.4. Graphing interactions 2 490 13.8.5. Other options 2 491 2 Output for factorial repeated-measures anova 492 13.9.1. Descriptives and main analysis 2 492 13.9.2. The effect of drink 2 493 13.9.3. The effect of imagery 2 495 13.9.4. The interaction effect (drink imagery) 2 496 13.9.5. Contrasts for repeated-measures variables 2 498 3 Effect sizes for factorial repeated-measures anova 501 Reporting the results from factorial repeated-measures anova 2 502 What to do when assumptions are violated in repeated-measures anova 3 503 What have I discovered about statistics? Key terms that Ive discovered 13.3. 13.4. 13.5. 13.6. 13.7. 13.8. 13.9. 22503 504 12. xivD I S C O VE R I N G STAT I ST I C S US I N G S PSS Smart Alexs tasks Further reading Online tutorials Interesting real research504 505 505 50514 Mixed design anova (glm 5) 14.1. 14.2. 14.3. 14.4. 506What will this chapter tell me? 1 Mixed designs 2 What do men and women look for in a partner? Mixed anova on spss 2 14.4.1. The main analysis 14.4.2.Other options214.5.1.The main effect of gender2The main effect of looks14.5.3.The main effect of charisma14.5.4. The interaction between gender and looks 14.5.6. 14.5.7.14.5.8.14.5.2.14.5.5.3214.5. Output for mixed factorial anova: main analysis2506 507 508 508 508 513 514 517 518 520 521 523 524 527 530 531 533 53622 2The interaction between gender and charisma2The interaction between attractiveness and charisma2The interaction between looks, charisma and gender3Conclusions3 14.6. Calculating effect sizes 3 14.7. Reporting the results of mixed anova 2 14.8. What to do when assumptions are violated in mixed ANOVA What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorials Interesting real research23536 537 537 538 538 53815 Non-parametric tests 53915.1. What will this chapter tell me? 1 15.2. When to use non-parametric tests 1 15.3. Comparing two independent conditions: the Wilcoxon rank-sum test and MannWhitney test 1 15.3.1.Theory15.3.2.Inputting data and provisional analysis15.3.3.Running the analysis 15.3.4. 15.3.5.2 1Output from the MannWhitney test Calculating an effect size2115.3.6.Writing the results15.4.1.Theory of the Wilcoxon signed-rank test1115.4. Comparing two related conditions: the Wilcoxon signed-rank test 15.4.2. 15.4.3.Running the analysis1Output for the ecstasy group115.4.4. Output for the alcohol group115.4.5.15.4.6.Calculating an effect size Writing the results1221539 540 540 542 545 546 548 550 550 552 552 554 556 557 558 558 13. xvC ontents15.5. Differences between several independent groups: the KruskalWallis test15.5.1.Theory of the KruskalWallis test15.5.2.Inputting data and provisional analysis115.5.3.Doing the KruskalWallis test on spss1 15.5.5.15.5.6. 15.5.7.15.5.8.Output from the KruskalWallis test1Post hoc tests for the KruskalWallis test2Testing for trends: the JonckheereTerpstra test Calculating an effect size22Writing and interpreting the results115.6. Differences between several related groups: Friedmans anova 15.5.4.2 15.6.1.15.6.2. 15.6.3.15.6.4. 15.6.5.15.6.6. 15.6.7.Theory of Friedmans anova2Inputting data and provisional analysis Doing Friedmans anova on spss Output from Friedmans anova112Post hoc tests for Friedmans anova Calculating an effect size1 2Writing and interpreting the results1What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorial Interesting real research116.3.1.Words of warning16.3.2.The example for this chapter16.4. Theory of manova 16.4.3.Introduction to matrices3Some important matrices and their functions16.4.4. Principle of the manova test statistic416.5.1.316.5.2. 16.5.3. 16.6.1.16.6.2. 16.6.3.Assumptions and how to check them Choosing a test statistic Follow-up analysis 23323The main analysisMultiple comparisons in Manova Additional options 33216.7.1.Preliminary analysis and testing assumptions16.7.2.Manova test statistics 16.7.3.16.7.4. 16.7.5.16.7. Output from manova3Calculating manova by hand: a worked example16.6. Manova on spss16.4.2.Univariate test statistics Sscp Matrices Contrasts3316.5. Practical issues when conducting manova16.4.1.2 233 2 559 560 562 562 564 565 568 570 571 573 573 575 575 576 577 579 58058416.1. What will this chapter tell me? 2 16.2. When to use manova 2 16.3. Introduction: similarities and differences to anova 2581 582 582 583 583 58316 Multivariate analysis of variance (manova) 1133584 585 585 587 587 588 588 590 591 598 603 603 604 605 605 606 607 607 608 608 608 609 611 613 14. xviD I S C O VE R I N G STAT I ST I C S US I N G S PSS 16.8. 16.9. 16.10. 16.11. 16.12.Reporting results from manova 2 Following up manova with discriminant analysis Output from the discriminant analysis 4 Reporting results from discriminant analysis 2 Some final remarks 4 16.12.1. The final interpretation316.12.2. Univariate anova or discriminant analysis?416.13. What to do when assumptions are violated in Manova What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorials Interesting real research23624 625 625 626 626 62617 Exploratory factor analysis 17.3.2. 17.3.3.Graphical representation of factors17.4.1.17.4.2. 17.4.3.2Mathematical representation of factors Factor scores2Communality22Choosing a method22Factor analysis vs. principal component analysis17.4.4. Theory behind principal component analysis 17.4.5.17.4.6. 17.5.1.3Improving interpretation: factor rotation 22Before you begin17.6. Running the analysis32Factor extraction: eigenvalues and the scree plot17.5. Research example17.3.1.17.4. Discovering factors62717.1. What will this chapter tell me? 1 17.2. When to use factor analysis 2 17.3. Factors 2 2217.6.1.Factor extraction on spss17.6.2.Rotation2Scores 17.6.4.Options17.7.1.Preliminary analysis217.7.2. 17.7.3.17.7.4. 17.7.5. 17.9.1.17.9.2. 17.9.3.17.9.4.2 Factor extraction2Factor rotation22Factor scores Summary22 217.8. How to report factor analysis 17.9. Reliability analysis 2 217.7. Interpreting output from spss 17.6.3.Measures of reliability13Interpreting Cronbachs (some cautionary tales ) Reliability analysis on spss Interpreting the output614 615 618 621 622 622 624 6242217.10. How to report reliability analysis22627 628 628 630 631 633 636 636 637 638 638 639 642 645 645 650 651 653 654 654 655 656 660 664 669 671 671 673 673 675 676 678 681 15. xviiC ontents What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorial Interesting real research218 Categorical data 18.4. 18.5. 18.6. 18.7. 18.8. 68618.1. What will this chapter tell me? 1 18.2. Analysing categorical data 1 18.3. Theory of analysing categorical data 18.9. 18.10. 18.11. 18.12. 686 687 1 687 18.3.1. Pearsons chi-square test 1 688 18.3.2. Fishers exact test 1 690 2 18.3.3. The likelihood ratio 690 18.3.4. Yates correction 2 691 Assumptions of the chi-square test 1 691 Doing chi-square on spss 1 692 1 18.5.1. Entering data: raw scores 692 18.5.2. Entering data: weight cases 1 692 18.5.3. Running the analysis 1 694 18.5.4. Output for the chi-square test 1 696 2 698 18.5.5. Breaking down a significant chi-square test with standardized residuals 18.5.6. Calculating an effect size 2 699 18.5.7. Reporting the results of chi-square 1 700 Several categorical variables: loglinear analysis 3 702 4 18.6.1. Chi-square as regression 702 18.6.2. Loglinear analysis 3 708 Assumptions in loglinear analysis 2 710 Loglinear analysis using spss 2 711 18.8.1. Initial considerations 2 711 18.8.2. The loglinear analysis 2 712 Output from loglinear analysis 3 714 Following up loglinear analysis 2 719 Effect sizes in loglinear analysis 2 720 Reporting the results of loglinear analysis 2 721What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorial Interesting real research19 Multilevel linear models 19.1. What will this chapter tell me? 19.2. Hierarchical data 2 1722 722 722 724 724 724725 119.2.1.The intraclass correlation19.2.2.Benefits of multilevel models219.3. Theory of multilevel linear models3682 682 683 685 685 6852725 726 728 729 730 16. xviiiD I S C O VE R I N G STAT I ST I C S US I N G S PSS 19.3.2.An example219.4.1.19.4.2.Fixed and random coefficients19.4. The multilevel model19.3.1.4Assessing the fit and comparing multilevel models Types of covariance structures19.5. Some practical issues3Assumptions19.5.2.Sample size and power19.5.3.Centring variables34Entering the data23419.6.2.Ignoring the data structure: anova19.6.3.Ignoring the data structure: ancova19.6.4.Factoring in the data structure: random intercepts19.6.6. 19.7.1.19.7.2. 19.7.3.19.7.4. 19.7.5.Adding an interaction to the model 4 24Growth curves (polynomials)4An example: the honeymoon period Restructuring the data3Running a growth model on spss Further analysis42 419.8. How to report a multilevel model 23Factoring in the data structure: random intercepts and slopes19.7. Growth models19.6.5.419.6. Multilevel modelling on spss 19.6.1.419.5.1.33What have I discovered about statistics? Key terms that Ive discovered Smart Alexs tasks Further reading Online tutorial Interesting real research24730 732 734 737 737 739 739 740 740 741 742 742 746 749 752 756 761 761 761 763 767 774 775 776 777 777 778 778 778Epilogue779Glossary781Appendix A.1. A.2. A.3. A.4.Table of the standard normal distribution Critical values of the t-distribution Critical values of the F-distribution Critical values of the chi-square distribution797 797 803 804 808References809Index816 17. PREFACEKarma Police, arrest this man, he talks in maths, he buzzes like a fridge, hes like a detuned radio. Radiohead (1997)Introduction Social science students despise statistics. For one thing, most have a non-mathematical background, which makes understanding complex statistical equations very difficult. The major advantage in being taught statistics in the early 1990s (as I was) compared to the 1960s was the development of computer software to do all of the hard work. The advantage of learning statistics now rather than 15 years ago is that Windows/MacOS enable us to just click on stuff rather than typing in horribly confusing commands (although, as you will see, we can still type in horribly confusing commands if we want to). One of the most commonly used of these packages is Spss; what on earth possessed me to write a book on it? You know that youre a geek when you have favourite statistics textbooks; my favourites are Howell (2006), Stevens (2002) and Tabachnick and Fidell (2007). These three books are peerless as far as I am concerned and have taught me (and continue to teach me) more about statistics than you could possibly imagine. (I have an ambition to be cited in one of these books but I dont think that will ever happen.) So, why would I try to compete with these sacred tomes? Well, I wouldnt and I couldnt (intellectually these people are several leagues above me). However, these wonderful and clear books use computer examples as addenda to the theory. The advent of programs like Spss provides the unique opportunity to teach statistics at a conceptual level without getting too bogged down in equations. However, many Spss books concentrate on doing the test at the expense of theory. Using Spss without any statistical knowledge at all can be a dangerous thing (unfortunately, at the moment Spss is a rather stupid tool, and it relies heavily on the users knowing what they are doing). As such, this book is an attempt to strike a good balance between theory and practice: I want to use Spss as a tool for teaching statistical concepts in the hope that you will gain a better understanding of both theory and practice. Primarily, I want to answer the kinds of questions that I found myself asking while learning statistics and using Spss as an undergraduate (things like How can I understand how this statistical test works without knowing too much about the maths behind it?, What does that button do?, What the hell does this output mean?). Like most academics Im slightly high on the autistic spectrum, and I used to get fed up with people telling me to ignore options or ignore that bit of the output. I would lie awake for hours in my bed every night wondering Why is that bit of Spss output there if we just ignore it? So that no student has to suffer the mental anguish that I did, I aim to explain what different options do, what bits of the output mean, and if we ignore something, why we ignore it. Furthermore, I want to be non-prescriptive. Too many books tell the reader what to do (click on this button, do this, do that, etc.) and this can create the impression that statistics and Spss are inflexible. Spss has many options designed to allow you to tailor a given test to your particular needs. Therefore, although I make recommendations, xix 18. xxD I S C O VE R I N G STAT I ST I C S US I N G S PSSwithin the limits imposed by the senseless destruction of rainforests, I hope to give you enough background in theory to enable you to make your own decisions about which options are appropriate for the analysis you want to do. A second, not in any way ridiculously ambitious, aim was to make this the only statistics textbook that anyone ever needs to buy. As such, its a book that I hope will become your friend from first year right through to your professorship. Ive tried, therefore, to write a book that can be read at several levels (see the next section for more guidance). There are chapters for first-year undergraduates (1, 2, 3, 4, 5, 6, 9 and 15), chapters for second-year undergraduates (5, 7, 10, 11, 12, 13 and 14) and chapters on more advanced topics that postgraduates might use (8, 16, 17, 18 and 19). All of these chapters should be accessible to everyone, and I hope to achieve this by flagging the level of each section (see the next section). My third, final and most important aim is make the learning process fun. I have a sticky history with maths because I used to be terrible at it:Above is an extract of my school report at the age of 11. The 27 in the report is to say that I came equal 27th with another student out of a class of 29. Thats almost bottom of the class. The 43 is my exam mark as a percentage! Oh dear. Four years later (at 15) this was my school report:What led to this remarkable change? It was having a good teacher: my brother, Paul. In fact I owe my life as an academic to Pauls ability to do what my maths teachers couldnt: teach me stuff in an engaging way. To this day he still pops up in times of need to teach me things (a crash course in computer programming some Christmases ago springs to mind). Anyway, the reason hes a great teacher is because hes able to make things interesting and relevant to me. Sadly he seems to have got the good teaching genes in the family (and he doesnt even work as a bloody teacher, so theyre wasted!), but his approach inspires my lectures and books. One thing that I have learnt is that people appreciate the human touch, and so in previous editions I tried to inject a lot of my own personality and sense of humour (or lack of ). Many of the examples in this book, although inspired by some of the craziness that you find in the real world, are designed to reflect topics that play on the minds of the average student (i.e. sex, drugs, rock and roll, celebrity, people doing crazy stuff). There are also some examples that are there just because they made me laugh. So, the examples are light-hearted (some have said smutty but I prefer light-hearted) and by the end, for better or worse, I think you will have some idea of what goes on in my head on a daily basis! 19. PREFAC EWhats new? Seeing as some people appreciated the style of the previous editions Ive taken this as a green light to include even more stupid examples, more smut and more bad taste. I apologise to those who think its crass, hate it, or think that Im undermining the seriousness of science, but, come on, whats not funny about a man putting an eel up his anus? Aside from adding more smut, I was forced reluctantly to expand the academic content! Most of the expansions have resulted from someone (often several people) emailing me to ask how to do something. So, in theory, this edition should answer any question anyone has asked me over the past four years! Mind you, I said that last time and still the questions come (will I never be free?). The general changes in the book are: MMMMMMMMMMMMMMMMMMMMMore introductory material: The first chapter in the last edition was like sticking your brain into a food blender. I rushed chaotically through the entire theory of statistics in a single chapter at the pace of a cheetah on speed. I didnt really bother explaining any basic research methods, except when, out of the blue, Id stick a section in some random chapter, alone and looking for friends. This time, I have written a brand-new Chapter 1, which eases you gently through the research process why and how we do it. I also bring in some basic descriptive statistics at this point too. More graphs: Graphs are very important. In the previous edition information about plotting graphs was scattered about in different chapters making it hard to find. What on earth was I thinking? Ive now written a self-contained chapter on how to use Spsss Chart Builder. As such, everything you need to know about graphs (and I added a lot of material that wasnt in the previous edition) is now in Chapter 4. More assumptions: All chapters now have a section towards the end about what to do when assumptions are violated (although these usually tell you that Spss cant do what needs to be done!). More data sets: You can never have too many examples, so Ive added a lot of new data sets. There are 30 new data sets in the book at the last count (although Im not very good at maths so it could be a few more or less). More stupid faces: I have added some more characters with stupid faces because I find stupid faces comforting, probably because I have one. You can find out more in the next section. Miraculously, the publishers stumped up some cash to get them designed by someone who can actually draw. More reporting your analysis: OK, I had these sections in the previous edition too, but then in some chapters I just seemed to forget about them for no good reason. This time every single chapter has one. More glossary: Writing the glossary last time nearly made me stick a vacuum cleaner into my ear to suck out my own brain. I thought I probably ought to expand it a bit. You can find my brain in the bottom of the vacuum cleaner in my house. New! Its colour: The publishers went full colour. This means that (1) I had to redo all of the diagrams to take advantage of the colour format, and (2) If you lick the orange bits they taste of orange (it amuses me that someone might try this to see whether Im telling the truth). New! Real-world data: Lots of people said that they wanted more real data to play with. The trouble is that real research can be quite boring. However, just for you, I trawled the world for examples of research on really fascinating topics (in my opinion). I then stalked the authors of the research until they gave me their data. Every chapter now has a real research example. New! Self-test questions: Everyone loves an exam, dont they? Well, everyone that is apart from people who breathe. Given how much everyone hates tests, I thought thexxi 20. xxiiD I S C O VE R I N G STAT I ST I C S US I N G S PSSbest way to commit commercial suicide was to liberally scatter tests throughout each chapter. These range from simple questions to test out what you have just learned to going back to a technique that you read about several chapters before and applying it in a new context. All of these questions have answers to them on the companion website. They are there so that you can check on your progress. MMMMMMMMMMMMNew! Spss tips: Spss does weird things sometimes. In each chapter, Ive included boxes containing tips, hints and pitfalls related to Spss. New! Spss 17 compliant: Spss 17 looks different to earlier versions but in other respects is much the same. I updated the material to reflect the latest editions of Spss. New! Flash movies: Ive recorded some flash movies of using Spss to accompany each chapter. Theyre on the companion website. They might help you if you get stuck. New! Additional material: Enough trees have died in the name of this book, but still it gets longer and still people want to know more. Therefore, Ive written nearly 300 pages, yes, three hundred, of additional material for the book. So for some more technical topics and help with tasks in the book the material has been provided electronically so that (1) the planet suffers a little less, and (2) you can actually lift the book. New! Multilevel modelling: Its all the rage these days so I thought I should write a chapter on it. I didnt know anything about it, but I do now (sort of). New! Multinomial logistic regression: It doesnt get much more exciting than this; people wanted to know about logistic regression with several categorical outcomes and I always give people what they want (but only if they want smutty examples).All of the chapters now have Spss tips, self-test questions, additional material (Oliver Twisted boxes), real research examples (Labcoat Leni boxes), boxes on difficult topics (Jane Superbrain boxes) and flash movies. The specific changes in each chapter are: MMMMMMMMMMMMMMMMChapter 1 (Research methods): This is a completely new chapter. It basically talks about why and how to do research. Chapter 2 (Statistics): I spent a lot of time rewriting this chapter but it was such a long time ago that I cant really remember what I changed. Trust me, though; its much better than before. Chapter 3 (Spss): The old Chapter 2 is now Spss 17 compliant. I restructured a lot of the material, and added some sections on other forms of variables (strings and dates). Chapter 4 (Graphs): This chapter is completely new. Chapter 5 (Assumptions): This retains some of the material from the old Chapter 4, but Ive expanded the content to include PP and QQ plots, a lot of new content on homogeneity of variance (including the variance ratio) and a new section on robust methods. Chapter 6 (Correlation): The old Chapter 4; I redid one of the examples, added some material on confidence intervals for r, the biserial correlation, testing differences between dependent and independent rs and how certain eminent statisticians hate each other. Chapter 7 (Regression): This chapter was already so long that the publishers banned me from extending it! Nevertheless I rewrote a few bits to make them clearer, but otherwise its the same but with nicer diagrams and the bells and whistles that have been added to every chapter. Chapter 8 (Logistic regression): I changed the main example from one about theory of mind (which is now an end of chapter task) to one about putting eels up your anus to cure constipation (based on a true story). Does this help you understand logistic regression? Probably not, but it really kept me entertained for days. Ive extended the 21. xxiiiPREFAC Echapter to include multinomial logistic regression, which was a pain because I didnt know how to do it. MMMMMMMMMMMMMMMMMMMMMMChapter 9 (t-tests): I stripped a lot of the methods content to go in Chapter 1, so this chapter is more purely about the t-test now. I added some discussion on median splits, and doing t-tests from only the means and standard deviations. Chapter 10 (GLM 1): Is basically the same as the old Chapter 8. Chapter 11 (GLM 2): Similar to the old Chapter 9, but I added a section on assumptions that now discusses the need for the covariate and treatment effect to be independent. I also added some discussion of eta-squared and partial eta-squared (Spss produces partial eta-squared but I ignored it completely in the last edition). Consequently I restructured much of the material in this example (and I had to create a new data set when I realized that the old one violated the assumption that I had just spent several pages telling people not to violate). Chapter 12 (GLM 3): This chapter is ostensibly the same as the old Chapter 10, but with nicer diagrams. Chapter 13 (GLM 4): This chapter is more or less the same as the old Chapter 11. I edited it down quite a bit and restructured material so there was less repetition. I added an explanation of the between-participant sum of squares also. The first example (tutors marking essays) is now an end of chapter task, and the new example is one about celebrities eating kangaroo testicles on television. It needed to be done. Chapter 14 (GLM 5): This chapter is very similar to the old Chapter 12 on mixed Anova. Chapter 15 (Non-parametric statistics): This chapter is more or less the same as the old Chapter 13. Chapter 16 (MAnova): I rewrote a lot of the material on the interpretation of discriminant function analysis because I thought it pretty awful. Its better now. Chapter 17 (Factor analysis): This chapter is very similar to the old Chapter 15. I wrote some material on interpretation of the determinant. Im not sure why, but I did. Chapter 18 (Categorical data): This is similar to Chapter 16 in the previous edition. I added some material on interpreting standardized residuals. Chapter 19 (Multilevel linear models): This is a new chapter.Goodbye The first edition of this book was the result of two years (give or take a few weeks to write up my Ph.D.) of trying to write a statistics book that I would enjoy reading. The second edition was another two years of work and I was terrified that all of the changes would be the death of it. Youd think by now Id have some faith in myself. Really, though, having spent an extremely intense six months in writing hell, I am still hugely anxious that Ive just ruined the only useful thing that Ive ever done with my life. I can hear the cries of lecturers around the world refusing to use the book because of cruelty to eels. This book has been part of my life now for over 10 years; it began and continues to be a labour of love. Despite this it isnt perfect, and I still love to have feedback (good or bad) from the people who matter most: you. Andy (My contact details are at www.statisticshell.com.) 22. How To Use This BookWhen the publishers asked me to write a section on How to use this book it was obviously tempting to write Buy a large bottle of Olay anti-wrinkle cream (which youll need to fend off the effects of ageing while you read), find a comfy chair, sit down, fold back the front cover, begin reading and stop when you reach the back cover. However, I think they wanted something more useful.What background knowledge do I need? In essence, I assume you know nothing about statistics, but I do assume you have some very basic grasp of computers (I wont be telling you how to switch them on, for example) and maths (although I have included a quick revision of some very basic concepts so I really dont assume anything).Do the chapters get more difficult as I go through the book? In a sense they do (Chapter 16 on MAnova is more difficult than Chapter 1), but in other ways they dont (Chapter 15 on non-parametric statistics is arguably less complex than Chapter 14, and Chapter 9 on the t-test is definitely less complex than Chapter 8 on logistic regression). Why have I done this? Well, Ive ordered the chapters to make statistical sense (to me, at least). Many books teach different tests in isolation and never really give you a grip of the similarities between them; this, I think, creates an unnecessary mystery. Most of the tests in this book are the same thing expressed in slightly different ways. So, I wanted the book to tell this story. To do this I have to do certain things such as explain regression fairly early on because its the foundation on which nearly everything else is built! However, to help you through Ive coded each section with an icon. These icons are designed to give you an idea of the difficulty of the section. It doesnt necessarily mean you can skip the sections (but see Smart Alex in the next section), but it will let you know whether a section is at about your level, or whether its going to push you. Ive based the icons on my own teaching so they may not be entirely accurate for everyone (especially as systems vary in different countries!): 12xxiv This means level 1 and I equate this to first-year undergraduate in the UK. These are sections that everyone should be able to understand. This is the next level and I equate this to second-year undergraduates in the UK. These are topics that I teach my second years and so anyone with a bit of background in statistics should be able to get to grips with them. However, some of these sections will be quite challenging even for second years. These are intermediate sections. 23. How To U se This B oo k3 This is level 3 and represents difficult topics. Id expect third-year (final-year) UK undergraduates and recent postgraduate students to be able to tackle these sections.4 This is the highest level and represents very difficult topics. I would expect these sections to be very challenging to undergraduates and recent postgraduates, but postgraduates with a reasonable background in research methods shouldnt find them too much of a problem.Why do I keep seeing stupid faces everywhere? Brian Haemorrhage: Brians job is to pop up to ask questions and look permanently confused. Its no surprise to note, therefore, that he doesnt look entirely different from the author. As the book progresses he becomes increasingly despondent. Read into that what you will. Curious Cat: He also pops up and asks questions (because hes curious). Actually the only reason hes here is because I wanted a cat in the book and preferably one that looks like mine. Of course the educational specialists think he needs a specific role, and so his role is to look cute and make bad cat-related jokes.Cramming Sam: Samantha hates statistics. In fact, she thinks its all a boring waste of time and she just wants to pass her exam and forget that she ever had to know anything about normal distributions. So, she appears and gives you a summary of the key points that you need to know. If, like Samantha, youre cramming for an exam, she will tell you the essential information to save you having to trawl through hundreds of pages of my drivel. Jane Superbrain: Jane is the cleverest person in the whole universe (she makes Smart Alex look like a bit of an imbecile). The reason she is so clever is that she steals the brains of statisticians and eats them. Apparently they taste of sweaty tank tops, but nevertheless she likes them. As it happens, she is also able to absorb the contents of brains while she eats them. Having devoured some top statistics brains she knows all the really hard stuff and appears in boxes to tell you really advanced things that are a bit tangential to the main text. (Readers should note that Jane wasnt interested in eating my brain. That tells you all that you need to know about my statistics ability.) Labcoat Leni: Leni is a budding young scientist and hes fascinated by real research. He says, Andy, man, I like an example about using an eel as a cure for constipation as much as the next man, but all of your examples are made up. Real data arent like that, we need some real examples, dude! So off Leni went; he walked the globe, a lone data warrior in a thankless quest for real data. He turned up at universities, cornered academics, kidnapped their families and threatened to put them in a bath of crayfish unless he was given real data. The generous ones relented, but others? Well, lets just say their families are sore. So, when you see Leni you know that you will get some real data, from a real research study to analyse. Keep it real. Oliver Twisted: With apologies to Charles Dickens, Oliver, like his more famous fictional London urchin, is always asking, Please sir, can I have some more? Unlike Master Twist, though, our young Master Twisted always wants more statistics information. Of course he does, who wouldnt? Let us not be the ones to disappoint a young, dirty, slightly smelly boy who dines on gruel, so when Oliver appears you can be certain of one thing: there is additional information to be found on the companion website. (Dont be shy; download it and bathe in the warm asps milk of knowledge.)xxv 24. xxviD I S C O VE R I N G STAT I ST I C S US I N G S PSSSatans Personal Statistics Slave: Satan is a busy boy he has all of the lost souls to torture in hell; then there are the fires to keep fuelled, not to mention organizing enough carnage on the planets surface to keep Norwegian black metal bands inspired. Like many of us, this leaves little time for him to analyse data, and this makes him very sad. So, he has his own personal slave, who, also like some of us, spends all day dressed in a gimp mask and tight leather pants in front of Spss analysing Satans data. Consequently, he knows a thing or two about Spss, and when Satans busy spanking a goat, he pops up in a box with Spss tips.Smart Alex: Alex is a very important character because he appears when things get particularly difficult. Hes basically a bit of a smart alec and so whenever you see his face you know that something scary is about to be explained. When the hard stuff is over he reappears to let you know that its safe to continue. Now, this is not to say that all of the rest of the material in the book is easy, he just lets you know the bits of the book that you can skip if youve got better things to do with your life than read all 800 pages! So, if you see Smart Alex then you can skip the section entirely and still understand whats going on. Youll also find that Alex pops up at the end of each chapter to give you some tasks to do to see whether youre as smart as he is.What is on the companion website? In this age of downloading, CD-ROMs are for losers (at least thats what the kids tell me) so this time around Ive put my cornucopia of additional funk on that worldwide interweb thing. This has two benefits: (1) The book is slightly lighter than it would have been, and (2) rather than being restricted to the size of a CD-ROM, there is no limit to the amount of fascinating extra material that I can give you (although Sage have had to purchase a new server to fit it all on). To enter my world of delights, go to www.sagepub.co.uk/field3e (see the image on the next page). How will you know when there are extra goodies on this website? Easy-peasy, Oliver Twisted appears in the book to indicate that theres something you need (or something extra) on the website. The website contains resources for students and lecturers alike: MMMMMMData files: You need data files to work through the examples in the book and they are all on the companion website. We did this so that youre forced to go there and once youre there you will never want to leave. There are data files here for a range of students, including those studying psychology, business and health sciences. Flash movies: Reading is a bit boring; its much more amusing to listen to me explaining things in my camp English accent. Therefore, so that you can all have laugh at Andy parties, I have created flash movies for each chapter that show you how to do the SPSS examples. Ive also done extra ones that show you useful things that would otherwise have taken me pages of drivel to explain. Some of these movies are open access, but because the publishers want to sell some books, others are available only to lecturers. The idea is that they can put them on their virtual learning environments. If they dont, put insects under their office doors. Podcast: My publishers think that watching a film of me explaining what this book is all about is going to get people flocking to the bookshop. I think it will have people flocking to the medicine cabinet. Either way, if you want to see how truly uncharismatic I am, watch and cringe. 25. How To U se This B oo kMMMMMMMMMMSelf-assessment multiple-choice questions: Organized by chapter, these will allow you to test whether wasting your life reading this book has paid off so that you can walk confidently into an examination much to the annoyance of your friends. If you fail said exam, you can employ a good lawyer and sue me. Flashcard glossary: As if a printed glossary wasnt enough, my publishers insisted that youd like one in electronic format too. Have fun here flipping about between terms and definitions that are covered in the textbook, its better than actually learning something. Additional material: Enough trees have died in the name of this book, but still it gets longer and still people want to know more. Therefore, Ive written nearly 300 pages, yes, three hundred, of additional material for the book. So for some more technical topics and help with tasks in the book the material has been provided electronically so that (1) the planet suffers a little less, and (2) you can actually lift the book. Answers: each chapter ends with a set of tasks for you to test your newly acquired expertise. The chapters are also littered with self-test questions. How will you know if you get these correct? Well, the companion website contains around 300 hundred pages (thats a different three hundred pages to the three hundred above) of detailed answers. Will I ever stop writing? Cyberworms of knowledge: I have used nanotechnology to create cyberworms that crawl down your broadband connection, pop out of the USB port of your computer then fly through space into your brain. They re-arrange your neurons so that you understand statistics. You dont believe me? Well, youll never know for sure unless you visit the companion website Happy reading, and dont get sidetracked by Facebook.xxvii 26. AcknowledgementsThe first edition of this book wouldnt have happened if it hadnt been for Dan Wright, who not only had an unwarranted faith in a then-postgraduate to write the book, but also read and commented on draft chapters in all three editions. Im really sad that he is leaving England to go back to the United States. The last two editions have benefited from the following people emailing me with comments, and I really appreciate their contributions: John Alcock, Aliza Berger-Cooper, Sanne Bongers, Thomas Brgger, Woody Carter, Brittany Cornell, Peter de Heus, Edith de Leeuw, Sanne de Vrie, Jaap Dronkers, Anthony Fee, Andy Fugard, Massimo Garbuio, Ruben van Genderen, Daniel Hoppe, Tilly Houtmans, Joop Hox, Suh-Ing (Amy) Hsieh, Don Hunt, Laura Hutchins-Korte, Mike Kenfield, Ned Palmer, Jim Parkinson, Nick Perham, Thusha Rajendran, Paul Rogers, Alf Schabmann, Mischa Schirris, Mizanur Rashid Shuvra, Nick Smith, Craig Thorley, Paul Tinsley, Keith Tolfrey, Frederico Torracchi, Djuke Veldhuis, Jane Webster and Enrique Woll. In this edition I have incorporated data sets from real research papers. All of these research papers are studies that I find fascinating and its an honour for me to have these researchers data in my book: Hakan etinkaya, Tomas Chamorro-Premuzic, Graham Davey, Mike Domjan, Gordon Gallup, Eric Lacourse, Sarah Marzillier, Geoffrey Miller, Peter Muris, Laura Nichols and Achim Schetzwohl. Jeremy Miles stopped me making a complete and utter fool of myself (in the book sadly his powers dont extend to everyday life) by pointing out some glaring errors; hes also been a very nice person to know over the past few years (apart from when hes saying that draft sections of my books are, and I quote, bollocks!). David Hitchin, Laura Murray, Gareth Williams and Lynne Slocombe made an enormous contribution to the last edition and all of their good work remains in this edition. In this edition, Zo Nightingales unwavering positivity and suggestions for many of the new chapters were invaluable. My biggest thanks go to Kate Lester who not only read every single chapter, but also kept my research laboratory ticking over while my mind was on this book. I literally could not have done it without her support and constant offers to take on extra work that she did not have to do so that I could be a bit less stressed. I am very lucky to have her in my research team. All of these people have taken time out of their busy lives to help me out. Im not sure what that says about their mental states, but they are all responsible for a great many improvements. May they live long and their data sets be normal. Not all contributions are as tangible as those above. With the possible exception of them not understanding why sometimes I dont answer my phone, I could not have asked for more loving and proud parents a fact that I often take for granted. Also, very early in my career Graham Hole made me realize that teaching research methods didnt have to be dull. My whole approach to teaching has been to steal all of his good ideas and Im pleased that he has had the good grace not to ask for them back! He is also a rarity in beingxxviii 27. Ac kno wl edge mentsbrilliant, funny and nice. I also thank my Ph.D. students Carina Ugland, Khanya PriceEvans and Saeid Rohani for their patience for the three months that I was physically away in Rotterdam, and for the three months that I was mentally away upon my return. I appreciate everyone who has taken time to write nice reviews of this book on the various Amazon sites around the world (or any other website for that matter!). The success of this book has been in no small part due to these people being so positive and constructive in their reviews. I continue to be amazed and bowled over by the nice things that people write and if any of you are ever in Brighton, I owe you a pint! The people at Sage are less hardened drinkers than they used to be, but I have been very fortunate to work with Michael Carmichael and Emily Jenner. Mike, despite his failings on the football field(!), has provided me with some truly memorable nights out and he also read some of my chapters this time around which, as an editor, made a pleasant change. Both Emily and Mike took a lot of crap from me (especially when I was tired and stressed) and Im grateful for their understanding. Emily Im sure thinks Im a grumpy sod, but she did a better job of managing me than she realizes. Also, Alex Lee did a fantastic job of turning the characters in my head into characters on the page. Thanks to Jill Rietema at Spss Inc. who has been incredibly helpful over the past few years; it has been a pleasure working with her. The book (obviously) would not exist without Spss Inc.s kind permission to use screenshots of their software. Check out their web pages (http://www.spss.com) for support, contact information and training opportunities. I wrote much of this edition while on sabbatical at the Department of Psychology at the Erasmus University, Rotterdam, The Netherlands. Im grateful to the clinical research group (especially the white ape posse!) who so unreservedly made me part of the team. Part of me definitely stayed with you when I left I hope it isnt annoying you too much. Mostly, though, I thank Peter (Muris), Birgit (Mayer), Jip and Kiki who made me part of their family while in Rotterdam. They are all inspirational. Im grateful for their kindness, hospitality, and for not getting annoyed when I was still in their kitchen having drunk all of their wine after the last tram home had gone. Mostly, I thank them for the wealth of happy memories that they gave me. I always write listening to music. For the previous editions, I owed my sanity to: Abba, AC/DC, Arvo Prt, Beck, The Beyond, Blondie, Busta Rhymes, Cardiacs, Cradle of Filth, DJ Shadow, Elliott Smith, Emperor, Frank Black and the Catholics, Fugazi, Genesis (Peter Gabriel era), Hefner, Iron Maiden, Janes Addiction, Love, Metallica, Massive Attack, Mercury Rev, Morrissey, Muse, Nevermore, Nick Cave, Nusrat Fateh Ali Khan, Peter Gabriel, Placebo, Quasi, Radiohead, Sevara Nazarkhan, Slipknot, Supergrass and The White Stripes. For this edition, I listened to the following, which I think tells you all that you need to know about my stress levels: 1349, Air, Angantyr, Audrey Horne, Cobalt, Cradle of Filth, Danzig, Dark Angel, Darkthrone, Death Angel, Deathspell Omega, Exodus, Fugazi, Genesis, High on Fire, Iron Maiden, The Mars Volta, Manowar, Mastodon, Megadeth, Meshuggah, Opeth, Porcupine Tree, Radiohead, Rush, Serj Tankian, She Said!, Slayer, Soundgarden, Taake, Tool and the Wedding Present. Finally, all this book-writing nonsense requires many lonely hours (mainly late at night) of typing. Without some wonderful friends to drag me out of my dimly lit room from time to time Id be even more of a gibbering cabbage than I already am. My eternal gratitude goes to Graham Davey, Benie MacDonald, Ben Dyson, Martin Watts, Paul Spreckley, Darren Hayman, Helen Liddle, Sam Cartwright-Hatton, Karina Knowles and Mark Franklin for reminding me that there is more to life than work. Also, my eternal gratitude to Gini Harrison, Sam Pehrson and Luke Anthony and especially my brothers of metal Doug Martin and Rob Mepham for letting me deafen them with my drumming on a regular basis. Finally, thanks to Leonora for her support while I was writing the last two editions of this book.xxix 28. Dedication Like the previous editions, this book is dedicated to my brother Paul and my cat Fuzzy, because one of them is a constant source of intellectual inspiration and the other wakes me up in the morning by sitting on me and purring in my face until I give him cat food: mornings will be considerably more pleasant when my brother gets over his love of cat food for breakfast. 29. Symbols used in this bookMathematical operators This symbol (called sigma) means add everything up. So, if you see something like xi it just means add up all of the scores youve collected.This symbol means multiply everything. So, if you see something like xi it just means multiply all of the scores youve collected.xThis means take the square root of x.Greek symbols The probability of making a Type I errorThe probability of making a Type II erroriStandardized regression coefficient2Chi-square test statistic2 FFriedmans Anova test statisticUsually stands for errorEta-squaredThe mean of a population of scoresThe correlation in the population22The variance in a population of dataThe standard deviation in a population of datax The standard error of the meanKendalls tau (non-parametric correlation coefficient)2Omega squared (an effect size measure). This symbol also means expel the contents of your intestine immediately into your trousers; you will understand why in due coursexxxi 30. xxxiiD I S C O VE R I N G STAT I ST I C S US I N G S PSSEnglish symbols biThe regression coefficient (unstandardized)dfDegrees of freedomeiThe error associated with the ith personFF-ratio (test statistic used in Anova)HKruskalWallis test statistickThe number of levels of a variable (i.e. the number of treatment conditions), or the number of predictors in a regression modellnNatural logarithmMSThe mean squared error (Mean Square). The average variability in the dataN, n, niThe sample size. N usually denotes the total sample size, whereas n usually denotes the size of a particular groupPProbability (the probability value, p-value or significance of a test are usually denoted by p)rPearsons correlation coefficientrsSpearmans rank correlation coefficientrb, rpbBiserial correlation coefficient and pointbiserial correlation coefficient respectivelyRThe multiple correlation coefficientR2The coefficient of determination (i.e. the proportion of data explained by the model)s2The variance of a sample of datasThe standard deviation of a sample of dataSSThe sum of squares, or sum of squared errors to give it its full titleSSAThe sum of squares for variable ASSMThe model sum of squares (i.e. the variability explained by the model fitted to the data)SSRThe residual sum of squares (i.e. the variability that the model cant explain the error in the model)SSTThe total sum of squares (i.e. the total variability within the data)tTest statistic for Students t-testTTest statistic for Wilcoxons matched-pairs signed-rank testUTest statistic for the MannWhitney testWsTest statistic for Wilcoxons rank-sum test X or xThe mean of a sample of scoreszA data point expressed in standard deviation units 31. Some maths revision1 Two negatives make a positive: Although in life two wrongs dont make a right, in mathematics they do! When we multiply a negative number by another negative number, the result is a positive number. For example, 2 4 = 8. 2 A negative number multiplied by a positive one make a negative number: If you multiply a positive number by a negative number then the result is another negative number. For example, 2 4 = 8, or 2 6 = 12. 3 BODMAS: This is an acronym for the order in which mathematical operations are performed. It stands for Brackets, Order, Division, Multiplication, Addition, Subtraction and this is the order in which you should carry out operations within an equation. Mostly these operations are self-explanatory (e.g. always calculate things within brackets first) except for order, which actually refers to power terms such as squares. Four squared, or 42, used to be called four raised to the order of 2, hence the reason why these terms are called order in BODMAS (also, if we called it power, wed end up with BPDMAS, which doesnt roll off the tongue quite so nicely). Lets look at an example of BODMAS: what would be the result of 1 + 3 52? The answer is 76 (not 100 as some of you might have thought). There are no brackets so the first thing is to deal with the order term: 52 is 25, so the equation becomes 1 + 3 25. There is no division, so we can move on to multiplication: 3 25, which gives us 75. BODMAS tells us to deal with addition next: 1 + 75, which gives us 76 and the equation is solved. If Id written the original equation as (1 + 3) 52, then the answer would have been 100 because we deal with the brackets first: (1 + 3) = 4, so the equation becomes 4 52. We then deal with the order term, so the equation becomes 4 25 = 100! 4 http://www.easymaths.com is a good site for revising basic maths.xxxiii 32. Why is my evil lecturer forcing me to learn statistics?1Figure 1.1 When I grow up, please dont let me be a statistics lecturer1.1. What will this chapter tell me?1I was born on 21 June 1973. Like most people, I dont remember anything about the first few years of life and like most children I did go through a phase of driving my parents mad by asking Why? every five seconds. Dad, why is the sky blue?, Dad, why doesnt mummy have a willy? etc. Children are naturally curious about the world. I remember at the age of 3 being at a party of my friend Obe (this was just before he left England to return to Nigeria, much to my distress). It was a hot day, and there was an electric fan blowing cold air around the room. As I said, children are natural scientists and my little scientific brain was working through what seemed like a particularly pressing question: What happens when you stick your finger into a fan? The answer, as it turned out, was that it hurts a lot.1 My point is this: my curiosity to explain the world never went away, and thats why 1 In the 1970s fans didnt have helpful protective cages around them to prevent idiotic 3 year olds sticking their fingers into the blades.1 33. 2D I S C O VE R I N G STAT I ST I C S US I N G S PSSIm a scientist, and thats also why your evil lecturer is forcing you to learn statistics. Its because you have a curious mind too and you want to answer new and exciting questions. To answer these questions we need statistics. Statistics is a bit like sticking your finger into a revolving fan blade: sometimes its very painful, but it does give you the power to answer interesting questions. This chapter is going to attempt to explain why statistics are an important part of doing research. We will overview the whole research process, from why we conduct research in the first place, through how theories are generated, to why we need data to test these theories. If that doesnt convince you to read on then maybe the fact that we discover whether Coca-Cola kills sperm will. Or perhaps not.1.2. What the hell am I doing here? I dont belong here 1 Youre probably wondering why you have bought this book. Maybe you liked the pictures, maybe you fancied doing some weight training (it is heavy), or perhaps you need to reach something in a high place (it is thick). The chances are, though, that given the choice of spending your hard-earned cash on a statistics book or something more entertaining (a nice novel, a trip to the cinema, etc.) youd choose the latter. So, why have you bought the book (or downloaded an illegal pdf of it from someone who has way too much time on their hands if they can scan an 800-page textbook)? Its likely that you obtained it because youre doing a course on statistics, or youre doing some research, and you need to know how to analyse data. Its possible that you didnt realize when you started your course or research that youd have to know this much about statistics but now find yourself inexplicably wading, neck high, through the Victorian sewer that is data analysis. The reason that youre in the mess that you find yourself in is because you have a curious mind. You might have asked yourself questions like why people behave the way they do (psychology) or why behaviours differ across cultures (anthropology), how businesses maximize their profit (business), how did the dinosaurs die (palaeontology), does eating tomatoes protect you against cancer (medicine, biology), is it possible to build a quantum computer (physics, chemistry), is the planet hotter than it used to be and in what regions (geography, environmental studies)? Whatever it is youre studying or researching, the reason youre studying it is probably because youre interested in answering questions. Scientists are curious people, and you probably are too. However, you might not have bargained on the fact that to answer interesting questions, you need two things: data and an explanation of those data. The answer to what the hell are you doing here? is, therefore, simple: to answer interesting questions you need data. Therefore, one of the reasons why your evil statistics lecturer is forcing you to learn about numbers is because they are a form of data and are vital to the research process. Of course there are forms of data other than numbers that can be used to test and generate theories. When numbers are involved the research involves quantitative methods, but you can also generate and test theories by analysing language (such as conversations, magazine articles, media broadcasts and so on). This involves qualitative methods and it is a topic for another book not written by me. People can get quite passionate about which of these methods is best, which is a bit silly because they are complementary, not competing, approaches and there are much more important issues in the world to get upset about. Having said that, all qualitative research is rubbish.2 This is a joke. I thought long and hard about whether to include it because, like many of my jokes, there are people who wont find it remotely funny. Its inclusion is also making me fear being hunted down and forced to eat my own entrails by a hoard of rabid qualitative researchers. However, it made me laugh, a lot, and despite being vegetarian Im sure my entrails will taste lovely. 2 34. 3CHAPTE R 1 Wh y is my evi l lecturer f orcing m e to l earn statistics ?1.2.1. The research process1How do I do How do you go about answering an interesting question? The research process is research? broadly summarized in Figure 1.2. You begin with an observation that you want to understand, and this observation could be anecdotal (youve noticed that your cat watches birds when theyre on TV but not when jellyfish are on3) or could be based on some data (youve got several cat owners to keep diaries of their cats TV habits and have noticed that lots of them watch birds on TV). From your initial observation you generate explanations, or theories, of those observations, from which you can make predictions (hypotheses). Heres where the data come into the process because to test your predictions you need data. First you collect some relevant data (and to do that you need to identify things that can be measured) and then you analyse those data. The analysis of the data may support your theory or give you cause to modify the theory. As such, the processes of data collection and analysis and generating theories are intrinsically linked: theories lead to data collection/analysis and data collection/analysis informs theories! This chapter explains this research process in more detail.DataInitial Observation (Research Question)Generate TheoryIdentify VariablesGenerate HypothesesMeasure VariablesCollect Data to Test TheoryGraph Data Fit a ModelAnalyse Data1.3. Initial observation: finding something that needs explaining 1 The first step in Figure 1.2 was to come up with a question that needs an answer. I spend rather more time than I should watching reality TV Every year I swear that I wont get . hooked on Big Brother, and yet every year I find myself glued to the TV screen waiting for 3My cat does actually climb up and stare at the TV when its showing birds flying about.Figure 1.2 The research process 35. 4D I S C O VE R I N G STAT I ST I C S US I N G S PSSthe next contestants meltdown (I am a psychologist, so really this is just research honestly). One question I am constantly perplexed by is why every year there are so many contestants with really unpleasant personalities (my money is on narcissistic personality disorder4) on the show. A lot of scientific endeavour starts this way: not by watching Big Brother, but by observing something in the world and wondering why it happens. Having made a casual observation about the world (Big Brother contestants on the whole have profound personality defects), I need to collect some data to see whether this observation is true (and not just a biased observation). To do this, I need to define one or more variables that I would like to measure. Theres one variable in this example: the personality of the contestant. I could measure this variable by giving them one of the many wellestablished questionnaires that measure personality characteristics. Lets say that I did this and I found that 75% of contestants did have narcissistic personality disorder. These data support my observation: a lot of Big Brother contestants have extreme personalities.1.4. Generating theories and testing them1The next logical thing to do is to explain these data (Figure 1.2). One explanation could be that people with narcissistic personality disorder are more likely to audition for Big Brother than those without. This is a theory. Another possibility is that the producers of Big Brother are more likely to select people who have narcissistic personality disorder to be contestants than those with less extreme personalities. This is another theory. We verified our original observation by collecting data, and we can collect more data to test our theories. We can make two predictions from these two theories. The first is that the number of people turning up for an audition that have narcissistic personality disorder will be higher than the general level in the population (which is about 1%). A prediction from a theory, like this one, is known as a hypothesis (see Jane Superbrain Box 1.1). We could test this hypothesis by getting a team of clinical psychologists to interview each person at the Big Brother audition and diagnose them as having narcissistic personality disorder or not. The prediction from our second theory is that if the Big Brother selection panel are more likely to choose people with narcissistic personality disorder then the rate of this disorder in the final contestants will be even higher than the rate in the group of people going for auditions. This is another hypothesis. Imagine we collected these data; they are in Table 1.1. In total, 7662 people turned up for the audition. Our first hypothesis is that the percentage of people with narcissistic personality disorder will be higher at the audition than the general level in the population. We can see in the table that of the 7662 people at the audition, Table 1.1 A table of the number of people at the Big Brother audition split by whether they had narcissistic personality disorder and whether they were selected as contestants by the producers No DisorderDisorderTotalSelected 3 9 12Rejected68058457650Total68088547662This disorder is characterized by (among other things) a grandiose sense of self-importance, arrogance, lack of empathy for others, envy of others and belief that others envy them, excessive fantasies of brilliance or beauty, the need for excessive admiration and exploitation of others. 4 36. CHAPTE R 1 Wh y is my evi l lecturer f orcing m e to l earn statistics ?5854 were diagnosed with the disorder, this is about 11% (854/7662 100) which is much higher than the 1% wed expect. Therefore, hypothesis 1 is supported by the data. The second hypothesis was that the Big Brother selection panel have a bias to choose people with narcissistic personality disorder. If we look at the 12 contestants that they selected, 9 of them had the disorder (a massive 75%). If the producers did not have a bias we would have expected only 11% of the contestants to have the disorder. The data again support our hypothesis. Therefore, my initial observation that contestants have personality disorders was verified by data, then my theory was tested using specific hypotheses that were also verified using data. Data are very important!JANE SUPERBRAIN 1.1 When is a hypothesis not a hypothesis?1A good theory should allow us to make statements about the state of the world. Statements about the world are good things: they allow us to make sense of our world, and to make decisions that affect our future. One current example is global warming. Being able to make a definitive statement that global warming is happening, and that it is caused by certain practices in society, allows us to change these practices and, hopefully, avert catastrophe. However, not all statements are ones that can be tested using science. Scientific statements are ones that can be verified with reference to empirical evidence, whereas non-scientific statements are ones that cannotbe empirically tested. So, statements such as The Led Zeppelin reunion concert in London in 2007 was the best gig ever,5 Lindt chocolate is the best food, and This is the worst statistics book in the world are all non-scientific; they cannot be proved or disproved. Scientific statements can be confirmed or disconfirmed empirically. Watching Curb Your Enthusiasm makes you happy, having sex increases levels of the neurotransmitter dopamine and Velociraptors ate meat are all things that can be tested empirically (provided you can quantify and measure the variables concerned). Non-scientific statements can sometimes be altered to become scientific statements, so The Beatles were the most influential band ever is non-scientific (because it is probably impossible to quantify influence in any meaningful way) but by changing the statement to The Beatles were the best-selling band ever it becomes testable (we can collect data about worldwide record sales and establish whether The Beatles have, in fact, sold more records than any other music artist). Karl Popper, the famous philosopher of science, believed that non-scientific statements were nonsense, and had no place in science. Good theories should, therefore, produce hypotheses that are scientific statements.I would now be smugly sitting in my office with a contented grin on my face about how my theories and observations were well supported by the data. Perhaps I would quit while Im ahead and retire. Its more likely, though, that having solved one great mystery, my excited mind would turn to another. After another few hours (well, days probably) locked up at home watching Big Brother I would emerge triumphant with another profound observation, which is that these personality-disordered contestants, despite their obvious character flaws, enter the house convinced that the public will love them and that they will win.6 My hypothesis would, therefore, be that if I asked the contestants if they thought that they would win, the people with a personality disorder would say yes. It was pretty awesome actually. One of the things I like about Big Brother in the UK is that year upon year the winner tends to be a nice person, which does give me faith that humanity favours the nice.5 6 37. 6Are Big Brother contestants odd?D I S C O VE R I N G STAT I ST I C S US I N G S PSSLets imagine I tested my hypothesis by measuring their expectations of success in the show, by just asking them, Do you think you will win Big Brother?. Lets say that 7 of 9 contestants with personality disorders said that they thought that they will win, which confirms my observation. Next, I would come up with another theory: these contestants think that they will win because they dont realize that they have a personality disorder. My hypothesis would be that if I asked these people about whether their personalities were different from other people they would say no. As before, I would collect some more data and perhaps ask those who thought that they would win whether they thought that their personalities were different from the norm. All 7 contestants said that they thought their personalities were different from the norm. These data seem to contradict my theory. This is known as falsification, which is the act of disproving a hypothesis or theory. Its unlikely that we would be the only people interested in why individuals who go on Big Brother have extreme personalities and think that they will win. Imagine these researchers discovered that: (1) people with narcissistic personality disorder think that they are more interesting than others; (2) they also think that they deserve success more than others; and (3) they also think that others like them because they have special personalities. This additional research is even worse news for my theory: if they didnt realize that they had a personality different from the norm then you wouldnt expect them to think that they were more interesting than others, and you certainly wouldnt expect them to think that others will like their unusual personalities. In general, this means that my theory sucks: it cannot explain all of the data, predictions from the theory are not supported by subsequent data, and it cannot explain other research findings. At this point I would start to feel intellectually inadequate and people would find me curled up on my desk in floods of tears wailing and moaning about my failing career (no change there then). At this point, a rival scientist, Fester Ingpant-Stain, appears on the scene with a rival theory to mine. In his new theory, he suggests that the problem is not that personality-disordered contestants dont realize that they have a personality disorder (or at least a personality that is unusual), but that they falsely believe that this special personality is perceived positively by other people (put another way, they believe that their personality makes them likeable, not dislikeable). One hypothesis from this model is that if personality-disordered contestants are asked to evaluate what other people think of them, then they will overestimate other peoples positive perceptions. To test this hypothesis, Fester Ingpant-Stain collected yet more data. When each contestant came to the diary room they had to fill out a questionnaire evaluating all of the other contestants personalities, and also answer each question as if they were each of the contestants responding about them. (So, for every contestant there is a measure of what they thought of every other contestant, and also a measure of what they believed every other contestant thought of them.) He found out that the contestants with personality disorders did overestimate their housemates view of them; in comparison the contestants without personality disorders had relatively accurate impressions of what others thought of them. These data, irritating as it would be for me, support the rival theory that the contestants with personality disorders know they have unusual personalities but believe that these characteristics are ones that others would feel positive about. Fester Ingpant-Stains theory is quite good: it explains the initial observations and brings together a range of research findings. The end result of this whole process (and my career) is that we should be able to make a general statement about the state of the world. In this case we could state: Big Brother contestants who have personality disorders overestimate how much other people like their personality characteristics.SELF-TEST Based on what you have read in this section, what qualities do you think a scientific theory should have? 38. 7CHAPTE R 1 Wh y is my evi l lecturer f orcing m e to l earn statistics ?1.5. Data collection 1: what to measure1We have seen already that data collection is vital for testing theories. When we collect data we need to decide on two things: (1) what to measure, (2) how to measure it. This section looks at the first of these issues.1.5.1. Variables11.5.1.1.Independent and dependent variables1To test hypotheses we need to measure variables. Variables are just things that can change (or vary); they might vary between people (e.g. IQ, behaviour) or locations (e.g. unemployment) or even time (e.g. mood, profit, number of cancerous cells). Most hypotheses can be expressed in terms of two variables: a proposed cause and a proposed outcome. For example, if we take the scientific statement Coca-Cola is an effective spermicide7 then proposed cause is CocaCola and the proposed effect is dead sperm. Both the cause and the outcome are variables: for the cause we could vary the type of drink, and for the outcome, these drinks will kill different amounts of sperm. The key to testing such statements is to measure these two variables. A variable that we think is a cause is known as an independent variable (because its value does not depend on any other variables). A variable that we think is an effect is called a dependent variable because the value of this variable depends on the cause (independent variable). These terms are very closely tied to experimental methods in which the cause is actually manipulated by the experimenter (as we will see in section 1.6.2). In crosssectional research we dont manipulate any variables, and we cannot make causal statements about the relationships between variables, so it doesnt make sense to talk of dependent and independent variables because all variables are dependent variables in a sense. One possibility is to abandon the terms dependent and independent variable and use the terms predictor variable and outcome variable. In experimental work the cause, or independent variable, is a predictor, and the effect, or dependent variable, is simply an outcome. This terminology also suits cross-sectional work where, statistically at least, we can use one or more variables to make predictions about the other(s) without needing to imply causality.CRAMMING SAMs Tips Some important termsWhen doing research there are some important generic terms for variables that you will encounter:Independent variable: A variable thought to be the cause of some effect. This term is usually used in experimental research to d