Notes on Jackson's Research Methods and Statistics 3rd edition Text

download Notes on Jackson's Research Methods and Statistics 3rd edition Text

of 22

Transcript of Notes on Jackson's Research Methods and Statistics 3rd edition Text

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    1/22

    Table of Contents

    Chapter 1: Thinking like a scientist

    Chapter 2: Getting started: ideas, resources, ethics

    Chapter 3: defining, measuring and manipulating variables

    Chapter 4: descriptive methods

    Chapter 5: data organization and descriptive statistics

    Chapter 6: correlational methods and statistics

    Chapter 7: probability and hypothesis testing

    Chapter 8: introduction to inferential statistics

    Chapter 9: the logic of experimental design

    Chapter 10: inferential statistics: two group designs

    Chapter 11: experimental designs with more than two levels of an independent variable

    Chapter 12: complex experimental designs

    Chapter 13: quasi-experimental and single-case designs

    Chapter 1 thinking like a scientist

    Sources of knowledge: p.6

    Superstition, intuition (couples more likely to conceive after adoptingan illusory correlation), authority (e.g. parents, actors), tenacity (repetition increases believabilityadvertising), rationalismlogical reasoning (syllogisms: A categorical syllogism consists of three parts: the major premise, the

    minor premise and the conclusion. Each of the premises has one term in common with the conclusion: in a

    major premise, this is the major term (i.e., the predicate of the conclusion); in a minor premise, it is the minor

    term (the subject) of the conclusion. For example:

    o Major premise: All men are mortal.o Minor premise: All Greeks are men.o Conclusion: All Greeks are mortal.

    Empiricismknowledge through observation and experiences; get a long list of observable facts; needrationalism to assemble these facts logically; Aristotle was an empiricist, while Plato was a theorist.

    Science: rationalism + empiricism; A hypothesis is a prediction regarding the outcome of a single study. Manyhypotheses may be tested and several research studies conducted before a comprehensivetheory on a topic is

    put forth.

    Publicly verifiable knowledge: research can be observed, replicated, criticized, and tested for veracity p.11 principle

    of falsifiability: a theory must be stated in such a way that it is possible to refute.

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    2/22

    Scientific research has three basic goals: (1) to describe behavior, (2) to predict behavior, and (3) to explain behavior

    p.14

    Research Methods in Science

    Descriptive Methodso Observational: Naturalistic observation and Laboratory observation (p.15)o Case Study Method: in-depth study of one or more individuals e.g. Piagets theory of cognitive

    development in children developed by simply describing the individual(s) being studied.

    o Survey Method: question individuals on topic(s) and then describe their responses Predictive (relational) methods: we do not systematically manipulate the variables of interest; we only

    measure them; since alternative explanations cant be ruled out, cannot establish causation

    o Correlational method: assesses the degree of relationship between two measured varso Quasi-experimental method: differs from the experimental method in that subjects choose to be

    members of the different groups being studied i.e. subject/participant varcant be changed i.e. it is

    not a manipulated variable e.g. sorority vs. non-sorority girls (p.17)

    Experimental method: Controls are very important in such experiments; you control who is in the study (geta representative population), who participates in each group (control for differences in participants by

    random assignment between the control (baseline) group and the experimental group), and the treatment

    each group receives (e.g. some take Vit C and some do not). Other vars such as amt of sleep, type of diet,

    amt of exercise might also need to be controlled. P.19

    Chapter 2: Getting Started on a Research Idea

    Selecting a Problem: review past research on the problem OR review the pertinent chapter in the psychology text OR

    observe a problem in nature and decide how to address it

    Reviewing the Literature (p.33,34): A list of psychology journals is on p.32; Psych Abstracts, published by the APA, lists

    abstracts on a monthly basis of all published work; PsycINFO is an electronic database that provides abstracts and

    citations to the scholarly literature in the behavioral sciences and mental health. To help you choose appropriate

    keywords, use the

    APAs Thesaurus of Psychological Index Terms. Whereas Psych Abstracts finds articles published on a given topic withina given year, the Social Science Citation Index (SSCI) can help you to work from a given article (a key article) and see

    what has been published on that topic since the key article was published.p.34.

    PsyArticles is an online database that provides full-text articles from many psychology journals and is available through

    many academic libraries. ProQuest is an online database that searches both scholarly journals and popular media

    sources. Full-text articles are often available.p.35

    Journal Article Structure (p.37)

    Research articles usually have five main sections: Abstract, Introduction, Method, Results, and Discussion. The Abstract

    is a brief description of the entire paper that typically discusses each section of the paper (Introduction, Method,

    Results, and Discussion). It should not exceed 120 words. The Introduction has 3 components: (1) intro to the problem

    (2) review of relevant previous research, and (3) purpose/rationale for study. Method section includes participants(selection processes), materials/apparatus (testing materials, equipment), and procedure (groups used in study,

    instructions to participants, experimental manipulation, controls etc); The Results section summarizes the data collected

    and the type of statistic used to analyze the data. This section should include a description of the results only, not an

    explanation of the results. Discussion: The results are evaluated and interpreted in the Discussion section. It typically

    begins with a restatement of the predictions of the study and tells whether or not the predictions were supported.

    Institutional Review Boards (IRBs) oversee all federally funded research involving human participants. P.44 If the

    participants in a study are classified as at minimal risk, then an informed consent is not mandatory. P.47 In studies

    where anonymity and confidentiality are at risk, an informed consent form should be used.

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    3/22

    Chapter 3: Defining, Measuring, and Manipulating Variables (p.57)

    Operational definition: defining a variable in concrete terms e.g. hunger: >12hrs w/o

    food intake, anxiety: galvanic skin response or HR. Purpose: communicate clearly to

    others and measure/manipulate vars consistently.

    Properties of measurement are listed at the right p.58

    Scales of Measurement

    A nominal scale is one in which objects or individuals are assigned tocategories that have no numerical properties. Nominal scales have the

    characteristic ofidentity. Such variables are categorical b/c. data is grouped

    into categories. E.g. ethniticity

    In an ordinal scale, the categories form a rank order along a continuum andhave the properties ofidentity and magnitude but lack equal unit size (diff

    bet. Rank 1 and 2; and rank 2 and 3) and absolute zero. Also known as

    ranked data. E.g. class rank

    In an interval scale, the units of measurement (intervals) between thenumbers on the scale are all equal in size. When you use an interval scale, the criteriaof identity, magnitude,

    and equal unit size are met. E.g. Celsius temp scale the Fahrenheit scale does not have an absolute zero.Because of this, you cannot form ratios based on this scale (for example, 100 degrees is not twice as hot as 50

    degrees).

    Ratio data have all four properties of measurementidentity, magnitude, equal unit size, and absolute zero.Examples of ratio scales of measurement include weight, time, and height.

    Aptitude tests measure an individuals potential to do something, whereas achievement tests measure an individuals

    competence in an area. P.63 Behavioral measures are often referred to asobservational measures because they involve

    observing anything that a participant does. Most physical measures, or measures of bodily activity, are not directly

    Observable.

    Reliability refers to the consistency or stability of a measuring instrument.p65 Examples of errors include trait(participant truthfulness e.g.) error and method errors (operator using equipment). In effect, a measurement is a

    combination of the true score and an error score. Observed score = True score + Measurement error

    Reliability is measured using correlation coefficients. A correlation coefficient measures the degree of relationship

    between two sets of scores and can vary between -1.00 and +1.00. To establish the reliability (or consistency) of a

    measure, we expect a strong correlation coefficientusually in the .80s or .90sbetween the two variables or scores

    being measured. A positive coefficient indicates that those who scored high on the measuring instrument at one time

    also scored high at another time, those who scored low at one point scored low again. P.68

    Types of reliability: test/retest reliability (lowered if practice effects--person can get better between testing frompractice), alternate-forms reliability (diff but equivalent questions on the tests), split-half reliability, and inter-rater

    reliability

    Validity: a measure that is valid measures what it claims to measure. P.70 A systematic examination of the test content

    to determine whether it covers a representative sample of the domain of behaviors to be measured assessescontent

    validity. Criterion validity: estimate present performance (concurrent validity) or to predict future performance

    (predictive validity). The construct validity of a test assesses the extent to which a measuring instrument accurately

    measures a theoretical construct or trait that it is designed to measure.

    p.72: A test can be reliable, but not valid, but it can never be valid without being reliable.

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    4/22

    Chapter 4: Descriptive MethodsDescriptive methods: describe what has been observed in a group of people or animals, but dont allow one to make

    accurate predictions or determine cause-and-effect relationships. Five different types of descriptive methods

    observational methods, case studies, archival method, qualitative methods, and surveys (p.79).

    Observational Methods: two types, naturalistic and laboratory (systematic).

    Naturalistic: Ecological validity refers to the extent to which research can be generalized to real-life situations.In nonparticipant observation (goodall), there is the issue of reactivityparticipants reacting in an unnatural

    way to someone obviously watching them. Disguised observation mitigates this. Expectancy effects are the

    effect of the researchers expectations on the outcome of the study naturalistic observation has greater

    flexibility but less control than laboratory observation.

    Laboratory: also concerned with reactivity and expectancy effects. Advantage is that the situation is contrived sothe likelihood that participants will perform the behavior is higher. P.83.

    Observational Methods: Data Collection

    Narrative records: full narrative descriptions of a participants behavior.E.g. piagets studies of cognitivedevelopment in children

    Checklists: A static item is a means of collecting data on characteristics that will not change while the observations are being made. E.g. # ppl present, age, gender; anaction item, is used to record whether specific

    behaviors were present or absent during the observational time period. Disadvantage is missing behavior not

    present on the checklist.

    Qualitative Methods (case studies, archival, interviews/focus groups, field studies, action research): These are

    distinguished from observational methods as follows: researchers are typically not interested in simplifying, objectifying

    or quantifying what they observe.

    Case Study Method: e.g. Piaget; an in-depth study of one or more individuals in the hope of revealing things that are

    true of all of us. Problems: atypical individual causes erroneous generalizations to population. Also, expectancy effects.

    P.85

    Archival Method: describing data that existed before the time of the study. E.g. whether more babies are born when themoon is full. Use US census bureau etc. Risks are selection bias (cherry-pick data sources, also risk of not reliability or

    validity b/c. using someone elses data

    Interviews/focus group methods: 3 types of interviews: standardized interview (fixed questions), semistandardized, and

    unstandardized (unstructured)

    Field studies: similar to naturalistic observation; difference is that data are always collected in narrative form and left

    that way. P.90 text.

    Qualitative Method: Qualitative research focuses on phenomena that occur in natural settings, and the data are

    typically analyzed without the use of statistics. Both the naturalistic observational method and the case study method

    can be qualitative in nature.

    Surveys (summary table on p.102): closed-ended, open-ended, partially open-ended (closed ended Questions with an

    additional other option). A Likert rating scale (most psychs view it as interval, but some ordinal) presents a

    statement rather than a question, and respondents are asked to rate their level of agreement with the statement.p.89

    A loaded question is one that includes nonneutral or emotionally laden terms (e.g. eliminating wasteful excesses). A

    leading question is one that sways the respondent to answer in a desired manner e.g. most people believe... Adouble-

    barreled question asks more than one thing in a single item. Survey questions should not be randomly arranged:

    sensitive questions (e.g. drug/sexual use) at end, demographic questions at end b/c. boring, group Qs on similar topics

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    5/22

    together.p.90 A socially desirable response is one that is given because participants believe it is deemed appropriate by

    society, rather than because it truly reflects their own views or behaviors.

    mail survey: less sampling bias than phone/email b/c. wide availibilty; also ppl more comfortable answeringsensitive stuff; disadv: if Qs are unclear, no clarification; low response rate: 20%

    Sampling Techniques for Surveys: There are two ways to sample individuals from a population: probability sampling and

    nonprobability sampling.

    Probability sampling:p.95 random selection, stratified random sample (guarantee that the sample accuratelyrepresents the population on specific characteristics; cluster sampling: e.g. sample from classes that arerequired of all students at the university, such as English composition.

    Non-probability Sampling:individual members of the population do not have an equal likelihood of beingselected

    o Convenience (haphazard) sampling: if you wanted a sample of 100 college students, you could standoutside of the library and ask people who pass by to participate

    o Quota sampling: Quota sampling is to nonprobability sampling what stratified random sampling is toprobability sampling, but still not much effort devoted to creating a sample that is truly representative

    of the population

    Chapter 5: Data Organization and Descriptive Statistics

    In a class interval frequency distribution, individual scores are combined into categories, or intervals, and then listed

    along with the frequency of scores in each interval. P.106

    For nominal scale or qualitative data, a bar graph (graphical representation of frequency distribution) is most

    appropriate e.g. democrats, independents, republicans. For quantitative data in ordinal, interval, or ratio scales, a

    histogram is used. P.107 Unlike in a bar graph, in a histogram, the bars touch each other to indicate that the scores on

    the variable represent related, increasing values.

    Frequency polygona line graph of the frequencies of individual scores or intervals. Again, scores (or intervals) are

    shown on thex-axis and frequencies on the y-axis. After all the frequencies are plotted, the data points are connected.

    Use with quantitative, continuous data like height, weight.

    Measures of Central Tendency: A measure of central tendency is a representative number that characterizes the

    middleness of an entire set of data. E.g. Mean, median, and mode p. 110

    Mean: The mean is appropriate for interval and ratio data but not for ordinal or nominal data

    For the sample mean,

    Median: not affected by extreme scores.

    Measures of Variation: A measure of central tendency provides information about the middleness of a distribution of

    scores but not about the width or spread of the distribution. P.114 A measure of variation indicates the degree to

    which scores are either clustered or spread out in a distribution.

    Range: The simplest measure of variation is the rangethe difference between the lowest and the highest scores in a

    distribution. The range is usually reported with the mean of the distribution.

    Standard Deviation for a population (p.117):

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    6/22

    Standard Deviation for a sample:

    Compare this to the average deviation for a population, which is given by . Note that the standard

    deviation will always be larger than the average deviation because the squaring of the terms gives more weight to

    outlying values.

    If, however, sample data is being used to estimate the population standard deviation, then an unbiased estimator

    modification of N 1 must be used: p.126

    Notice that the symbol for the unbiased estimator of the population standard deviation is s (lowercase), whereas the

    symbol for the sample standard deviation is S (uppercase). The estimate has N-1 in the denominator to compensate for

    the small samples not containing as much variability as the real population.

    Variance is the square of the standard deviation.

    Normal distributions are bell-shaped, symmetrical, and have an identical mean, median, and mode. They are unimodal;

    most observations are centrally clustered. Last, when standard deviations are plotted on thex-axis, the percentage of

    scores falling between the mean and any point on thex-axis is the same for all normal curves. (p.121)

    Kurtosis: how flat or peaked a normal distribution is; Platykurtic = short and wide (think: platypus = close to theground, flat); Mesokurtic = medium height/breath; Leptokurtic = tall and thin (think: lepto = leap)

    In a positively skewed distribution, the peak is to the left of the center point, and

    the tail extends toward the right. Reason for its name: few individuals have

    extremely high scores that pull the distribution in that direction. Negatively

    skewed is just the opposite. P.122 If your disease has a low median survival rate,

    you would prefer a positive skewthis means some people live for a very long

    time post-diagnosis.

    The Z-score (p.124): A z-score or standard score is a measure of how many

    standard deviation units an individual raw score falls from the mean of the

    distribution. Thus, when calculating a z-score for an individual in comparison to a

    sample, we use , while for a

    population, we use .

    If the distribution of scores for which you are

    calculating transformations (z-scores) is normal

    (symmetrical and unimodal), then it is referred to as

    the standard normal distributiona normal

    distribution with a mean of 0 and a standard

    deviation of 1.p.126

    The standard normal curve can also be used to determine an individualspercentile rankthe percentage of scores

    equal to or below the given raw score. P131

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    7/22

    Chapter 6: Correlational Methods and StatisticsCorrelational Methods: determine whether two variables are related to one another. P.148 In

    addition to describing a relationship, correlations allow us to make predictions from one

    variable to another. If two variables are correlated, we can predict from one variable to the

    other with a certain degree of accuracy. The magnitude or strength of a relationship is

    determined by the correlation coefficient describing the relationship:0 = no correlation; 0 -

    0.29: weak correlation; 0.3 to 0.69: moderate correlation; 0.7 to 1.0: strong correlation; 1.0 =

    perfect correlation. In a perfect correlation, an increase/decrease in one variable is always

    accompanied by an increase/decrease in the other variable.

    Thus, in a graph, when there is a perfect correlation, the data points all fall exactly on a straight line (the

    slope is irrelevant unless it is zero). Accompanying scatterplot shows no relationship. Also, it is

    possible for a correlation coefficient of zero to indicate a curvilinear relationship (the + and

    correlations nullify each other e.g. Anxiety vs. text performance, memory and age ) p.144

    Misinterpreting Correlations

    Causality refers to the assumption that the correlation indicates a causal relationship between two variables, whereas

    directionality refers to the inference made with respect to the direction of a causal relationship between two variables.

    P.146.

    Third variable effects: a strong correlation between two variables is not really a meaningful relationship and isreally the product of a third variable. E.g. researchers found contraceptive use strongly correlated w/. # ofelectrical appliances; the third var was socioeconomic status; to remove the effect of the 3rd var, use partial

    correlation p.148.

    Restrictive Range: examine the correlated vars over a very short range that isnt big enough to observe acorrelation.

    Curvilinear relationships mask correlationsPearson product-moment correlation coefficient: Pearsonsris used for data measured on an interval or ratio scale of

    measurement. P.151. e.g. consider a list of 20 individuals heights and weights. Step 1: calculate the mean and S.D. for

    the heights and weights. Next, convert each value to a z-score. If the correlation is strong and positive, we should find

    that positive z-scores on one variable go with positive z-scores on the other variable and vice versa. Step 2: calculate thecross-product i.e. multiply each of the z-scores together and sum the respective products. If both zs are consistently

    positive or negative or positive/negative, you will end up with a large positive or negative value and a strong correlation

    The overall formula is below: General rule of thumb: at least 10 ppl per variable. An alternative,

    computational formula is listed on p.154.

    Coefficient of Determination: Calculated by squaring the correlation coefficient, the coefficient of determination (r2) is

    a measure of the proportion of the variance in one variable that is accounted for by another variable. R2 is typically

    reported as a percentage.p.154

    Correlations for Nominal or Ordinal Data:

    Spearmans rank-order correlation coefficient: both vars are ordinal (ranking) scale. If one var is interval/ratio, itmust first be converted into the ordinal scale.

    Point-biserial correlation coefficient: one var is a two-value dichotomous nominal (e.g. gender) and the other isinterval or ratio

    Phi coefficient: both vars are dichotomous nominal vars.Regression Analysis:p.156 A tool that enables us to predict an individuals score on one variable based on knowing one

    or more other variables is regression analysis. Regression analysis involves determining the equation for the best-fitting

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    8/22

    line for a data set. The line has the form of y = mx + b, but is written as follows: , where Y is the

    predicted value on the Yvariable, b is the slope of the line,Xrepresents an individuals score on theXvariable, and a is

    the y-intercept. To compute the slope: To compute the y-intercept: , where the

    bars are the respective sample means. Multiple regression analysis involves combining several predictor variables in a

    single regression equation to increase the predictive accuracy because in the real world, it is unlikely that one variable is

    affected by only one other variable.

    Chapter 7: Hypothesis Testing and Inferential Statistics

    Probability: multiplication and addition rule p.178

    It is impossible statistically, however, to demonstrate that something is true. In fact, statistical techniques are much

    better at demonstrating that something is not true. Whatever the research topic, thenull hypothesis always predicts

    that there is no difference between the groups being compared.

    One-tailed test p.184: E.g. Do students in after-school programs have higher IQs than those in the generalpopulation? The null and alternative hypotheses are:

    Two-tailed test: e.g. the researcher just wants to prove that there are IQ differences between the two groups,but isnt concerned with the direction of those differences.

    Errors: p.186

    The p-value or alpha level: When a result is statistically significant at the 0.05 (or 5%) level, it means that the

    observed difference between the sample and the population could have occurred by chance only 5 out of every 100

    times. In other words, any variation between groups is most likely due to true/real differences between them. In

    this case, the risk of a Type I error is 5%.

    Chapter 8: Introduction to Inferential Statistics

    Inferential Statistics: p.197 three teststhe z test, the t test, and the chi-square (X2) test; the z test and the t-test are

    used with interval or ratio data and are parametricassumptions such as knowing population mean (u) and standard

    deviation (sigma) are needed; the chi-square test is used with ordinal or nominal data and is non-parametric.

    The z-test: parametric inferential statistical test. needs population parameters such as mean and standard deviation.

    determines the likelihood that the sample is part of the sampling distribution. allows us to test the null hypothesis for a

    single sample when the population variance is known. Remember that az-score tells us how many standard deviations

    above or below the mean of the distribution an individual score falls. But in the IQ problem above, we are not comparing

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    9/22

    an individuals score to the population mean, but rather a sample mean must be compared instead with a distribution of

    sample means, known as the sampling distribution.

    Standard Error of the Mean p.198: the standard error of the mean (the standard deviation of thesampling distribution) can never be as large as , the standard deviation for the distribution of individual scores.

    Think about it this way: if the size of each of these samples were to approach the population size, their means

    would all be tightly clustered around the pop. mean and the standard deviation of the sample distribution would

    be very small. Thus, the central limit theorem states that for any population with mean u and standard

    deviation , the distribution of sample means for sample size N will have a mean of u and a standard deviation

    of/sqrt (N) and will approach a normal distribution as N approaches infinity. p.198 Thus,

    The z-score will tell us how many standard deviation units a sample mean is from the population mean, or thelikelihood that the sample is from that population. P.175 e.g. if wind up with a z = 2.06 for the one-tailed test,

    the zritical = 1.64 i.e. the area under the graph to the right of that is 5%. The z-value would be significant and H0

    would be rejected. In APA style, report result as Z (n = 50) = 2.06, p

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    10/22

    The estimated standard error of the mean: , where . is the estimatedstandard error of the mean i.e. an estimate of the standard deviation of the sampling distribution based on

    sample data since the pop. Standard dev is not known. s, (the estimated standard deviation for a population,

    based on sample data):

    APA style: t(9) = 2.06, p

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    11/22

    History: change in dependent var due to external circumstances; eg. Stress reduction b/c. exams at start andvacation at end of study

    Maturation: participants mature physically, socially, and cognitively over course of study Testing: the testing effectchange in performance due to familiarity with and practice on test items. Both +

    practice effect and fatigue effect

    Regression to the Mean: extreme scores that are the product of chance will moderate upon retesting Instrumentation effect: observer becomes better/more fatigued with taking measures Attrition/Mortality: e.g. heaviest smokers in experimental cessation group drop-out; post-test measures would

    be unduly optimistic Diffusion of treatment: people receive treatment info from other participants Experimenter/Participant Effects: experimenter bias or expectancy effects influence outcome e.g. clever hans

    the mathematical horse receiving cues from owners. Solve via single blind: either the experimenter or the

    participants are blind to the manipulation being made or double blind: both unaware; Participant effects include

    reactivitychange in behavior due to being watched. Also, placebo effect.

    Floor and ceiling effects: e.g. measure rat weight in poundsno change detectedfloor effect; ceiling effectmeasure elephant w/. 350 lb max limit bathroom scale;

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    12/22

    Threats to External Validity

    Generalization to Populations: hampered by the college sophomore problem Generalizations from Lab settings: control maximized in lab settingsthe artificiality criticism; solve by

    conceptual replicationtest concepts via diff indep var or dep var.

    Correlated-Group Designs: participants in experimental and control groups are related Within-participant design: also known as repeated measures designsall participants serve in all conditions;

    benefit is that you need fewer participants (e.g. if there are 4 conditions and need 15 ppl per condition; then in

    the between-participants design, need 60 ppl, whereas only 15 for within-participant design), takes less time,

    and increases statistical power b/c. reduces variability due to individual differences; this mode is popular is

    psychological research p.240 downside; b/c. participants tested at least twice, practice/fatigue effects; solve via

    counterbalancingreverse the order of tasks presented to control and experimental groups; however, with

    three conditions, 6 possibilities, 4 conditions have 24 orderings of conditions; therefore, complete

    counterbalancingexposing participants to all of the orderings of conditions is not possible; also carry-over

    effectsdrug administered in one condition effects performance in subsequent conditions

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    13/22

    Matched-Participants Experimental design: for each participant in one condition, there is a participant in theother condition(s) who matches him or her on some relevant variable or variables. Has advantages over the

    between-participant design (groups are more similar) and the within-participant design (less carryover testing

    effects); downsidemore people needed; also mortality effectsif one person drops out, the pair is

    compromised; also difficulty finding participants (p.242)

    Chapter 10: Inferential StatisticsTwo-group Designs

    The inferential statistics discussed in Chapter 7 compared single samples with populations (z test, ttest, and test).

    The statistics discussed in this chapter are designed to test differences between two equivalent groups or treatment

    conditions.

    The t Test for Independent Groups (Samples): p.251It indicates whether the two samples perform so similarly that we conclude they are likely from the same population, or

    whether they perform so differently that we conclude they represent two different populations. P.227 e.g. researcher

    wants to determine whether spacingstudy same amt of material all at intervalsis superior to cramming. Thus,

    The dependent var is participants scores on a test

    Statistical significance indicates that an observed difference between two descriptive statistics (such as means) is

    unlikely to have occurred by chance.

    Rather than comparing a single sample mean to a population mean, we are comparing two

    sample means. To determine how far the difference between the sample means is from the difference between the

    population means, we convert the mean differences to standard errors.

    The standard error of the difference between the means does have a logical meaning. If we took thousands of pairs of

    samples from these two populations and found for each pair, those differences between means would not

    all be the same. They would form a distribution. The mean of that distribution would be the difference between the

    means of the populations and its standard deviation would be . Thus,

    , where . s12 and s2

    2 are the variances of the two groups. P.252 The

    degrees of freedom for this independent groups t test are (n1 -1) + (n2 -1). Refer to Table A.3 for the tcritical value. APA

    style: t(18) =4.92, p

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    14/22

    effect sizethe proportion of variance in the dependent variable that is accounted for by the manipulation of the

    independent variable. It is an estimate of the effect of the independent variable, regardless of sample size. P.232 For

    the ttest, one formula for effect size, known as Cohens d, is

    According to Cohen (1988, 1992), a small effect size is one of at least 0.20, a medium effect size is at least 0.50, and alarge effect size is at least 0.80. e.g. APA: t(18) = 4.92,p = .05 (one-tailed), d= 2.198

    R2: the proportion of variance accounted for in the dependent variable based on knowing which treatment group the

    participants were assigned to for the independent variable.

    Confidence Intervals: Same formula as before (Ch. 7), except that rather than using the sample mean and the standard

    error of the mean, we use the difference between the means and the standard error of the difference between means.

    p.257

    T test for Correlated GroupsThe same people are used in each group (a within-participants design) or different participants are matched between

    groups (a matched-participants design). P.260 In a correlated groups design, the sample includes two scores for each

    person, instead of just one. The null hypothesis is that there is no difference between the two scores. The degrees of

    freedom for a correlated-groups ttest are equal to N 1

    Step 1: We compute a difference score for each person by subtracting one score from the other for that person (or the

    two individuals in a matched pair).

    The standard error of the difference scores is the standard deviation of the sampling distribution of mean

    differences between dependent samples.

    , where sDis the unbiased estimator of the standard deviation of the difference scores and N is the

    number of participants in each group.

    Effect size: Cohens d and r2 p.262

    . The r2 formula is the same as that listed above

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    15/22

    Confidence interval:

    e.g. on word memorization differences between concrete and abstract words, we could

    answer that we are 95% confident that the difference in performance on the 20-item memory test between the two

    word type conditions would be between 0.96 and 4.04 words recalled correctly.

    Nonparametric Tests

    A nonparametric test does not use any population parameters, such as the mean and standard deviation. Three

    nonparametric tests: the Wilcoxon rank-sum test, the Wilcoxon matched-pairs signed-ranks Ttest (both used withordinal data), and the chi-square test of independence, used with nominal data. P.240

    Wilcoxon Rank-Sum Test: p.265

    The Wilcoxon rank-sum test is similar to the independent-groups ttest; however, it uses ordinal data (ranking) rather

    than interval-ratio data and compares medians rather than means. Interval or ratio data may be converted to ranked

    ordinal data. The underlying distribution is not normal. First, sum the ranks for the group expected to have the smaller

    total. This value needs to be equal to or less than the critical value to be statistically significant. Further, in table A.6, n1

    is always the smaller of the two groups. Refer to Table A.6. Table A.6 presents the critical values for one-tailed tests

    only. If a two-tailed test is used, the table can be adapted by dividing the alpha level in half. n1(the number of

    participants in a group) is always the smaller of the two groups. Assumptions of this test: p.266

    Wilcoxon Matched-Pairs T Test

    This is a nonparametric statistic and is necessary whenever the distribution is skewed (i.e. not normal). P.243

    e.g. during the first term, the teacher measures the number of books her students read and ranks them ordinally; during

    the second term, a rewards program is instituted and the students are again ranked. Is there a statistically significant

    difference between the # of books read? The null hypothesis is that the median number of books read does not differ;

    the alternative hypothesis is that the median number of books read during rewards is greater.

    Step 1: for each student, compute a difference score (subtract books read 2nd

    month from those read first month); if

    program had no effect, would expect most scores to be close to 0.

    Step 2: rank the absolute values of the difference scores. If two scores at position 1 have the same numerical value,

    they are both ranked 1.5 and the next score gets a 3. Note that any values with a difference score of zero are not

    ranked and do not figure into the N value.

    Step 3: give the rank the sign of the difference score it representsStep 4: sum the positive and negative ranks. for a two-tailed test, Tobt is equal to the smaller of the summed ranks. In

    contrast, the Tobt for a one-tailed test is the sum of the signed rankspredictedto be smaller. p.268 As with the Wilcoxon

    rank-sum test, the obtained value needs to be equal to or less than the critical value to be statistically significant.

    Chi-Square Test of IndependenceThis nonparametric test compares an observed frequency distribution to an expected frequency distribution of two

    nominal variables. P.245 The difference between the Chi-Square test of independence and the Chi-Square goodness-of-

    fit test (ch.7) is that the goodness-of-fit test compares how well an observed frequency distribution ofone nominal

    variable fits some expected pattern of frequencies, whereas the test of independence compares how well an observed

    frequency distribution oftwo nominal variables fits some expected pattern of frequencies. The degrees of freedom for

    this test are equal to (r-1)(c - 1), where ris the number of rows and c is the number of columns.

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    16/22

    Objective: determine whether babysitters are more likely to have taken first aid than those who have never worked as

    babysitters.

    To determine the expected frequency for each cell:

    , where RTis the row total, CTis the column total, and N is the total number of observations. P.246

    If the exceeds the , then thenull hypothesis can be rejected.

    Chi-Square test and effect size: Phi Coefficient

    As with the ttests discussed earlier in this chapter, we can also compute the effect size for a test of independence.

    . Cohens (1988) specifications for the phi coefficient indicate that a phi coefficient of .10 is a small effect,

    .30 is a medium effect, and .50 is a large effect. In our particular example, if the phi value is small, then the difference

    observed in whether a teenager had taken a first aid class is not strongly accounted for by being a babysitter.

    Summary:

    First consideration: determine whether to use either a parametric or a nonparametic statistic; if the data is not normally

    distributed, use nonparametric; also if certain population parameters such as mean and standard deviation are not

    provide, use nonparametric (Wilcoxon or Chi-square); if data is normal, use parametric, such as T-test.

    Second consideration: whether a between-participants or correlated-groups design has been used. P.248

    A nonparametric test is one that does not involve the use of any population parameters, such as the mean and standard

    deviation. In addition, a nonparametric test does not assume a bell-shaped distribution. The test is nonparametric

    because it fits this definition.

    Chapter 11: Experimental designs with More than Two Levels of an Independent

    VariableThe experiments described in Chapter 9 involved manipulating one independent variable with only two levels (aka

    treatments)either a control group and an experimental group or two experimental groups. Researchers may want

    more than 2 levels of an independent var b/c. they can compare multiple treatments e.g. compare placebo group w/.

    control/experimental groups. P.281

    If group 1 is compared to group 2, 2 to 3, 3 to 4, and so on, we increase the risk of a type 1 error by

    where c equals the number of comparisons performed. One way of counteracting this is to use a more stringent alpha

    level by performing the Bonferroni adjustment, in which the desired alpha level is divided by the number of tests or

    comparisons. However, Type II error is increased. A better method is to use a single statistical test that compares all

    groupsANOVA.

    ANOVA is an inferential parametric statistical test for comparing the means of three or more groups that have interval o

    ratio data. P.286. If the data are ordinal, use Kruskal-Wallis analysis of variance for a between-subjects design; for a

    within-subjects design, where the data are skewed and/or ordinal, use the Friedman rank test. if data are nominal, use

    chi-square test. If the Fobt value is greater than the Fcrit value, the results of ANOVA indicate that at least one of the

    sample means differs significantly from the others. In that case, a post hoc test for comparing each of the groups in the

    study with each of the other groups must be conducted to determine which ones difer significanlty from each other. e.g

    Tukeys HSD test. p.297 Also, see p. 296 for the assumptions of the anova (interval-ratio, normal distributed etc.)

    One-way randomized ANOVA

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    17/22

    A significant ANOVA result i.e. F-value indicates that at least one of the sample means differs significantly from the

    others. to determine which means differ significantly from the others, one needs to perform a post hock test (such as

    Tukeys HSD). p.297 Assumptions (p.296): data are interval/ratio, normally distributed, observations are independent

    etc. The term randomizedindicates that participants are randomly assigned to conditions in a between-participants

    design. The term one-wayindicates that the design uses only one independent variable. E.g. rote rehearsal vs. imagery,

    vs. story-telling on # of words recalled. This is a design with one independent var with 3 levels. The null hypothesis is

    . The alternative hypothesis is atleast one u not equal to another u. When a researcher

    rejects H0 using an ANOVA, it means that the independent

    variable affected the dependent variable to the extent that at least one group mean differs from the others by more

    than would be expected based on chance.

    The grand mean is the mean performance across all participants in all conditions. Since none of the participants scored

    the grand mean, there is variability between conditions. Is this variability due to the independent var or due toerror

    variance--chance or uncontrolled variables such as individual differences between participants?

    Within-groups variance

    This is an estimate of the population error variance. Error variance can be ascertained by seeing the variability within

    each condition b/c. participants were treated similarly.

    Between-groups variance Systematic variance due either to the effects of the independent variable or to uncontrolled confounding vars Error variance

    The F-ratio

    If we assume that the systematic variance is due to the effects of the independent variable, then if the independent var

    has a strong effect, the F-ratio will be substantially greater than one; else it will be around 1. P.264

    Step 1: Sum of Squares p.291: Several types of sums of squares (SS) are used in the calculation of an ANOVA; SSwithin +SSbetween = SStotal

    Total sum of squares (SStotal): the sum of the squared deviations of each score from the grandmean. The sum of the variances of all the groups are added together to produce the total sum of squares value

    Within-groups sum of squares : , where X is each individual score, and is the mean for eachgroup or condition. This is the sum of the squared deviations of each score from its group or condition mean and

    is a reflection of the amount oferror variance.

    Between-groups sum of squares: . This is the sum of the squared deviations of eachgroups mean from the grand mean, multiplied by the number ofparticipants in each group. The between-

    groups variance is an indication of the systematic variance across the groups. The basic idea: if the independent

    var has no effect, the group means would be similar to the grand mean, and there would be little variance acros

    conditions.

    Step 2: Mean Square (MS) is the mean squared deviation that is an estimate of variance between and within the groups

    MSwithin and MSbetween groups are calculated by dividing each SS by the appropriate df. Dftotal = N -1, where N is the total

    number of subjects in the study; dfwithin = Nk, where k = # of groups; dfbetween = k 1. Note that if the dfwithin number

    is not present in the table at the back, use the next lowest number (because when dfvalues decrease, the critical value

    increases)p.294.

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    18/22

    Step 3: Calculate the F-ratio p.293

    In APA format, to say that a test with a between groups df of 2 and a within groups df of 21 has a value of 11.07 and is

    significant at the 0.01 level, we write: F(2,21) = 11.07, p

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    19/22

    within groups sum of squares, the error sum of squares is left. Thus error sum of squares, SSerror, equals

    Step 3: calculate F = MSbetween/MSerror p.302

    MS or mean square; dfsubjects = n -1, where n is number of subjects (p.304). dftotal = N -1, where N is the total number

    of scores in the study; dfparticipants = n -1, where n = # in group; dfbetween = k-1, where k is # of conditions; dferror =

    dfbetween X dfparticipants. In table A.8, use dfbetween and dferror to find the Fcv

    Effect size in the repeated measures ANOVA is calculated similarly to one-way ANOVA. P.280

    Tukeys Post Hoc HSD test:

    Chapter 12: Complex Experimental Designs p.316In the previous chapter, we discussed designs with more than two levels of an independent variable. In this chapter, we

    will look at designs with more than one independent variablefactorial designs. P.316 A complete factorial design is

    one in which all levels of each independent variable are paired with all levels of every other independent variable. An

    incomplete factorial design, all levels are not paired with all levels of every other var.

    The factorial notation for a factorial design is determined as follows:

    Thus, a 3 X 6 factorial design is one with two independent variables, the first one of which has 3 levels and the second

    one, 6 levels, for a total of 18 possible conditions. It is not possible to have a 1 X 3 factorial design.

    A main effect is an effect of a single independent variable. The main effect of each independent variable tells us about

    the relationship between that single independent variable and the dependent variable. In other words, do different

    levels of one independent variable bring about changes in the dependent variable? For example, in a study about theeffects of different rehearsal types (rote, imagery) and different word types (concrete, abstract) on memory, the first

    two are the independent variables, and memory is the dependent variable. p.317 There can be as many main effects as

    there are independent variables. An interaction effect is the effect of each independent variable across the levels of the

    other independent variable.

    The relationship can be graphed. The dependent variable always goes on they-axis. One independent variable is

    placed on thex-axis, and the levels of the other independent variable are captioned in the graph. P.294 Possible

    outcomes of a 2 X 2 factorial design are Main effect of A? Main Effect of B? Interaction Effect? So 2*2*2 = 8 possible

    outcomes (p.296).

    Question p.322: How many main effect(s) and interaction effect(s) are possible in a 4 X 6 factorial design? A 4 X 6

    factorial design has two independent variables. Thus, there is the possibility of two main effects (one for each

    independent variable) and one interaction effect (the interaction between the two independent variables).

    Two-Way ANOVA p.323

    For the factorial designs discussed in this chapter, a two-way ANOVA would be used. The term two-wayindicates that

    there are two independent variables in the study. As with one-way ANOVA, if either of the variables has an effect, the

    variance between the groups should be greater than the variance within the groups. In a 2 X 2 factorial design, such as

    the one we have been looking at in this chapter, there are three null and alternative hypotheses. The null hypothesis for

    factor A states that there is no main effect for factor A, and the alternative hypothesis states that there is an effect of

    factor A. A second null hypothesis states that there is no main effect for factor B. The third null hypothesis states that

    there is no interaction of factors A and B.

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    20/22

    Step 1: Calculate SStotal. This is calculated in the same manner as one-way ANOVA. The dftotal also is the same: N 1;

    Step 2: Calculate SSA. p.325 This is the sum of the squared deviation scores of each group mean for factor A minus the

    grand mean times the number of scores in each factor A condition (column). The definitional formula is:

    , where is the mean for each condition of factor A, is the grand mean, and

    is the number of people in each of the factor A conditions. dfA = the number of levels of factor A minus 1. P.325.

    SSB is calculated similarly.

    Step 3: Calculate the sum of squares interaction (SSA X B):

    , where Xc is the mean for each condtion (cell), Xg is the grand

    mean, and nC is thenumber of scores in each condition or cell. The degrees of freedom for the interaction are based on

    the number of conditions in the study. To determine the degrees of freedom across the conditions, we multiply the

    degrees of freedom for the factors involved in the interaction. p.327

    Step 4: Calculate sum of squares error (SSError): The sum of squares error (SSError) is the sum of the squared deviations of

    each score from its condition (cell) mean:

    . dfError is calculated as follows: the number of conditions in the study is multiplied by the

    number of participants in each condition minus the one score not free to vary, orAB(n 1). P.303

    In the table below, A = # of conditions in A (e.g. concrete vs. abstract), B = # of conditions in B (e.g. rote vs. imagery)

    To determine the Fcritical value in Table A.8, we use dferror running down the left side of the table and the dfbetween

    running across the top of the table. p.329 However, note that there are three dfbetween values and thus three Fcv

    values. For factor A, dfbetween is dfA, for factor b, dfbetween is dfB, for the interaction, dfbetween is dfinteraction. If FA is

    significant, this means that there was a significant main effect for factor A.

    Note that Tukeys Post-hoc test needs only be completed if either or both of the independent variables have more than

    two levels (assuming that the main effects are significant to begin with). e.g. in a 2X6 factorial design for which both

    main effects are signficant, post-hoc needs to be calculated only for the independent variable that has six levels to

    determine which pairs of these six are significant). p.331

    eta-squared = SSbetween/SStotal; here SSbeween equals SSA, SSB, and SSAXB, respectively p.331

    Chapter 13: Quasi-Experimental and Single-Case Designs

    Non-manipulated Independent variables (aka participant vars e.g. gender, age, ethnicity, political affiliation): as with

    experimental studies, groups are compared and hypotheses regarding causality are tested; however ,the participants are

    not assigned randomly and the groups occur naturally. (p.345)

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    21/22

    Single-group posttest-only design: involves the use of a single group of participants to whom some treatment is given.

    there is neither a comparison group nor a comparison of the results to any previous measurements. Thesingle-group

    pretest/posttest design is an improvement over the posttest-only design in that measures are taken twicebefore the

    treatment and after the treatment. The single-group time-series design involves using a single group of participants,

    taking multiple measures over a period of time before introducing the treatment, and then continuing to take several

    measures after the treatment. The nonequivalent control group posttest-only design is similar to the single-group

    posttest-only design; however, a nonequivalent control group is added as a comparison group. Nonequivalent means

    that group membership is not random, but already established. Thus, the differences observed between the two groupson the dependent variable may be due to the nonequivalence of the groups and not to the treatment.P.323. An

    improvement over the previous design involves the addition of a pretest measure, making it anonequivalent control

    group pretest/posttest design. a pretest allows us to assess whether the groups are equivalent on the dependent

    measure before the treatment is given to the experimental group. The logical extension of the previous design is to take

    more than one pretest and posttest. In a multiple-group time-series design, several measures are taken on

    nonequivalent groups before and after treatment.

    Internal validity is the extent to which the results of an experiment can be attributed to the manipulation of the

    independent variable, rather than to some confounding variable. Thus, quasi-experimental designs lack internal validity.

    p.325

    Statistical Analysis:

    Depending on the type of data (nominal, ordinal, or interval-ratio), the number of levels of the independent variable, the

    number of independent variables, and whether the design is between-participants or within-participants, we choose the

    appropriate statistic as we did for the experimental designs.

    Cross-sectional Designs p.352

    Researchers study individuals of different ages at the same time. The advantage of this design is that a wide variety of

    ages can be studied in a short period of time. The main issue is that the researcher is typically attempting to determine

    whether or not there are differences across different ages; however, the reality of the design is such that the researcher

    tests not only individuals of different ages but also individuals who were born at different times and raised in different

    generations or cohorts, so rather than testing age differences, may be testing generational differences.

    Longitudinal Design

    With a longitudinal design, the same participants are studied repeatedly over a period of time. Disadvantage: people

    who attrition may differ from those who remain in the study.

    Sequential Designs

    a researcher begins with participants of different ages (a cross-sectional design) and tests or measures them. Then,

    either a number of months or years later, the researcher retests or measures the same individuals (a longitudinal

    design). P.352

    Single Case Research: versions of a within-participants experiment in which only one person is measured repeatedly.

    Often the research is replicated on one or two other participants. Thus, we sometimes refer to these studies as small-ndesigns.

    A reversal design is a within-participants design with only one participant in which the independent variable isintroduced and removed one or more times.

    o An ABA reversal design involves taking baseline measures (A), introducing the independent variable (B)and measuring behavior again, and then removing the independent variable and retaking the baseline

    measures (A). the reversal controls for confounds that may be changing the dependent variable.

    o The ABAB reversal design involves reintroducing the independent variable after the second baselinemeasurement.

    Multiple-baseline designs: Because single-case designs are a type of within-participants design, carryovereffects from one condition to another are of concern.

  • 7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

    22/22

    o Multiple Baselines across participants: So, here we assess the effect of introducing the treatment overmultiple participants, behaviors, or situations. We control for confounds not by reversing back to

    baseline after treatment, as in a reversal design, but by introducing the treatment at different times

    across different people, behaviors, or situations. P.331 This eliminates the possibility that some other

    extraneous variable produced the results.

    o Multiple baselines across behaviors: An alternative multiple-baseline design uses only one participantand assesses the effects of introducing a treatment over several behaviors. E.g. first introduce treatmen

    for aggressive behaviors, then days later, for talking out of turn, then days later for temper tantrums

    o Multiple baselines across situations: introduce treatment across different situations. E.g. treat first forbad behavior in math class, then days later, for bad behavior in English class. Introducing the treatmentat different times in the two classes minimizes the possibility that a confounding variable is responsible

    for the behavior change.